Search Engines: Real-time Search

phil bradley

Search Engines: Real-time Search

Phil Bradley looks at the concept of real-time search and points to some of the functionality that users can and should expect to find when exploring these engines.

There’s a lot of talk going around at the moment on the subject of ‘real-time search’, so I thought it might be useful to look at the concept in a little more detail, and to explore some of the search engines that say they offer it. This is by no means an attempt at a comprehensive listing, as new search engines are appearing if not daily, then certainly every week. Rather, it’s an attempt to define the area, using examples, and to point to some of the functionality that users can and should expect to find when exploring these engines themselves.

What is ‘Real-time Search’?

Firstly, what is meant by ‘real-time search’, and what does it encompass? A ‘real-time search’ could mean that it’s possible to find something on the Web that was made available in the last few seconds, irrespective of what that item might be. It could be a blog post that took the author 5 hours to write, and which he or she has just published. In one sense, that’s real-time searching since you can get access to that post within seconds of its going online.

However, one could also argue that it’s not ‘real time’ at all, given the amount of time that it took to write! Immediately we’re faced with the ambiguity of the concept; is the emphasis on real-time search, or is it on real-time search? Opinions differ here - Danny Sullivan in his article What is real-time search? defines the concept as ‘…looking through material that literally is published in real time. In other words, material where there’s practically no delay between composition and publishing.’ [1] This would therefore exclude the idea of the 5-hour blog post, even if the searcher found it seconds after it was published.

On the other hand, Kimbal Musk, from the OneRiot [2] search engine defines real-time search as ‘finding the right answer to your question based on what’s available right now, about the subject you care about right now. Realtime search is finding the “Right Answer, Right Now”’. [3] My personal definition is more akin to Danny’s – when I want to know what’s happening in the world right now, I’m interested in gaining access to data that’s been created (written, on video or in photographs) in the last few seconds, not data that has simply been made available in the last few seconds.

It’s a subtle, but important distinction, and at the moment I don’t feel that there’s a clear-cut definition, so if you hold to a different view, that’s fine as well – as long as we all recognise that when discussing the subject we might actually be discussing different things. Is it important? I think it is, as I’ll be using a range of search engines that use microblogging services such as Twitter, rather than news resources such as Google News.

However, there are further issues to take into account with real-time search before we can start to look at the search engines themselves. They are the usual suspects such as spam and authority. Once any subject becomes a hot topic some individuals will try to piggyback to get across their sad marketing message; for example, ‘Interesting news about <hot topic of the moment>, but come and visit my get-rich-quick website’ are tweets that I’m sure we’ve all seen before.

If the idea of authority on the Web is taxing, at least with traditional Web publishing it’s possible to check a Web site, read back through a series of blog posts or see who links to a specific page. With real-time search we’re depending on the fact that someone who says that they’re at the scene of a specific event actually is; it’s all too easy for a malicious individual to ‘report’ something that’s happening when it really isn’t. As we have little to go on other than brief details that person has supplied in their biography (if indeed they have), it becomes much more difficult to sort the wheat from the chaff.

A third issue is the sheer amount of information that’s being produced. If searchers are obtaining information on a newly breaking story, they will have to deal with new tweets or posts appearing in their hundreds every couple of minutes. In comparison the idea of a static 5,000 results returned by a traditional search engine, through which you can wade through at your leisure, sounds almost idyllic!

Twitter

Of course, when looking at real-time search, we need to find the content to search, and where better than Twitter? Twitter [4] has its own search engine, and if you’ve done much real-time searching yourself, it’s a reasonable assumption that this is the one you’ve used. A major problem with this search engine is that Twitter as a resource isn’t as careful with content as one may assume; in fact it seems to lose data on a regular basis, so don’t rely on a search being fully comprehensive because it won’t be.

Having said that, the search engine has some interesting features along with the basic functionality. A visit to the Advanced Search page at http://search.twitter.com/advanced (which, annoyingly, remains unlinked to from the Twitter home page thus decreasing the likelihood of it being used) shows that it’s possible to do a location-based search within a specific radius. This can be done by the use of a date option (although as previously mentioned, this isn’t as helpful as it could be), a search for something with a positive or negative bias towards it (based on the two emoticons ? and ? ), and finally with a question in the tweet by using the ? symbol as a limiting factor.

How real-time is the Twitter search engine? I updated my status at 11.50 am using the term “asd123poi456” and it was returned in a search within 25 seconds.

Other Real-time Search Engines

Sency [5] is a new search engine, just out of beta, which provides a simple interface, with two columns below the search box offering links to current trends and recent trends. My assumption, based on running searches, is that it is limited to Twitter’s database, since it found my strange search term in a timely fashion. However, a lack of search functionality – it couldn’t complete my location search using Twitter search terms – limits its value, and it was disappointing that there were no links to a search or FAQ page. One useful option provided by Sency however was the ability to create an HTML-based feed for a particular term or phrase which could then be incorporated onto a Web site or into a blog.

Collecta [6]] is a search engine that takes a rather broader remit than simply searching Twitter content. It pulls content from blogs, blog comments, Twitter [4], Jaikua [7], Identi.ca, [8], Flickr [9], TwitPic [10], yFrog [11] and video from YouTube [12] and Ustream [13].

Collecta has a three-column approach; left-hand for search, hot now topics, options and the like, centre column for results and the right-hand column to expand the result. There are options to filter and limit a search based on five different criteria: Stories, Comments, Updates, News, Video, based on various options within each criteria; so limiting to Updates would simply pull in content from Twitter, Jaiku and Identi.ca) and searches can be shared across a broad range of social media sites such as Facebook [14] and Delicious [15]. What particularly impressed me about Collecta was that it was able to find my search and date-stamp it at exactly 11.49.25.

Searches are saved while you’re on the site, allowing searchers to go back easily and check to see if new material has been added since the search was originally run. The centre column displays a small icon next to each result to indicate source, and in the right-hand pane each result can be viewed in more detail, with a link to the original source (such as a blog posting) if it’s too large to be displayed in its entirety by Collecta.

The downsides to Collecta, however, are a lack of RSS feeds, very limited help functionality and no listing of possible search syntax.

OneRiot [2] also claims to search ‘the realtime web’. They say of themselves ‘OneRiot crawls the links people share on Twitter, Digg and other social sharing services, then indexes the content on those pages in seconds. The end result is a search experience that allows users to find the freshest, most socially-relevant content from across the realtime web.’ [16] It has two search options, Web and Video, and below the search box are a number of trending topics. Disappointingly however, OneRiot did not find my recent tweet even though (at 45 minutes) it was by then quite elderly. The reason for this however quickly becomes clear when looking at the results of searches that are returned, since OneRiot is interested in links to resources, rather than more general content.

Consequently, searching on a trending topic ‘optical illusions’ provides access to the original resource being discussed, the number of times that it has been linked to, and when. The engine draws a distinction between two types of data – ‘real-time’ and ‘pulse’. They explain the difference thus: ‘When you sort your search results by “Realtime” (our default sorting system), you’ll find search results that reflect the most recently shared content on the Web as related to your query. When you sort by “Pulse,” you’ll find the most socially valued content on the Web as related to your query - a ranking that takes into account stuff like number of shares, rate of share and so on’.

Scoopler [17] is another engine that takes a fairly broad remit of the resources that it uses, indexing Twitter, Flickr, Digg [18], Delicious and others]. It takes a similar approach to that of OneRiot, as it has a two-column display of results, ‘popular shares’ and ‘real-time’ results. The popular shares automatically include images and videos, although they can be disabled if required. Live posts certainly seemed to be real-time – the searches that I ran covering popular news items were returning results of posts that had been published in the last minute, and it found my odd search term without any problem at all, date-stamped, with a link to my avatar and Twitter home page. Scoopler also offers a ‘hot topics’ option, links to ‘My searches’ and posts in key subject areas such as Technology, World Business, Sports and so on.

CrowdEye [19] is another beta Twitter search engine, but it does provide a wealth of information from that one resource. Results are split into different sections – a time span and filter option with a graph to show the tweet volume of a particular term; useful when you need to see if and when a term has peaked, a tag cloud to further narrow results down, a search filter column to limit to your own search terms, and hashtag-related queries. That”s just column one!

CrowdEye then goes onto provide access to the most popular links, such as blogs and news sites, and tweets which are sorted by relevance or time. An interesting display here is an indication of the influence of the person tweeting, based on some “cool math”. Stephen Fry has an influence of 86, the Daily Telegraph newspaper is rated at 50, with the Daily Mail on 46. CrowdEye provides access to hot searches and the current top 20 Web sites. This search engine is powerful, neatly laid out and easy to use. If it indexed content from more sites it would put itself in an almost invincible position.

The rather strangely named Stinky Teddy [20] has positioned itself as a realtime meta search engine, pulling in content from Bing [21], Yahoo [22], Videosurf [23], Twitter and Collecta. It provides details on how many tweets, Web results, video and image results it can find, and a buzz-o-meter telling users how many tweets there are per minute on Twitter for the subject being searched. Results are collated according to source and can be limited to news, Tweets, video and so on. It’s still early days for the search engine – it told me that there was one result for my strange search term, but then flatly contradicted itself and told me there were no results to view.

There are various other search engines that are currently developing in this area, such as Twazzup] [24] with suggestions of Twitter users to follow, headline news, real-time tweets, news, top links, related photographs and links to key contributors. Twingly [25] has fairly basic results from Twitter and several other key resources such as Jaiku and Identi.ca, and it does have an RSS option. Topsy [26] is ‘a search engine powered by tweets’, and lists the usual trending topics and popular pages. It’s major strength is listing links to sites that people are tweeting about, breakdowns of terms/time periods and top authors.

Traditional Search Engines

Given the rise in this new breed of search engines, one shouldn’t forget the old warhorses; how are they managing to integrate real-time content? Google is certainly indexing tweets, and found my sample tweet with no problem at all. Using the site search option of site:twitter.com provides 332,000,000 results and these can then be reduced with other search terms, or the ‘Show options’ ‘Recent results’ can also be employed.

Since I began this piece there have been developments with two of the major players – Google obviously being one, and also Bing. The latter recently announced an agreement [27] with Twitter to take their ‘firehose’ of tweets. If you’re keen to try this out the first point to make is that you need to use either the US version or make sure you’re in Australia, Canada, Great Britain, Indonesia, Ireland, India, Malaysia, New Zealand, Philippines, Singapore, Arabia, South Africa, rather than any other country version. I am at a loss to explain why Microsoft appear unable to roll out working functionality across the site, but that’s the way that they have decided to do it. The site itself is http://www.bing.com/twitter [28]. First thing that we see is a tag cloud of hot topics on Twitter. This is not the same as the trending topics you see in Twitter - there are differences. So clearly Bing is already doing different things to the data that they’re getting. Trying a Windows 7 search, as that’s a hot topic of the moment, returned tweets that were 2 minutes old, yet a search directly at Twitter was providing content that was less than 30 seconds old. One might think that it’s uncharitable to quibble over 90 seconds, but the cumulative effect is worrying; in the time that it took me to run the searches, compare results and write a commentary, Twitter added another 376 new tweets, but even though Bing has a ‘Pause’ button (which I hadn’t used), the results were exactly the same - other than the fact that Bing was telling me that the results were now 7 minutes old.

Bing has an option to see ‘top links shared in tweets’ which is helpful, and indeed Bing is taking this a stage further by filtering results and ranking results based on various criteria such as the age of a link, how many people are retweeting links and the authority of people who are doing the retweeting.

Bing is still very much in beta phase with this resource, but nonetheless I do have concerns – it is supposed to break open all shortened URLs, such as those from bit.ly and other similar services, but I quickly found an example where this hadn’t happened and which took me to a spam site, and others (such as Danny Sullivan of SearchEngineLand [29]) have reported the same thing. This is apparently a ‘glitch’ but it doesn’t lead to much confidence in the service. While there are good things about Bing’s offering, I don’t currently see enough of a benefit to it to make me change from other options. I’m also not alone in this feeling – Karen Blakeman summed up her review of the service by saying, ‘Bing have yet again snatched defeat from the jaws of victory.’ [30] However, they’re still a good step ahead of Google who are able to make nothing more than the announcement of their Twitter agreement [31] and don’t even have a test site to share at the moment.

Icerocket [32] however, has been busily repositioning itself as a real-time search engine and did perform rather better. Not only did it find my sample tweet, it was pulling up results that had been published on Twitter in the last minute. Icerocket also has a ‘big buzz’ option which pulls in content from blogs, Twitter, video, news and images. It also offers auto-refresh and save search options.

Conclusion

Real-time search is still in its infancy, and there’s very little yet by way of ‘must- have’ functionality. Clearly any search engine that limits itself to Twitter is of limited value, and users may well decide to use the native search engine in most instances, unless there’s a compelling reason not to. CrowdEye certainly provides users with extra functionality and is one to keep a watch on. Traditional search engines are going to have to up their game in order to integrate content and, as yet, with one or two exceptions, have not really started to do this.

Finally, there are many engines that are busy in this area; it would have been easy to have listed another dozen engines without any difficulty - so it’s an area worth keeping an eye on.

References

Danny Sullivan (2009). What Is Real Time Search? Definitions & Players, 9 July 2009, Search Engine Land
http://searchengineland.com/what-is-real-time-search-definitions-players-22172
OneRiot.com – Realtime Search for the Realtime Web http://www.oneriot.com/
RE: What Is Real Time Search? Definitions & Players 7/09/09 - Posted by Kimbal Musk under ‘Industry.’ This is a response to Danny Sullivan’s recent post on SearchEngineLand
http://blog.oneriot.com/content/2009/07/re-what-is-real-time-search-definitions-players/
Twitter http://twitter.com/
Sency - What’s Going On? http://sency.com/
Collecta http://collecta.com/
Jaiku - Your Conversation http://www.jaiku.com/
Identi.ca http://identi.ca/
Flickr http://www.flickr.com/
TwitPic : Share Photos on Twitter http://twitpic.com/
yFrog - Share your images/videos on Twitter! http://www.yfrog.com/
YouTube –Broadcast Yourself http://www.youtube.com/
Ustream http://www.ustream.tv/
Welcome to Facebook! http://www.facebook.com/
Delicious http://delicious.com/
OneRiot.com - About Us http://www.oneriot.com/company/about
Scoopler: Real-time Search http://www.scoopler.com/
Digg - The Latest News Headlines, Videos and Images http://digg.com/
CrowdEye http://www.crowdeye.com/
Stinky Teddy http://stinkyteddy.com/
Bing http://www.bing.com/
Yahoo http://www.yahoo.com/
Videosurf Video Search Engine http://www.videosurf.com/
Twazzup http://www.twazzup.com/
Twingly Microblog Search http://www.twingly.com/microblogsearch
Topsy http://topsy.com/
Bing is Bringing Twitter Search to You 21 October 2009, 10.24 AM
http://www.bing.com/community/blogs/search/archive/2009/10/21/bing-is-bringing-twitter-search-to-you.aspx
Bing Twitter http://www.bing.com/twitter
Danny Sullivan , Up Close With Bing’s Twitter Search Engine, 21 October 2009, 2.40pm ET
http://searchengineland.com/live-today-bings-twitter-search-engine-28224
Karen Blakeman’s Blog: Blog Archive and Twitter search in Bing and Google, 23 October 2009
http://www.rba.co.uk/wordpress/2009/10/23/twitter-search-in-bing-and-google/
RT @google: Tweets and updates and search, oh my! Official Google Blog, 21 October2009, 02:09 PM
http://googleblog.blogspot.com/2009/10/rt-google-tweets-and-updates-and-search.html
Icerocket http://www.icerocket.com/

Author Details

Phil Bradley
Independent Internet Consultant

Email: philb@philb.com
Web site: http://www.philb.com/

Return to top