Web Magazine for Information Professionals

Search Engines: Weblog Search Engines

Phil Bradley looks at the developments occurring with weblogs and how you can go about searching on or for them.

Weblogs are becoming increasingly important these days and it’s becoming harder to spend any time on the Web without seeing references to people’s weblogs or being invited to ‘read my weblog’ [1] This is all well and good of course, but there’s still a lot of mystery surrounding not only what these creatures are, but how they can best be used - and of course, how to find them and once you have found them, how to search them. Consequently I thought it might be interesting in this column to take a look at the whole subject of weblogs, though primarily from the viewpoint of either searching for them, or searching their content.

However, before we get into that area, it’s quite likely that you’re still a little in the dark over what exactly a weblog is. (If you do know, congratulations, and as a prize you can skip this paragraph and move straight onto the rest of the article!) A weblog is a website or page that is the product of (generally) an individual or of non-commercial origin that uses a date-limited or diary format, and which is updated either daily or at least regularly with new information about a subject, range of subjects, or personal details.
This information may have been written by the author of the log, obtained from other sources on the Web, contributed by others, or a combination of those. They are consequently usually topical and timely, and can be viewed as a developing commentary on a situation, event or subject.
Weblogs are also referred to as logs, Blogs, Web logs and so on. There appears to be no single standard way of referring to them.

There are a variety of different types of weblog, all doing different things. The single most popular weblog is probably Slashdot [2] which is the work of programmer and graphic artist Rob Malden and some of his colleagues. Slashdot is an extended weblog, in that it carries discussion threads that are contributed to by various individuals, and on many subject areas, such as games, hardware, programming and so on. To this extent, it may appear to be more akin to a portal, rather than a diary, but then again, this is the Internet after all, and things have a habit of merging and morphing into something else, so don’t be overly concerned about a strict definition! At the other end of the spectrum is for example the weblog of Jenny Levine, The Shifted Librarian, [3] which is a personal weblog of an information professional; it’s one of my own favourites.

Despite their differences, weblogs have several key elements in common:

It’s therefore rather difficult to search them in a traditional format. You can certainly use Google for example to identify weblogs that may interest you, simply by doing a subject search and adding in ‘weblog’ as a term. A search for ‘librarian weblog’ returns almost 57,000 hits and you’ll not only find some interesting weblogs written by librarians you can visit, but also some useful articles about them written for or by librarians themselves. However, while Google does index weblogs (which incidentally has been the cause of much concern and complaint), what is obviously being missed is the immediacy of weblogs which is their great value and strength.

Therefore, we must turn to other resources that are, dare I say it, better than Google in this area. Experienced Web searchers will of course be aware that there are various types of search engine available; I identify two types in particular as ‘free text’ such as Google, Alltheweb and so on and index/directory, such as Yahoo! Exactly the same division exists when looking at weblog search engines, and your choice of engine will depend on what exactly you want to find. If you already know what you’re interested in, and have some key words to search for, a good first visit would be to Daypop [4] which visits and indexes some 35,000 news sites, weblogs and feeds for new and currently breaking stories. I ran a search for Patriot Act and got about 400 references, most of which had been written in the last couple of days. While I’d get a lot more using Google (438,000 in fact) these were not as current, and even when I focussed my search on the Google News function I found that the Daypop results were still bringing me more recent material.

Daypop also provides some other useful functions as well as (it has to be said) a fairly basic search facility. The Daypop Top 40 is a list of links that are popular with webloggers around the world. It can be useful to spend some time looking at this to see exactly what people are talking about, though the disadvantage is that a lot of the stories are either related to bizarre ‘you’ll never believe this’ stories, or to news items of interest to the Internet community in general and webloggers in particular. I’m not convinced that this is therefore as valuable as it could, and I hope, will be in the future. Their section on top news stories is slightly better, but the emphasis is still very strongly biased towards US news.

The search engine also does something called word and news bursts. It highlights various words that are either repeated in a lot of different entries in weblogs or in news headlines. The idea is that if lots of people/news resources are talking about something, it must be of interest. This idea works well up to a point, but given that some of the current (23 June) word bursts are ‘hellfire’, ‘harvesters’, ‘honeypot’, ‘celestial’, and ‘clatter’, I’ve really got to question their value. However, I can see how this might be useful in the future if and when the Daypop programmers can tighten their code up.

Daypop also list top weblogs by citation and importance, so if you’re keen to jump into the whole field of weblogs but don’t know where to start, this facility is a good place to begin.

Other search engines in the same ‘free text’ category as Daypop are Blogdex [5], Feedster [6] and Detold Blawg Search [7], which is a search engine that specialises in legal information. It’s worth noting that this is only a partial list - there are many more search engines that focus on weblogs and news stories, and these are just a few. In my opinion they all seem to be in their infancy and have very limited functionality, though I’m sure that this is going to change in the coming months and years. However, for the time being they’re pretty much as good as it gets!

The second type of weblog search engine that I mentioned is the index or directory style, and there is a fairly wide choice available; none of them as yet stand out head and shoulders over the others. The Eatonweb Portal [8] is one of the oldest, and currently has 12,161 logs well categorised into subject, language, country and alphabetical order. As an aside, I think this last category really does highlight the extent to which weblogs are still in their infancy; can you image Yahoo! indexing all their listed websites into a single alphabetical list for example? There is also the Globe of Blogs [9] which currently indexes 5,757 weblogs, and has options to search by topic or location. Several communities exist which allow webloggers to host and share their logs and these also provide some rudimentary search facilities, such as Network54 [10], Diarist [11] and LiveJournal [12]. Most of these community directories do however tend to emphasise personal weblogs which, it could argued, have little relevance for us in a professional capacity. However, one nice collection of library weblogs can be found at Library Weblogs [13]. One final good collection is that provided by Guardian Unlimited [14] which arranges their listing by British weblogs, world blogs, news blogs, tech weblogs and other niches. If you’re interested in some general explorations of the weblog phenomena, any of these would be a good place to begin.

In conclusion I think that weblogs already do have a value to us as information professionals; they keep us current with what people are thinking, writing about and linking to. They can be useful for current affairs and news stories, and as long as you accept that they can have a considerable personal bias in the way in which they’re written, they can be a useful starting point to obtaining more information. The weblog search engines are still very basic and have a long way to go before they will be truly useful, but they’re already a good way of gleaning some useful nuggets of information.

References

  1. Phil Bradley’s Blog http://www.philb.com/blog/blogger.html
  2. Slashdot http://slashdot.org/
  3. The Shifted Librarian http://www.theshiftedlibrarian.com/
  4. Daypop http://www.daypop.com/
  5. Blogdex http://blogdex.media.mit.edu/search.asp
  6. Feedster http://www.feedster.com/
  7. Detold Blawg Search http://blawgs.detod.com/
  8. Eatonweb Portal http://portal.eatonweb.com/
  9. Globe of Blogs http://www.globeofblogs.com/
  10. Network54 http://www.network54.com/
  11. Diarist.Net http://www.diarist.net/
  12. LiveJournal.com http://www.livejournal.com/
  13. Library Weblogs http://www.libdex.com/weblogs.html
  14. Guardian Unlimited http://www.guardian.co.uk/weblog/special/0,10627,744914,00.html

Author Details

Phil Bradley
5 Walton Gardens,
Feltham,
Middlesex

Email: philb@philb.com
URL: http://www.philb.com
Phil Bradley is an Independent Internet Consultant.

Return to top

Article Title: “Search Engines: Weblog search engines”
Author:Phil Bradley
Publication Date: 30-July-2003
Publication: Ariadne Issue 36
Originating URL: http://www.ariadne.ac.uk /issue36/search-engines/