Book Review: Introduction to Modern Information Retrieval
This substantial (470-page) paperback is the second edition of one of the few UK-based textbooks on information retrieval (IR). The first edition appeared in 1999, and was criticised for being badly out of date and at times too complex for its intended undergraduate and postgraduate student audience. How does this second edition stack up?
The first thing to say is that it is a lot better than the first edition - there are a number of new chapters that are well written and up to date, and some of the chapters that also appeared in the first edition have had errors removed. But the bad news is that there is still quite a lot wrong with this book.
The book comprises 23 chapters, starting with basic concepts, database technology, bibliographic formats and cataloguing, and considering search and retrieval and user issues before considering evaluation methods. It concludes with a hotch potch of miscellaneous topics, such as Web IR, intelligent IR, natural language processing, digital libraries and the future. Each chapter is supported by some references (wildly varying in number - in one chapter it was more than 100, but 20 was the more typical number). A few of the references are to Web sites, and a heavy emphasis on BUBL as a typical database - a bit unfortunate, as it seems to have closed in mid 2002.
There is no question that, if used judiciously, the book could form the basis of a text for teaching basic IR principles. However, the mistakes (recall and precision do not range between 1% and 100%, but between 0% and 100%; Memex was designed for microfilm retrieval and not online as the author claims), inconsistencies (sometimes bibliographic databases include full-text ones, sometimes they do not; is it Theodore Nelson or Ted Nelson?), the bits missing (nothing on searching e-journal collections; virtually nothing on the Open Archives Initiative; the list of key DIALOG commands missed the crucial TYPE command; too limited on evaluation of Web search engines) and the badly out of date references (especially Chapters 1, 2, 3, 6, 7, 9, 13 and 20), greatly diminish the text. In some cases, the references are to ancient editions of books that have since been updated, e.g. Jenny Rowley's The Electronic Library and the UKMarc Manual. Does the author not make any checks about recent books? In other cases, he refers to 'recent' articles that are in fact six, eight or 10 years old. Some chapters fail to refer to seminal works, such as Bourne and Hahn on the early history of online, Rowley on abstracting, Pedley on the invisible Web (confusingly called "deep Web" by Chowdhury) and Case on user studies.
There are many tedious lists of bullet points in some chapters, and over-complex explanations in others, for example on automatic classification and automatic indexing, vector processing, natural language processing and best match searching. Does the author really expect librarianship students to understand advanced calculus?
There are a small number of minor typos. From time to time the author introduces concepts, such as PRECIS in one chapter, and then explains them fully in a later chapter. It would be good to have some cross-referencing in place. The in-depth discussion of Farradane's relational indexing, which was only ever used in one establishment where Farradane worked, and was then immediately dropped when he left the establishment more than 30 years ago, is pointless. Similarly the descriptions of Uniterm, Peek-a-boo, certain "intelligent information retrieval" systems and Xanadu do not make it clear that these are systems that have long vanished (or in the case of Xanadu, never got going), and are therefore of passing historical interest only.
To summarise - there is the basis of a good textbook on information retrieval here, but it needs to be revised, with the out of date references, and over-complex and erroneous material removed. Can we have a new improved third edition soon please?