Book Review: ARIST 39 - Annual Review of Information Science and Technology

michael day

Book Review: ARIST 39 - Annual Review of Information Science and Technology

Michael Day reviews another recent volume of this key annual publication on information science and technology.

The Annual Review of Information Science and Technology (ARIST) is an important annual publication containing review articles on many topics of relevance to library and information science, published on behalf of the American Society for Information Science and Technology (ASIST). Since volume 36 (2002), the editor of ARIST has been Professor Blaise Cronin of Indiana University, Bloomington.

Professor Cronin's introduction to the 2004 volume highlighted some of the difficulties with planning a publication like ARIST, noting that it has a habit of not quite turning out as it was initially conceived [1]. Reflecting on this with a wry reference to the modern trend of 'supersizing,' Cronin's introduction to the latest bumper-sized volume notes that ARIST 39 should contain "something for just about everyone" (p. vii). Despite this, however, a neat redesign by the publishers has resulted in a volume that actually contains fewer pages than volume 38. The volume contains fourteen chapters - two more than ARIST 38 - grouped into five sections relating to information retrieval, technology and systems, social informatics, national intelligence, and theory.

Information Retrieval

The opening chapter is a review of recent developments in the use of statistical language modelling for the retrieval of written text by Xiaoyong Liu and Bruce Croft of the University of Massachusetts, Amherst. The opening pages explain why statistical language modelling techniques, historically mainly used for things like automatic speech recognition or machine translation, have now been applied in support of information retrieval. The core of the chapter commences with a rather technical introduction to language models and their uses in information retrieval followed by a discussion of the 'smoothing' strategies used to make statistical distributions more uniform. Following some comparisons with traditional probabilistic information retrieval approaches, Liu and Croft sketch out some of the main application areas where language modelling has been used, including dealing with ambiguity in queries, providing relevance feedback, and supporting distributed and cross-lingual retrieval. A final section sketches out some future research directions.

Chapter 2, by Kiduk Yang of Indiana University, deals with the specific issue of information retrieval on the World Wide Web. The chapter starts with a consideration of some of the main characteristics of the Web (size, interconnectivity, etc.) and of users' information seeking behaviour. This includes the observation that the analysis of search engine query logs suggests that Web users "expect immediate answers while expending minimum effort" (p. 39). In his introduction, Yang notes that the Web is rich in types of information not present in most information retrieval test collections, including hyperlinks, usage statistics, markup tags, and subject-based directories (p. 33). The main body of the chapter explores in more detail how these types of information are used in Web retrieval research. A major focus is on link analysis algorithms that mine the human thought involved in creating hyperlinks to determine the likely relevance (or authoritativeness) of pages, with more detail provided on the uses of HITS (Hypertext Induced Topic Search) [2] and Google's PageRank [3] algorithms. Further sections deal with the mining of usage data, the Web track activity in TREC (the Text REtrieval Conference) [4], and attempts to bring a level of information organisation to the Web, e.g. using information from Web directories or automated classification techniques. The chapter concludes by recommending 'fusion' approaches to retrieval that combine multiple sources of Web evidence.

The statistical analysis of Web linking behaviour is also a key feature of the new research field of webometrics, which is introduced in the following chapter by Mike Thelwall of the University of Wolverhampton, Liwen Vaughan of the University of Western Ontario and Lennart Björneborn of the Royal School of Library and Information Science in Copenhagen. The concept of webometrics, as first defined by Almind and Ingwersen in 1997, originated from the realisation that informetric methods like citation analysis could also be applied to Web links [5]. The chapter itself focuses on four main areas of webometrics. Firstly, the authors introduce the basic concepts and methods used, highlighting problems with defining units of analysis and providing an overview of data collection methods and sampling techniques. A second section reviews research that looks at Web linking behaviour in the context of scholarly communication. This includes studies looking at research papers in e-journals and services like CiteSeer [6], as well as the country-based analyses of university Web sites pioneered by Thelwall and his colleagues. The section following covers more general issues, reviewing attempts to analyse the size and nature of the Web, as well as studies of user behaviour and commercial Web sites. The final section introduces the topological approaches that have done much to uncover the underlying structure of the Web as a complex network, including the discovery of scale-free network features and small-world properties. The discovery of any structure at all was a surprise for some. Steven Strogatz has written that while the Web is an "unregulated, unruly labyrinth where anyone can post a document and link it to any page at will ... [it] is apparently ordered in a subtle and mysterious way, following the same power-law pattern that keeps popping up elsewhere" [7]. The interesting behaviour of small-world networks has been explored in more detail in popular books written by Lázló Barabási [8] and Duncan Watts [9].

Technology and Systems

The first chapter in the section on technology and systems concerns information visualisation, written by Bin Zhu of Boston University and Hsinchun Chen of the University of Arizona, Tucson. Visualisation techniques are becoming increasingly important in e-science, e.g. for helping to understand the large datasets generated by grid-based experiments or simulations [10]. In their introduction, Zhu and Chen (p. 139) note that visualisation provides a means of linking computing power and the human eye to help "identify patterns and to extract insights from large amounts of information." The opening sections of their chapter provide an overview of the topic, including information on theoretical foundations, application areas, and a framework (taxonomy) of technologies. Further sections explore emerging applications for visualisation techniques - focusing on digital libraries, the Web, and virtual communities - and evaluation methods. Colour versions of all the figures provided in the chapter are freely available from the ARIST Web pages [11].

The 'data deluge' in many scientific disciplines has resulted in a growing awareness of the need for computational tools that can help manage and interpret vast amounts of data. This is especially true in molecular biology, where the new interdisciplinary field of bioinformatics has emerged to deal with the large amounts of data being generated by sequencing initiatives and other biological projects. While the curation of data is a major concern, Luscombe, Greenbaum and Gerstein note that the aims of bioinformatics extend much further, i.e. to develop tools to support the analysis of data and to interpret results in a biologically meaningful manner [12]. Chapter 5 contains the first review of bioinformatics to appear in ARIST, and is written by Gerald Benoît of Simmons College, Boston. It commences with a very brief overview of biological data gathering techniques - e.g., nucleotide and protein sequencing - providing some examples of prominent projects and databases. A second section looks in more detail at definitions of bioinformatics, concluding with a long quotation from the article by Luscombe, et al. cited above. A short section on professional communication is followed by a more detailed review of different database types, the roles of data mining and visualisation tools, opportunities for collaboration and links with clinical (medical) informatics. The chapter is very interesting, but its structure is sometimes confusing and there is the odd inconsistency. For example, the list of journals publishing 25 or more items annually on bioinformatics in Appendix 5.2 surprisingly omits Nature, the Journal of Computational Biology, and Bioinformatics (these are, however, included in the list provided on p. 188). Others (Nucleic Acids Research, Proceedings of the National Academy of Sciences) are misspelled. On one occasion (at least) there is a lack of precision that could potentially be misleading. The extremely brief account of James Watson and Francis Crick's decoding of the structure of DNA (p. 181) implies that Watson was a British scientist (rather than an American scientist working at the Cavendish Laboratory in Cambridge) and that he and Crick themselves used X-ray crystallographic techniques to demonstrate the double helical structure of DNA. While they certainly did use X-ray evidence to support their theories - most famously using data generated by Rosalind Franklin of King's College, London - Watson and Crick's own accounts make it clear that their main work was based on model building and intuition [13], [14].

The following chapter is an extremely well written review of research initiatives related to electronic records management by Anne Gilliland-Swetland of the University of California, Los Angeles. While ARIST has previously covered general digital preservation topics [15], [16], this is the first chapter to specifically review electronic records management as a research topic. The chapter starts with a discussion of definitional issues, focused on debates about the nature of records and the role of archives, which have themselves (in part) been driven by the challenge of electronic records. Further sections review the history of electronic records research since the emergence of social science data archives in the 1940s, emphasising the importance of the 1991 meeting report Research Issues in Electronic Records, issued by the US National Historical Publications and Records Commission [17]. Gilliland-Swetland argues that this report marked the emergence of a new approach to electronic records management, one largely record- and evidence-driven, and informed by empirical study (p. 234). The chapter then reviews several of the research initiatives that embodied this 'second-generation' approach to electronic records management, including projects like the seminal Pittsburgh Project (Functional Requirements for Evidence in Electronic Recordkeeping) [18] and the more-recent InterPARES (International Research on Permanent Authentic Records in Electronic Systems) collaboration [19]. The remaining sections of the chapter look in more detail at issues relating to the reliability and authenticity of electronic records and the key topic of metadata, which Gilliland-Swetland notes is "likely to be a locus of considerable research and development for the foreseeable future" (p. 244).

Social Informatics

Social informatics is an area that has been covered in ARIST before, for example, by Bishop and Star in 1996 [20] and more recently by Sawyer and Eschenfelder in 2002 [21]. The late Rob Kling defined it as a new name for "the interdisciplinary study of the design, uses, and consequences or information technologies that takes into account their interaction with institutional and cultural contexts" [22]. The section contains three chapters, the opening one by Ewa Callahan of Indiana University reviewing the influences that cultural or linguistic differences can have on interface design. The chapter first looks at definitions of culture and methodological issues in cultural research, then reviews interface design with perspectives on language, graphical elements, structural presentation and usability. The second chapter in the section investigates "The social worlds of the Web" and is by Caroline Haythornthwaite and Christine Hagar of the University of Illinois at Urbana-Champaign. As the title might suggest, this is a preliminary look at the social networks that underpin the Web, concluding that we "are still at an early stage in understanding how the Web is affecting local, national, and global patterns of society" (p. 338). The final chapter in the section, by Andrew Large of McGill University in Montreal, concerns the use of the Web by children and teenagers. The chapter reviews a wide-range of topic areas, including national surveys of Web use, studies of Web access and information-seeking behaviour, the use of the Web in educational contexts, and issues relating to content and personal safety.

National Intelligence

ARIST 36 included a chapter on "Intelligence, Information Technology, and Information Warfare" by Philip Davies [23]. Volume 39 goes one better and has a whole section devoted to national intelligence. First, the editor himself provides a chapter on "Intelligence, terrorism and national security," based on a public lecture delivered at St. Anthony's College, Oxford in 2003. In this, Professor Cronin looks at the nature of extreme terrorism, analysing the challenges faced by intelligence and counterintelligence services in the United States. The chapter is not primarily a review of information science research, but an analysis of national security challenges based on a comparison of the organisation and culture of US intelligence agencies with the decentralised, distributed networks used by some terrorist groups. A second chapter, by Lee Strickland, David Baldwin and Marlene Justsen of the University of Maryland looks at "Domestic security surveillance and civil liberties." Again primarily focused on the United States context, this chapter first reviews the history of government surveillance legislation and guidance, cumulating with the USA Patriot Act of late 2001. Further sections look at the impact of surveillance on citizen rights and propose a scheme for the management of surveillance in a representative democracy. This last section includes a brief look at the UK context, including some comments on oversight regimes.

Theory

The final section of volume 39 contains chapters on three different theoretical approaches to information science, nicely complimenting Nancy Van House's chapter on science and technology studies in ARIST 38 [24]. Theoretical and philosophical perspectives originating in literary criticism and cultural studies have become extremely influential in the wider social sciences and humanities, and it is probably no surprise that interest in these matters is increasing in information science. Evidence of this trend can be found in recent special issues of the journals Library Trends and Journal of Documentation focusing on philosophical issues [25], [26]. Theoretical approaches have often been criticised for ignoring practical information science problems, but as Talja, Tuominen and Savolainen comment, practical solutions will always be "developed on the basis of theoretical and epistemological assumptions" [27] - whether stated or unstated.

The opening chapter in the theory section of ARIST 39 concerns the management of social capital and is by Elisabeth Davenport of Napier University and Herbert Snyder of North Dakota State University. Social capital refers to the benefits that accrue to individuals through their social networks, and this chapter investigates how social capital in organisations may be able to be managed with information and communication technologies. The chapter reviews a number of different approaches but the authors resist synthesising them into a grand narrative because, as they note, the topic is an emerging one and there are few robust, longitudinal studies (p. 539).

Chapter 13 is by Julian Warner of the Queen's University of Belfast and is a study of the role of labour in information systems. Warner himself acknowledges that no coherent tradition of attention to labour in information systems exists to be reviewed, so here he attempts to synthesise from implied concepts revealed elsewhere (p. 553). This chapter starts with a quotation from Genesis, then progresses rapidly through John Milton to Karl Marx. The thinking of Marx seems to underpin much of the chapter's argument, but the frequent use of technical terms means that it is not a particularly pleasant (or easy) read.

The final chapter investigates the (to me at least) unpromising topic of post-structuralism, described by Cronin in his volume introduction as "one of the major paradigms of twentieth-century literary theory" (p. x). In this chapter, Ronald Day of Wayne State University in Detroit first introduces post-structuralism with reference to some of the fashionable French theorists that first developed the concept - here chiefly Michel Foucault, Jacques Derrida and Gilles Deleuze. He then tries to relate post-structuralism to information science, noting the traditional dominance of theoretical approaches focused on particular types of information or users. From his overview of existing research, Day concludes that information studies theory has largely remained "a positivist exercise, squarely within the metaphysical tradition of Western Philosophy, in so far as it reifies meaning and understanding in language acts, replacing variable pragmatics with idealistic models" (p. 579). The chapter then proceeds to look at different information science issues from a post-structuralist perspective, focusing on issues like the correspondence of meaning (which is relevant to the development and use of knowledge organisation systems) and the importance of historicity. Further sections provide more detail on discourse analysis, hermeneutics, and events. Day argues that post-structuralism provides "a challenge to the metaphysical and epistemological assumptions that have, for so long, dominated" information science research and practice (p. 581) and concludes that it merits more attention by information studies researchers (p. 603). There seems to be at least two things missing from this chapter. Firstly, there is little attempt to 'position' post-structuralism in its wider philosophical context; although it must be acknowledged that there is no shortage of introductory texts that attempt to do this [28]. Also, it would have been useful to have some comments on the likely practical outcomes of post-structuralist discourse in information science. In this regard, it is interesting that Gary and Marie Radford elsewhere cite 'best match' retrieval techniques, the development of flexible metadata standards like Dublin Core, and Google's search algorithms as examples of post-structuralist tendencies in library and information science [29]. I do wonder how many of the developers of these tools were actually aware of this.

Conclusions

Professor Cronin's volume introduction (p. xi) acknowledges that ARIST will rarely be read from cover to cover. Reading patterns into what will to some extent be a random selection of topics is likely to be problematic, but it was striking in volume 39 that many chapters were focused on the nature of the Web (or its users) or on the characteristics of networks more generally. Others will have their own favourites, but I thought that the best chapters in ARIST 39 were Thelwall, Vaughan and Björneborn's introduction to the new research field of webometrics and Gilliland-Swetland's excellent review of electronic records management research. The most disappointing of all was the chapter on bioinformatics, which could now be supplemented by a new chapter focusing primarily on the biological problems that the field is intended to support. The final set of theoretical chapters was not to my personal taste, but they do contain ideas that will be of interest to others. The volume contains over 2,000 bibliographical references, which will provide a mine of information when readers need to investigate new topics of study.

On occasion, ARIST has been criticised for focusing mainly on US research. In this regard it is perhaps interesting to note that of the 22 contributors, 16 were from the USA, three from the UK, two from Canada, and one from Denmark, resulting in a lower proportion of non-US authors than the previous volume.

That said, the continued high-quality of ARIST means that we can have confidence in the editorial decisions of Professor Cronin, his associate editor Debora Shaw, and the ARIST Advisory Board. I, for one, am looking forward to the next volume.

References

Cronin, B., Introduction. Annual Review of Information Science and Technology, 38, 2004, vii.
Kleinberg, J. M., Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5), 1999, 604-632.
Brin, S., Page, L., The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems, 30(1-7), 1998, 107-117.
TREC Web track: http://ir.dcs.gla.ac.uk/test_collections/
Almind, T. C., Ingwersen, P., Informetric analyses on the World Wide Web: methodological approaches to 'webometrics.' Journal of Documentation, 53(4), 1997, 404-426.
CiteSeer: http://citeseer.ist.psu.edu/
Strogatz, S. Sync: the emerging science of spontaneous order. London: Penguin, 2004, p. 255.
Barabási, A. L., Linked: the new science of networks. Cambridge, Mass.: Perseus, 2002.
Watts, D. J., Six degrees: the science of a connected age. New York: Norton, 2004.
Ball, P., Picture this. Nature, 417, 4 July 2002, 11-13.
Information visualization: http://www.asis.org/Publications/ARIST/vol39ZhuFigures.html
Luscombe, N. M., Greenbaum, D., Gerstein, M., What is bioinformatics? A proposed definition and overview of the field. Methods of Information in Medicine, 4, 2001, 346-358.
Watson, J. D., The double helix: a personal account of the discovery of the structure of DNA. London: Penguin, 1970.
Crick, F., The double helix: a personal view. Nature, 248, 26 April 1974, 766-769.
Yakel, E., Digital preservation. Annual Review of Information Science and Technology, 35, 2001, 337-378.
Galloway, P., Preservation of digital objects. Annual Review of Information Science and Technology, 38, 2004, 549-590.
National Historical Publications and Records Commission, Research issues in electronic records: report of the working meeting. St. Paul, Minn.: Minnesota Historical Society, 1991.
Pittsburgh Project: http://www.archimuse.com/papers/nhprc/
InterPARES: http://www.interpares.org/
Bishop, A. P., Star, S. L., Social informatics of digital library use and infrastructure. Annual Review of Information Science and Technology, 31, 1996, 301-401.
Sawyer, S., Eschenfelder, K. R., Social informatics: perspectives, examples, and trends. Annual Review of Information Science and Technology, 36, 2002, 427-465.
Kling, R. Learning about information technologies and social change: the contribution of social informatics. The Information Society, 16, 2000, 217-232.
Davies, P. H. J., Intelligence, information technology, and warfare. Annual Review of Information Science and Technology, 36, 2002, 313-353.
Van House, N. A., Science and technology studies and information studies. Annual Review of Information Science and Technology, 38, 2004, 3-86.
Herold, K., ed., The philosophy of information. Library Trends, 52(3), 2004, 373-670.
Hjørland, B., ed., Library and information science and the philosophy of science. Journal of Documentation, 61(1), 2005, 5-163.
Talja, S., Tuominen, K., Savolainen, R., "Isms" in information science: constructivism, collectivism and constructionism. Journal of Documentation, 61(1), 2005, 79-101.
Culler, J., Literary theory: a very short introduction. Oxford: Oxford University Press, 1997.
Radford, G. P., Radford, M. L., Structuralism, post-structuralism, and the library: de Saussure and Foucault. Journal of Documentation, 61(1), 2005, 60-78.

Author Details

Michael Day
UKOLN, University of Bath

Email: m.day@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/

Return to top