Web Magazine for Information Professionals

British Library Corner: Text and the Internet

Graham Jefcoate, a Research Analyst from the British Library Research and Innovation Centre will be writing this regular column for the remaining issues of Ariadne. In this issue, Graham gives us the text of his Libtech talk: Text and the Internet.

This is the text of a paper given at Libtech 96 on 5 September 1996 in a session organised by the British Library's Centre for the Book. The author regards it - in the nature of texts on the Internet - as a "work in progress" and welcomes corrections and comments. He hopes to improve the text on the basis of comments received and will, of course, acknowledge their source!

Abstract

The advent of the Internet is sometimes regarded as the beginning of the end of the book as we know it. It might be worth making a number of points, however, about the Net as a medium for the dissemination of texts, raising questions of functionality, content and access. The Web is essentially text-based and therefore depends on data that is intended to be read. Traditionally, high-quality discourse in the West has been disseminated as print on paper bound in the codex format but this is relatively inflexible. Texts printed in the machine age are for all intents and purposes "static". Digital documents delivered over networks could be characterised as "dynamic". Networked texts exist as constantly changing data in a "dynamic" digital environment. They are made available on a server for users to locate and read as they choose. The computer allows one to locate a particular text rapidly and readily and to search and manipulate it for local use. These functional characteristics of the Net raise serious questions about the possibility of serious discourse across open networks, but the dynamic nature of digital documents does also bring opportunities for the free flow of ideas.

Grama Jefcoate Authors will increasingly choose the Net as their primary means of communication. In theory, the same library of documents could be available via the Internet to anybody everywhere. The entire, rapidly growing assembly of texts available through the World Wide Web is available to any user with appropriate browsing software at the point of use. No further mediation or dissemination is required. Many of objections raised about the content of the World Wide Web are really objections about this functionality. Much of the content of the Internet is said to be junk, but a similar point could be made about almost any other given communications medium, print being no exception. One might argue that the Web is the only medium where information of real value is actually increasing.

There is no indication of a decline in the production of print. Particular texts will be disseminated using the medium that is more appropriate and convenient. Traditional literacy skills - the ability to read, interpret and evaluate text - will play just as important a role on the Internet as with print on paper. Perhaps the ability to exercise critical judgement will be even more important!


"A mass of trivial electronic information"

The advent of the Internet is sometimes regarded as the beginning of the end of the book as we know it. With the demise of the book, the cultural pessimists contend, we shall lose the medium through which complex ideas are expressed or serious arguments developed. The Internet will impoverish our culture by discouraging text-based discourse and reduce intellectual exchange to the level of the sound-bite, the trivial and the trite. Our attention span, the level of concentration required to receive, assimilate and evaluate propositions or concepts, will be reduced. In any case, people will not want to read long, discursive texts on a computer screen. If readers are to move from print to screen, then much will be lost. Computers, and powerful computer applications like the Internet, contend the pessimists, are therefore per se a bad thing.

An extreme example of such pessimism about the Internet could be found recently in the Daily Telegraph, where Edward Chancellor inveighed against the new, electronic Science and Business Library at the New York Public Library where access is provided to the Internet. According to Chancellor, this "is not a place to contemplate or write and its culture is profoundly anti-book. It has turned its back on its grand progenitor, whose building ennobles while enabling intellectual activity. It hails a world which is visual and fast-paced, not cerebral and reflective; a world which deadens the spirit and confuses the mind. Its frantic atmosphere will produce no Keyneses, no Edisons, and quite likely, no J. P. Morgans. It has cleaved the world of science and humanities in two. SIBL portends a world, disconnected from the past, where civilisation has been extinguished amid a mass of trivial electronic information" (24 August 1996).

One is reminded of Pope's bleak vision of the future at the end of the Dunciad (Variorum edition, Lomdon 1729, lines 335-356):

"Thy hand great Dulness! lets the curtain fall,

And universal Darkness covers all."

Before we accept this grim scenario unquestioningly, it might be worth making a number of points about the Internet as a medium for the dissemination of texts. But first, I should like to say something about my own personal background as this might explain how my thinking on these issues has developed. My only knowledge of - and interest in - the Internet and its applications is as a user. I have no technical expertise to offer; my degree is in English and I have a post-graduate diploma in library science. For as long as I can remember, I have enjoyed the physicality of the book and have appreciated elegant solutions to the problems raised by the presentation of text as print on paper. Indeed, I have spent by far the longest part of my working life cataloguing and researching early printed books and historic collections. My professional interest in texts has therefore been as keys to understanding the past.

But another aspect of text-based culture has also fascinated me from the beginning: the need in a democratic society for the widest possible access to information of high quality. Originally I saw physical libraries as the best means of achieving this goal. Like many others with a background in the humanities, I have now become intrigued by the possibility that the Internet in general, and the World Wide Web in particular, might be the means to provide that access.

My interest in the Internet is therefore not centred on technical issues but rather on its potential as a medium for disseminating text-based information more widely and equitably. This raises questions of functionality, content and access. How can digital documents carried over networks be compared with physical documents available in other formats carried by other, more traditional, media? How can they be accessed and used? What do World Wide Web documents actually contain? What special characteristics do they have? I cannot pretend to have any of the answers to these questions, but, in my new post as Research Analyst (Digital Library Research) within the British Library's Research and Innovation Centre, I should be happy at least to open a debate on some of the issues. I believe them to be crucial to the future of the Internet as a medium for the dissemination of serious text-based discourse.


The World Wide Web: A Text-Based Medium

The World Wide Web is one of the most popular and successful Internet applications, providing users with access to a vast assembly of disparate texts, images, audio and video files, and the means to retrieve them. The Web itself originated in the late 1980's at CERN, the European high-energy physics laboratory at Geneva, as a medium for exchanging project documentation using hypertext with integrated graphics. It quickly became the primary medium for disseminating text-based information over the Internet, establishing itself with email as its most popular function.

It is often overlooked that the two most widely-used Internet applications, email and the Web, are both essentially text-based and therefore depend on data that is intended to be read. The telephone, radio, film and television, of course, are sound and image-based, but the Internet could be described as the first new text-based communications medium this century. Admittedly, hypertext allows us to order and manipulate texts in ways impossible with one-dimensional print on paper. But hypertext is still text. Unlike audio and video applications, an illiterate person will not be able to use it. Far from undermining our literate culture, will the Internet, by placing the written word at its centre, in fact reinforce it?


Static and Dynamic Documents

Traditionally, high-quality discourse in the west has been disseminated in the form of print on paper bound in the codex format established in ancient times, in other words pages arranged sequentially and held together between covers. This format has been tested over centuries. It is often convenient (normally allowing the user to carry even quite substantial texts around with them) but it is relatively inflexible.

Any bibliographer or historian of the book will tell you that compositorial practice meant that no two surviving copies of an early printed text issued as part of a single edition are exactly the same. Texts printed in the machine age, however, are for all intents and purposes "static". The data they carry can only be changed by manual amendment or by new print on paper replacing old information with new. Stasis in the evolution of a printed text is reached when it is received by a reader in the form it is published as, for example, an article in a serial, a book or even, more recently, on a CD-ROM disk. Once it has entered the public domain in this form it can be evaluated, recorded and archived. These versions represent the result of a decision by an author, publisher or bookseller to halt the process of change and issue the text to the reader. The process of authorial revision and editorial correction may then begin again, but this will produce a new edition of the text to replace the old. Bibliographers seek, if one will, to categorise and describe these "static" versions as separate and definable stages in the evolution of a printed text.

It is not merely the texts of printed documents that could be described as relatively static or inflexible. Their physical format means they need space and other resources if they are to be appropriately stored and preserved as archival documents. All this entails considerable and and long-term costs. But their greatest drawback from the ser's point of view is surely access. It is difficult to find information embedded within a discursive printed text unless it is supplied with a good index. Furthermore, the information contained in printed texts is not usually readily available to us at all unless we live or work in close proximity to a comprehensive collection of printed materials with an unusually good catalogue.

In contrast to the relative inflexibility of static print on paper, digital documents delivered over networks could be characterised as "dynamic". Potentially at least, they are easy to locate and to retrieve. Above all, networked documents are available to users when and where they need them. The Pope quotation is a case in point. I happen to work on the British Library's Soho site where no copy of Pope's works is likely to be found. I decided to check the reference through the Web by entering the keyword "Dunciad" in the search engine Alta Vista. The first option among the sites retrieved was a good resource description by the National Library of the Netherlands with a hyperlink to a site at the University of Maryland. Here Wendy J. Carter has mounted the text of the Dunciad from the Variorum edition of 1729 in a version that reflects the original orthography. By keyword-searching the text itself I quickly found the lines I wished to quote and (using the copy-and-paste facility within Windows) I incorporated them into the text of my talk today.

In the case of the Pope quotation, the computer therefore allowed me to locate a particular text of obvious quality rapidly and readily and to search and manipulate it for personal use. The process saved time, effort and resources. But the success of this one sample search can clearly not be used to justify the entire shift from print to digital information. What other considerations should we take into account?

Unlike print on paper, networked texts never reach true stasis. They exist as constantly changing data in a "dynamic" digital environment. In order to use them in the same way as conventional texts we would need to turn them into conventional documents, halting the process of or potential for change, for example by transferring them to a static medium such as print on paper. The act of publication, in which a "definitive" text is agreed and disseminated, even if only in interim form, is replaced by a process rather less easy to grasp. It becomes a different kind of transaction between the originator of the text and its user. To borrow the jargon of post-structuralism, the process of publication is itself "deconstructed" and the text itself becomes - in a very real sense - "unstable".

In the networked environment, the text is merely made available on a server for users to locate and download as they choose. Its author or any other person with access to the text on a server can - and often does - alter and refine the text after the original act of making it available. This is possible as extensively or as frequently as he or she chooses. At no time need anyone decide what is a "definitive" version of the text nor need they call a halt to the process of change. If the document's own unique identifier, the so-called uniform resource locator or URL, is changed or deleted, any hyperlinks made to it will result in an error message.

The originator might indeed decide to withdraw the text entirely from the public domain by deleting it on the server. If no one has turned the dynamic document into a static one, by printing it out or copying it to a secure electronic archive, citations of it or links and references to it will become instantly obsolete and the text in the form it was received and evaluated will be irretrievably lost. Of course, a text can also be downloaded by others, altered, manipulated and made available again in a form not envisaged or authorised by its originator. It the eighteenth century, this would have been called piracy. I could take Wendy Carter's Variorum text of the Dunciad, in which she has been careful to retain as much of the integrity of the printed original as possible, and create a completely new version full of my own editorial ideas and interpretational errors and offer it as an alternative to catch innocent users of search engines on the World Wide Web. Although nothing here would have been inherently impossible with printed texts, the nature of networked digital documents and the technology of the Internet seem almost to invite intervention or manipulation by the reader.


Serious Discourse across Networks?

These functional characteristics of the Net raise serious questions about the possibility of serious discourse across open networks. What is the value of a text made available on the World Wide Web if it can be changed, moved or withdrawn by anyone with access to the server on which it is placed? The dynamic nature of the Web makes the traditional tasks of evaluating, citing, recording or archiving texts extremely difficult. Among other things, it makes life for electronic librarians very difficult. What version of a networked document should be "acquired"? How and when should it be described and archived? How indeed does a national library with an archival responsibility deal with the deposit and storage of such documents?

But the dynamic nature of digital documents delivered over networks does also brings opportunities for the free flow of ideas. These have been well argued in an article published last year by Fred Nash. Far from seeing difficulties in publishing scholarly or scientific articles on the Net, Nash identifies a number of real advantages:

"It can", for example, "accelerate the exchange of ideas by removing the vetting and publication lead time. A piece can be on the Internet in a matter of days, and responses to it can be added in a matter of minutes. ... It can enable us to generate genuine debate and real and open exchange of ideas between all who have something to say on the subject. ... The reader participates instead of reading a polished report. It can enable more papers, and views, to be published ...".

What is clear is that authors will increasingly choose the Net as their primary means of communication. If the functionality of the World Wide Web has, from a traditionalist's point of view, certain disadvantages, the benefits for scholarly exchange may surely not be left out of consideration. Thousands of individuals and organisations across the globe have seized on the Web as an attractive and effective way of presenting information on the Net. The Lycos online search service currently indexes nearly sixty million separate Web pages, and even this does not provide exhaustive coverage of the current resources of the Web. The rapidly-growing corpus of documents represented by the Web might well be the prototype of the global electronic library dreamt of by many politicians and journalists. As such it is clearly of the greatest significance to anyone concerned with the future of the written word and discourse based on texts.


Access to the Global Library

Access to this information is open to all who are linked to the network. World Wide Web documents are available, potentially at least, to an enormous and rapidly-growing community of Internet users world-wide. A report issued earlier this year estimated that some 35 million users would soon be connected to the Internet. By 2002, more than 200 million people were expected to be connected to at least part of the Internet. The World Wide Web itself was doubling in size every three months, and approaching 100,000 Web sites were forecast by this spring.

In theory, the same library of documents could be available via the Internet to anybody anywhere: in remote third world villages just as in ancient university towns or in Silicon Valley. The educational value of the Internet clearly excites President Clinton, who, in his speech to the 1996 Democratic Convention in Chicago, sees the information superhighway leading across his bridge into the information age:

"We need schools that will take our children into the next century, ... with every single library and classroom in America connected to the information superhighway by the year 2000. ... Now folks, if we do these things, ... every 12-year-old will be able to log in on the Internet, ... and all Americans will have the knowledge they need to cross that bridge to the 21st century".

We too must ensure that there is the maximum access for everybody in the community: not just at school or at work, but increasingly in public libraries, citizens' advice bureaux and at home too. The needs of minority groups and the disabled should not be neglected. This priority was recognised here by the recent House of Lords Select Committee report on the information society. The danger of creating a society of information "haves" and "have nots" is all too apparent, but one we have the means to avoid. We also need to find the will.

The entire, rapidly-growing assembly of texts available through the World Wide Web is available to any user with appropriate browsing software at the point of use. No further mediation or dissemination is required. Ease of access to pages on the World Wide Web is ensured in two ways. Online services enable users to select, locate and access specific services from menus often supported by a graphical interface such as a map. A range of sites of this kind is already available. They are supplemented as finding aids by a variety of search engines, the so-called web crawlers or worms such as Alta Vista which index and allow searches of large sub-set of the World Wide Web by keyword. Once accessed, each of these documents might well contain embedded in its text numerous hyperlinks to other WWW documents deemed relevant by the author.


Content and Quality

By means of a World Wide Web browser, anyone connected to the Internet can therefore locate, often in a matter of seconds, material on any given subject. Many of objections frequently raised about the content of the World Wide Web are really objections about this functionality. Broadcasters and publishers have until now determined what will be disseminated applying criteria of quality and decency and reflecting contemporary values. National governments and other agencies have sought to regulate what will be disseminated. Educationalists and politicians have drawn up curricula with lists of approved "set texts" for schools and colleges. Censorship has restricted published discourse to what authority found acceptable; legal sanctions were imposed on what was not. If not forbidden, material that did not conform to accepted criteria has traditionally been difficult to access. The nature of the Internet, its world-wide reach and ease of use, makes it highly suspicious in societies, such as our own perhaps, where regulation has been the norm. Objections have addressed both issues both of decency and of quality.

Notoriously, the Net has brought pornographic material down from the newsagent's top shelf and made it as instantly and easily accessible as any other material. Important as questions of decency certainly are, academic users of the Net and librarians will be equally concerned with the issue of quality. Much of the World Wide Web could be described as a branch of the vanity publishing industry. The ease by which HTML pages can be made and mounted on a server has empowered tens of thousands of enthusiasts to give us access to their views on a myriad of subjects. Again, the only difference to conventional media is the relative accessibility of this material through search engines. And again, it is up to us first to ensure the content of the Internet includes more and more information of high quality and second to enable users to locate it readily. The eLib programme has shown the way here.

The Web already functions as an enormous library of hypertext documents. Common prejudice (too often confirmed by the random results of online keyword searches) tells us they are of little value: the weird and wonderful hypertext scribblings of naive and nerdy enthusiasts, mostly American and often interested, it seems, in computers, trivia and sex (or any combination thereof). New applications like Java are toys, tricks to impress others, of no real value, because they are about style and not about substance. And the enthusiasts rarely seem to master the basic tools of literacy: their spelling is notorious! According to witnesses before the House of Lords committee, much of the content of the Internet is junk.

Similar points could, of course, be made about almost any other given communications medium, print being no exception. Our television channels fill the schedules with the ephemeral and trite. We expend floods of useless words across the telephone. For each book of real value, the shelves of our newsagents, bookshops and, yes, public libraries groan under the weight of trivia and trash. And every day we hear complaints that standards are declining. The Dunciad, of course, was a denunciation of the floods of trivia emanating from the printing presses of eighteenth-century Grub Street. But, far from collapsing into the abyss where 'Dulness' reigns, the eighteenth-century after Pope is accepted as one of the high points of our literary culture. Of course the presses produced trash; they also brought forth the prose and poetry of Samuel Johnson, his English dictionary and edition of Shakespeare.

One might argue that the Web is the only medium where information of real value is actually increasing. Hardly noticed by the Internet's critics (and often ignored by the computing enthusiasts), scholarly journal articles, scientific reports, newspapers (including the Daily Telegraph), new legislation and reports - public information of all kinds from across the globe - is being added to the global library hourly. Links will be found to information about or from innumerable central and local government organisations, universities and research institutions, libraries and museums, publishers and booksellers, commercial firms and so on. The Web is the only source for much of the information provided at these sites. Even if we have not quite reached the open road of the information superhighway, we have long since gone beyond the point at which anyone concerned with the provision of timely and appropriate information can afford to ignore the Net. The new illiterates, it could be argued, are those who have yet to grasp this.


A Future for the Codex?

That computer users will not wish to read long, discursive texts on screens is indubitably true. The Internet will therefore demand a more concise form of discourse and different forms of presenting arguments. This is not necessarily always a bad thing - how many traditional printed books contain no padding? The publishing industry needs to produce a certain number of which need to be of minimum length to be sold at a particular price. There will always be room for the codex. After all, it's a relatively cheap, durable and portable product. There is no indication of a decline in the demand for or production of print.

How then will the two media co-exist? The Internet may well become a bulletin-board carrying directory or encyclopaedia-type information, while the codex continues to carry discursive arguments requiring sustained development or novels. Different versions of one text may appear in both. A particular text will be disseminated using the medium which is most appropriate and convenient. Eventually, of course, technology will allow us to produced codices at will from networked, digitised data, thus giving the user maximum choice.


Literacy Skills for the Net

It could be argued the advent of the Internet demands a range of literacy skills. The computer literacy required, it seems to me, is really a minimum. We must offer people training in the use and interpretation of documents on the Web to ensure they can find what they need and exploit it effectively and responsibly. Traditional literacy skills - the ability to read, interpret and evaluate text - will play just as important a role on the Internet as with print on paper. Perhaps the ability to exercise critical judgement will be even more important!

The World Wide Web will continue its rapid development. Its functionality will be further refined and its content greatly expanded. As awareness of its potential spreads, it is predictable that more and more individuals and organisations will see the Web as a primary medium for text-based discourse. We should remain aware of its apparent shortcomings but not blind to its numerous advantages. The information it provides will become de facto the most readily available on many given subjects. We must ensure that information of quality is accessible through it and that finding aids are developed to guide us to it. And finally, we should give no more credence to the cultural pessimists than to the blacker forebodings of Alexander Pope - and what would he have made of the availability of the Dunciad over the Internet and my easy piracy of it?


Contact Details

Graham Jefcoate
Research Analyst (Digital Library Research)
British Library, Research and Innovation Centre
Tel. +44 171 412 7109
graham.jefcoate@bl.uk

Portico - The British Library's WWW Server: http://portico.bl.uk/ric/research/digital.html