Web Magazine for Information Professionals

Text Encoding for Interchange: A New Consortium

Lou Burnard on the creation of the TEI Consortium which has been created to take the TEI Guidelines into the XML world.

The Text Encoding Initiative (TEI) was originally established in 1987 with the goal of creating a community-based standard for text encoding and interchange. It came into being as the result of a perception in many different parts of the academic research community that the rising tide of digitized media (largely known, in those distant days, as "electronic" or even "machine-readable" texts) threatened to engulf everything in a war of competing formats and encoding systems. Scholarship has always thrived on serendipity and the ability to protect and pass on our intellectual heritage for re-evaluation in a new context; many at that time suspected (and events have not yet proved them wrong) that longevity and re-usability were not high on the priority lists of software vendors and electronic publishers. Even before the World Wide Web, it was clear that one of the striking virtues of converting resources into a digital form was the consequent ability to integrate resources of different kinds and from different places into one. At the end of the eighties there was a real concern that the entrepreneurial forces which (then as now) drive information technology forward would impede such integration by the proliferation of mutually incompatible technical standards.

Between 1987 and 1994, nearly two hundred different people from research and teaching establishments on both sides of the Atlantic, and elsewhere in the world, collaborated on a rather unusual research project. The aim was to achieve two contradictory goals: to agree and codify common practice in the digital representation of the texts which form the raw material of scholarship, while at the same time defining a mechanism which could be expandable to encompass the full range of scholarly encoding practices. It might well have been a recipe for chaos. In practice however, thanks to the enthusiasm and skill of the technical experts who contributed their services to this pioneering effort, the TEI's Guidelines were not only published, and widely adopted, but still stand as one of the most complete surveys ever undertaken of the nature of textual structures.

In his preface to a collection of working papers about the TEI published in 1995 (Ide and Veronis, 1995), Charles Goldfarb, inventor of the SGML standard, remarks with typical prescience "The vaunted "information superhighway" would hardly be worth travelling if the landscape were dominated by industrial parks, office buildings and shopping malls. Thanks to the Text Encoding Initiative, there will be museums, libraries, theatres and universities as well." The TEI Recommendations have been endorsed by the American National Endowment for the Humanities, the UK's Arts and Humanities Research Board, the Modern Language Association, the European Union's Expert Advisory Group for Language Engineering Standards, and many other agencies that fund or promote digital library and electronic text projects. Recognizing its importance in the emerging digital library community, a workgroup sponsored by the Library of Congress has produced guidelines for best practice in applying the TEI metadata recommendations for interoperability with other standards, notably MARC. (See TEI/MARC Best Practices; see also Pouchard, 1998)

Today, the TEI is internationally recognized as a critically important tool, both for the long-term preservation of electronic data, and as a means of supporting effective usage of such data in many subject areas. It is the encoding scheme of choice for the production of critical and scholarly editions of literary texts, for scholarly reference works and large linguistic corpora, and for the management and production of detailed metadata associated with electronic text and cultural heritage collections of many types. The TEI website maintains a list of about a hundred currently active TEI projects, ranging from small research applications to major encoding ventures (see The TEI Applications Page). But, it is reasonable to ask, does it have a future? is its work complete? has the scholarly community assimilated or outgrown it?

Bespoke Guidance

As first published in 1994, the TEI Guidelines took the form of a substantial 1300 page reference manual, documenting and defining some 600 SGML elements which can be combined and modified in a variety of ways to create specific SGML document type definitions (DTDs) for particular purposes. A minor revision of this document and its modular DTD framework was produced in 1998, and is available online at a number of sites (TEI P3, 1998). From the start, however, it was recognised that specific customizations of this dauntingly large document would be needed for particular user communities. One such customization (known as TEI Lite) was developed specifically to address the needs of the group forming a TEI core constituency, in electronic text centres and digital libraries. A measure of the success of this customization is the number of translations which have been produced of it independently of the TEI (for details, see The TEI Lite Home Page 1999); this DTD is probably the most widespread TEI application, and for many people is almost synonymous with the TEI itself. It is important to remember, however, that TEI Lite is only one view of the TEI scheme, not necessarily the best fit for every application. Many -- perhaps most -- serious TEI applications have found it necessary to build their own customization of the full scheme in some way.

Using one of the culinary metaphors which permeate the TEI mind set, the construction of an application-specific view of the TEI encoding scheme has been compared to the construction of a Chicago pizza. The designer reviews the available tagsets (collections of semantically related element definitions) choosing how they are to be combined. Individual elements may be renamed, ommitted, or modified, subject to some constraints, in the same way as a pizza can be ordered with a specific combination of toppings, subject only to some simple architectural constraints. In a pizza, there must be a single base, and there must be cheese and tomato sauce, but there need not be mushrooms and you can add your own alfalfa if you insist. Similarly, in a TEI application, you must choose a single basic framework for your documents, you must include the TEI header, but you don't have to use the tags for marking editorial uncertainty, and you can add special tags for marking irony, if you insist.

This metaphor has been instantiated in a much used set of TEI web pages, known inevitably as the Pizza Chef (TEI Chef 1999) which goes some way towards simplifying and automating the design process. A free-standing version of the underlying application is also in preparation. It is a testimony to the power and flexibility of the TEI design that exactly the same mechanism is used to modify the underlying SGML definitions to produce XML conformant document type definitions.

The Pizza Chef is more than just a way of simplifying the construction of SGML DTDs however. It makes clear that the lasting achievement of the TEI lies, not in its DTD, but in the creation of the intellectual model underlying it, which can continue to inform scholarship as technology changes. When the TEI was first thought of, it was by no means clear that SGML would be its sole form of expression; it was agreed early on to adopt that metalanguage so long as it remained the tool best suited to the purpose. Fifteen years on, with the emergence of new equally expressive XML tools that have far greater market acceptance than SGML ever did, the TEI is poised for metamorphosis.

From research project to consortium

In January of 1999, the University of Virginia and the University of Bergen (Norway) presented a proposal to the TEI Executive Committee for the creation of an international membership organization, to be known as the TEI Consortium which would maintain, continue developing, and promote the TEI. This proposal (TEI Consortium, 1999) was accepted by the Executive Committee, and shortly thereafter, Virginia and Bergen added two other host institutions with longstanding ties to the TEI, Brown University and Oxford University. Over the past year, these four hosts have established a new domain-name for the TEI at tei-c.org, and started work on a new TEI web site, to include all the material formerly hosted by the University of Illinois at Chicago, and more besides. The new site is currently being developed at Oxford University (a beta version is visible at http://www.hcu.ox.ac.uk/TEI/) and will shortly be mirrored at the other three host sites. The four hosts have provided modest support of the TEI in cash and in kind, continuing editorial work on the DTD, promoting the use of TEI in various conferences and projects, and producing and distributing CDs with TEI documentation and examples. However, their major achievement over the last year has been to define a new constitutional framework for the TEI, as a membership consortium.

The goal of the new TEI Consortium is to establish a permanent home for the TEI as a democratically constituted, academically and economically independent, self-sustaining, non-profit organization. This will involve putting the Consortium on solid legal and organizational footing, developing training and consulting services that will attract paying members, and providing the administrative support that will allow it to continue to exist while income from membership grows. In the immediate future, the Consortium will launch a membership and publicity campaign the goal of which is to bring the new Consortium and the opportunity to participate in it to the attention of libraries, publishers, and text-encoding projects worldwide. Its key message is that the TEI Guidelines have a major role to play in the application of new XML-based standards that are now driving the development of text-processing software, search engines, Web-browsers, and indeed the Web in general.

The Future of TEI

The future usefulness of vast collections of electronic textual information now being created and to be created over the coming decades will continue to depend on the thoughtful and well-advised application of non-proprietary markup schemes, of which the TEI is a leading example. We may expect that in the future some of the more trivial forms of markup will be done by increasingly sophisticated software, or even implied from non-marked-up documents during processing. As XML and related technologies become ever more pervasive in the wired world, we may also expect to see a growing demand for interchangeable markup standards. What is needed to facilitate all of these processes is a sound, viable, and up-to-date conceptual model like that of the TEI. In this way, the TEI can help the digital library, scholar's bookshelf, and humanities textbooks survive into a future in which they can respond intelligently to our queries, can combine effectively with conceptually related materials, and can adequately represent what we know about their structure, content, and provenance.

References

  1. Ide, Nancy and VĂ©ronis, J. (eds) 1995 Text Encoding Initiative: Background and Context Kluwer
  2. TEI/MARC "Best Practices": November 25, 1998 Available from http://www.lib.umich.edu/libhome/ocu/teiguide.html
  3. Pouchard, Line Cataloguing for Digital Libraries: the TEI and the TEI Header Katherine Sharp review VI, (Winter 1998). Available from http://www.lis.uiuc.edu/review/6/pouchard.pdff
  4. Text Encoding Initiative The TEI Applications Page. Shortly to be available from http://www.hcu.ox.ac.uk/TEI/Applications/
  5. Sperberg-McQueen, C.M. and Burnard, L. (eds) 1994. Guidelines for electronic text encoding and interchange (TEI P3) Chicago and Oxford: ACH-ALLC-ACL Text Encoding Initiative. Revised Reprint available from http://www.hcu.ox.ac.uk/TEI/Guidelines/
  6. Text Encoding Initiative The TEI Lite Home Page. Available from http://www.hcu.ox.ac.uk/TEI/Lite/
  7. Text Encoding Initiative The TEI Pizza Chef: a TEI tag set selector. Available from http://www.hcu.ox.ac.uk/TEI/pizza.html
  8. Text Encoding Initiative 1999 An Agreement to Establish a Consortium for the Maintenance of the Text Encoding Initiative. Available from http://www.tei-c.org/consortium.html

Author Details

 Lou Burnard
Manager of the Humanities Computing Unit at Oxford University
Institution
(European Editor of the Text Encoding Initiative since 1990).
University of Oxford

Email: lou.burnard@computing-services.oxford.ac.uk
Web site: http://users.ox.ac.uk/~lou/