Web Magazine for Information Professionals

Unique Identifiers in a Digital World

Andy Powell reports on a seminar organised jointly by Book Industry Communication and the UKOLN on the use of unique identifiers in electronic publishing.

On the afternoon of Friday the 14 March more than 50 people involved in electronic publishing met for a seminar reviewing recent developments in the unique identification of digital objects. Delegates included representatives of publishers, libraries and other organisations. The seminar was organised jointly by Book Industry Communication (BIC) and the UK Office for Library and Information Networking (UKOLN) with support from the eLib programme. A brief report follows:

Introduction - Why we need identifiers

Brian Green (BIC) and Mark Bide (Mark Bide and Associates) introduced the seminar with an overview of why the publishing industry needs identifiers [1]. Unique identifiers for digital objects are an essential part of the technology that allows:

Several issues were highlighted:

There is another complication at the moment in that we are in a transitional period of publishing. Publishers must continue to deal with traditional paper publications, while also being involved with new electronic only publications and with parallel publications.

The music industry is facing similar problems. In response the International Confederation of Authors and Composers’ Societies (CISAC) [3] has been developing the Common Information System (CIS). This system includes identifiers for various manifestations of content and for creators and publishers. A recent development is the International Standard Work Code (ISWC) which identifies the musical composition itself, rather than the recorded or printed expression of the work. It has been suggested that the ISWC might be extended to cover literature and the visual arts as well. Creators and publishers are identified by the Compositeur, Auteur, Editeur (CAE) number, which will be extended and renamed the Interested Party (IP) number.

The Digital Object Identifier

Carol Risher (Association of American Publishers (AAP) and Albert Simmonds (RR Bowker) gave an overview of the Digital Object Identifier (DOI) [4]. Their presentation included a video based largely on the first public demonstration of the DOI given in February that showed documents and other files being retrieved on the Web using DOIs rather than URLs. Development of the DOI is being performed by RR Bowker and the Corporation for National Research Initiatives (CNRI) on behalf of the AAP.

A DOI contains two parts. The first part, known as the ‘Publisher ID’, indicates the numbering agency and publisher and is assigned by the DOI Agency. The second part, known as the ‘Item ID’, is assigned by the publisher and can be made up of any alpha- numeric sequence of characters. The use of an existing standard scheme in the Item ID, a SICI or PII for example, is encouraged though some publishers may choose to use a proprietary scheme. A DOI can be assigned to any digital object at a level of granularity that is appropriate to the publisher. Typically this might mean that a separate DOI is assigned to each component (text, image, sound, video) of a multimedia document.

The DOI system has two parts - the ‘DOI agency’ and the ‘DOI computers’. The DOI agency assigns Publisher IDs, issues guidelines for DOI usage and works with the relevant standards bodies to maintain the integrity of the system as a whole. The DOI computers form a distributed system that resolve any DOI to its associated URL. The system is based on the CNRI handle system [5]. Any user who knows the DOI of a digital object can query the DOI Directory directly by typing it into a Web based search form. Typically however, DOIs are likely to be embedded in Web pages, hidden behind clickable buttons. Queries to the DOI Directory are resolved and the client passed direct to the publisher’s system.

The current state of the DOI system is as follows:

Once assigned, a DOI remains unchanged. If the ownership of an object changes, the new owner registers the change with the DOI agency. If the object pointed to by the DOI moves (that is, the URL changes), the DOI entry for that object can be updated.

It is anticipated that the charges associated with registering with the DOI agency will be small enough that DOIs will be used in non-commercial areas of the Internet as well as by commercial publishers. The DOI agency will assign Publisher IDs to individuals and other organisations in addition to traditional publishers.

The DOI is non-proprietary and will be introduced to ISO in May. Development of the DOI system will continue over the summer culminating in a full demonstration at the Frankfurt Book Fair in October 1997.

The SICI and the BICI

Sandy Paul (SISAC/BISAC) gave an overview of the Serial Item and Contribution Identifier (SICI) [6], a scheme for identifying serials and parts of serials. The scheme has been in use since the late 1980’s and is now widely used, mainly at the issue level, by a broad range of publishers in EDI message transactions and by libraries and subscription agents.

The original version of the SICI allowed an identifier to be assigned to each issue of a serial (the Serial Item Identifier) and to each contribution (article) within a serial (the Serial Contribution Identifier). Recently the SICI has been updated to identify fragments other than articles (for example a table of contents, an abstract or an index) and to identify particular physical formats. The SICI contains the ISSN of the serial.

A final draft of the Book Item and Component Identifier (BICI) [7] is now available. This is essentially a book version of the SICI, using the ISBN in place of the ISSN. The BICI can be used to identify a part, a chapter or a section within a chapter, or any other text component, such as an introduction, foreword or index. It can also identify an entry in a directory, encyclopaedia or similar work that is not structured into chapters.

The PII

Norman Paskin (Elsevier Science) gave an overview of the Publisher Item Identifier (PII) [8] which was developed in 1995 by the Scientific and Technical Information (STI) group of publishers. The requirements for the PII were:

The PII is made up of 17 characters and contains the ISBN or ISSN in order to guarantee uniqueness. It is a ‘dumb’ identifier that has the capacity of 10000 items per journal per year. Future versions of the PII will have extensions to cover document components and versions. Development of any new version of the PII will take account of developments in other areas, for example the DOI system and URNs.

Some interesting figures were given for the numbers of identifiers required for the STI area of publishing. Estimating 1 million articles per year, identifying all the versions of all the components of those articles may require somewhere in the region of 1014 identifiers! [9]

Group Sessions

The seminar closed with three group sessions covering:

These were followed by group reports and a plenary discussion. Some interesting issues were raised.

It was generally agreed that the group sessions could have gone on for far longer than the 45 minutes allocated and that follow-up meetings in specific areas may be required.

This was an interesting seminar and thanks are due to Brian Green (BIC) and Rosemary Russell (UKOLN) for organising a very successful event.

References

  1. Unique Identifiers: a brief introduction, Brian Green and Mark Bide, ISBN 1-873671-18-0
    http://www.bic.org.uk/bic/uniquid
  2. IETF URN Working Group,
    http://www.bunyip.com/research/ietf/urn-ietf/
  3. International Confederation of Authors and Composers’ Societies (CISAC),
    http://www.cisac.org/
  4. Digital Object Identifiers,
    http://www.doi.org/
  5. CNRI Handle System,
    http://www.handle.net/
  6. SICI standard,
    http://sunsite.Berkeley.EDU/SICI/
  7. A Standard Identifier for Book Items and Contributions - draft (Report prepared for BIC and the British National Bibliography Research Fund), David Martin - available after 21 April 1997
    http://www.bic.org.uk/bic/bici.html
  8. The PII as a means of Document identification,
    http://www.elsevier.nl/inca/homepage/about/pii/
  9. Information Identifiers, Norman Paskin, Learned Publishing (vol 10 issue 2, pp 135 -156)

Author Details

Andy Powell,
Technical Development and Research Officer,
Email: A.Powell@ukoln.ac.uk
Web page: http://www.ukoln.ac.uk/~lisap/
Tel: +44 1225 323933
Address: UKOLN, University of Bath, Bath, BA2 7AY