Web Magazine for Information Professionals

The 2nd Workshop on the Open Archives Initiative (OAI)

William Nixon and Pauline Simpson report on the meeting held at CERN, Geneva, in October 2002.

CERN, the European Organisation for Nuclear Research is the world’s largest particle physics centre. It is located just outside of Geneva on the French-Swiss border. CERN is also the birthplace of the World Wide Web, created by Tim Berners-Lee in 1990.

About the Conference

The workshop was organized by LIBER, SPARC-Europe and CERN Library and sponsored by SPARC (Scholarly Publishing and Academic Resources Coalition), JISC (Joint Information Systems Committee), OSI (Open Society Institute), and ESF (European Science Foundation).  Some 136 participants from 9 countries attended with 22 presentations [1].

It was aptly subtitled “Gaining independence with e-prints archives and OAI” and its focus was on the challenges in establishing OAI services rather than the technical issues of implementing the Open Archives Initiative – Protocol for Metadata Harvesting (OAI-PMH).

It was also an opportunity to meet and to listen to some of the founders of the Open Archives Initiative such as Michael Nelson, Carl Lagoze and Herbert Van de Sompel and more importantly to interact with the international community that is coming to grips with the implementation of e-Prints Archives. 

The Open Archives Initiative is a growing movement to enhance access to e-Print archives as a means of increasing the availability of scholarly communication [2].

Programme

The workshop programme was opened  with a presentation on the just released OAI-PMH v.2.0 by Herbert Van de Sompel. This provided a technical overview and a look at its new features and functionality with a hint of future enhancements including certification, usage logs, citation data and rights metadata etc.

Case studies of discipline based and institutional archives

There was a mix of case studies looking at discipline based archives and then institutional based archives with presentations from both service and data providers. These included DSpace at MIT and the California Digital Library.

Jörgen Eriksson from Lund University in Sweden gave a talk on the implementation of an e-Prints service at the Lund University and demonstrated how e-Prints could offer departmental branding and sit within a departmental website. This has been a key strategy in encouraging the use of the service [3].

Stephen Pinfield, Project Director of the CURL SHERPA Project [4] presented the UK perspective. Stephen gave an overview of the state of OAI in the UK and noted that the real challenge for institutions is to get the content for their repositories. The UK’s FAIR programme is intended to address a range of these issues and has funded projects such as RoMEO [5], TARDIS [6], SHERPA and DAEDALUS [7].

John Ober presented an excellent Case Study of the California Digital Library. CDL was founded in 1997 and was “charged to create a comprehensive system for the management of digital scholarly information.” The CDL eScholarship Program has established a number of discipline based repositories as examples of alternative publishing services such as Working Papers in Social Sciences and Humanities

Technical Topics

Eric Van de Velde from Caltech talked about Caltech CODA [Collection of Open Digital Archives]. CODA has some 1100 documents now available and has been set-up using the publicly available ePrints.org and the Virginia Tech e-theses software. They have established a range of different repositories for a number of subjects including Computer Science and Earthquake Engineering as well as a repository for Theses. From July 2002, e-submission of theses at Caltech is now compulsory. Caltech have Use Licences in place for their repositories which identify the rights and conditions which must be accepted if a paper is to be deposited. To ensure future access to the content of CODA, each paper is assigned a unique identifier and a Resolver has been implemented to keep track of its location.

Michael Nelson gave a paper entitled “Service Providers: Future Perspectives” and talked about the growth of service providers. There is currently a ratio of five data providers to one service provider. He noted the shift in topics from those about the protocol itself to actual services, which can take advantage of it. He listed a range of service providers from the original ARC service at Old Dominion to the University of Michigan’s OAIster as well as discipline specific providers such as ARCHON.  The code for the ARC service provider is now publicly available [8]. Controversially, he even then posited that the OAI-PMH is not actually important! This is because users will not care about it and if it used right then there is no reason for them to know. It is just a core technology like http whose presence will just be assumed. He concluded that Service Providers are now becoming a competitive market with a growing sophistication of services. The OAI-PMH will have arrived when the protocol itself fades into the background.

Gaining Independence

An adjunct to the sessions on the OAI was a range of talks under the banner “Gaining Independence”. Alison Buckholtz from SPARC-Europe discussed the role, which OAI compliant institutional repositories can play in reclaiming scholarly communication. Fred Friend gave a talk about the Budapest Open Access Initiative, which supported both self-archiving, and the creation of new open access journals. These and the other talks in this session provided a timely context in which to ground the opportunities offered by embracing the OAI protocol.  However the presentation from Elsevier ‘Scirus’ service indicated how commercial interest was already ‘harvesting the fruits’ of OAI-compliant repositories such as CogPrints and arXiv.org.

Practical steps to promote the OAI: Technical Issues

Saturday began with two more excellent talks, which showcased freely available software to enable institutions or disciplines to build OAI compliant data providers. Jean-Yves Le Meur gave a paper on the CERN Document Server software. This was made publicly available in August 2002 and is currently in use at CERN.  Chris Gutteridge provided an overview of the GNU Eprints 2 software. He identified a range of issues that need to be considered before the software is installed and the need to decide what it is to be used for. Chris also discussed how he saw the interoperability between different e-Prints software.

Particularly interesting throughout the three days, was hearing about e-Prints software beyond the widely used GNU eprints from the University of Southampton [9]. Mackenzie Smith walked us through DSpace from MIT (now available) [10]. One of her main messages for university based archives was to engage senior management before implementing the software [this was to be a recurring theme]. DSpace has a core digital preservation element and their information model includes a much wider community and aspects of revenue earning on premium services. From Max Planck we heard about their eDoc software which offers archive, publication tools and workspace for working groups.  From CERN we saw a demonstration of their own CDSWare [11] which is a particularly sophisticated package offering applications tools for automated document conversion, but also in progress is the extraction of citation from the full text and extraction of keywords.

Practical steps to promote the OAI: Political Issues

In the final session  Jean-Claude Guédon from the Université de Montréal talked about Independence from an “academic” point of view.  He proposed open access archives incorporating the peer review process and looked ahead to ‘federal’ editorial boards organized by groups of universities.  The final talk of the workshop, by Bas Savenije focused on FIGARO, a European e-publishing initiative for the academic community. FIGARO will build an e-publishing infrastructure and foster innovation to enhance scientific communication – tenets central to the Open Access Initiative.

Breakout sessions

At the end of Friday we all elected to join one of the breakout groups discussing:  Long term e-Archiving, Non-commercial scientific journals, OAI protocols, OAI software, OAI services, Document provider (server) implementation, Human related problems around institutional OAI servers, Mise en service d’une archive numerique.  Possibly the most contentious was the long-term e-Archiving issue for which there is no definitive answer as yet, but which the group recognized should be an international responsibility

A talk by Chris Pressler gave hope that at least nationally in the UK the Digital Preservation Coalition [12] will “address the urgent challenges of securing the preservation of digital resources in the UK and to work with others internationally to secure our global digital memory and knowledge base.”

Conclusion

This was a fascinating and very worthwhile workshop which struck the right balance between technical issues and the real world challenges in building OAI services.  It left the attendees enthusiastic and encouraged about the OAI and committed to build upon the momentum this workshop inspired.

Presentations of the workshop (including slides and video recordings) are available at http://doc.cern.ch/age?a02333

Author Details

References

[1]   Detailed Agenda  http://documents.cern.ch/AGE/current/fullAgenda.php?ida=a02333
[2]   Open Archives Initiative  http://www.openarchives.org
[3]   Lund Medical Virtual Journal  http://search.medfak.lu.se/
[4]   SHERPA Project  http://www.sherpa.ac.uk/
[5]   Project ROMEO  http://www.lboro.ac.uk/departments/ls/disresearch/romeo/
[6]   TARDIS Project  http://tardis.eprints.org/
[7]   DAEDALUS Project  http://www.gla.ac.uk/daedalus/
[8]   ARC Cross Archive Search Service   http://arc.cs.odu.edu/
[9]   GNU Eprints 2 software  http://software.eprints.org
[10]   DSpace  http://www.dspace.org
[11]   CERN Document Server  http://cdsware.cern.ch/
[12]   Digital Preservation Coalition  http://www.dpconline.org
 

Author Details

 

William J Nixon
w.j.nixon@lib.gla.ac.uk
William is the Deputy Head of IT Services, Glasgow University Library and Administrator of the Glasgow ePrints Service. He is also the Project Manager: Service Development for DAEDALUS (University of Glasgow)

Pauline Simpson
ps@soc.soton.ac.uk
Pauline is Head of Information Services, Southampton Oceanography Centre and Faculty Liaison Leader for Science, Engineering and Math, University of Southampton. She is also the TARDIS (University of Southampton e-Prints) Project Manager.