Climbing the Scholarly Publishing Mountain With SHERPA

john maccoll; stephen pinfield

Climbing the Scholarly Publishing Mountain With SHERPA

John MacColl and Stephen Pinfield explore the SHERPA project, which is concentrating on making e-prints available online.

JISC announced its FAIR Programme (Focus on Access to Institutional Resources) in January of this year. The central objective of the Programme is to test ways of releasing institutionally-produced content onto the web. FAIR describes its scope as:

“to support access to and sharing of institutional content within Higher Education (HE) and Further Education (FE) and to allow intelligence to be gathered about the technical, organisational and cultural challenges of these processes.… This programme is part of a broader area of development to build an Information Environment for the UK’s Distributed National Electronic Resource.”(1)

It specifically sought projects in the following areas:

· Support for disclosure of institutional assets including institutional e-print archives and other types of collections through the use of the OAI (Open Archives Initiative) protocol.

· Support for the harvesting of the metadata disclosed through this protocol into services which can be provided to the community on a national basis. These services may be based around subject areas or other groupings of relevance for learning and research.

· Support for disclosure of institutional assets through the use of other relevant protocols, for example Z39.50 and RSS.

· Exploration of the deposit of institutional collections with a community archive or to augment existing collections which have proven learning, teaching or research value.

· Experiments with the embedding of JISC collections and services in local institutional portals and how well they can be presented in conjunction with institutionally managed assets.

· Studies into the related issues and challenges of institutional asset disclosure and deposit, including collections management, IPR, technical, organisational, educational, cultural and digital preservation challenges.

FAIR awarded funding to 14 projects in five ‘clusters’: museums and images, e-prints, e-theses, intellectual property rights, and institutional portals (details are given in the Appendix).

The Open Archives Initiative lay very firmly behind FAIR, as the call document says:

“This programme is inspired by the vision of the Open Archives Initiative (OAI) (http://www.openarchives.org), that digital resources can be shared between organisations based on a simple mechanism allowing metadata about those resources to be harvested into services.… The model can clearly be extended to include…. learning objects, images, video clips, finding aids, etc. The vision here is of a complex web of resources built by groups with a long term stake in the future of those resources, but made available through service providers to the whole community of learning.”(2)

The SHERPA project(3) represents the response to this vision of a number of major research libraries. It is concentrating on making ‘e-prints’ (electronic copies of research papers) available online. The bid was put together under the auspices of CURL (the Consortium of University Research Libraries) which is also contributing to the project funding. The project is being hosted by the University of Nottingham.

The research library perspective

The starting point of SHERPA is the view that the current system of research publication is not working. In this system the research community (predominantly universities) generates research output in the form of papers, which it then gives away free of charge to commercial publishers, who in turn sell it back to the research community at high prices. And the research community does not just give away its services as authors, but also as referees, editors and editorial board members, all mostly free of charge. Ironically, this is a system that does not ultimately work out in favour of researchers. As authors, the potential impact their research output may make is limited in this system since commercial publishers will normally shield their work behind ‘toll gates’ (journal subscriptions or article pay-per-view charges). As readers of the literature, they are prevented by these toll gates from gaining easy access to all of the publications in their field. Even libraries in large well-funded universities cannot afford subscriptions to anywhere near all peer-reviewed journals(4).

Academic libraries are then placed in a difficult position. Journals account for a large proportion of most academic library budgets. And this proportion is growing. Over the last 15 years journal prices have risen by about 10% a year at a time when library budgets have grown by no more than 2 or 3%. Libraries have often had to divert money from other budgets to maintain subscriptions or simply cancel titles. In most cases, they have done both. Many library managers have, as a result, become increasingly frustrated by the system, and those in research universities more than most. It is, after all, these institutions, more than others, who are generating the research output, which they are having to buy back in large quantities and at high prices in order to support ongoing research. Librarians who are buying these publications on behalf of their institutions have been leading voices in saying ‘we cannot go on like this’.

One possible solution is ‘self archiving’. Authors can make their own research output freely available outside the confines of commercial journals. Until recently, the best way of doing this was simply mounting it on a web site. However, this is not a particularly attractive prospect. It requires those carrying out literature searches to go to the web sites of individuals and research groups in potentially hundreds of different locations. Either that or rely on standard web search engines. Neither of these could give reliable comprehensive access. The Open Archives Initiative(5) Protocol for Metadata Harvesting (OAI-PMH) is a technical development which addresses this problem. Through the use of a ‘lowest common denominator’ metadata format (unqualified Dublin Core), it allows those producing metadata for all types of digital objects to ‘expose’ their metadata on the internet. The metadata can then be automatically harvested, collected together and made available in a searchable form. The real potential of the protocol lies in its support for interoperability. It is a tool for building union catalogues from a potentially vast range of different collections, and it therefore exploits the ubiquity of the internet to make virtually possible what is physically impossible. E-prints, whether ‘pre-prints’ (which have not yet been peer-reviewed), or ‘post-prints’ (which have), can be deposited and described by the authors themselves or perhaps third parties and made easily available to users. Through the OAI-PMH the metadata created can contribute to a vast worldwide network of resources which can be easily searched.

Of course, the ‘invisible college’ has always operated like this in any case (albeit in a limited way). Researchers do in some cases make free copies of their research available to their peers – via conferences, and on web sites. An interesting variant of this is the culture of working papers produced by academic staff belonging to particular institutions. However, this is an exclusive method of communication. Senior researchers in any discipline will know which institutions across the world have the strongest departments, or those with research interests which match their own – but what about junior researchers, or researchers in interdisciplinary areas? They may miss out on accessing this research. The potential impact of the research is then still limited. Making searchable metadata about these papers easily available would be a big step forward in addressing this problem.

Benefits of OAI-PMH to institutions and their libraries

With a system of OAI-compliant archiving, e-print repositories could replicate content only otherwise available commercially. Making content freely accessible in this way has the potential to improve scholarly communication (by lowering impact and access barriers) but it also has the potential to save institutions and their libraries money. Freeing-up access to the research literature and ensuring it is easily searchable will mean that commercial publishers have to pare down their profit margins and concentrate on adding value in order to retain customers.

But of course, it is likely to take a long time before there is a critical mass of content available. This is a massive mountain to be climbed. In some disciplines real progress has already been made. The case of the high-energy physicists who have been using arXiv.org(6) for more than a decade is well-known, but few other disciplines have yet shown an interest in organising themselves around a centralised discipline-specific repository in this way.

One suggested means of redressing this is to put the emphasis on repositories at the institutional level instead of the disciplinary. That is what the SHERPA project – located within the e-prints cluster of the FAIR Programme – will seek to test in the UK. If the impetus comes from within the university, with institutional support mechanisms in place to permit the growth of an institutional repository, then the current unevenness in the disciplinary spread of the free corpus may be reduced(7). Over time, the argument goes, a snowball effect will operate within institutions, and at a national – and international – level, so that a multi-disciplinary free collection of research literature can be built.

The institutional library service is in many ways the natural co-ordinator of this activity, performing the role of infrastructure provider. As part of the SHERPA project, a number of CURL libraries will begin to take on this role. Six open access e-print repositories will be funded within the project: at the Universities of Edinburgh, Glasgow, Oxford and Nottingham, together with a shared archive within the ‘White Rose’ partnership of York, Leeds and Sheffield, and one at the British Library for the research outputs of ‘non-aligned’ researchers. They will use the open source eprints.org(8) software produced by the University of Southampton. The project will investigate the technical and managerial aspects of running these repositories. After the initial work is complete, it is hoped that other institutions will be able to come on board.

SHERPA will be setting up OAI-compliant e-print repositories but it will not (in the first instance at least) be creating aggregated search services. This will be done by others, including new projects funded as part of FAIR. One such project, e-prints UK, will be working in partnership with SHERPA to achieve the best ways of creating metadata so that it can be effectively harvested. One of the key elements of OAI is this separation between repositories (‘Data Providers’) and search services (‘Service Providers’). FAIR gives us an opportunity to try this model out within real organisations. With this experience SHERPA hopes to be in a good position to advise others on setting up these kinds of services from scratch for themselves.

In the short term, the biggest challenge of all is not a technical or managerial one but a cultural one. We need to convince academics that they must also join the expedition. Librarians should now take on the role of change advocates. SHERPA will aim to contribute to this advocacy. Major advocacy campaigns will be mounted in CURL institutions supporting the institutional archive agenda. It is also hoped to contribute to the wider campaign beyond these institutions as well. SHERPA will, for example, put materials used and lessons learned into the public domain. It hopes to be one of the growing number of voices in the academic community arguing for change.

Quality content

One of the key ways of winning over researchers is by demonstrating that e-print repositories can provide access to the quality literature. There are widely held views that free literature on the web is normally of poor quality and that open access repositories are not an appropriate medium for publishing peer-reviewed research. For this reason SHERPA aims to concentrate on collecting refereed content. It will not reject other forms of papers, but it will seek post-prints as its first priority. Authors will be encouraged to post their work on their institutional repository as well as having it published in journals. Having a good proportion of refereed articles searchable within the SHERPA corpus will help to demonstrate the viability of the approach. Another reason to focus on refereed material is that it is likely that this will define which items in the SHERPA collections are selected for digital preservation. While a pre-print which an author never intends to submit for peer review may still be worth preserving, generally the approach will be to preserve articles once they are in their final form – and this is most easily witnessed by their appearance in the journal literature.

The approach taken by SHERPA will then be to collect papers which have been (or will be) also published in the peer-reviewed literature. For these reasons, SHERPA is keen to engage publisher support for the project. The very choice of the name, indeed, is designed to convey this. ‘Securing a Hybrid Environment for Research Preservation and Access’. This particular ‘hybrid environment’ is one in which a free corpus of research literature can exist alongside a commercial one, and is not necessarily in conflict with it. As the example of high energy physics shows, open access e-print archives do not necessarily kill journals. Journals may however have to change their roles, possibly focusing on managing the peer-review process and adding value to the basic content (both of which of course cost money) rather than being sole distributors of content. The SHERPA project wants to work alongside publishers to investigate how the field of scholarly communication may take shape in the future.

Copyright

A key issue here is copyright. It is common for commercial publishers to require authors to sign over copyright to them before they will publish an article. In some cases, this will give the publisher exclusive publication rights and the author will not be able to self archive the paper. The idea that authors should continue to submit their work to journals but also post their work on e-print repositories runs into problems here. How can projects like SHERPA deal with this?

Firstly, it should be recognised that not all publishers require copyright sign-over. A good number of publishers allow authors to keep copyright. Since authors (to a certain extent) have the choice about where they place their papers, advocates of self-archiving can encourage authors to place their papers with publishers of this sort and thus retain copyright. Where copyright sign-over is required by publishers, the author is sometimes still permitted to distribute a paper for non-commercial purposes outside the confines of the journal. Some publishers have copyright agreements which explicitly allow the posting of e-prints. Once again, authors can be encouraged to submit papers to these publishers. One thing that SHERPA will aim to do will be to examine the copyright agreements of different publishers and publicise what their agreements will and will not allow.

Where exclusive rights are normally expected to be signed over, a number of possible strategies may be adopted. Firstly, SHERPA intends to help authors to negotiate with publishers in order to allow them to self archive. One possible way in which this may be done is to produce a standard ‘back licence’ document that can be appended by authors to publisher copyright agreements. Such a back licence might state that the author is signing the publisher’s own licence but subject to the terms of the back licence, and the back licence in turn allows the author to retain the right to self archive the work in a non-commercial repository. In other cases, SHERPA hopes to negotiate directly with publishers to persuade them to grant the project a blanket waiver which allows articles to be posted on SHERPA archives at least for the duration of the project.

This may not be as difficult as it might at first appear. The editor-in-chief of an Elsevier journal in informatics, one of the professors of informatics at the University of Edinburgh, recently pursued Elsevier over its policy regarding e-prints. He received a reply in the Bulletin of the European Association for Theoretical Computer Science for October 2001, in an article entitled ‘Recent Elsevier Science Publishing Policies’, which stated

‘… the exclusive distribution rights obtained by Elsevier Science refer to the article as published, bearing our logo and having exactly the same appearance as it has in the journal. Authors retain the right to keep preprints of their articles on their homepages (and/or relevant preprint servers) and to update their content, for example to take account of errors discovered during the editorial process, provided these do not mimic the appearance of the published version. They are encouraged to include a link to Elsevier Science’s online version of the paper to give readers easy access to the definitive version.’(9)

This is an interesting departure for Elsevier and perhaps indicates that some publishers are keen to investigate these issues further. Even where there is no interest, things can be done. SHERPA will also investigate ways in which the Harnad-Oppenheim strategy(10) can be employed effectively and appropriately.

Digital preservation

The SHERPA project is also keen to pursue another objective. The CURL Directors, in considering the potential of the Open Archives Initiative, were very interested in the archiving dimension. They wanted a project which would ‘put the archiving into Open Archives’. The reason for this is that, as we move into an electronic-journal-dominated future for research, there are real concerns emerging about the preservation of digital material. Who should take responsibility for the preservation of the academic record? This has traditionally been a research library activity.

Peter Hirtle, writing in D-Lib in April 2001, stated:

“an OAI system that complied with the OAIS reference model, and which offered assurances of long-term accessibility, reliability, and integrity, would be a real benefit to scholarship.”(11)

OAIS is the Open Archival Information System(12) (a completely different standard from OAI-PMH), which emerged in 1999 from work done in NASA on designing a reference model for preserving space data. The model was seized upon by the digital preservation world generally, and used within the JISC-funded CURL Exemplars in Digital Archives (CEDARS) project(13). CURL therefore had a strong interest in implementing an OAIS-based digital preservation project, having initiated the successful work in OAIS model development undertaken by the CEDARS project since 1998. We expect that SHERPA will also be engaged in digital preservation activity for the contents of its archives later in the project, and are talking to funding agencies and various partners about the prospects for this.

Conclusion

The current structure of scholarly communication may have made some sense in a paper-based world. However, in a digital world it is looking increasingly anomalous. Where there is a need for the rapid and wide dissemination of content to the research community, it is found wanting. It is also extremely expensive for the very research community it is trying to serve. The development of institutional repositories is one possible response to the current problems. SHERPA is one project which hopes to go some way in testing out this model. There are key technical, managerial, and cultural issues which need tackling urgently. As the project begins to do this it will disseminate the lessons learned to the wider community in the hope that others will begin the process as well. SHERPA is, of course, just one project within a larger programme. FAIR is just one programme within a larger set of international developments. But it is hoped that FAIR projects, along with others working in this area, can begin to generate some kind of momentum which will enable us to improve the way in which scholarship is carried out in the future.

Appendix: FAIR projects

Museums and Images Cluster (4 projects)

· Petrie Museum, University College London - Accessing the Virtual Museum

· Fitzwilliam Museum, University of Cambridge; Archaeology Data Service, University of York - Harvesting the Fitzwilliam

· AHDS Executive, King’s College London; Theatre Museum, V&A; Courtald Institute of Art, University of London; Visual Arts Data Service, University of Surrey; Performing Arts Data Service, University of Glasgow - Partial Deposit

· ILRT, University of Bristol; University of Cambridge - BioBank

E-Prints Cluster (4 projects)

· CURL (University of Nottingham; University of Edinburgh; University of Glasgow; Universities of Leeds, Sheffield and York (‘White Rose’ partnership); University of Oxford; British Library) - SHERPA (Securing a Hybrid Environment for Research Preservation and Access)

· RDN, King’s College London; University of Southampton; UKOLN, University of Bath; UMIST; University of Bath; University of Strathclyde; University of Leeds; ILRT, University of Bristol; Heriot Watt University; University of Birmingham; Manchester Metropolitan University; University of Oxford; University of Nottingham; OCLC - E-prints UK

· University of Strathclyde; University of St. Andrews; Napier University; Glasgow Colleges Group - Harvesting Institutional Resources in Scotland Testbed

· University of Southampton - Targeting Academic Research for Deposit and dISclosure

E-Theses Cluster (3 projects)

· Robert Gordon University; University of Aberdeen; Cranfield University; University of London; British Library - Electronic Theses

· University of Edinburgh - Theses Alive!

· University of Glasgow - DAEDALUS

Intellectual Property Rights Cluster (1 project)

· Loughborough University; Birkbeck College, University of London; University of Greenwich; University of Southampton - Machine-readable rights metadata

Institutional Portals Cluster (2 projects)

· University of Hull; RDN, King’s College London; UKOLN, University of Bath - Presenting natiOnal Resources To Audiences Locally

· Norton Radstock College, Bristol; City of Bath College; City of Bristol College; Filton College, Bristol; Weston College, Weston-super-Mare; Western College Consortium, Bristol - FAIR Enough

Author Details

John MacColl is Sub-Librarian (Online Services) and Director of SELLIC at the University of Edinburgh.

Stephen Pinfield is Assistant Director of Information Services at the University of Nottingham and Director of SHERPA.

Both are members of the CURL Task Force for Scholarly Communication.

References

(1) http://www.jisc.ac.uk/pub02/c01_02.html

(2) http://www.jisc.ac.uk/pub02/c01_02.html

(3) http://www.sherpa.ac.uk

(4) See Stevan Harnad, ‘The self-archiving initiative’ Nature: webdebates. <http://www.nature.com/nature/debates/e-access/Articles/harnad.html>

(5) See http://www.openarchives.org

(6) http://www.arxiv.org

(7) See Raym Crow The case for institutional repositories: a SPARC position paper. Washington, DC: SPARC, 2002. Release 1.0. <http://www.arl.org/sparc/IR/ir.html>

(8) http://www.eprints.org

(9) Arjen Sevenster ‘Recent Elsevier Science publishing policies’. Bulletin of the European Association for Theoretical Computer Science 75, October 2001, 301-303

(10) Stevan Harnad, ‘For whom the gate tolls? How and why to free the refereed research literature online through author/institution self-archiving, now’, Section 6. <http://www.cogsci.soton.ac.uk/~harnad/Tp/resolution.htm#Harnad/Oppenheim>

(11) Peter Hirtle, ‘Editorial: OAI and OAIS: What’s in a name?’ D-Lib Magazine 7, 4, April 2001 <http://www.dlib.org/dlib/april01/04editorial.html>

(12) See Consultative Committee for Space Data Systems Reference model for an open archival information system (OAIS), 1999 <www.ccds.org/documents/p2/CCSDS-650.0-R-1.pdf>

(13) http://www.leeds.ac.uk/cedars/