How to Publish Data Using Overlay Journals: The OJIMS Project
The previous article about the Overlay Journal Infrastructure for Meteorological Sciences (OJIMS) Project [1] dealt with an introduction to the concept of overlay journals and their potential impact on the meteorological sciences. It also discussed the business cases and requirements that must be met for overlay journals to become operational as data publications.
There is significant interest in data journals at this time as they could provide a framework to allow the peer-review and citation of datasets, thereby encouraging data scientists to ensure their data and metadata are complete and valid, and granting them academic credit for this work. This would also benefit the wider community as a whole, as data publication would also ensure that expensive (and often irreproducible) data are archived and curated appropriately. Science, as a discipline, benefits from publishing processes that facilitate the appropriate application of data and the reproduceability of experiments.
The OJIMS Project aimed to develop the mechanisms that could support both a new (overlay) Journal of Meteorological Data and an Open-Access Repository for documents related to the meteorological sciences. Its work was conducted by a partnership between the Royal Meteorological Society (RMetS) and two members of the National Centre for Atmospheric Science (NCAS), namely the British Atmospheric Data Centre (BADC) and the University of Leeds.
This article goes into more technical detail about the OJIMS Project, giving details of the software used to deploy a demonstration data journal and operational document repository and the form of the submission processes for each.
OJIMS Aims and Objectives
Aims
At the start of the OJIMS Project, there were three fundamental aims:
- Creation of overlay journal mechanics
- Creation of an open access subject-based repository for meteorology and atmospheric sciences
- Construction and evaluation of business models for potential overlay journals
The third aim has been detailed in our previous article [1], so this contribution will concentrate on the details of the first two aims.
Objectives
The specific objectives of the project were detailed as below.
Repository Set-up
Set up a repository for meteorology and atmospheric sciences capable of preserving documents relating to the subject area with the following in mind:
- The repository should take peer-reviewed publications, 'grey' literature (which includes technical reports, images, video, podcasts etc.) and structured metadata documents.
- Create the repository's deposit and access polices.
Demonstration Overlay System
Create a demonstration overlay journal system with the following aspects addressed:
- The system must present an online journal to the reader, and be capable of organising the workflows associated with the peer-review process.
- Construct a prototype data journal (MetData) in order to evaluate its sustainability. This will include review procedures, presentation and trial content.
- Construct a prototype 'star-rated' overlay journal (MetRep) in order to evaluate its sustainability. This will include review procedures, presentation and trial content.
Most of these objectives remained the same over the course of the project, though time spent working on the prototype 'star-rated' journal was reduced in order to spend more time on the construction of the prototype data journal. This was decided after in-depth user surveys (as reported in [2] [3]) suggested that the meteorological and atmospheric science communities were more interested in a data journal than the provision of a 'star-rated' overlay journal (mainly due to the low levels of documents in pre-existing repositories). It should be pointed out that the software developed to provide the overlay documents for the data journal is nonetheless equally applicable to the 'star-rated' journal.
However after examining the business models, we discovered that the creation and operation of the data and 'star-rated' journals themselves stood quite explicitly outside the project scope as such work required a long-term commitment from a journal publisher.
Methodology
The main project issues were:
- Integration of RMetS current practice with the new overlay journals
- Copyright for overlay journals
- Copyright for documents deposited in the subject-based repository. The copyright issues for published papers are fairly clear; however the authorship of technical reports and other 'grey' literature is often less than completely clear
- Dataset peer-review processes
- Technical implementation of overlay journals
- Viability of business models
Figure 1 gives an overview of the components required for this project and their interactions. It is worth noting that the software requirements for the data journal and the overlay subject repository are very similar, hence the same basic software (with minor modifications) can be used for both the data journal and overlay subject repository.
The OJIMS Project Web site was produced to act as a dissemination point for the results of the project, and as a collaboration tool for the project partners. The Web site [4] will remain operational for several years after the project ends to publicise the project results.
Implementation
The work of the OJIMS Project was conducted by a partnership between the Royal Meteorological Society (RMetS) and two members of the National Centre for Atmospheric Science (the British Atmospheric Data Centre and the University of Leeds).
Building the National Centre for Atmospheric Science (NCAS) Document Repository
A key deliverable of the OJIMS Project was to create a discipline-based open access document repository embedded within the BADC. There were two main requirements for the subject repository:
- A suitable place to lodge grey literature
- Mechanics for the creation of records that describe documents in other repositories (overlay documents)
The overlay document requirements are considered in the data journal developments (see Creating the Infrastructure for Overlay Journals) so the subject repository development concentrated on identifying how to provide a suitable place to lodge grey literature.
The deposit policy, documentation and training process for maintenance of the repository system were all developed during the project. The full deposit policy is available on the repository site [5]. It is broken down into separate metadata, data, content, submission and preservation policies. Key parts of the policy are that anyone can access the metadata, full-text and other full data items stored in the repository free of charge, and that items stored in the repository will be retained indefinitely.
Implementation of the subject repository was done by installing the EPrints software (version 3) on a Xen (virtual server) platform running Red Hat Enterprise. The basic configuration was supplemented by:
- Using the standard subject categories used by NORA [6]
- Branding and look-and-feel tweaks
- Adding policy information (using the OpenDOAR [7] policy tool)
- SNEEP [8] extensions for adding comments and tags to repository content
After populating the repository with some sample content, and training BADC staff to administer the repository, the repository was launched on 30 October 2008, and advertised to BADC users. Documents already held by the BADC and NEODC were were added to the repository. The repository has been running operationally since launch as the Centre for Environmental Data Archival Document Repository (CEDA Docs [9]).
The repository has the standard EPrints interface with the addition of the tags and comments extensions from the SNEEP Project. The standard repository workflows apply. The repository currently has over 200 items mainly added by BADC staff from existing material held within the data centre. 27 users are registered with the repository.
The OJIMS Project provided the funding to run the CEDA document repository for a year, with the principal expenditure devoted to moderating the deposit of new items into the repository. The sustainability and cost modelling of the repository were also investigated, and the costs of running the repository within the BADC in the long term were not found to be prohibitive. Hence the repository will be maintained for the foreseeable future now that the OJIMS Project has ended.
Creating the Infrastructure for Overlay Journals
The infrastructure requirements for the overlay journals are similar, regardless of whether the overlay journal is a data journal, or a 'star-rated' journal. The project team examined current overlay infrastructure tools and technologies and chose the Open Journal Systems (OJS) because of its open source nature and the ease of adaption. A series of interfaces and forms were generated for the publishers and authors, including a peer-review management interface and issue construction interface for publishers, and a submission interface form for authors.
Overlay Documents for the Repository and Data Journal
An overlay document is a structure document that is created to annotate another resource with information on the quality of the resource. This document can be referred to as the data description document. However, it contains more than just a description of the data, including, for example, details of the review process context for which it is constructed. It is for this reason that the term 'overlay document' has been coined. The document has three basic elements:
- metadata about the overlay document itself;
- information about and from the quality process for which the document was constructed; and
- basic metadata from the referenced resource to aid discovery and identification.
When considering how to encode this information, project staff considered various implementation methods; as this is an annotation document, RDF seemed appropriate. It is potentially harder to render RDF documents for human readers because of RDF's more complex data representation, but as the structure of these documents is not overly complex, it can be done. We took inspiration from annotations of Flickr photos by Masahide Kanzaki [10].
Only openly available software was used to create the overlay document editor and the structure for the data journal. Any modifications made to the software during the project have been made freely available in the sub-version repository on the OJIMS Web site [4].
The creation of the overlay documents used in the overlay journals required a custom-built editor system. This was written using the Pylons Web application framework. The editor system supported creation of documents with XML schema, Dublin Core fields for the overlay documents themselves and, for the overlaid dataset, metadata for the data centre. The OJIMS editor is also freely available from the sub-version repository on the OJIMS site and will remain there for the foreseeable future.
Policies and Procedures for the 'Star-rated' and Data Overlay Journal
This work, led by the RMetS, concentrated on producing viable business plans, as well as submission and acceptance policies for the data and 'star-rated' journal.
The main tasks for the data journal included:
- Work out acceptance policy for datasets
- Formalise interaction with the overlay journal infrastructure
For the 'star-rated' overlay journal, the tasks included:
- Establish 'kite-marking'/'star-rating' criteria and methodologies
- Formalise the 'star-rating' process
Both types of overlay journal required sustainability and business modelling. Full details of the policies and procedures for data and star-rated journals can be found in the business models report [11].
For the data journal the acceptance policy for datasets depends on the subject area covered by the data journal and whether the datasets are stored in an existing data centre that satisfies standards of good practice in archiving and data management and which is registered with the data journal. For example, for a data journal specialising in meteorological data, a dataset of rain gauge measurements stored in the BADC (or other accredited data centre) would be appropriate for publication, while a dataset on road traffic flows would not.
The contents of the data journal could be categorized in the following ways:
- Experimental campaigns
- Numerical modelling projects
- Operational systems (systems which are delivering a service and so have to be resilient and available, e.g. collection of radar data for input into numerical weather models or weather forecasts)
- Instruments and observing facilities (as used for scientific campaigns etc., where precision may be more important than resilience)
For the overlay journal and document repository, two types of ratings for the referenced documents were proposed. The first rating advises readers on how far the material has gone through the independent peer-review process, giving four ratings as explained in Figure 3.
The second form of rating comes from the users of the overlay journal (Figure 4), where users could rate the entry out of 10. The average rating would be displayed alongside the number of reviews and number of downloads.
The Data Journal
A demonstration overlay journal system used to produce a data journal has the following requirements:
- Tools to create the data description documents for the author
- Inclusion of simple metadata in the data description documents about the document and the dataset referenced
- Inclusion of data description documents in standard journal processes, like submit, search, view and review. These same functions are expected of normal journal articles
- Unambiguous reference to datasets in long-term data centres
The production of an overlay document repository can be done using an analogous process.
Figure 5 gives a schematic view of the data journal structure. The data journal contains a database of XML documents relating to various published datasets. These XML data description documents contain links to the datasets as they are published in various accredited data repositories. The data journal editor edits these XML files, but does not make any changes whatsoever to the underlying datasets.
The tactic taken in the development of the demonstration system was to use as much standard online journal technologies a possible, thereby introducing all the functions of journals without engineering new solutions. Various online journal systems considered including the Open Journal Systems (OJS), Digital Publishing System (Dpubs) and Hyperjournal. OJS was chosen because of its open source nature and the ease of adaption. The RIOJA [12] Project also used this software for exactly these reasons.
The approach used was to add the data description documents into the standard workflow of the journal software. The additional elements needed were a tool to author the data description documents and a method to render the documents.
To create these documents a Web-based authoring tool was developed. This was done using the Pylons Web application framework, which allows the rapid development of Web applications in the Python programming language. The code for this application is available from the sub-version repository on the OJIMS Web site [13]. The editor requires input of metadata about the overlaid dataset and other information such as the author of the document. It also adds information set and constrained by the data journal's review processes. For example, a text description of the review process is the same for all documents and is simply inserted from the editor's configuration.
The XML documents produced by the editor were rendered into a human-readable document using a XSLT style sheet when viewing through the data journal interface (see screenshots below).
Outcomes of the OJIMS Project
The main project achievements have included:
- The project has developed a business case for data journals on behalf of the academic publishing community. The RMetS has evaluated the technologies and business cases associated with new overlay journals. It is hoped that this will lead to the publication of a data journal in the near future.
- The project has also developed some of the software technologies required to run a data journal.
- The project brings data journals closer to actual realisation. Should a data journal be developed and run, this will allow data scientists to gain academic credit for their work producing the datasets.
- Further, peer review of datasets will ensure the quality of the datasets while publishing will ensure that more datasets are properly curated and archived, and are more widely available.
- The document repository [9] is now fully operational and will be a resource freely available to members of the atmospheric science community and a source for documentation about the datasets stored at the BADC.
- Expanding document repositories to include a wider range of citable material is of benefit and interest to the user community. The CEDA docs repository will continue to collect material ('grey' literature) outside the scope of journal articles from numerous sources.
Impact on the Meteorological Sciences Research Community
A significant part of the OJIMS project work was the survey of scientists and organisations which served to introduce the work the project was doing at the same time as capture the requirements for the data journal and document repository. The results from these surveys are documented in the reports OJIMS Survey of Organisations [2] and OJIMS Survey of Scientists [3].
These surveys and presentations at conferences and meetings served to kick-start a community debate on what materials need archiving and which should be regarded as 'publication-quality'. The OJIMS project has a high profile within the repository and atmospheric science community. At the recent NERC Data Management Workshop (February 2009 [14]) the OJIMS Project was mentioned in more than one key-note speech, with special emphasis on the data journal and its potential ability to provide academic credit for those data scientists who publish their data.
Conclusions and Recommendations
The OJIMS Project has demonstrated that standard online journal technologies are suitable for the development and operation of a data journal as they allow the use of all the functions of journals without the need to engineer new solutions.
OJIMS also showed that there is a significant desire in the meteorological sciences community for a data journal, as this would allow scientists to receive academic recognition (in the form of citations) for their work in ensuring the quality of datasets. The funders of the research that produces these data also benefit from data publication as it raises the profile of the data, ensuring reuse. Furthermore, such publication encourages the scientists involved to submit to accredited data repositories, where their data will be properly archived.
With regards to standards, the OJIMS data journal system chosen was the Open Journal Systems (OJS) and the repository software was EPrints. Both OJS and EPrints were chosen because of their open source nature and their ease of adaption. However they also offer standard interfaces such as OAI-PMH [15].
The overlay document schema incorporated Dublin Core metadata and used RDF to encode the needed information.
The project endeavoured to make use of pre-existing and mature software to implement the document repository and the overlay journal infrastructure, modifying it as appropriate. This was to ensure ease of use and stability of the resulting software.
The OJIMS Project would recommend that further work be done on the implementation and operation of a data journal. The authors are aware of one data journal currently in operation, the Earth System Science Data Journal (ESSD) [16], which has four papers in its library as of time of writing.
Acknowledgements
The authors would like to acknowledge the Joint Information Systems Committee (JISC) as the principal funder of the OJIMS Project under the JISC Capital Programme call for Projects, Strand D: - 'Repository Start-up and Enhancement Projects' (4/06). Complementary funding was provided by NCAS through the BADC core agreement, and also by the Natural Environment Research Council.
References
- Sarah Callaghan, Fiona Hewer, Sam Pepler, Paul Hardaker and Alan Gadian, "Overlay Journals in the Meteorological Sciences", July 2009, Ariadne, Issue 60 http://www.ariadne.ac.uk/issue60/callaghan-et-al/
- Fiona Hewer, OJIMS Survey of Organisations, Version 2.0, March 2009
http://proj.badc.rl.ac.uk/ojims/attachment/wiki/WikiStart/FRK_RMetSOJIMS_SurveyOfOrgsV2%209Mar2009.pdf - Fiona Hewer, OJIMS Survey of Scientists, Version 2.0, March 2009
http://proj.badc.rl.ac.uk/ojims/attachment/wiki/WikiStart/FRK_RMetSOJIMS_SurveyOfScientistsV2%209Mar2009.pdf - Overlay Journal Infrastructure for Meteorological Sciences (OJIMS) - Trac http://proj.badc.rl.ac.uk/ojims
- Policies: CEDA Repository http://cedadocs.badc.rl.ac.uk/policies.html
- NERC Open Research Archive http://nora.nerc.ac.uk/
- Directory of Open Access Repositories http://www.opendoar.org/
- Social Networking Extensions for EPrints http://sneep.ulcc.ac.uk/wiki/index.php/Main_Page
- CEDA Repository http://cedadocs.badc.rl.ac.uk/
- Image Annotator: The Web Kanzaki http://www.kanzaki.com/docs/sw/img-annotator.html
- Fiona Hewer, OJIMS Business Models Report, March 2009
http://proj.badc.rl.ac.uk/ojims/attachment/wiki/WikiStart/FRK_RMetSOJIMS_BusinessModelsV2p1.pdf - Repository Interface for Overlaid Journal Archives (RIOJA) http://www.ucl.ac.uk/ls/rioja/
- OJIMS - Trac http://proj.badc.rl.ac.uk/ojims/browser
- 2009 Workshop Programme - NERC Data Management Workshop - CEH Wiki http://wiki.ceh.ac.uk/display/nercworkshop/2009+Workshop+Programme
- The Open Archives Initiative Protocol for Metadata Harvesting http://www.openarchives.org/OAI/openarchivesprotocol.html
- Earth System Science Data (ESSD) http://www.earth-system-science-data.net/