Enhancing Scientific Communication through Aggregated Publications
The Internet has caused a revolution in the way scientists and scholars have access to scholarly output. Only 15 years ago, the (university) library decided what sources should be offered to the staff and individual scientists could only hope the librarian would listen to their wishes. In this system scientists frequently had no instantaneous access to the information they wanted. In such instances they had to rely on the Interlibrary Loan System.
Nowadays, researchers may access scientific publications anytime and anywhere, provided that their institution has fulfilled their obligations to pay the access fees to the publisher. Of course this system has a risk in itself: especially for universities from outside Western countries there is the problem of finding the money to pay for their researchers’ access to digital output.
One of the solutions to overcome this problem is the rise of the Open Access movement. Its view is that the results of publicly funded research should be publicly available. Thanks to the development of institutional repositories and of service providers like NARCIS [1], many scientific publications have become accessible to a broad public. Furthermore, the switch from the classic business model (where the user pays) to a new Open Access business model (where the author pays) may be helpful in improving the access to scientific output.
Due to these developments, the relationships between Science and Society have become stronger. Journalists, policy makers and the broad public are offered the opportunity to become acquainted with new scientific views at an early stage. Thanks to new developments, researchers are offered the opportunity to strengthen these relationships in what are termed Enhanced Publications (EPs). This article describes a way researchers can implement EPs in Aggregated Publications Environments.
From Scholarly Publication to Scholarly Communication
The publication process represents but one single aspect of scientific work. Science is flourishing thanks to communication, a much broader concept than publishing. Long before the final book or article will be published, researchers will have had extensive discussions with their peers to share their views, ideas and opinions in order to check the validity of their claims. The raw data on which the publication will be based may be shared as well. Unfortunately, in recent times the improvement of scientific communication has not appeared to be a top priority for the major publishers. Even the announcement of Elsevier’s ‘Article of the Future’ [2] does not appear to be as promising as had been expected. Its end-product is still a publication rather than a communication object.
On the other hand, it is certainly the case that publishers have invested a lot of money in the development of excellent search and alerting tools. A good example is the service that Scopus [3] offers scientists, providing them with the opportunity to search citations from thousands of journals and scientific Web pages. This sort of service is of course useful, but it may still be seen by some as a form of one-way traffic: the publisher offers information to researchers. They can subsequently re-arrange subsets of this information according to their own wishes. There is no option, however, to use this kind of service to comment on findings or to discuss them with other researchers.
Collaboratories
Apart from the traditional publication process, researchers themselves have begin to set up what are termed collaboratories in all kinds of disciplines. The goal of such collaboratories is to enable researchers to co-operate in distributed teams and to share tools and resources (e.g. datasets). Naturally, the collaboratory gives the team the chance to write and discuss manuscripts as well. A good international example is Cx-Nets, a collaboratory on complex networks [4]. This collaboratory supports co-operation among teams from the USA, Italy and France.
The information shared within a collaboratory is not necessarily available to everybody. So a collaboratory may display all the characteristics of modern scientific communication, but is not primarily focused on the open exchange of information.
Enhanced Publications in Relation to Aggregated Publications Environments
There is now a new publication and communication model which combines the properties of the traditional publications with the improvements from the collaboratories. In this model the information object will play a central role. It may be any kind of object: a traditional publication, a comment on that publication; a dataset; an image; an audio fragment, and so on. Furthermore, the model – Jane Hunter has called it a ‘scientific publication package’ [5] - describes the relationships between the objects. All relevant steps in the scientific workflow that are necessary to produce a publication may be traced.
In this new publication type, termed an Enhanced Publication (EP), it will be clear to readers what for instance is the relationship between a dataset and a traditional publication built upon it. But they will also have the opportunity to comment on or to build upon the EP. In fact, the EP is not a static document in the way a journal article is. The EP has a dynamic character. Moreover, readers may have the opportunity to reuse certain parts of the EP, either to create another EP or a traditional publication.
Implementing EPs is technically rather complex. The challenge is to suppport researchers with tools that will make it easy to aggregate objects and to describe the relationships between them. The Aggregated Publications Environments (APEs) have been set up to make this happen. Usage of EPs will support the interaction between Science and Society. A policy maker will be offered the opportunity to put together an EP in which parts of the original scientific publication have been incorporated into a policy paper. This example also shows that the components of an EP may have different authors from different fields.
OAI-ORE
New techniques have had to be developed to provide the functionality required to create EPs. In this respect, the completion of the ORE-ORE model of Herbert van de Sompel has been welcomed by the EP community [6].
In short, OAI-ORE is a means to describe aggregations of Web resources, while identifying the boundaries of such aggregations at the same time. Examples of aggregations are collections of images, a book with a large number of chapters, an article with related datasets. (See also the OAI-ORE introduction given by Rumsey and O’Steen [7]).
A central distinction made by the OAI-ORE model is that among Resource Map, Aggregation and Aggregated Resources. Figure 1, taken from Van de Sompel’s OAI-ORE primer [6], depicts an example of a Resource Map.
The OAI-ORE model can easily be translated into a model for the EP. The overall description of the EP is provided by the Resource Map. This Resource Map has a machine-readable representation that provides details about the Aggregation. The aggregation sums up what information objects have been used in the EP and that aggregated resource represents an individual information object in the EP.
There even exists the option to divide traditional information objects into smaller ones. For instance, it is perfectly possible to describe every single chapter in a book or to split a journal article into units such as Introduction, Materials & Methods, and Discussion.
In this sense, OAI-ORE will realise what Harmsze, Van der Tol and Kircz [8] have described in the late 1990s. They proposed a model for modular versions of scientific publications. They argued that instead of using PDF – which after all only creates an electronic counterpart of the paper version –it would be advisable to split the publications into modules. By doing so, it would become possible to refer to a single module of a published document. In fact, what Elsevier presents in Cell Press beta [9] is not entirely dissimilar to their idea.
It is up to the creators of an EP to decide on the degree of granularity they wish to adopt while putting the EP together. For instance, with conference proceedings, the individual papers may best represent the information unit, but in the example of an journal article in may be useful to choose a greater degree of granularity.
ESCAPE
The OAI-ORE model is now being introduced in the world of libraries and research. In the Netherlands, SURFfoundation has launched a tender [10] for the implementation of EPs. The goal of this tender is to demonstrate the advantages of EPs to the field of scholarly communication. Enhanced Scientific Communication by Aggregated Publications Environments (ESCAPE) [11] is one of the projects that have received a SURFfoundation grant.
Various groups of scientists and researchers from both the University of Twente and the University of Groningen in the Netherlands are involved in this project to demonstrate the added value of the Enhanced Publication model. They are working together in a rather unique setting with programmers, library staff and information specialists from both their own institutions as well as the Royal Netherlands Academy of Arts and Sciences (KNAW).
Where hyperlinks are being used to interconnect existing Web documents, a web of data can be created by interconnecting data by means of RDF-triples [12]. In fact, OAI-ORE has RDF as one of its foundations, as Van de Sompel states in his ORE Primer [6]. By using RDF, it becomes possible to identify persons and even abstract entities like ideas on the Web. This is very important in the case of EPs, because it is possible to include background information on the authors (their activities, their affiliations, etc.). Moreover, RDF can be used to indicate the relationships between the information objects.
The EP (Resource Map) has its own Uniform Resource Identifier (URI) and so has every single information object that is part of it. This approach is essential in order to meet the goals of OAI-ORE, i.e. object re-use and exchange.
Researchers can incorporate a specific information object from an already existing EP in a new one. For example, they can simply copy over to the new EP the exact same research method from the existing EP as they later adopted exactly the same procedures as described in the earlier EP. A further example: researchers can incorporate in a new EP a whole dataset from a former EP since their writing relies on exactly the same dataset as that employed in the earlier EP.
Initiatives like ESCAPE are evolving worldwide. Some important examples are:
- eSCIDoC [13]: Within the Max Planck Society, the eSCIDoc Project will implement an infrastructure for its researchers which combines different types of information objects in one scientific knowledge space.
- SCOPE [14]: The Australian project SCOPE (Scientific Compound Object Publishing and Editing) has goals related to those of the ESCAPE project: offering authoring tools to researchers to enable them to create their own EP’s.
- ICE-TheOREM [15] - End to End Semantically Aware eResearch Infrastructure for Theses: In ICE-TheOREM the functionalities of a content management system with those of OAI-ORE are being coupled in one single system.
Aggregated Publications Environments as Part of ESCAPE
Important developments in the field of Linked DATA (and the related RDF and SemWeb) are described by Bizer et al [16], but this will not represent a principal feature of this article. Instead the focus will be on the potential producers of EPs: researchers in fields like biology or archaeology. These researchers presumably want to obtain tools to relate information objects to one another, but without detailed knowledge of OAI-ORE and RDF.
The most important part of the Aggregated Publications Environments (APEs) therefore, is the development of tools that can enable researchers to produce their own EPs. Obviously, these tools are being developed by programmers in close collaboration with those researchers. The deliverables of the ESCAPE Project will be tested and embedded in the regular workflow of the researchers. These APEs can be regarded as elements in the service layer of the OAI data/services model. No major changes in the mode of operation of traditional repositories are needed to deliver objects as items in an aggregation with other objects. The ESCAPE Project is based on Web architecture principles, mainly using individual resources (objects) and effecting unique identification of the resources by means of a URI (in most cases a URL). In FRBR terminology [17] the core of description is at item [18] level.
All the individual objects may become part of a structure of related individual objects. This structure is not fixed. You may have a structure inspired by library principles (like FRBR) or inspired by content-based relationships between objects. In this last example, the kind of the content relations may depend on the scientific discipline. As the OAI-ORE model is completely compatible with Web architecture principles, in ESCAPE, OAI-ORE is being used to describe EP instead of XML-containers like DIDL (Digital Item Declaration Language).
Information specialists, library personnel and programmers from Twente University, RUG and KNAW are working together to develop APEs for the following research groups:
- Centre for Conflict, Risk and Safety Perception (iCRiSP) [19]. With these researchers from Twente University an APE is being developed as part of the Web site CRiSP.
- Brandaris128 [20]. Within the BRANDARIS Project, the applications of an ultra high-speed camera are being studied. The results have a major impact on society. As part of the Web site Brandaris128, an APE will be presented with EPs that show the relationships between the original research results and derived material (like television programmes and popular scientific publications).
- Centre for Public Order and Security [21]. Within this Centre, studies are being set up to measure the impact and reuse of annotations (comments on legal decisions) on local authorities.
Editing Rights in APEs
EPs offer new opportunities for co-operation. Naturally, a research group does not want everybody to be able to add or edit objects within its EP. Therefore three levels of usage have been implemented. The creators have all the rights to add, edit or delete objects. They may also appoint other researchers as ‘Editors’. These editors may add information to the EP (for instance by tagging or by commenting parts of the EP), but they do not have the right to delete or edit existing information objects. The general public is able to view the EPs, but has no further rights.
First Results of the ESCAPE Project
To start with, a technical infrastructure has been established for the archiving of resource maps, based on the OAI-ORE data model [22] and its elaboration in DRIVER [23]. As explained earlier, the resource maps describe and give access to the objects (aggregated resources) and their mutual relationships within it. Moreover, the objects within the resource maps (ReMs) may be edited by library personnel or scientists.
Repository of ReMs
It was decided to base the repository on Fedora 3.1, especially because of its built-in RDF (triple store) support. The repository stores the individual entities of the OAI-ORE data model and exposes them through OAI-ORE resource maps. These resource maps are harvestable making use of the classic OAI-PMH protocol [24]. Consequently, no other harvesting protocol is required. The resource maps may be serialized in Atom or in RDF for harvesting.
Therefore in response to a particular publication or information need, it will be possible to offer a view of a network of content-related objects displayed together with the relevant document. In this way, the relationship between the original scientific publication and a derived information object – for instance a policy paper – will be made visible to the public.
In some cases one may want to refer to an information object in a particular setting. A good example is the description of a particular painting as part of a specific exhibition. This situation can be described using the proxy URI [25] as proposed by Van de Sompel.
Resource Map Editor
In order to be able to establish a repository based on resource maps (ReMs), researchers had first to be in a position to put together these very same ReMs. The project therefore devoted effort to the development of a special editor that they could use to produce and amend resource maps in a user-friendly manner. The resource map editor is a tool for defining and describing relations between objects, by authorised professionals using standardised descriptions. These descriptions can be viewed and edited. Unauthorised users will also have the opportunity to add free descriptions to the ReMs. Work is in progress to realise this option.
The Resource Map Editor and a tool for browsing and searching the aggregated resources are combined in the Aggregated Publications Environments.
Resource Map Vocabularies
Researchers can indicate the existing relationships between information objects by the use of vocabularies. Within the ESCAPE Project it was decided to rely as much as possible on already existing vocabularies like dcterms [26], FOAF [27] and SWAN [28]. This would help to make the outcomes of the project interoperable with other systems.
In the provisional example of an EP below, you can see a part of an EP with some of its information objects.
Every single information object may be edited by authorised users. The vocabulary used in an information object (e.g. name of the organisation, title of an article, and so on) is shown via a mouse roll-over functionality.
The main focus of the ESCAPE Project will be the development of functionality. Next year, more attention will be paid to the improvement of graphical representation.
In Figure 4 an information object – in this case a journal article – is shown as part of the Enhanced Publication of Figure 3. The creator of the EP, Michel Versluis, has decided that this article is related to a video with the name ‘On sound of snapping shrimp’.
Researchers are offered terms from a broad set of vocabularies. Besides terms from already existing vocabularies they have the option to indicate other (free) terms to describe potential relationships. The existing vocabularies do not offer terms to describe specific content relations; for example, when a publication is adopted by another publication, as in the case of a policy paper.
Accordingly, the ‘relationAnnotation’ object has been introduced in ESCAPE. So using the normal dcterms:relation, researchers can indicate that there is a relationship between two objects, but by using the relation annotation object (with its own URI) they can now also make the relationship identifiable. In the example below, the relation annotation makes it clear that the creator of the EP has incorporated the BBC animal show ‘Weird Nature’, as it is a ‘Beautiful non-specialist video of the scientific study into cavitating bubbles’.
The ESCAPE Project will end in December 2009. By that time, tools for composing EPs will have been delivered. As a result, researchers will be able to produce EPs and deposit them in an institutional repository. This development will mean that existing service providers like NARCIS will be adapted in order to present the added value of the EPs to its users.
Conclusions
The emergence of the semantic Web/ linked data and OAI-ORE represents great potential. Researchers will have the tools to identify each object with its own URI. Moreover they will able to describe the relationships between these objects and have the opportunity to arrange these related objects in a resource map. The boundaries of these resource maps – each with a URI of its own - can be defined by researchers themselves.
However, these developments may also generate difficulties themselves. First of all, the creation of a resource map with aggregation and aggregated resources is a more complex job than just depositing a publication in an institutional repository. It will require extra analytical effort, because researchers must be completely informed about the objects they want to include in the resource map. Moreover, they must know, and indicate, the relationships between the objects. This means that authors have to keep a ‘helicopter view’.
Furthermore, the composer of an EP will be confronted by certain practical problems, such as the user-friendliness of the tools, the completeness of the vocabularies and the discovery of resource maps in which a specific object has been placed. However, it is anticipated that such difficulties will disappear within a short time.
There are some issues though that will require special attention. In an EP environment, researchers will need to enter an RDF statement (URI) to describe the name of the creator of an object. But which one should they choose? In many cases several URIs will exist describing the same author (for instance, a URI provided by the university and one provided by the cycle club of which the author is a member). Which URI is correct will depend on the situation. So there may be a need for an aggregation of URIs for a specific author. As a consequence, in this respect, there could well be a role for digital author identifiers.
The last thing to deal with is the fact that is not easy to discover in what EPs an individual object (for instance an object from a traditional repository) has been described. For science research it would be very interesting to identify who created the EPs in which a specific information object has been incorporated. A solution to this ‘discovery’ problem could be the introduction of a PING or trackback functionality. Every time a specific information object is being incorporated in a new EP, the URI of that EP (and the URI of the object) would be sent back to the repository where that specific item resides.
These problematic matters notwithstanding, the introduction of linked data, RDF and OAI-ORE in the world of scientific communication represents a big step forwards. Scholars will have the opportunity to relate the items they consider important. For the readers of their EPs, it will become clear to them on what basis a publication has been created. In this respect these conclusions are quite similar to those drawn by Van den Sompel et al [29].
There are already some interesting use cases. Within the project DatapluS [30] researchers are composing EPs in which traditional articles are being linked to specific subsets of survey data. The Centre for Conflict, Risk and Safety Perception of the University of Twente [19] will use the ESCAPE Resource Map editor to allow researchers and other interested parties to share (social scientific) knowledge on public risk perception and public adaptive behaviour toward water safety. Their goal is to stimulate interaction between researchers and policy makers. Lawyers will have the opportunity to compose EPs based on existing legislation with all the annotations and the legal definitions, reports and case laws related to them.
It is expected the EPs will make research more transparent. As a result, one may hope that scientific/scholarly progress will be accelerated and that as a consequence, the bond between Science and Society will be strengthened.
References
- Narcis: the Gateway to Dutch Scientific Information http://www.narcis.info
- Schemm, Y. “Elsevier announces ‘Article of the Future”, 20 July 2009
http://www.elsevier.com/wps/find/authored_newsitem.cws_home/companynews05_01279 - Scopus overview http://info.scopus.com/overview/what/
- Cx-Nets: Complex Networks Collaboratory http://cxnets.googlepages.com/
- Hunter, J., “Scientific Publication Packages – A Selective Approach to the Communication and Archival of Scientific Output”, International Journal of Digital Curation 1(1), 2006.
http://www.ijdc.net/index.php/ijdc/article/viewFile/8/4 - Lagoze, C., Van de Sompel, H. Ore User Guide – Primer, 17 October 2008
http://www.openarchives.org/ore/1.0/primer - Rumsey, S., O’Steen, B. , “OAI-ORE, PRESERV and Digital Preservation. October 2008, Ariadne, Issue 57
http://www.ariadne.ac.uk/issue57/rumsey-osteen/ - Harmsze, F.A.P., Van der Tol, M.C., Kircz, J.G., “A Modular Structure for Electronic Scientific Articles”, Proceedings of the Conference on Information Sciences, Amsterdam, 1999
http://www.science.uva.nl/projects/commphys/papers/infwet/infwet.html - Cell Press beta, “Article of the Future”, 20 July 2009
http://beta.cell.com/index.php/2009/07/article-of-the-future/ - SURFfoundation. “SURFshare Tender Project”, 2008
http://www.surffoundation.nl/en/themas/openonderzoek/surfsharetenders/pages/default.aspx - ESCAPE, 2009 http://escapesurf.wordpress.com/
- W3C, “RDF Primer”, 10 February 2004 http://www.w3.org/TR/REC-rdf-syntax/
- eSciDoc https://www.escidoc.org/
- Cheung, K., Hunter, J., Lashtabeg, A., Drennan, J. “SCOPE: a Scientific Compound Publishing and Editing System”, International Journal of Digital Curation, 3(2), 2008
http://www.ijdc.net/index.php/ijdc/article/view/84 - Sefton, P., Downing, J., Day, N., “ICE-TheOREM: End to End Semantically Aware eResearch Infrastructure for Theses”, paper presented at the Open Repositories conference, Atlanta, 19 May 2009.
http://eprints.usq.edu.au/5248/1/ice-theorem-paper-OR09.htm - Bizer, Chr., Cyganiak, R., Heath, T., “How to Publish Linked Data on the Web”. 27 July 2007
http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ - Davis, I., Newman, R., “Expression of Core FRBR Concepts in RDF”, 16 May 2009 http://vocab.org/frbr/core
- In this terminology a ‘work’ stands for the creation of a person, an ‘expression’ is a version, translation or realisation of that ‘work’; the manifestation is the particular form (print, electronic); and the ‘item’ stands for an exemplar of the manifestation (a specific book in a library).
- iCRiSP, Centre for Conflict, Risk and Safety Perception, 2009 http://www.ibr.utwente.nl/icrisp/
- Brandaris 128: Ultra High Speed Imaging http://www.brandaris128.nl
- Centre for Public Order and Security, (in Dutch)
http://www.rug.nl/rechten/faculteit/vakgroepen/alrg/arw/centrumvooropenbareordehandhaving/index - Lagoze C., Van de Sompel, H., “ORE Specification – Abstract Data Model”, 17 October 2008
http://www.openarchives.org/ore/1.0/datamodel - Verhaar, P., “Enhanced Publications: Object Models and Functionalities”, 18 February 2009
http://www.driver-repository.eu/component/option,com_jdownloads/Itemid,58/task,view.download/cid,54/ - Open Archives Initiative, “The Open Archives Initiative Protocol for Metadata Harvesting”, 14 June 2002
http://www.openarchives.org/OAI/openarchivesprotocol.html - Lagoze, C., Van de Sompel, H., “Proxy URis”, 17 October 2008
http://www.openarchives.org/ore/1.0/http.html#Proxy - Dcterms, 2008 http://purl.org/dc/terms/
- FOAF specifications http://xmlns.com/foaf/spec/
- W3C, “Semantic Web Applications in Neuromedicine (SWAN) Ontology”, 17 September 2009
http://www.w3.org/2001/sw/hcls/notes/swan/ - Van de Sompel, S., Lagoze C., Nelson, M.L., “Adding eScience Assets to the Data Web”, paper presented at the Linked Data on the Web Workshop, 20 April 2009, Madrid
http://events.linkeddata.org/ldow2009/papers/ldow2009_paper8.pdf - DatapluS http://www.surffoundation.nl/en/projecten/Pages/Dataplus.aspx