Web Magazine for Information Professionals

eSciDoc Days 2011: The Challenges for Collaborative eResearch Environments

Ute Rusnak reports on the fourth in a series of two-day conferences called eSciDoc Days, organised by FIZ Karlsruhe and the Max Planck Digital Library in Berlin over 26-27 October 2011.

eSciDoc is a well-known open source platform for creating eResearch environments using generic services and tools based on a shared infrastructure. This concept allows for managing research and publication data together with related metadata, internal and/or external links and access rights. Development of eSciDoc was initiated by a collaborative venture between FIZ Karlsruhe – Leibniz Institute for Information Infrastructure and the Max Planck Digital Library (MPDL) and was funded by the German Federal Ministry of Education and Research. Today, both partners promote its further development as an open source project supported by a growing number of scientific communities. The source code of the eSciDoc infrastructure can be downloaded from the eSciDoc Web site [1].

This is the fourth time the two organisations have held the eSciDoc Days, providing extensive information about the challenges of collaborative eResearch environments. The topics addressed included the sustainable management of an ever-increasing volume of data of differing kinds throughout the research process and the provision of a publishing environment for research results together with research data. Scientists in research organisations and academic institutions, information experts and software developers discussed the latest developments in building the underlying digital information infrastructures. More than 100 international participants with a variety of backgrounds attended the conference, including repository managers, librarians as well as academics from different disciplines ranging from traditional engineering and sciences to the arts and the humanities.

Keynote: The JISC Managing Research Data Programme

Simon Hodson, Programme Manager, for Managing Research Data, JISC UK

Simon Hodson, Programme Manager for Managing Research Data (JISCMRD), was the keynote speaker at this year‘s eSciDoc Days. In his presentation entitled “Helping UK Universities meet the research data challenge, lessons and implications”, Simon addressed challenges, opportunities and benefits and shared his expertise in promoting and supporting good data management and data sharing in Higher Education and research. The rapidly growing amount of research data, together with the increasing awareness that such data represent a distinct asset, challenge academic institutions to re-think their research data management. It is not just a matter of addressing storage issues, but also one of guidelines for good practice as well as incentives to make research data available for verification and reuse.

eSciDoc

To address these challenges, JISC established its MRD programmes [2] that are funding projects to provide the UK Higher Education sector with examples of good research data management throughout the full data lifecycle. Similar discussions are underway in other countries, for example, the Alliance of German Science Organisations adopted the “Principles for the Handling of Research Data” in June 2010. The focus of the first JISCMRD from 2009 to 2011 was more on general topics like RDM infrastructure and systems, planning tools, advice and guidance, and also training materials. The current JISCMRD lays greater emphasis on institutional development with pilots and the transition to a service.

Simon’s presentation introduced the current UK projects and initiatives and highlighted the benefits of the UK national data centres in research data management that are summarised by the study report “Data centres: their use, value and impact”. A key factor in mastering these challenges is to clarify who will bear the costs. EPSRC, the main UK government agency for funding research and training in engineering and the physical sciences, stated that ‘it is reasonable and appropriate to use public funds for research to also fund the associated data management costs’. Eventually, effective data management throughout the research lifecycle will ensure that data will continue to work as productively as the research that produced them.

eSciDoc: Concepts and Generic Approaches

Malte Dreyer, MPDL & Matthias Razum, FIZ Karlsruhe

The first day provided an overview of eSciDoc with its concepts, basic services and customised applications as well as its growing community.

Malte Dreyer, Max Planck Digital Library (MPDL) and Matthias Razum, FIZ Karlsruhe, who are responsible for the eSciDoc project at their organisations, gave an overview of the concepts and generic approaches.

eSciDoc was developed for the global and interdisciplinary collaboration of academic communities. Librarians, software developers and researchers can access research data, create new methods of publication and pioneer new ways of academic collaboration. The open source platform eSciDoc has laid the foundations for an open and sustainable access to research data from academic institutions and research organisations. Core features of eSciDoc are a central repository with basic data management services (ie ‘eSciDoc Infrastructure’, mainly with Fedora) and numerous complementary services including access rights management, metadata organisation, support of data imports, analysis tools, etc. (‘eSciDoc Services’). The infrastructure provides the basis for specific applications (‘eSciDoc Applications’) that use its internal services and, if necessary, also integrates external services provided by third parties. The main asset of eSciDoc is certainly the modular structure behind it. The service-oriented architecture (SOA) allows for implementing any kind of eResearch scenario. The eSciDoc software is issued as open source under the “Common Development and Distribution Licence” and is available from the eScidoc Web site [1].

eSciDoc Applications and Their Usage Scenarios

The main target of the eSciDoc platform is to provide sustainable research data management together with a publishing environment for research data and research findings. Based on the eSciDoc Infrastructure, scientists from various disciplines are developing application software for individual research tools or for entire eResearch environments.

eSciDoc Applications for Publication Management

Michael Franke (MPDL) presented the current status of PubMan [3], an eSciDoc application which allows members of research organisations to store, manage and enrich their publications and provide these data for reuse by other Web services. PubMan has been running in production mode since May 2009. As an Open Source repository software, it is continuously developed. Both end-users and developers are invited to join the PubMan Community and to share ideas on improvements and new features.

Also including aspects of publication management, the following presentations were focused on research data management in different disciplines, ranging from art history images to electronic lab-book data.

eSciDoc Applications for Research Data Management in the Arts and the Humanities

The presentation given by Andreas Vollmer, Karsten Asshauer (Humboldt-University of Berlin), and Julian Röder (Free University of Berlin) was about their further development of the current eSciDoc image research application software, called Imeji [4]. They use it for the new virtual research environment (VRE) and repository at the Institute for Art and Visual History. Imeji enables users to create their own image collections, to use different metadata schemata describing various types of images, and to share them with artists or artist groups. Even large quantities of images, mainly historical collections and works of art, can easily be  uploaded and efficiently managed via the browser.

Jean-Philippe Magué (Ecole Normale Supérieure de Lyon) introduced Amalia, an eSciDoc-based solution for collaborative corpus management in the humanities and the social sciences. The project is creating an integrated access to digital data and documents using the strengths of eSciDoc: the powerful central repository for reliable data storage, authentication with object-based authorisation, and realisation of the whole document lifecycle together with version management. Currently, the only weakness in Amalia is that eSciDoc does not provide an appropriate workflow for digitisation projects. The recently started project Digitization Lifecycle has as its chief goal the comprehensive support of the digitisation endeavours within the MPG, including proposals for digitisation workflows [5]. Andrea Kulas (MPDL) described the development of this integrated service environment which provides generic tools. He also explained that in addition to the technical developments, guidelines for digitisation projects are to be established and an advisory group of experts will be formed. The project results will be made available to other institutions including the open source software.

eSciDoc Applications for Research Data Management in Science and Engineering

Guido Lonij (RWTH Aachen) gave a status report on a project which aims to build a virtual work environment for mechanical engineers focusing on the early design stages of gear mechanisms and the related transmission systems. The German universities of Aachen (RWTH) and Ilmenau (TU) and FIZ Karlsruhe are working together in this project, called e-Kinematix [6]. They are planning to integrate the existing information resources of the digital mechanisms, the gear library DMG-Lib, patent information systems, library systems, e-journals and e-books as well as visualisation tools. The technical realisation is on the basis of the eSciDoc Infrastructure which supports an open and modular development strategy.

The presentation of Masao Takaku (National Institute for Materials Science (NIMS)) outlined the activities of the growing eSciDoc community in Japan. NIMS eSciDoc is a research information infrastructure based on the eSciDoc Infrastructure using the eSciDoc Applications PubMan and Imeji. The information system for NIMS’s outreach activities consists of the NIMS Digital Library as a self-archiving and dissemination platform for publications, the NIMS Researchers Database SAMURAI [7] as directory service for researchers’ profiles linked with the Digital Library, and The Library of Materials Science as a research data repository. Interoperability with internal and external services is a key issue in the step-by-step expansion of content and functionality.

Jens Klump (GFZ German Research Centre for Geosciences) explained that projects at GFZ generate a great variety of research data - big data as well as small data- requiring a flexible data management toolbox combined with an institution-wide data management infrastructure. Research data objects may have more than one metadata object, or different metadata schemata are chosen according to the project requirements. To meet these requirements, the project panMetaDocs [8] enhances the current information exchange platform panMetaworks by changing the storage mechanism from a file system to the eSciDoc Infrastructure. Furthermore, the authentication feature of the eSciDoc Infrastructure is integrated.

On the second day, Matthias Razum (FIZ Karlsruhe) gave a project summary of BW-eLabs [9]. In this project the universities of Stuttgart and Freiburg, Stuttgart Media University (HdM) and FIZ Karlsruhe are working together to create digital environments for virtual and remote control laboratories based on the eSciDoc Infrastructure. Key concepts include the reproducibility of experiments, the discoverability of and access to primary data as well as the storage and curation of all artefacts that are created throughout the research process. Scientists doing research in nanotechnology will be given efficient and easy access to cost-intensive laboratory equipment by providing remote access to real laboratories - and by the allocation of virtual laboratories.

Information Infrastructure and Integration Projects

The second day’s sessions focused on the topics “Infrastructure and Integration Projects” and “Research Data Management” with up-to-date information and talks with practical relevance that also covered non-eSciDoc scenarios.

Patrick Harms (SUB Göttingen) opened with a brief overiew of the initiative DARIAH (DigitAl Research Infrastructure for the Arts and the Humanities). Entering into more detail, Patrick explained that DARIAH [10] will facilitate long-term access to, and use of, all European arts and humanities digital research data. The DARIAH infrastructure will be a connected network of people, information, tools, and methodologies for investigating, exploring and supporting work across the broad spectrum of the digital humanities. Within the project, the eSciDoc partner MPDL is producing specific generic software packages compatible with the repository software of the DARIAH infrastructure. Experience gleaned from eSciDoc service concepts are proving of great benefit.

An effective depiction of the project goals in the presentation by Jochen Büttner (Max Planck Institute for History of Science) gave his audience a vivid idea of how the project proposal Digital Scrapbook could be put into practice. A scrapbook provides an individual workbench for researchers with a window on relevant publications and research data together with tools for annotating, commenting and linking any digital sources (e.g. texts, images). Interlinking existing scrapbooks as soon as they have been released by researchers is also an important issue. The speaker is a strong supporter of the new publication paradigm in a world of dynamic objects: ‘publish as early as possible and update frequently.’ It is planned to use the eSciDoc software to realise such demanding project goals.

Andreas Vogler and Natasa Bulatovic (MPDL) presented the MPG-driven project Astronomer’s Workbench (AWOB) [11]. AWOB is a Web-based platform for communication and collaboration which enables scientists to access measuring data and scientific information jointly, supports them in sharing and discussing their findings, provides a means of handling metadata, and offers persistent storage. The challenge in the astrophysical community is to provide this environment for project teams with anything up to hundreds of collaborators handling huge amounts of data in observational and theoretical astrophysics. Applications such as AWOB show the potential of eSciDoc. The eSciDoc Service Infrastructure allows for using existing eSciDoc Applications such as PubMan, for plugging in existing external research tools and for developing new components.

Research Data Management

Ross King (Austrian Institute of Technology (AIT)) introduced SCAPE [12], an EU-funded project for building a scalable digital preservation environment. SCAPE is about planning and managing computing-intensive (digital) preservation processes such as the large-scale ingestion or migration of large (multi-Terabyte) datasets. The SCAPE consortium, co-ordinated by AIT, bundles a broad spectrum of expertise across 16 international partners (among them FIZ Karlsruhe) from universities, memory institutions, research labs, data centres, and industrial companies. SCAPE will enhance the state of the art of digital preservation in three ways: by developing infrastructure and tools for scalable preservation actions; by providing a framework for automated, quality-assured preservation workflows, and by integrating these components with a policy-based preservation planning and watch system. The project results will be validated within three large-scale testbeds from diverse application areas: digital repositories from the library community, Web content from the Web archiving community and research datasets from the scientific community.

Dirk Fleischer (IFM-GEOMAR) took a less technical, but more organisational and cultural view on research data management. Filling a research data repository [13] requires data sharing. Most researchers happily embrace the idea of sharing, but, in practice, the advantages often fail to outweigh their concerns and uncertainty about the further usage of their own research data. No doubt, data sharing opens up the possibility of independent scrutiny, fosters new partnerships and encourages further investigation of existing datasets. But currently, the advantages of data sharing are not particularly appreciated. Therefore institutions need to take action to provide appropriate research environments in order to make data capture as convenient as possible for researchers. Research sites should provide tools that capture raw data together with meta information at the time of creation, store analytic procedures as provenance information, and transfer data by a simple mouse click from one structured data repository to another.

eSciDoc: Getting Started and Demos

The subject of this session was how to take the first steps in setting up one’s own eSciDoc application using the eSciDoc Infrastructure and/or existing eSciDoc Applications. The new tools eSciDoc Browser [14] and eSciDoc Admin Tool [15] were presented in detail by Frank Schwichtenberg (FIZ Karlsruhe). These tools make it easier to start development with the eSciDoc Infrastructure by supporting the creation of one’s own environment(s) with content models, and to define organisational units, contexts, users and roles.

Demos of existing eSciDoc Applications showed how solutions can be developed and operated in various environments and scientific disciplines: PubMan with the service Control of Named Entities (CoNE) [3] for creation of institutional repositories, Virtual research platform for the digital preservation and dissemination of cultural heritage content (ViRR) [16] and Digitization Lifecycle (DLC) [5] for publication and usage of digitised objects and IMEJI [4] for image data management.

Developer Workshop

On the second day, an additional ‘hands-on’ workshop with comprehensive tutorials gave participants with a technical background the opportunity to familiarise themselves with installing the eSciDoc Infrastructure (Basic concepts of eSciDoc, Getting core services up and running) and developing their own individual eSciDoc Applications (Installation and customising, Configuring own eResearch environments).

Conclusion

Reliable, efficient research data management has become one of the most important demands on research institutions, scientific organisations, and enterprises involved in R&D. eSciDoc is a well-known open source platform for creating eResearch environments and represents the basis for numerous international R&D projects. Its role is:

  1. to ensure that research findings in the digital age can be securely stored and that controlled access is guaranteed;
  2. to assist researchers in collecting, processing and documenting their own research data, and to provide researchers with easy, fast and sustainable access to other research data;
  3. to support researchers in collaborating in virtual work groups across all disciplines and across all geographic frontiers; and
  4. to increase the visibility of research results by supporting sustainable publishing with due consideration for copyright issues.

References

  1. eSciDoc - the Open Source e-Research Environment  http://www.escidoc.org
  2. JISC Managing Research Data programme (JISCMRD) http://www.jisc.ac.uk/whatwedo/programmes/mrd.aspx
  3. PubMan - the eSciDoc Solution for Publication Management
    http://colab.mpdl.mpg.de/mediawiki/Portal:PubMan
  4. IMEJI - an eSciDoc application to manage images together with their specific metadata  http://www.imeji.org/
  5. Max Planck Digital Library (MPDL): Digitization Lifecycle - DLC  
    http://www.mpdl.mpg.de/projects/intern/dl_en.htm
  6. e-Kinematix - a virtual research environment (VRE) targeted at mechanical engineers
    http://misc.jisc.ac.uk/vre/projects/e-kinematix
  7. NIMS Researchers Database SAMURAI  http://samurai.nims.go.jp/index-e.html
  8. panMetaDocs - a tool for collecting and managing digital objects in a scientific research environment  http://sourceforge.net/apps/trac/panmetadocs/
  9. BW-e Labs: Knowledge Management in Virtual and Remote Labs  
    http://www.bw-elabs.org/
  10. DARIAH (Digital Research Infrastructure for the Arts and Humanities) 
    http://www.mpdl.mpg.de/projects/extern/dariah_en.htm
  11. Astronomer’s Workbench  http://www.mpdl.mpg.de/projects/intern/awob_en.htm
  12. SCAPE - SCAlable Preservation Environments  http://www.scape-project.eu/
  13. Data Management of Kiel Marine Sciences 
    http://www.geomar.de/en/institute/central-facilities/rz/daten/
  14. eSciDoc Browser - a Rich Internet Application allows browsing digital assets through a eSciDoc Core infrastructure  https://github.com/escidoc/escidoc-browser
  15. eSciDoc Infrastructure Administration Tool
    https://www.escidoc.org/JSPWiki/en/ESciDocInfrastructureAdministrationTool
  16. ViRR - a virtual research platform for the digital preservation and dissemination of cultural heritage content  http://colab.mpdl.mpg.de/mediawiki/ViRR

Author Details

Ute Rusnak
Head of Public Research and Education
FIZ Karlsruhe – Leibniz Institute for Information Infrastructure
Germany

Email: ute.rusnak@fiz-karlsruhe.de
Web site: http://www.fiz-karlsruhe.de

Ute Rusnak is Head of Public Research and Education at FIZ Karlsruhe. She holds a university degree in biology and has an additional qualification as a software engineer. For many years her focus has been on the design and development of reference databases and academic information portals. Currently, she provides project support for information infrastructure developments. She is interested in the ongoing changes in academic data management and its development towards the coverage of the entire research data lifecycle.