Workshop on E-Research, Digital Repositories and Portals
This workshop was held at the University of Lancaster Centre for e-Science. The organisers were Rob Crouchley, Rob Allan and Caroline Ingram, there were 17 other attendees.
The main aim of this workshop was to explore the relationship between digital repositories, e-Research and Portals in the UK with a view to discovering e-infrastructure gaps and articulating requirements. The hosts had been commissioned by JISC to undertake the ITT: JISC Information Environment Portal activity - supporting the needs of e-Research [1].
The aims and objectives of the study were:
- To scope the requirements of e-research within the area of resource discovery with reference to portal type services and tools;
- To identify gaps and duplication within the current provision (with reference to JISC portal and other relevant activity) therefore to identify potential areas for network and possibly synergies that could offer a more holistic approach than currently available;
- To highlight issues and challenges that will need to be addressed in terms of serving e-Research requirements and in terms of enhancing portal activity for the Information Environment (IE) more generally;
- To make recommendations for portal-related activity that could be taken forward by JISC.
The workshop was recorded and will be made available as a resource on the Web for future reference (as has been done with previous workshops)[2].
The event was a lively one with a wide range of talks, these included: Data Webs and repositories for subject-specific collections in Zoology and for Geospatial Information; Open Archival publication and Digital Curation; the need to provide a greater variety of Information Environment tools, inter-operating with existing ones such as the Information Environment Service Registry (IESR); the need to link data with information and make both available in the ‘Discovery to Delivery’ cycle.
Most of the speakers are involved with JISC-funded projects, or have been at one time. We now provide a short summary of each presentation.
Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data
Simon Coles, National Crystallography Service, University of Southampton
Simon gave a far-ranging introduction from the perspective of a practising chemist who is also the head of the UK Crystallography Service run from University of Southampton. He has been inspirational in the e-Bank and R4L projects which are investigating tools to support the research life cycle and linking repositories directly into the laboratory for archival of data and associated meta-data. A key message was that underlying data is not suitable for the ‘printed page’ as attempts to represent it result in a loss of information. Published papers should contain the interpretation and intellectual input and a link to the actual data for re-evaluation if necessary. Current tools to discover and interpret data are however barely adequate. Simon reported that Google copes quite well with the chemical identifiers (InChI) used for indexing purposes. Current interfaces to the UK Crystallography Service are included in portals such as Intute: Science, Engineering and Technology (formerly PSIGate), and OAI is used for the repositories.
Is the Institutional or Subject Repository Best for Researchers’ Needs?
Fred Friend, University College London
Fred continued the line of discussion started by Simon, but addressed the question of where the archives should be hosted. There are several “factors” involved in this which include: cultural/ loyalty; political; convenience. In the end it was suggested that there is no perfect solution and that multiple repository types would persist. There were questions such as “who holds my publications if I change institution?” and “how do we classify a facility provider such as CCLRC which fits neither the institution nor subject category?”. In the end, users will probably choose either:
- what is easiest;
- what their colleagues are using; or
- the most prestigious because of external factors like the RAE.
Repository providers will need to supply tools which enable cross searching.
Institutional Management of Portals: Facilitating Portal Use across Boundaries
Chris Awre, University of Hull
Chris outlined the history of portals as we now see them and some JISC-funded projects. There are many different portals for different purposes. Within an institution this has to be managed and a common interface provided for the users with single sign-on. Standards such as WSRP (Web Services for Remote Portlets) and JSR-168 facilitate this and are supported by many Java frameworks. uPortal is the most commonly used open-source institutional portal. Administration, library and teaching/ learning functionalities are being brought together in the institutional portal. A portal can now be viewed as a thin layer that aggregates, integrates, personalises and presents information, transactions and applications to the user seamlessly and securely, according to their role, location and preferences, and in a manner independent of browser platform or device. Portals complement repositories by providing a user interface to the content.
CCLRC’s Institutional Repository
Catherine Jones, CCLRC
Cathy outlined CCLRC’s ePubs Project to develop an open archival repository for publications by its staff and facility users. Several interesting factors have emerged from this work. Take-up of ePubs is strongest in departments with an existing culture of publication and information collection. There can be a competitive element when staff see how many publications their peers are producing. Organisation of the content is also important. There needs to be a culture change in the deposit of publications for the success of ePubs. One possibility is keeping a version which can be deposited (e.g. a preprint). Cathy also mentioned some related work on the JISC-funded CLADDIER Project which is investigating linking data to publications, and on a digitisation programme for older technical reports going back to the early 1960s. This aims to capture much research-related technical information which would otherwise be hard to access.
EVIE: What Real Researchers Want (from their institutional environment)
Derek Sergeant, University of Leeds
Derek’s presentation took us through a survey of user needs for the JISC-funded EVIE Project Embedding a VRE in an Institutional Environment [3]. This analysed the research life cycle from the perspective of several different research disciplines. The main findings of this study were: that resource discovery was deemed most essential for the portal but provision of support for research outputs less so; that users wanted to see all databases and resources that are available for their subject, but want a single Google-style search box - however, no one size fits all users. The collaboration tools that were most popularly indicated were a meeting organiser and the ability to share files.
Integrative Biology Virtual Rtesearch Environment (IB VRE)
Matthew Mascord, University of Oxford
Matthew presented subject-specific work to enhance the IB project through the provision of a portal. A 3-month research process analysis was carried out with some key users from the heart disease and cancer modelling communities. This identified tools to be provided in the portal, including a repository interface for studies and results of model runs, a collaboration tool for sharing and annotating moving images (Vannotea is being evaluated), and a management tool (Transparent Approach to Costing (TRAC) is being used). The use of a USB digital pen is also being investigated as this is a means of remote sharing discussions of the underlying mathematics, particularly for the cancer models.
Using Familiar Data: The Link between e-Research, Portals and Preservation
David Giaretta, Digital Curation Centre, CCLRC
David explained some work going on in the EU-funded CASPAR Project which is concerned with ingestion and publication of scientific data. A key part of this process is the ‘representation information’ which captures knowledge of data bit structure and any other information needed to interpret it in the future.
JISC IE and Portals: Meeting the Needs of e-Research
Rob Allan, CCLRC
Rob presented the work which had been done to date in the JISC ITT mentioned in the Introduction to this paper. A full final report and additional background information will be available soon. In addition to the issues identified by the earlier presenters, Rob mentioned the need for researchers to combine and cross-search data and information, possibly in a collaborative manner, as well as the need for both wide (e.g. Google) and deep (e.g. subject-based) discovery facilities.
IESR: The Information Environment Service Registry
Ann Apps, MIMAS, Manchester
Ann presented the current status and functionality of the IESR, which is an example of the components of the IE. It is a machine-to-machine registry providing: descriptions of collections of resources (e.g. census, e-learning resources); descriptions of services that make resources available; agents; transactional services, all of which are needed for resource discovery. A portal builder would not need to know about all the separate underlying services. A number of protocols are supported and integration with other services, such as UDDI and RSS, is being considered. The biggest problem at present seems to be take-up of the IESR and people need to be made aware of the advantages of using it.
StORe: Source to Output Repositories
Ken Miller, UK Data Archive, Essex
Ken’s talk also addressed the linking of data (source) to publications (research output) in the JISC/ CURL-funded StORe Project. Ken reported on a detailed survey of researcher method and practice in the use and management of digital repositories, involving the research communities of 7 scientific disciplines [4]. Implementations are now beginning to address issues of how researchers can deposit and link data and share it with known peers.
The New Improved OpenDOAR (linked to SHERPA Project)
Peter Millington, University of Nottingham
Peter described a directory of open archive repositories, OpenDOAR, which includes institutional and subject repositories, funder’s OA archives, but not OA journals. An aim is to provide access to authoritative, evaluated data. Two related projects RoMEO and JULIET, are concerned with publishers’ policies, prototype API available and funders’ policies on mandatory deposition of results of research respectively. Although the project is relatively new, interest is growing. Only 33% of OAI archives currently have policies published via OAI-PMH and more are being encouraged to submit them via suggested defaults. More machine-to-machine interfaces are being developed.
Data Webs: New Visions for Research Data on the Web
David Shotton, University of Oxford
Zoologists are developing the BioImage database and finding ways to populate it. David described the difference between a database, in which data is confined, and a Data Web, in which metadata harvesting is used to discover data and information from independent sources. The BioImage Data Web registry provides interoperability, gathering, ordering and integrating the metadata from across the Web into a single searchable graph, then directs users to original sources. Primary data holders benefit by increased user traffic, but retain control locally. The project leverages many of the advantages of Web 2.0 and includes subject-specific semantic tagging.
The Geospatial Repository for Academic Data and Extraction (GRADE) Project
James Reid, EDINA
James looked at an infrastructure for improved geospatial data sharing through improving access via digital repositories. It addresses the issues of: lack of willingness to share; locating data; and explores mechanisms for sharing and locating data. A demonstrator is currently linked into LandMap and DigiMap services and has search, upload and download tools.
Discussion Session
Caroline Ingram, CSI Consultancy
The second day ended with an active discussion session lead by Caroline Ingram. The main research resource discovery issues drawn out from this meeting were:
- Need to emphasise relationship between data and outputs - for both preservation and later access;
- Users need easy access to resources and tools to find resources;
- Perceived barriers to use are not just technical - examples include awareness, Intellectual Property Rights (IPR), prestige, how the success of a project is judged;
- There have to be more resources for people to discover - should there be incentives to encourage more researchers to deposit data, results, reports and publications in repositories available to them?
- Sharing data is perceived to be important, but in some disciplines more than others - we need to explain the benefits;
- The need for people to adopt and use services such as IESR and OpenDOAR as these could facilitate easier resource discovery and reuse;
- How researchers handle their own research data is currently different to how they use other people’s data - tools to facilitate interoperability are needed;
It was stated that “It is better to address un-satisfied requirements now than to deploy new frameworks”. But David Shotton reminded the group that a tool has to be fit for purpose, not shoehorned into place.
Conclusion
In conclusion, the workshop was extremely useful and interesting, not only in terms of contributing to the e-research and portals study, but also more generally for the community with regard to considerations to make during future development for portals and digital repositories. Thanks again from the organisers to all participants, who contributed considerably to a successful couple of days discussions.
References
- JISC ITT: JISC Information Environment portal activity
http://www.jisc.ac.uk/fundingopportunities/funding_calls/2006/03/funding_portaleresearch.aspx - ReDRess Presentations
http://redress.lancs.ac.uk/Presentations.html - D.M. Sergeant, S. Andrews and A. Farquhar Embedding a VRE in an Institutional Environment (EVIE). Workpackage 2: User Requirements Analysis. User Requirements Analysis Report (University of Leeds, 2006)
http://www.leeds.ac.uk/evie/workpackages/wp2/evieWP2_UserRequirementsAnalysis_v1_0.pdf - StORe: the Source to Output Repositories. Reports available via JISCStore http://jiscstore.jot.com/WikiHome
or CURL (Consortium of University Research Libraries) http://www.curl.ac.uk/