Web Magazine for Information Professionals

Digital Curation: Digital Archives, Libraries and e-Science Seminar

Neil Beagrie and Philip Pothen report on the Digital Preservation day in October 2001, held in London.

Digital preservation remains a significant and growing challenge for libraries, archives, and scientific data centres. This invitational seminar held in London on the 19th October sponsored by the Digital Preservation Coalition and the British National Space Centre, brought together international speakers to discuss leading edge developments in the field. Three developments were key to the timing and organisation of this international event: firstly, the imminent approval of the Open Archival Information Systems (OAIS) Reference Model as an ISO standard; secondly, the launch of the Digital Preservation Coalition (DPC), a cross-sectoral coalition of over 15 major organisations; and, finally, the development of the e-science programme to develop the research grid in the UK.

Neil Beagrie (JISC Digital Preservation Focus and Secretary, Digital Preservation Coalition) in outlining the main objectives of the event, suggested that there was a need to raise the profile of relevant british and international standards and initiatives in the UK, to show their practical application in the sciences, libraries and archives, and to illustrate their role in securing and promoting access to digital resources for current and future generations.

The first session of the seminar dealt with the OAIS Reference Model and digital archive certification.

Lou Reich of NASA spoke about how NASA and the Consultative Committee for Space Data Systems (CCSDS) had been central to the development of the OAIS Reference Model, but how they had ensured widespread consultation and cooperation with the archive and library communities, both in the US and internationally. The resulting model had been developed, therefore as an ‘open’ and public model and was already being widely adopted as a starting point in digital preservation efforts. Lou Reich explained that a new version of the OAIS Reference Model was delivered to the ISO and CCSDS Secretariats in July 2001 for a two month public review period, and a final standard should be produced in late 2001. Dr Reich concluded by outlining some of the uses and implementations of the OAIS Model, including the Networked European Deposit Library (NEDLIB), the National Library of Australia, the CEDARS project in the UK, the US National Space Science Data Center (NSSDC), the US National Archives and Records Administration (NARA), and others.

Bruce Ambacher of NARA spoke on certification efforts based on the OAIS model. The October 1999 Archival Workshop on Identification, Ingest and Certification (AWIICS) was particularly involved in the area of Certification. The AWIICS Certification Working Group developed a preliminary checklist for certification that develops best practices and procedures for each aspect of the OAIS model, including legal issues, mission plans, compliance with relevant regulations, relationships with data providers, ingest procedures, data fidelity and life-cycle maintenance. Further work on certification based on OAIS was now being proposed.

Robin Dale of RLG spoke on the RLG and OCLC report Attributes of a Trusted Digital Repository for Digital Materials. She also emphasised the importance of certification as a key component of a trusted digital repository; self-assessment, she said, will not always be adequate. There is a need, therefore, for certification practices to be formalized and made explicit. The AWIICS draft report had suggested the need for an official certifying body, for identifying the attributes to be measured and to define the conditions of revocation of certification. But many questions still remained to be answered, including, who will be on such a body, who will set up this body and which stakeholders will be represented on it.

The importance of collaboration was an important theme of the day, and David Ryan of the UK Public Record Office, as a founder member of the DPC, emphasised this point. He outlined the PRO’s mandate to store and make available comprehensive ‘born digital’ public records and how its activities and future plans in this area were a core part of the PRO e-business strategy.

An important discussion followed in which the relative costs of printed and digital storage were discussed. The duty of care and costs associated with traditional special collections and archives was cited. With digital storage the costs of computer storage are diminishing constantly so costs would be related primarily to staff effort required for long-term preservation. The degree of automation which could be implemented for future migrations and preservation efforts, would therefore be critical to relative and absolute costs over the long-term. It was argued that issues such as appraisal and migration represented costs that were ongoing. It would be easy to underestimate the costs of long-term digital preservation where it was dependent on human intervention and perhaps could not be scaled across collections.

In the second session of the day – on data curation and the Grid - Professor Tony Hey, Director of the UK e-science programme, began by stating that e-science is about global collaboration in key areas of science and the next generation of infrastructure that will enable it. He quoted John Taylor, Director General of the Research Councils who said that ‘e-science will change the dynamic of the way science is undertaken.’ NASA’s Information Power Grid has promoted a revolution in how NASA addresses large-scale science engineering problems by providing a persistent infrastructure for ‘highly capable’ computing and data management services. The Grid, by providing pervasive, dependable and inexpensive access to advanced computational capabilities will provide the infrastructure for e-science, said Professor Hey.

The UK e-science initiative represents £120m worth of funding over the next three years to provide next generation IT infrastructure to support e-science and business, £75m of which is for Grid applications in all areas of science and engineering, £10m for the supercomputer and £35m for the Grid middleware. It uses SuperJANET and all e-science centres will donate resources to form a UK National Grid.

Peter Dukes from the Medical Research Council outlined the overall scope of its programmes and its current work to develop a data archiving policy. Outlining some of the access issues, such as rights and ownership, consent and ethics, Peter Dukes stressed that the research Grid would provide tremendous opportunities for advancing science. However work on research data policies and practice was also needed to help unlock the potential of the Grid for collaborative scientific research.

The main challenges involved in scientific data curation are a rapidly increasing capability to generate data in many different formats in the physical and life sciences, the increasingly expensive facilities needed to generate this data, the irreplaceability of much of the data, and the increasing need for access to be on a global scale. David Boyd of the CLRC e-science Centre looked at how the Grid can help address some of the challenges of curating scientific data.

Paul Jeffreys, Director of the Oxford e-Science Centre, spoke about the Oxford-wide collaboration that the centre is involved in, such as the work with the Oxford Digital Library, the Oxford Text Archive and Humbul. Although, Dr Jeffreys said, global science is driving the initiative, the interest is much wider, and these areas of collaboration suggest that the centre will become a core part of the University’s life.

The key issue that came up in the second discussion session was the importance of data curation and the need to look at data policies, archival models, and how to incentivize the submission of primary research in digital form with appropriate metadata. Ideas put forward included: financial incentives; increasing and enhancing recognition of the value of digital resources in general among the research and scholarly community; through to persuading funding councils, the RAE and publishers to take these matters more seriously and to build such considerations into their funding and reward processes.

In the final session of the day – on the curation of digital collections - Maggie Jones and Derek Sergeant from the CEDARS project funded by JISC, spoke about some of the lessons of the project, including the centrality of metadata to the preservation of resources, and the increasing consensus that is emerging about standards.

At the moment the British Library’s digital collecting was on the basis of a voluntary deposit, along with purchases made by the BL, as well as created digital resources undertaken by the BL itself. Among its main priorities, said Deborah Woodyard, Digital Preservation Coordinator at the British Library, was to ensure improved coverage of the UK’s National Published Archive, to increase the collection of digital materials and to continue to collaborate with other major players in the field.

Kevin Ashley of the University of London Computing Centre (ULCC) and the National Digital Archive of Datasets (NDAD) spoke about ULCC’s role under contract to the PRO and others for digital preservation and their practical experience of running a digital preservation service. He spoke about the OAIS model; it advantages were clear, he said in that it eases procurement of hardware and software, and interworking with compliant systems, as well as migration tasks, but there are question marks about interworking with traditional repositories, as well as its working with mixed-mode models, questions which will need to be looked at closely in the future.

Discussion continued on the potential value and limitations of the OAIS model. Its value in the early stages of system design and development was recognised but at the same time it would not provided he detailed implementations and practice. Documenting and sharing practical experience in this area will be vital.

In summarising the lessons and next steps that came out of the day, David Giaretta (Rutherford Appleton Laboratory/BNSC and chair of CCSDS panel developing the OAIS model) noted the next international CCSDS meeting which would discuss OAIS and archive certification was being held in Toulouse in the following week. He would report on the UK seminar and its discussion. It was also important to continue co-ordination across sectors and the Digital Preservation Coalition could be immensely valuable in achieving this.

Neil Beagrie suggested that closer involvement between JISC and the research councils to support users of the Grid and primary research data would be important. The creation of a JISC research committee chaired by Tony Hey could clearly have an important role in this area. He hoped to see close links with e-science and a growing membership of the DPC amongst the research councils and data centres. Tony Hey noted he was open to discussion of possible projects in data curation or indeed other areas raised during the seminar but it was important to note the need currently for industry involvement and funding in such proposals.

The seminar was felt by participants to have been a great success with outstanding speakers, leading-edge discussion of theory and practice, and to have established an essential cross-sectoral dialogue. For those who wish to learn more about the seminar and presentations, a fuller meeting report and speakers presentations are now available on the JISC website at:

http://www.jisc.ac.uk/dner/preservation/digitalarchives.html.

 

Neil Beagrie and Philip Pothen