Digital Curation and Preservation: Defining the Research Agenda for the Next Decade
Over recent years it has become clear that accessing and preserving digital data is increasingly important across a wide range of scientific, artistic and cultural activities. There has been a growing recognition of the need to address the fragility and accessibility of the digital information collected in all aspects of our lives. Access to digital information lies at the heart of the scientific and technical innovation vital for modern economies. A two-day workshop took place over 7 - 8 November at the University of Warwick to address these issues and to map out a future research agenda for digital curation and preservation. Sponsored by JISC, the Digital Curation Centre (DCC), the British Library and the Council for the Central Laboratory of the Research Councils (CCLRC), the invitation-only event drew a wide range of national and international experts to explore the current state of play with a view to shaping future strategy.
Day One
Malcolm Read, Executive Secretary of JISC, delivered the welcome address, saying at the outset that there had been a 'sea-change' in this area and that 'the debate is now very much about 'how?' we preserve digital information rather than 'whether?''. Policy makers had taken up this issue with the Office of Science and Technology (OST) working group on e-Infrastructure establishing 'curation and preservation' of digital information as one of six key components in the future national e-infrastructure. The OST working group is charged with mapping out relevant developments, gaps and challenges in digital curation and preservation over the next 10 years, Dr Read said.
He looked forward to a time when 'research resources and research data will be very much more readily available than is the case at the moment.' However, he warned, a 'culture change' was needed to achieve this vision. The issue of making data more openly available was an important one, and the time was right to look at this question more closely, something he hoped the two-day workshop would address.
Professor John Wood, Chief Executive of the CCLRC, followed with the keynote address. Setting the scene for the two days, Professor Wood spoke of the policy background and in particular the House of Commons Science and Technology Committee's report last year which called for the development of institutional repositories to store research outputs and, echoing Dr Read's words, their linking to primary datasets. Digital media are especially vulnerable to loss, he continued, with fewer that one in ten software packages lasting more than ten years. These facts made the deliberations of the attendees of particular importance to the development of the national research infrastructure over the coming years, Professor Wood concluded.
Drivers, Data Life Cycle Management and Technical Issues
Delegates split into three groups for the afternoon session. One looked at the drivers - policy, technical and cultural - behind digital curation and preservation and the barriers to their effective implementation. Professor Christine Borgman of UCLA spoke of the 'value chain of information' - the relationships between information sources, their history, their provenance - which was at the heart of scholarly communications. Greater value, status, and indeed reward, needed to be given to the role of information management - the importance of documentation, of adding full metadata to allow greater visibility of research outputs, for example - within the scholarly and research community, she said. Other barriers or disincentives to a fully open scholarly communications environment were, she said, the protectiveness of many scholars who had acquired their data often with great difficulty, and questions, and often misunderstandings, about copyright and intellectual property. The need to mitigate or overcome these barriers, through appropriate incentives was, she said, the 'highest priority' facing policy makers in this area.
Professor Laurie Hunter of the University of Glasgow looked at the question of digital preservation from the perspective of an organisation's or community's holding of an intangible asset. It was a characteristic of the post-industrial economy, he said, with its emphasis on services as opposed to production, and its concomitant dependence on digital media, that organisations often hold a wealth of intangible assets the value of which they needed to understand more fully. We need to see spending decisions as investments, Professor Hunter said, with a clear view of the costs and benefits of activities through the employment of rigorous systems of metrics. A way forward was to examine more closely the social and organisational benefits of digital curation and preservation, as well as a greater emphasis on education.
Robert Sharpe of Tessella, a commercial organisation specialising in digital preservation, spoke of the 'sticks' and 'carrots' behind organisational efforts at digital preservation and said that too many organisations were responding to the sticks rather than the carrots, whether they be legal or funding requirements. There were a range of other reasons why preservation procedures should be adopted more rigorously by organisations and a more effective business case needed to be made for these, he concluded.
A second group looked at the question of data life cycle management, exploring the question of where delegates hoped the community would be in ten years' time. The session began with thought-provoking presentations on the scholarly life cycle by Dr Jeremy Frey of Southampton University, on the data policies of the research councils by Mark Thorley of NERC, and on life cycle costings for digital material by Helen Shenton of the British Library.
In discussion, delegates spoke of the need for an environment in which data resources were interoperable, easily discovered and seamlessly searched, with appropriate appraisal mechanisms for the selection of resources. There is also a need for a unified and clearly understood policy framework, as there is for trust within the management of the digital life cycle, for better understanding of the economics and costs over time and how these are influenced by and varied by different factors, with technical competence and a continued emphasis on provenance of information sources also being absolutely vital.
The question of education was a recurrent theme. Where are the digital curators of the future? asked one delegate, and how do we train them? While the barriers between libraries, archivist and technical specialists are breaking down slowly, the wider question of training and education needed to be addressed, delegates agreed.
Expectations surrounding the networked world had outrun delivery, argued one delegate, which means that digital curation and preservation are being assumed both by policy makers and end-users, with inappropriate funding and status structures. The concept of 'academic literacy' was changing, however, with greater technical skills now apparent but greater incentives were needed for researchers to curate and archive themselves. Once again, appropriate business models were needed, delegates agreed. With research centres, universities and national data centres all involved in digital curation and preservation, there was a need for them to work together, perhaps at different stages of the digital life cycle and to understand how such models might work in practice. Funding, however, was perhaps the greatest priority - to incentivise researchers, to deliver the appropriate infrastructure and to ensure the long-term sustainability of services.
The third group looked at technical aspects, including curation techniques and the underlying technologies and hardware. As with the other groups, there was first a review of the current state of play, with Olaf Barring from CERN and Steve Hughes from JPL setting the scene. The view to the 5- and 10-year horizons was addressed by Bruce Wright, UK Met. Office, and Reagan Moore from the San Diego Super Computer Centre, well known for his SRB (Storage Resource Broker) distributed storage system.
The vision was one of increasing 'virtualisation' as a way to keep the knowledge that we wish to preserve in a way that is independent of the underlying systems. This will ensure that as those underlying systems change, as they will, the knowledge will remain accessible, and understandable. Lou Reich, NASA/CSC, who was the editor of the OAIS Reference Model, described several of the significant challenges which remain to be addressed.
The majority of this group's time was spent in detailed discussion of the technical issues, and a report from that session [1] is available on the DCC Web site.
Day Two
Speaking on the second day of the workshop, Peter Tindemans, Chair of the European Union's Task Force on Permanent Access, began by giving some of the background to the Task Force's work, and in particular the EU conference in November 2004 in The Hague at which participants agreed the need to create a European infrastructure for the long-term preservation and access (LTPA) to the records of science. The Task Force was subsequently set up with an LTPA remit and with a focus on research and the testing and prototyping of new technologies.
Peter Tindemans spoke of the need to devise new approaches to the development of IT solutions from the perspective of the durability of information, and of the need for life cycle costings, value chain analyses and other economic models to support sustainable long-term preservation. There are two main strands, he said, driving this agenda: cultural heritage and the needs of the scientific community in the digital age.
Addressing the first strand, he said there was a need to get stakeholders involved in issues of digital heritage at 'board level' by emphasising the economic and cultural importance of LTPA for their own strategic development. Political attention, through organisations such as the EU and UNESCO, was also supporting this particular driver. As for the scientific community, he said, their need was greatest.
There was a need for more focused action by a critical mass of stakeholders to create an infrastructure to support LTPA as well as the need for a consensus among communities and organisations around what was required to place these issues within the 'real life' structure of organisations. Among the conditions which Peter outlined were: frameworks for metadata, persistent identifiers and registries; a common framework of principles and guidelines for the management of access and rights; financial mechanisms for developing and testing tools and techniques, and common certification and accreditation mechanisms.
There was, however, Peter Tindemans concluded, a consensus of sorts emerging which could be built upon, around standards, and through national and EU-funded projects, as well as public-private agreements such as those between libraries and publishers. Finally, calling on stakeholders to continue their current efforts and their investments in this area, he said that 1 million Euros were needed to establish the European Alliance for Permanent Access to the Record of Science, an action recommended by the Task Force to meet the challenge of LTPA.
Neil Beagrie, BL/JISC Partnership Manager, gave the concluding address and summed up the background and aims of the workshop as well as the next steps that need to be taken. Neil began by giving the background to the work of the Preservation and Curation Working Group of which he is chair. The UK Government's 'Science and Innovation Investment Framework: 2004 - 2014' identifies systematic preservation of digital information as an important component of the information infrastructure. The Department of Trade and Industry (DTI) established 6 sub-groups to take forward the published framework, one of them being the Preservation and Curation Working Group. A report from the working group will be delivered to the DTI by the end of March 2006. The Warwick workshop would be an important input into this process, he said.
The second area of discussion in the workshop concerned developments in the European Union. Summarising potential future initiatives including proposals for the Framework 7 programme, final projects emerging from Framework 6, and the consultation on the i2010 digital libraries initiative, Neil reported that a number of working groups, including ESFRI (European Strategy Forum on Research Infrastructures), the e-Infrastructure Reflection Group (e-IRG), and the KB (Koninklijke Bibliotheek) Task Force on Permanent Access to the Record of Science, were now working on shaping and influencing the EU agenda for future research and infrastructure. Many of these working groups would be reporting at the EU Research Infrastructures conference in Nottingham in December 2005, including the Task Force which would report on plans for a European Alliance for Permanent Access to the Record of Science.
Conclusion
Neil concluded by outlining the recommendations of the previous Warwick workshop held in 1999 and reviewing the progress that had been made in implementing them over the subsequent five years. Many had been addressed with substantial success, such as the need for greater awareness of the issue of preservation and the need for cross-sectoral communication. He cited the work of the Digital Preservation Coalition in achieving these important goals. Guidelines and support materials too had been developed and continued to be developed and were of value to the community, while a growing network of preservation centres was, he said, becoming established. However, areas where he felt most work was still needed were in the development of certification criteria, checklists to determine complexity and cost, and new research, particularly into emulation and dynamic data. In discussion all were agreed that much had been done in the last five years and that important policy activity was emerging both in the UK and in Europe. However there was still a great deal to be done and a need to continue to step up activity as part of the UK research agenda for the next decade.
References
- Digital Curation and Preservation: Defining the research agenda for the next decade, Warwick Workshop 7/8 November 2005 Curation Services and Technologies Session Report
http://www.dcc.ac.uk/training/warwick_2005/Warwick_Workshop_report.pdf