Web Magazine for Information Professionals

CEDARS

Kelly Russell explores the main deliverables of the CEDARS project: recommendations and guidelines, plus practical, robust and scaleable models for establishing distributed digital archives.

CEDARS logo

In recent years, libraries have been fortunate to have increasing access to new and innovative digital resources. A number of factors contribute to this trend:

Libraries in the UK over the past few years have certainly benefited from new funding, technology and business approaches. Relatively speaking it has happened very fast and consequently our reliance on these resources is growing exponentially

This article will begin with some background and "scene setting" related to digital preservation, move into consideration of some of the challenges for providing long-term access to digital resources, and consider briefly the three main "strategies" for preserving digital resources. Digital Preservation is a complex and exciting area – this article will merely skim the surface of what is a very big pond. Further reading is recommended below for more detailed consideration of many of these issues.

As has been suggested above information and networked technologies have provided libraries will new opportunities to provide faster and easier access to a wider range of scholarly information. Our is not the first generation to experience such as boom. If we cast our minds back to an earlier period in our history when a new technological innovation allowed cheaper and faster information delivery: the mid19th century saw the introduction of acidic paper. This was a cheaper and faster method for producing paper products. At the time few would realise the long-term implications of this new approach – the paper decayed dangerously quickly and could be reduced to dust within the century. Eventually, to compensate for the alarming speed with which much 19th century paper was deteriorating, there was a significant co-ordinated movement to try and raise awareness about this critical situation. In the 1980’s (i.e. 130 years after the introduction of acidic paper) the Commission for Preservation and Access commissioned a short film called Slow Burning Fires to try and bring the issue of brittle books to the fore. Fortunately a great deal of time and effort has been dedicated to alleviating this problem and the video contributed to this effort enormously. Most research libraries in the US now have Brittle Books programmes working to conserve/preserve fragile books by reformatting (to microfilm or, more recently, to digitised images).

If we compare this to present day we too are inundated with resources produced on a new medium. Libraries now rely increasingly on digital resources, many of which do not have a print equivalent. Examples include new electronic journals, web-based information gateways or electronic courseware. Since the introduction of digital material (over the last decade), the technologies through which a great deal of digital resources are delivered have continued to evolve, mutate and develop. In fact a great deal of the technology we used in the 1980’s is already obsolete. In 1997, (i.e. within a decade after the introduction of digital scholarly resources into libraries) the Commission for Preservation and Access commissioned another film, this time entitled Into the Future in an attempt to facilitate discussion about the implications of dependence on relatively unstable and unreliable technologies. It is estimated that within 2 years over 75% of the US governments information will be available only in digital formats. Libraries in the UK have pursued relatively conservative collections management policies, and continue to retain print journals where there are new electronic versions. However, electronic journals (even those in parallel) are beginning to include technology-dependent content such as animation, sound or video which is simply not available/possible in the print version. Although the Research Libraries Group (RLG) is currently funding a study to support this assumption, it is unlikely that many libraries in the UK have policies which address the long-term access to and preservation of electronic materials. Our investment in digital resources and our reliance on them unfortunately extends far beyond our current capacity to preserve and integrate them. We will need not only need to confront the changing technology and adapt to it but make adjustments in our organisational and management structures to allow us to do so indefinitely. Despite this potential gap, libraries continue to invest in digital resources that may not have a print equivalent and funding continues to be poured into digitisation projects without a clear understanding of the future of the digital files.

In research libraries, preservation of material has always been a priority. Indeed research libraries (and in particular the UK copyright deposit libraries) are largely responsible for preserving access to our most valued UK scholarly resources. However, as has been illustrated above, the urgency with which we must address this responsibility is not paralleled in a print world. Where it might take 50 years for book printed on acid paper to decay, a digital resource dependent on specific technology could be inaccessible in a matter of 5 years or less. It is probably the case, that libraries already contain material which is inaccessible. One needs only to think of print resources which were produced over the last decade with a digital supplement (e.g. a floppy disk). Imagine that the information contained on that disk was in a proprietary file format and was created to run in a specific (now possibly obsolete) technical environment. Would a library be able to deal with an 8" floppy disk (or even 5-14") presented to a member of staff on the help desk? Although arguably not a common problem, concern about technological obsolescence and loss of resources is a growing concern for libraries. What is more, libraries do not usually possess the know-how necessary for addressing such problems.

The Cedars Project

Although much of current digital libraries activity may have happened anyway in the fullness of time, there has been a surge of new funding (both in the UK and abroad) for the creation and application of new technology in libraries. Most notably in the UK, the eLib Programme has overseen the establishment of over 70 different projects and studies in the UK in digital libraries (this work has involved over 100 different higher education institutions directly). As a result of this, eLib has recognised that some investment into the problems of long-term access to digital resources was probably wise.

The Cedars project is funded as part of eLib Phase 3. It is administered through the Consortium of University Research Libraries and stands for "CURL exemplars in digital archives" (Cedars). The main objective of the project is to address strategic, methodological and practical issues and provide guidance in best practice for digital preservation. The project will do this by work on two levels – First, through practical demonstrator projects which will provide concrete experience in preserving digital resources and secondly, through strategic working groups based on broad concepts or concerns which will articulate preferences and make recommendations of benefit to the wider community. The main deliverables of the project will be recommendations and guidelines as well as practical robust and scaleable models for establishing distributed digital archives. It is expected that the outcomes of Cedars will influence the development of legislation for legal deposit of electronic materials and feed directly into the emerging national strategy for digital preservation currently being developed through the National Preservation Office of the British Library.

What is digital preservation?

Digital preservation is a process by which digital data is preserved in digital form in order to ensure the usability, durability and intellectual integrity of the information contained therein. Although a list of working definitions is still evolving, the project has agreed a more specific definition for digital preservation which is "storage, maintenance and access to digital objects/materials over the long term. this may involve one or more digital preservation strategies including technology preservation, technology emulation or digital information migration." According to Hendley, at a basic level, digital preservation involves the following tasks:

Cedars is most concerned with digital preservation beyond this basic level to ensure that the information can be retrieved and processed in a meaningful way in the future and is therefore useable.

Defining the Terms

The relative novelty of digital preservation as an issue for libraries, and the fact that expertise in this area resides in other sectors (e.g. electronic records management) means that defining what we mean by specific terms is sometimes contentious. Where librarians, archivists, records managers and computing technologists assemble, the term "archive" can (and does) mean very different things. Definition of terms continues to be an issue within the Cedars project. Outside of Cedars there are also some misconceptions that need rectification.. For example, there is some specific confusion within the wider HE library community about the differences between digital preservation and digitisation (converting analogue material into digital) as it is used in libraries as a preservation measure. Cedars needs to make clear that digital preservation involves the maintenance and long-term access to digital files. This may include digital image surrogates taken from a physical original (e.g. a manuscript) but "digital preservation" it is by no means limited just to digitised material. Digital preservation includes many materials for which there is no print equivalent – these materials have been referred to as "born digital". Indeed it is the latter which present the most complex problems of preservation because they are often inextricably linked to the technical environment in which they are produced. Digitised images may need routine "refreshing" (see above), but will generally not require disentangling from a technical environment which is often necessary with a ‘born digital’ object.

Although Cedars in its proposal to JISC has specified that it is concerned primarily with preserving the intellectual content of the resources and not the physical medium, this distinction is not always easy to make in practice. The waters are muddied by digital objects where they are intertwined with their technical environment and it is sometimes unclear where the intellectual content of the object stops and the technical environment begins. In a product like Microsoft’s Encarta, it may be nonsensical to imagine disentangling bits of sound, image and text in order to preserve them when the intellectual content of the object involves a specific technical manifestation of these different components. However, how far do we extend our considerations in this area? Although we can understand the interdependencies within a digital object at the conceptual level what does this mean practically? Does it mean, in the case of Encarta, preserving a specific copy of a version of Windows 3.1 and a copy of a specific version of the DOS operating system? If this depends on a specific PC with a particular configuration, do we preserve that as well? And if so, how?

Strategies for Digital Preservation

1) Technology Preservation

Preservation of the technical environment by conserving copies of the software and specific hardware is referred to as "technology preservation". For some digital objects this may be the best solution – at least in the short-term – because it ensure the material is accessible by preserving the access tools as well as the object itself. However, longer term this is more problematic. For example, issues of space and maintenance for hardware as well as costs may make this an impractical solution for the long term.

2) Technology Emulation

There are other options however which focus on preserving the environment for a digital object. It is possible in the case of something like Encarta to preserve data about how the technical environment was created in the first place in order to re-create a new technical environment which mimics the original. If we preserve enough information about Windows 3.1 then instead of preserving it, we can simply re-engineer it again when we need it. This option is called "emulation" and is probably the most complex option for preserving digital resources. This relies on a robust system for preserving the metadata which describes the technical environment. In an emulation situation, like the case described above, nothing is done to the original object (it is left simply as a bitstream) and it is the environment which is re-created. The costs of emulation are as yet unknown and it is expected that the costs of re-creating complex technical environments could be astronomical. However, unlike the technical preservation model, the costs fall further along in the resource’s lifecyle. Instead of spending money now and for the foreseeable future, by preserving both software and hardware, emulation loads the costs at the far end. If a resource is needed in future, only then are resources required to emulate the necessary technological environment. Emulation is more like the "just in time" option where technology preservation means we will have the necessary soft/hardware "just in case". (However, this option also requires a leap of faith in terms of the power of future technologies and in the abilities of future software engineers).

3) Migration

Where the two options described above, focus on the environment of the object and preserving the resource through re-creating or preserving the necessary operating environment, another strategy for digital preservation is what has been called "migration". A report commissioned by the Research Libraries Group and the Commission for Preservation and Access in the US, helpfully distinguishes between migration and what has been termed refreshing" (mentioned above). The reports suggests that

….migration is a broader and richer concept than "refreshing"….Migration is a set of organized tasks designed to achieve the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation. The purpose of migration is to preserve the integrity of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology.

It is this last strategy in which many libraries and archives are already involved and many believe that this is the most practical approach, at least for the short and medium term. For objects like Encarta or electronic journals where they contain bits of sound and video the issue of migration is not a simple one. The costs of migration may, in the long run, exceed those costs necessary for preserving either the technology itself or the detailed technical specification which will allow future emulation.

It is clear, even at this early stage in the Cedars project, that costs are a key component for providing guidance to libraries on digital preservation. The strategies described above all require resources. When and how these resources are deployed will depend on the perceived value of the digital object. Through its demonstrator projects, Cedars aims to explore the strategies described above and make recommendations on cost implications.

This article has not addressed the issue of selection – that is, what resources are selected for long term preservation and how these decisions are reached. This is a critical issue. The exponential growth of digital resources and our dependence on them together with the speed with which technology changes means that assessing the value of digital resources can be much more difficult

Conclusions

This article has attempted to provide a simple overview of digital preservation and the issues with which libraries will be increasingly concerned. This is not the first time libraries have been confronted with technical innovation but the urgency with which we need to address it may be unprecedented. There are strategies available. However long term solutions to the problems will require cooperation across traditional boundaries to include the experience gained outside of the library, including university computing services, university archives, electronic records managers, data suppliers and others both within and without of the public sector. Cedars will work to build these partnerships and to produce robust guidance for HE libraries as well as scaleable practical models for distributed digital archives. More information about the project can be found on the Web at: http://www.curl.ac.uk/projects.shtml.

Further Reading

Web Sites

1. The Arts and Humanities Data Service Digital Preservation Resources

2. Preserving Access to Digital Information - maintained by the National Library of Australia

Other Resources

1. Preserving Digital Information Report of the Task Force on Archiving of Digital Information commissioned by The Commission on Preservation and Access and The Research Libraries Group. 1996

2. JISC/NPO Preservation Studies (available from on the Web at http://www.ukoln.ac.uk/services/elib):

3. Building the digital research library: preservation and access at the heart of scholarship. P.S. Graham. Follett Lecture Series, Leicester University, 19 March 1997.

4. Ensuring the longevity of digital documents. Rothenberg J. Scientific American, 272 (1), 1995.

5. Guidelines on Best Practices for Using Electronic Information. DLM- Forum Luxemburg: Office for Official Publications of the European Commission, 1997.

6. Intellectual preservation: electronic preservation of the third kind. P.S. Graham. Washington, D.C.: Commission on Preservation and Access, 1994.

7. Preserving the intellectual record: a view from the archives.M. Hedstrom. in Networking and the Future of Libraries 2: Managing the Intellectual Record. Mowat et al (eds). London:Library Association Publishing, 1995, 179-191.

8. Reference Model for an Open Archival Information System (OAIS). Consultative Committee for Space Data Systems, 15 April 1998 (http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html)

9. Report of the First National Consultative Meeting on the Management of Australian Physical-Format Digital Publications National Library of Australia, October 1997

10. The Electronic Information Initiative, Phase 1 Final Report: The National Agricultural Library

11. Digital preservation: a time bomb for Digital Libraries M. Hedstrom http://www.uky.edu/~kiernan/DL/hedstrom.html

Author Details

Kelly Russell
CEDARS Project Manager
Edward Boyle Library
University of Leeds
Leeds
LS2 9JT
Email: k.l.russell@leeds.ac.uk