Web Magazine for Information Professionals

Metadata for Digital Preservation: An Update

Michael Day discusses 'Metadata for Digital Preservation'.

In May 1997, the present author produced a short article for this column entitled "Extending metadata for digital preservation" [1]. The article introduced the idea of using metadata-based methods as a means of helping to manage the process of preserving digital information objects. At the time the article was first published, the term 'metadata' was just beginning to be used by the library and information community (and others) to describe 'data about data' that could be used for resource discovery. So, for example, the most well-known metadata initiative was (and remains) the Dublin Core Metadata Initiative, initially concerned with defining a core metadata element set for Internet resource discovery [2]. It is now widely accepted that identifying and recording appropriate metadata is a key part of any strategy for preserving digital information objects.

This brief update will report on a number of more recent initiatives that have relevance to preservation metadata, but will take a specific look at currently proposed digital preservation strategies and the development of recordkeeping metadata schemes. It will also introduce the Open Archival Information System (OAIS) reference model that is beginning to influence a number of digital preservation based projects. This review of activities is partly based on a review of preservation metadata initiatives carried out for the Cedars project in the summer of 1998 [3], but it has been updated to include reference to additional projects and standards.

Digital preservation strategies and metadata

If one ignores the technology preservation option, there are currently two main proposed strategies for long-term digital preservation: first the emulation of original hardware, operating systems and software; and secondly the periodic migration of digital information from one generation of computer technology to a subsequent one [4].

Emulation strategies are based on the premise that the best way to preserve the functionality and 'look-and-feel' of digital information objects is to preserve it together with its original software so that it can be run on emulators that can mimic the behaviour of obsolete hardware and operating systems. Emulation strategies would involve encapsulating a data object together with the application software used to create or interpret it and a description of the required hardware environment - i.e., a specification for an emulator. It is suggested that these emulator specification formalisms will require human readable annotations and explanations (metadata). Jeff Rothenberg says, for example, that the emulation approach requires "the development of an annotation scheme that can save ... explanations [of how to open an encapsulation] in a form that will remain human-readable, along with metadata which provide the historical, evidential and administrative context for preserving digital documents" [5].

Migration - the periodic migration of digital information from one generation of computer technology to a subsequent one - is currently the most tried-and-tested preservation strategy. However, as Seamus Ross points out, data migration inevitably leads to some losses in functionality, accuracy, integrity and usability [6]. In some contexts, this is likely to be important. David Bearman, for example, has pointed out that if electronic records are migrated to new software environments, "content, structure and context information must be linked to software functionality that preserves their executable connections" [7]. If this, however, cannot be done, he suggests that "representations of their relations must enable humans to reconstruct the relations that pertained in the original software environment". Successful migration strategies will, therefore, depend upon metadata being created to record the migration history of a digital object and to record contextual information, so that future users can either reconstruct or - at the very least - begin to understand the technological environment in which a particular digital object was created.

There is currently a debate about the relative merits of emulation and migration strategies. Rothenberg, for example, claims that migration has little to recommend it and calls it "an approach based on wishful thinking". He criticises the approach because he feels that it is impossible to predict exactly what will happen and because the approach is labour-intensive and expensive [8].

In the absence of any alternative, a migration strategy may be better than no strategy at all; however, to the extent that it provides merely the illusion of a solution, it may in some cases actually be worse that nothing. In the long run, migration promises to be expensive, error-prone, at most partially successful, and ultimately infeasible.

Bearman questions the basis of this opinion and opines that Rothenberg is mistaken because he assumes that what needs to be preserved is the information system itself, rather than that which the system produces. By this he means capturing "all transactions entering and leaving the system when they are created, ensuring that the original context of their creation and content is documented, and that the requirements of evidence are preserved over time". In any case, Bearman argues that the emulation approach is itself extremely complicated [9].

Rothenberg's proposal does not even try to define the elements of metadata specifications that would be required for the almost unimaginably complex task of emulating proprietary application software of another era, running on, and in conjunction with, application interface programs from numerous sources, on operating systems that are obsolete, and in hardware environments that are proprietary and obsolete.

The debate is likely to continue for at least as long as it takes to test the emulation approach. For example, some work is being carried out into the use of emulation techniques for preservation as part of the JISC/NSF funded Cedars 2 project [10] and as part of the European Union-funded NEDLIB project.

Regardless of whether emulation-based or migration-based preservation strategies are adopted - and it is likely that both will have some role - the long-term preservation of digital information will involve the creation and maintenance of metadata. Clifford Lynch describes the function of some of this as follows[11]:

Within an archive, metadata accompanies and makes reference to each digital object and provides associated descriptive, structural, administrative, rights management, and other kinds of information. This metadata will also be maintained and will be migrated from format to format and standard to standard, independently of the base object it describes.

As a result preservation metadata has, therefore, become a popular area for research and development in the archive and library communities. Archivists and records managers have concentrated on the development of recordkeeping metadata, while other groups have dealt with defining metadata specifications for particular needs. For example, the library and information community has initiated some important work:

Archivists and "Recordkeeping metadata"

As has been mentioned, another group with a keen interest in long-term digital preservation are the archives and records management communities. Traditional approaches to the archival management of records and archives tended to be based on physical records being transferred into the physical custody of an archival repository at the end of their active life-cycle. The growing existence of digital records, however, has resulted in a widespread reassessment of archival theory and practice [15]. For example, in the digital environment, it is no longer sufficient for archivists to make decisions about the retention or disposal of records at the end of their active life. By that time it may be too late to ensure their preservation in any useful form. Greg O'Shea of the National Archives of Australia has commented that the ideal time for archivists attention to be given to digital records, "is as part of the systems development process at the point systems are being established or upgraded, i.e. even before the records are created" [16]. Some archivists, particularly Australian ones, have begun to shift attention from the traditional 'life-cycle' approach to records and have started to develop archival management practices based on the concept of a 'records continuum'.

A continuum approach to records means that a major change in the understanding of archival description (or metadata) is required. Sue McKemmish and Dagmar Parer have summarised what this means [17].

If archival description is defined as the post-tranfer process of establishing intellectual control over archival holdings by preparing descriptions of the records, then those descriptions essentially function as cataloguing records, surrogates whose primary purpose is to help researchers find relevant records. In the continuum, archival description is instead envisaged as part of a complex series of recordkeeping processes involving the attribution of authoritative metadata from the time of records creation.

This metadata is commonly known as 'recordkeeping metadata', or "any type of data that helps us manage records and make sense of their data content" [18]. McKemmish and Parer have definitively expressed the concept as being "standardised information about the identity, authenticity, content, structure, context and essential management requirements of records" [19].

A variety of research projects and practically-based initiatives have been concerned with the development of recordkeeping metadata schemes and standards:

The OAIS Model

Most of the initiatives mentioned so far originated in the library and archives communities. Another important recent development with preservaation metadata implications has been the development of a draft Reference Model for an Open Archival Information System (OAIS). The development of this model is being co-ordinated by the Consultative Committee for Space Data Systems (CCSDS) at the request of the International Organization for Standardization (ISO). The CCSDS is an organisation established by member space agencies to co-ordinate members information requirements. ISO requested that the CCSDS should co-ordinate the development of standards in support of the long-term preservation of digital information obtained from observations of the terrestrial and space environments. The result, the Reference Model for an Open Archival Information System is currently being reviewed as an ISO Draft International Standard.

The document defines a high-level reference model for an Open Archival Information System or OAIS, which is defined as an organisation of people and systems that have "accepted the responsibility to preserve information and make it available for a designated community" [29]. Although development of the model originated in and has been led by the space data community, it is intended that the model is able to be adopted for use by other communities.

The OAIS model is not just concerned with metadata. It defines and provides a framework for a range of functions that are applicable to any archive - whether digital or not. These functions include those described within the OAIS documentation as ingest, archival storage, data management, administration and access. Amongst other things, the OAIS model aims to provide a common framework that can be used to help understand archival challenges and especially those that relate to digital information.

As part of this framework, the OAIS model identifies and distinguishes between the different types of information (or metadata) that will need to be exchanged and managed within an OAIS. Within the draft recommendation, the types of metadata that will be needed are defined as part of what is called a Taxonomy of Information Object Classes [30]. Within this taxonomy, an Archival Information Package (AIP) is perceived as encapsulating two different types of information, some Content Information and any associated Preservation Description Information (PDI) that will allow the understanding of the Content Information over an indefinite period of time. The Content Information is itself divided into a Data Object - which would typically be a sequence of bits - and some Representation Information that is able to give meaning to this sequence. Descriptive Information that can form the basis of finding aids (and other services) can be based on the information that is stored as part of the PDI, but is logically distinct.

The OAIS Taxonomy of Information Object Classes further sub-divided the PDI into four different groups. These were based on some concepts described in the 1996 report on Preserving Digital Information that was produced by the Task Force on Archiving of Digital Information commissioned by the Commission on Preservation and Access (CPA) and the Research Libraries Group (RLG). The task force wrote that "in the digital environment, the features that determine information integrity and deserve special attention for archival purposes include the following: content, fixity, reference, provenance and context" [31]. Accordingly, the OAIS taxonomy divides PDI into Reference Information, Context Information, Provenance Information and Fixity Information.

There is no clear understanding, as yet, how the Taxonomy of Object Information Classes defined in the OAIS model is meant to be implemented. It is possible, for example, that it could itself provide a basis for the development of a metadata schema.

Several European library-based projects have expressed an interest in implementing parts of the OAIS model, including its Taxonomy of Object Information Classes:

Conclusion

For a variety of reasons, this column has concentrated on identifying relevant projects and initiatives rather than on describing any of them in detail. It is suggested that those with a deeper interest in the subject would profit by following-up some of the hypertext links listed in the References section below.

References

  1. Day, M., 'Extending metadata for digital preservation.' Ariadne (Web version), 9, May 1997.
    http://www.ariadne.ac.uk/issue9/metadata/
  2. Dublin Core Metadata Initiative.
    http://purl.org/DC/
  3. Day, M., Metadata for preservation. CEDARS project document AIW01. Bath: UKOLN, UK Office for Library and Information Networking, 1998.
    http://www.ukoln.ac.uk/metadata/cedars/AIW01.html
  4. Ross, S., 'Consensus, communication and collaboration: fostering multidisciplinary co-operation in electronic records.' In: Proceedings of the DLM-Forum on Electronic Records, Brussels, 18-20 December 1996. INSAR: European Archives News, Supplement II. Luxembourg: Office for Official Publications of the European Communities, 1997, pp. 330-336; here p. 331.
  5. Rothenberg, J., Avoiding technological quicksand: finding a viable technical foundation for digital preservation. Washington, D.C.: Council on Library and Information Resources, 1999, p. 27.
    http://www.clir.org/pubs/reports/rothenberg/contents.html
  6. Ross, S., 'Consensus, communication and collaboration,' op. cit. p. 331.
  7. Bearman, D., Electronic evidence: strategies for managing records in contemporary organizations. Pittsburgh, Pa.: Archives and Museum Informatics, 1994, p. 302.
  8. Rothenberg, J., Avoiding technological quicksand, op cit., pp. 13-16.
  9. Bearman, D, Reality and chimeras in the preservation of electronic records. D-Lib Magazine, 5 (4), April 1999.
    http://www.dlib.org/dlib/april99/bearman/04bearman.html
  10. Cedars 2.
    http://www.leeds.ac.uk/cedars/cedars2/index.htm
  11. Lynch, C., 'Canonicalization: a fundamental tool to facilitate preservation and management of digital information.' D-Lib Magazine, 5 (9), September 1999.
    http://www.dlib.org/dlib/september99/09lynch.html
  12. RLG Working Group on Preservation Issues of Metadata, Final report. Mountain View, Calif.: Research Libraries Group, May 1998.
    http://www.rlg.org/preserv/presmeta.html
  13. Cameron, J. and Pearce, J., 'PANDORA at the crossroads: issues and future directions.' In: Sixth DELOS Workshop: Preservation of Digital Information, Tomar, Portugal, 17-19 June 1998. Le Chesnay: ERCIM, 1998, pp. 23-30.
    http://www.ercim.org/publication/ws-proceedings/DELOS6/index.html
  14. National Library of Australia, Request for Tender for the provision of a Digital Collection Management System. Attachment 2 - Logical data model. RFT 99/11. Canberra: National Library of Australia, 23 August 1999.
    http://www.nla.gov.au/dsp/rft/index.html
  15. Dollar, C.M., Archival theory and information technologies: the impact of information technologies on archival principles and methods. Macerata: University of Macerata Press, 1992.
  16. O'Shea, G., 'Keeping electronic records: issues and strategies.' Provenance, 1 (2), March 1996.
    http://www.netpac.com/provenance/vol1/no2/features/erecs1a.htm
  17. McKemmish, S. and Parer, D., 'Towards frameworks for standardising recordkeeping metadata.' Archives and Manuscripts, 26, 1998, pp. 24-45; here, p. 39.
    http://www.sims.monash.edu.au/rcrg/publications/recordkeepingmetadata/smckrmp1.html
  18. McKemmish, S., Cunningham, A. and Parer, D., Metadata mania. Paper given at: Place, Interface and Cyberspace: Archives at the Edge, the 1998 Annual Conference of the Australian Society of Archivists, Fremantle, Western Australia, 6-8 August 1998.
    http://www.sims.monash.edu.au/rcrg/publications/recordkeepingmetadata/sm01.html
  19. McKemmish and Parer, op. cit., p. 38.
  20. Duff, W., 'Ensuring the preservation of reliable evidence: a research project funded by the NHPRC.' Archivaria, 42, 1996, pp. 28-45. See also the project's Web site at:
    http://www.lis.pitt.edu/~nhprc/
  21. Bearman, D. and Sochats, K., Metadata requirements for evidence. Pittsburgh, Pa.: University of Pittsburgh, School of Information Science, 1996.
    http://www.lis.pitt.edu/~nhprc/BACartic.html
  22. Duranti, L. and MacNeil, H., 'The protection of the integrity of electronic records: an overview of the UBC-MAS research project.' Archivaria, 42, 1996, pp. 46-67. See also the project's Web site at:
    http://www.slais.ubc.ca/users/duranti/intro.htm
  23. Duranti, L. 'Reliability and authenticity: the concepts and their implications.' Archivaria, 39, 1995, pp. 5-10.
  24. Reed, B., 'Metadata: core record or core business.' Archives and Manuscripts, 25 (2), 1997, pp. 218-241.
    http://www.sims.monash.edu.au/rcrg/publications/recordscontinuum/brep1.html
  25. National Archives of Australia, Recordkeeping metadata standard for commonwealth agencies, version 1.0. Canberra: National Archives of Australia, May 1999.
    http://www.naa.gov.au/govserv/techpub/rkms/intro.htm
  26. Acland, G., Cumming, K. and McKemmish, S., The end of the beginning: the SPIRT Recordkeeping Metadata Project. Paper given at: Archives at Risk: Accountability, Vulnerability and Credibility, the1999 Annual Conference of the Australian Society of Archivists, Brisbane, Queensland, 29-31 July 1999.
    http://www.archivists.org.au/events/conf99/spirt.html
  27. Ibid.. See also: McKemmish, S. and Acland, G., Accessing essential evidence on the Web: towards an Australian recordkeeping metadata standard. Paper given at: AusWeb99, the Fifth Australian World Wide Web Conference, Ballina, New South Wales, 17-20 April 1999.
    http://ausweb.scu.edu.au/aw99/papers/mckemmish/
  28. Rothenberg, J. and Bikson, T., Carrying authentic, understandable and usable digital records through time: report to the Dutch National Archives and Ministry of the Interior. The Hague: Rijksarchiefdienst, 6 August 1999.
    http://www.archief.nl/DigiDuur/index.html
  29. Consultative Committee for Space Data Systems, Reference model for an Open Archival Information System (OAIS), Red Book, Issue 1. CCSDS 650.0-R-1. Washington, D.C.: National Aeronautics and Space Administration, p. 1-11
    http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html
  30. Ibid. pp. 4-21 - 4-27.
  31. Garrett, J. and Waters, D., (eds.), Preserving digital information: report of the Task Force on Archiving of Digital Information commissioned by the Commission on Preservation and Access and the Research Libraries Group. Washington, D.C.: Commission on Preservation and Access, 1996.
    http://www.rlg.org/ArchTF/
  32. Bearman, D. and Lytle, R.H, 'The power of the principle of provenance.' Archivaria, 21, 1985-86, pp. 14-27.
  33. Duranti, L., Eastwood, T. and MacNeil, H., The preservation of the integrity of electronic records. Vancouver: University of British Columbia, School of Library, Archival & Information Studies, 1996.
    http://www.slais.ubc.ca/users/duranti/intro.htm
  34. Lynch, C.A., 'Integrity issues in electronic publishing.' In: Peek, R.P. and Newby, G.B., (eds.), Scholarly publishing: the electronic frontier. Cambridge, Mass.: MIT Press, 1996, pp. 133-145.
  35. Werf-Davelaar, T. van der, 'Long-term preservation of electronic publications: the NEDLIB project.' D-Lib Magazine, 5 (9), September 1999.
    http://www.dlib.org/dlib/september99/vanderwerf/09vanderwerf.html
  36. Russell, K., 'The JISC Electronic Libraries Programme.' Computers and the Humanities, 32, 1998, pp. 353-375.
  37. Russell, K., 'CEDARS: long-term access and usability of digital resources: the digital preservation conundrum.' Ariadne, 18, December 1998.
    http://www.ariadne.ac.uk/issue18/cedars/
  38. Russell, K. and Sergeant, D., 'The Cedars project: implementing a model for distributed digital archives.' RLG DigiNews, 3 (3), 15 June 1999.
    http://www.rlg.org/preserv/diginews/diginews3-3.html

7. Acknowledgements

Cedars is a Consortium of University Research Libraries (CURL) project funded by the Joint Information Systems Committee (JISC) of the UK higher education funding councils through its Electronic Libraries Programme (eLib).

UKOLN is funded by the Library and Information Commission, the JISC, as well as by project funding from the JISC and the European Union. UKOLN also receives support from the University of Bath, where it is based.

Author details

Michael Day
Research Officer
UKOLN: the UK Office for Library and Information Networking
University of Bath
Bath BA2 7AY, UK
E-mail: m.day@ukoln.ac.uk