Web Magazine for Information Professionals

Metadata Corner: Working Meeting on Electronic Records Research

Michael Day reports from the Working Meeting on Electronic Records Research, held in Pittsburgh, Pennsylvania May 29-31, 1997.

Archivists and records managers share an interest in the archival management and preservation of what are today known as electronic records. Recognition of important issues related to the archival management of electronic records dates back to the early 1970s when archivists began to investigate the accessioning of what were then known as machine-readable data files. It has long been recognised that the archival community and the library community have shared concerns in this area, and this was demonstrated by the recently published report of a US Task Force on Archiving of Digital Information commissioned by the Commission on Preservation and Access and the Research Libraries Group [1]. These shared concerns mean that other information professionals, including librarians, information scientists and computing scientists, will have an potential interest in the archival community’s response to electronic recordkeeping.

The Working Meeting on Electronic Records Research was organised by the Pittsburgh based Archives & Museum Informatics [2] and sponsored by the Centre for Electronic Recordkeeping & Archival Research (CERAR) at the University of Pittsburgh [3]. There were around fifty invited participants at the meeting and, as it was a working meeting, virtually all of them at some time were detailed to be either presenting papers or leading break-out group sessions. The meeting was held at the Embassy Suites Hotel close to Pittsburgh International Airport but twelve miles from the downtown area. This isolation was intentional as there would be less potential distraction for participants in the meeting. Half of the participants came from the United States, the remainder representing Canadian, Australian or European organisations. The intention of the Working Meeting was to identify areas for future research and implementation.

David Bearman of Archives and Museum Informatics introduced the meeting with a brief contextual paper describing the previous ten years of electronic records research and practice. In 1987 the archival profession’s interest was largely focused on appraisal techniques and on media longevity issues. Throughout the next ten years further technological development combined with an added emphasis on functional requirements led to a significant change in focus. Interest in media longevity and ‘refreshing’ techniques, for example, has developed into a concern with data migration in an environment of software-dependence. Bearman briefly described the various international meetings and conferences which had taken place over the last ten years and then identified the five general subjects which were to be discussed throughout the rest of the meeting.

The purpose of the meeting was to attempt to identify a few unresolved issues and to suggest desirable research methodologies and sensible research outcomes for them. Each of the five sessions started with two or more short presentations outlining current research outcomes or experiences with electronic records which were intended to identify relevant open issues. These were then taken-up by the six break-out groups for discussion and these reported back to the whole meeting at the end of the session.

This report will not attempt to describe the proceedings of the meeting in detail but will pick up on particular themes and hopefully demonstrate some shared concerns between the archives and records management professions, on one hand, and the library and information professions, on the other.

Electronic records

In the same way that the coming of the digital library has prompted a certain amount of reassessment of the role of libraries and information workers, the electronic record has prompted some discussion of the way archivists do their work. Much of the debate has been concerned with defining what exactly electronic records are and how they should be dealt with. Two North American projects have investigated different aspects of this problem and representatives from both gave short papers at the meeting. No attempt will be made to outline the projects here as descriptions are available elsewhere [4] [5].

Wendy Duff (University of Toronto) and Richard J. Cox (University of Pittsburgh) represented the University of Pittsburgh Electronic Records Project [6]. Their presentations at the Working Meeting elaborated on the concept of “literary warrant”, which can be defined as the mandate from outside the archives profession - from law, professional best-practice and other social sources - which requires the creation and maintenance of records. It is thought that the concept of warrant might be helpful in fostering the understanding of records within an organisation and might, in addition, provide the authority necessary for records professionals to perform their important role within it.

The other project looking at this general area concerned “The Preservation of the Integrity of Electronic Records” and was based at the University of British Columbia (UBC). The methodological approach of the UBC project was to determine whether the general premises about the nature of records in diplomatics and archival science were relevant and useful in an electronic environment. Diplomatics has been defined as a body of concepts and methods, dating from the seventeenth and eighteenth centuries with the purpose of “proving the reliability and authenticity of documents” [7] . At the Working Meeting, Marcia Guercio (Ufficio Centrale Beni Archivistici, Rome) and Luciana Duranti (University of British Columbia) outlined the contribution that they felt archival science and diplomatics could give to a better understanding of electronic records.

The UBC project has adopted the concepts of reliability and authenticity from diplomatics. Duranti has defined both of these terms as follows [8]:

Reliability is, therefore, something exclusively linked to record creation while authenticity, according to Duranti and Heather MacNeil, is linked to “the record’s mode, form, and state of transmission, and to the manner of its preservation and custody, and [ensures that it] is protected and guaranteed through the adoption of methods that ensure that the record is not manipulated, altered, or otherwise falsified after its creation …” [9]. This type of terminology might be useful, for example, when discussing the long-term preservation of electronic journals. For example a future reader of an article in a electronic journal will want to know about the journal’s provenance and whether the article has been peer-reviewed (its reliability) and also whether the article he is reading has not been deliberately or accidentally changed since it was first published (its authenticity). Systems that records professionals devise to maintain these concepts over time may, therefore, be of interest to information professionals interested in digital preservation. Authenticity, indeed, is a key component of Peter Graham’s description of “intellectual preservation” [10].

The UBC project has also been concerned with preserving the concept of “archival bond” in electronic records. Archival bond refers to what Duranti and MacNeil call “the link that every record has with the previous and subsequent one in the conceptual net of relationships among the records produced in the course of the same activity” [11]. It is interesting to speculate whether links in a hypertext database or Web page pose the same type of conceptual problem.

The UBC project elaborated the idea of a “record profile” which would contain “all the elements of intellectual form necessary to identify uniquely a record and place it in relation to other records belonging in the same aggregation” [12]. The record profile is therefore essentially a type of annotation (or metadata) which would be linked to the record for its lifetime. The identification of relevant metadata and its capture was also a major preoccupation of the Pittsburgh project, and it is this subject that will be dealt with next.

Metadata

Some archivists have been for several years advocating a metadata approach to the archival management of electronic records [13] [14]. This has, in part, built on an awareness of the inadequacies of traditional archival description for the description of records in the electronic era [15]. Margaret Hedstrom has commented that the “types of information needed to describe electronic records will differ from, and may exceed, that needed to describe records in paper formats” [16]. The description would have to include sufficient information about the record to enable identification, access, understanding of its provenance, interpretation, authenticity and management over time. The Pittsburgh project built-up a specification of the attributes of evidentiality, and this provided a foundation for the identification of a specific set of metadata [17].

Building on this, David Bearman’s short paper at the Working Meeting on “Research issues in metadata” raised many important issues. Just a few will be outlined here:

Metadata linkage: how can the relevant metadata can be securely linked to the record content itself over time?

Metadata semantics: Bearman commented that records metadata “must be semantically homogenous” but it was also desirable that it should also be “syntactically heterogeneous”.

Structural metadata and migration: in what way can metadata about the structure of a record ensure “least-loss” migration of evidence over time?

The discussion following Bearman’s paper indicated that there was a need for what one working group described as “generic records metadata standards”, and to instigate further analysis of metadata attributes and semantics. Interest was shown from some quarters in the Dublin Core Metadata Element set (DC) [18], and it was suggested that research could be carried out into the minimum elements which would need to be added to DC to make it useful in the records context. It was also recognised that resolution of many of these (and other) problems depended upon intelligent implementation in test environments and this seemed to be the immediate way forward.

Migration and software-dependence issues

Bearman in his introduction to the workshop noted that archivists concerns had moved from media longevity ten years ago to an awareness of the need for ongoing data migration from one media or software format to another. Margaret Hedstrom (University of Michigan) in her paper on “Migration and long-term preservation” pointed out that migration had recently replaced the concept of digital refreshing. The Task Force on the Archiving of Digital Information, of which Hedstrom was a member, defined migration as “the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation” [19]. The concept of data migration has evolved as a response to two related problems:

Avra Michelson and Jeff Rothenberg have defined software-dependent records as “electronic documents that can be read only by using a particular piece of computer software” [20], and - defined in this way - all electronic documents are software-dependent, even if some are stored in what are considered at the current time to be relatively simple formats like ASCII. Unfortunately, all software formats will at some point become obsolete, as will also the hardware platforms for which they are designed.

Migration, in its widest sense, might also mean something more than periodic file conversion, and might include, for example: transfer to a human-readable medium like paper or microfilm; the use of software-independent formats; the creation of surrogates; and possibly the development of systems capable of emulating obsolete software and associated data [21]. These options need further research and it is unlikely that any single approach will be suitable for application to all types of electronic records.

The Working Meeting identified several areas of potential interest for research:

Defining acceptable data loss. Data migration is and will be a complex procedure, and is likely to result in some degree of data loss or degradation. What level of loss or degradation would be acceptable?

Documentation of the migration process. Subsequent users of records will need to determine which characteristics of a document were lost in each format conversion, the reasoning behind the migration strategy chosen and the authority responsible for implementing it.

The development of migration agents. “Self-migrating” records managed by artificial agents might be a long term goal, but any feasible system will have to be designed in collaboration with software engineers.

Cost models. Some research needs to be done into cost models for the different approaches to migration.

Professional collaboration. Hedstrom asked whether the requirements for the long-term preservation of electronic records is fundamentally different from the requirements for the preservation of other types of digital information. If not - and this is itself an area for legitimate research - what sort of collaborate activity would be appropriate, and with whom?

Conclusions

With a working meeting held over three days, a conference report like this cannot attempt to cover even a fraction of all of the topics raised for discussion- for example there has been no mention of involvement in developing international standards or the importance of risk management [22]. The last afternoon provided a forum for defining consensus, and at the risk of distortion, here is a personal interpretation of the main themes of the meeting.

The Working Meeting on Electronic Records Research was a valuable chance for a diverse group of specialists to get together and discuss important issues in a specific context. One surprise was that there was no significant discussion of electronic records and the World Wide Web during the meeting. This is a subject that will have to be addressed in the near future.

Acknowledgements

I would like to thank David Bearman and Richard J. Cox for their invitation to the Working Meeting on Electronic Records Research.

This visit was supported by grants from the UK Electronic Libraries Programme (eLib).

References

[1] Task Force on Archiving of Digital Information, Preserving digital information. Commissioned by: the Commission on Preservation and Access and the Research Libraries Group. Washington, D.C.: Commission on Preservation and Access, 1 May 1996,
http://www.rlg.org/ArchTF/

[2] Archives & Museum Informatics, Pittsburgh, PA,
http://www.archimuse.com/

[3] University of Pittsburgh, Centre for Electronic Recordkeeping and Archival Research,
http://www.lis.pitt.edu/~cerar/

[4] University of Pittsburgh, School of Information Sciences. Functional requirements for evidence in recordkeeping.
http://www.lis.pitt.edu/~nhprc/

[5] University of British Columbia, School of Library, Archival and Information Studies. The preservation of the integrity of electronic records,
http://www.slais.ubc.ca/users/duranti/intro.htm

[6] Duff, W. Ensuring the preservation of reliable evidence: a research project funded by the NHPRC. Archivaria, 42, Fall 1996, 28-45,

[7] Duranti, L. and MacNeil, H. The protection of the integrity of electronic records: an overview of the UBC-MAS Research Project. Archivaria, 42, Fall 1995, 46-67, p. 47.

[8] Duranti, L. Reliability and authenticity: the concepts and their implications. Archivaria, 39, Spring 1995, 5-10,

[9] Duranti and MacNeil. The protection of the integrity of electronic records, Archivaria, 42, Fall 1995, 46-67, p. 56.

[10] Graham, P.G. Intellectual preservation: electronic preservation of the third kind. Washington, D.C.: Commission on Preservation and Access, March 1994,
http://www-cpa.stanford.edu/cpa/reports/graham/intpres.html

[11] Duranti and MacNeil. The protection of the integrity of electronic records, Archivaria, 42, Fall 1995, 46-67, p. 53.

[12] Duranti and MacNeil. The protection of the integrity of electronic records, Archivaria, 42, Fall 1995, 46-67, p. 51.

[13] Wallace, D. Metadata and the archival management of electronic records: a review. Archivaria, 36, Autumn 1993, pp. 87-110.

[14] Wallace, D. Managing the present: metadata as archival description. Archivaria, 39, Spring 1995, 11-21.

[15] Bearman, D. Documenting documentation. Archivaria, 34, Summer 1992, 33-49.

[16] Hedstrom, M. Descriptive practices for electronic records: deciding what is essential and imagining what is possible. Archivaria, 36, Summer 1993, 53-63, p. 55.

[17] Bearman, D. and Sochats, K. Metadata requirements for evidence. 1996,
http://www.lis.pitt.edu/~nhprc/BACartic.html

[18] The Dublin Core Metadata Element Set home page,
http://purl.org/metadata/dublin_core

[19] Task Force on Archiving of Digital Information, Preserving digital information. Washington, D.C.: Commission on Preservation and Access, May 1996, p. 6.

[20] Michelson, A and Rothenberg, J. Scholarly communication and information technology: exploring the impact of changes in the research process on archives. American Archivist, 55, Spring 1992, 236-315, p. 298.

[21] Rothenberg, J. Ensuring the longevity of digital documents. Scientific American, 272 (1), January 1995, 24-29.

[22] Bearman, D. Archival data management to achieve organizational accountability for electronic records. Archives and Manuscripts, 21 (1), May 1993, 14-28.

 

Author Details

Michael Day,
Metadata Research Officer,
UKOLN
Email: M.Day@ukoln.ac.uk
Tel: 01225 323923
Fax: 01225 826838