Web Magazine for Information Professionals

Migration: A Camileon Discussion Paper

Paul Wheatley explores migration issues for the long-term preservation of digital materials.

Aims

This paper is intended to continue the debate on the different uses of migration for the long-term preservation of digital materials. This discussion will hopefully form the basis of future comparisons between migration and emulation as part of the CAMiLEON project’s investigation of emulation as a digital preservation strategy (A look at some of the practical aspects of an emulation preservation strategy can be found in “Emulation, Preservation and Abstraction” [1] by David Holdsworth and Paul Wheatley).

To this end there are three key aims:

This paper provides no easy answers and there is still a long way to go. Hopefully it does take us at least one step nearer to the confident use of migration as a viable tool for long term digital preservation.

What is migration?

Migration of digital materials in the traditional sense has typically only dealt with relatively simple digital objects that are converted in a series of steps to current platforms. Little research has been conducted as to its use beyond this level. Access to word processor files, statistical data files and bitmap image files has been maintained by migrating from one format to another, often by the import facilities of current application software. With more complex digital objects it is not immediately clear how these would be migrated or what migration in this context actually means. Furthermore, is migration at all useful in this field or is emulation always a better strategy for preserving these more complex objects?

The CPA/RLG report [2] provides a useful and broad definition of migration as “…a set of organised tasks designed to achieve the periodic transfer of digital materials from one hardware/software configuration to another, or from one generation of computer technology to a subsequent generation.” This paper will attempt to describe what these different tasks could be in a practical sense and discuss their relative usefulness in different contexts. Implementations of some or all of these migrations may provide comparisons for the user testing of these resources in an emulated environment (The CAMiLEON [3] project will be performing user testing of various digital resources both in their original environments and in emulated environments). This paper will also introduce the concept of recreation as a an alternative digital preservation strategy that is related to migration.

Migration, the story so far

The influential Open Archiving Information System (OAIS) [4] breaks down migration into four categories; refreshment, replication, repackaging and transformation. This is useful, but we can certainly go much further. In particular, refreshment, replication and repackaging are distinct processes in themselves, which are more appropriately associated with the general management of an archive. These processes merely ensure that we maintain a reliable copy of the bytestream of our digital object (refreshment) or a manageable package within an OAIS archive (replication and repackaging). Surely it is sensible not to group these within the overall moniker of migration. OAIS transformation actually modifies the bytestream of a digital object (so that it can be interpreted on a current computer platform) and this is what will specifically be considered as the process of migration within this paper.

OAIS goes further in breaking down this migration into two more categories; reversible migration and non-reversible migration. This is a crucial distinction. The only way to guarantee that no information will be lost when a migration is performed is the litmus test of a backwards migration which recreates the original object precisely.

Charles Dollar [5] begins to break down the overall task of preservation into different categories and makes some important distinctions between migration and related processes (see “Authentic Electronic Records: Strategies for Long-Term Access” by Charles Dollar). He is keen to highlight the differences between maintaining the “processibility” of digital objects and the function of actual migration. The former relies on import/export facilities of current software to convert digital objects to a current format, and the latter involves a far more involved process of performing a migration to a current format without these existing application aids and perhaps little knowledge of the formats and systems involved. However, a third and related strategy, which needs to be highlighted, is the Cedars Project’s cautious but sensible take on migration. This aims to preserve both the original bytestream and a tool for understanding or interpreting the original object. This is discussed in more detail under section a href=“#3.5”>3.5.

Dollar identifies maintaining the authenticity of a digital object as a key aim in the migration process. He suggests that migrating objects may result in some loss of their original properties. However, we have to be far more pessimistic and accept that migration will always result in some losses of this kind. This makes it essential to identify these side effects in order that the best preservation method is chosen for each circumstance and that the implications of any migration action that is taken are fully understood.

“Risk Management of Digital Information : A File Format Investigation” [6] by Lawrence et al, attempts to quantify the risks involved in the use of migration and analyses several commercially available migration tools for their relative accuracy. This practical investigation has led to the identification of some key requirements for migration software:

These requirements are equally applicable for tools used as part of the three migration strategies identified above, but few existing tools come close to meeting all of them.

Dollar envisions a diminished role for migration in future times as the producers of digital objects move towards standard or open formats and the application software, which renders proprietary format objects, provide built in export and import features which can be relied on to perform format “conversion”. Unfortunately it is not in the interest of software developers to rely completely on open formats like RTF and there is little interest from publishers in providing digital objects in any form other than those desired by current users. The trend of increasing complexity in ‘published’ digital objects also looks set to continue. With these driving forces working against the common sense voice of the preservation community it seems rather optimistic that our preservation challenge will become easier with time. To this end we must continue to be sceptical of the longevity of current format ‘standards’ (and platforms) and develop suitable migration or emulation strategies, before it is too late to preserve much of our digital heritage.

The difficulties inherent in the understanding and interpretation of data formats, whose design has been influenced by commercial software developers, is illustrated all too well by Lawrence et al. The report reveals that even when format specifications are publicly available they are often incomplete. Furthermore a substantial component of file format specifications often consist of non-standard elements which “provide the competitive edge for third-party software and rarely are openly circulated.”

1.4 Who are we preserving for?

This is a broad look at preserving a range of digital objects and no assumptions about who the materials are being preserved for have been made. A software historian will obviously have very different requirements to those of a sociologist solely interested in the raw intellectual content of an object. Each digital object has a range of significant properties (see [7] A blueprint for Representation Information in the OAIS model” by David Holdsworth and Derek Sergeant) that we may be interested in preserving, but few preservation methods retain all of them. Performing migration in order to preserve is, to a certain extent, always going to be a contradiction. By changing the form or even the structure of a digital object the final output will not be a strict preservation of the original. In many cases this is acceptable but it is important to consider very carefully and understand fully the potential loss of content and structure when using migration as a preservation strategy. The only kind of migration which does not pose a significant risk of loss is a reversible migration.

A useful series of questions that should be asked before preserving a digital object might be:

It should be noted that assumptions have not been made as to any particular starting point from which a preservation action may take place. Access to the object running on the original platform may be available in some cases. Alternatively future historians may be engaged in ‘digital archaeology’ and have no information or experience of a certain object’s format, never mind access to its original platform and running environment.

BBC materials

The BBC Micro platform [8] was chosen as a source for the test materials for several key reasons. Primarily it offers a source of material that is at risk of loss and at the same time contains some valuable intellectual content that is worth preserving. A wide range of educational material was produced for the BBC Micro platform, much of which is in danger of being lost. Examples include a range of software produced for the Micro Electronics Programme and the BBC Domesday Project which collected together 2 laser discs worth of social and geographical material on the UK. A comprehensive description of the Domesday Project, by one of the original project team, can be found online [9].

Working with material that is not current allows ourselves to be placed in the role of the archivist working to preserve materials that are becoming obsolete, rather than simply performing arbitrary and unrealistic tests on current materials.

The BBC materials were selected to provide a cross section of data types and complexity across a fairly small number of resources. All these resources and their respective software applications (where applicable) are available for download on the Internet for use on the original hardware, under emulation or as the source for a migration of some kind. Appendix A provides a brief description of these applications. Obviously these examples are not directly analogous to every instance where migration could be used, but they do offer a starting point for this bottom up development and discussion of migration strategies.

Terminology

It is abundantly clear that many migration terms (namely conversion , translation, transformation and indeed migration itself) have taken on meanings from several disciplines leading to much potential for confusion. In the following discussions an attempt has been made to avoid misleading terms and to use new more explicit ones where appropriate.

Preservation processes

The following represents an attempt to break down migration as a generalised term into the different practical migration tasks or methods available for digital preservation. I will then describe the outputs of these preservation methods when applied to our test materials in the table below. The final section of the paper will discuss these options, their implications and their usefulness (or otherwise) in different situations.

Minimum preservation

Minimum preservation refers to preserving a copy of the bytestreams that make up the original object (a copy of the original bytestream should always be retained in addition to any other migrations that modify the original).

Minimum migration

This method represents fairly trivial migration tasks that require very little technical work. This is a simple way of improving a minimum preservation for human viewing. These tasks could be performed by hand or automated using a software tool as a simple example of automatic conversion migration (see below). A possible example of a minimum migration could be a word processor file that is stripped of all but the common ASCII characters. This process would naturally remove all formatting and structure from the document, but would be a very simple, cheap and easy way of gaining access to the raw text of the document.

Preservation Migration

The three forms of preservation migrations defined below combine the most basic form of access to the intellectual content, with a non-technical way of preserving some of the look and feel of the original digital object. Dollar sites that “In some instances where it is technically possible to preserve all the functionality and integrity associated with electronic records during migration, the costs of doing so may exceed available resources.” A preservation migration is an example of a possible strategy in this situation. Obviously the additional information recorded will create a new preservation issue in itself, so the data formats used should be considered carefully, with an eye for longevity.

Recreation

A recreation is the re-coding of a digital object by hand. With a document this could mean re-typing the text in a current application and adding formatting to match the original. At the other end of the scale, a complex software object could be re-coded on a current platform from scratch.

Human conversion migration

A human conversion migration uses exactly the same processes of reproducing the function of the object, but some element of the original object (usually the data rather than the software) will be incorporated in the final migrated object. An example would be a human conversion migration of the BBC Domesday [7] object which combined the original (or automatically conversion migrated versions of the original) image and textual data with a newly recreated or re-coded software front end.

Automatic conversion migration

An automatic conversion migration uses a software tool to interpret and modify a digital object into a new form. A typical example would be to take a word processor file from the BBC and output the object as a Word98 document. This is a good example of a traditional view of migration.

The following table describes the practical output of these different methods of preservation on the BBC example resources. Note that I have not considered emulation strategies to be within the scope of this document. For an in depth technical look at emulation, see “Emulation, Preservation and Abstraction” [1] by David Holdsworth and Paul Wheatley.

BBC Resource

View document

View Store document

Human Digestive system

Chuckie Egg

Resource Description

Document for use with typical BBC word processor.

Document for use with typical BBC database program.

Interactive multimedia program with sound and animations.

Classic ‘platform’ arcade game.

     

Minimum preservation

Copy of document’s bytestream. Can be examined in ASCII/hex viewer.

Copy of the bytestream of all files making up the database.

Copy of the bytestreams of the files that make up the program. BASIC files can be viewed in tokenised BASIC viewer.

Copy of the bytestreams of the files that make up the program.

BASIC files can be viewed in tokenised BASIC viewer.

Minimum Migration

Edited/processed copy of the document as a text file. Control codes removed in ASCII editor. Remaining file can be viewed in text viewer.

Textual dump of spreadsheet view of database. Data is then viewed in text viewer.

Screendumps converted to GIF files for use in graphics viewer. Tokenised BASIC converted to raw text to view in text viewer.

Level data cut and pasted from program code to view in a Hex viewer. Tokenised BASIC converted to raw text to view in text viewer.

Basic preservation migration

As above with screenshots (either screendumps grabbed with a utility on the BBC and then converted to a current graphics format (e.g. GIF) or graphics files digitised from the BBC’s video output) of several views of the document in the word processor.

As above with screenshots of key moments in the use of the program (e.g. Each table view).

As above with screenshots of key moments in the use of the program (e.g. Each diagram).

As above with screenshots of key moments in the use of the program (e.g. Loading screens, and several in game views).

Annotated Preservation Migration

As above with annotations to the screenshots.

As above with annotations to the screenshots.

As above with annotations to the screenshots.

As above with annotations to the screenshots.

Complex Preservation Migration

As above with video sequences of the word processor while viewing the document. Video clips accompanied by textual descriptions and explanations of program processes.

As above with video sequences of the database while viewing the data. Video clips accompanied by textual descriptions and explanations of program processes.

As above with video sequences of key moments in the use of the program (e.g. Each digestion animation). Video clips accompanied by textual descriptions and explanations of program processes.

As above with video sequences of key moments in the use of the program (e.g. Playing through a level of the game, losing a life, etc). Video clips accompanied by textual descriptions and explanations of game processes.

Recreation

Document is “re-keyed” into a current word processor, based on a print out or display of the original document in View.

Database data is re-keyed and database structure is reproduced in a current database application.

Program software is re-coded on a current platform in a current computer language, based on appearance and function of original.

Program software is re-coded on a current platform in a current computer language, based on appearance and function of original.

Human conversion migration

-

-

As above, but the text and image data from the original object is incorporated into the migrated object.

As above but the level data is incorporated into the migrated object.

Automatic conversion migration

Format conversion program transforms the document from View format to a current word processor format.

Format conversion program transforms the tables, data and structure from ViewStore format to a current database format.

Original program code is converted line by line to a current language. This is combined with program data and graphics to produce a working conversion of the original.

Original program code is converted line by line to a current language. This is combined with program data and graphics to produce a working conversion of the original.

Discussion of migration possibilities

Minimum migration

The minimum migration method collects together methods of migration that require a minimum of technical work. The raw bytestreams are hand edited in hex or text editors to produce a clear human understandable text output. In the case of the View files, a look at the raw bytestream in an editor reveals readable chunks of ASCII text with some control codes which can be stripped away by hand or a simple migration tool (perhaps even the search and replace features of modern text editors). This could provide a better representation of one of the significant properties (i.e. the intellectual content of the raw textual data) of word processor files that do not rely on a large amount of formatting. The advantage of this method is its simplicity and ease of execution in comparison to an automatic conversion migration or complete re-keying of the data. With the ViewStore database, this method also seems to make some sense where cost is an issue. Saving a text dump of the database view loses all of the database functionality, but it does give a cheap and reliable way of getting to a substantial amount of the intellectual content. As long as the original bytestream is maintained along with any relevant documentation (in this case the valuable original ViewStore manual which describes the ViewStore file format in detail) for the possibility of better preservation work in the future, this migration method could be a useful stop gap measure.

In the case of the Human Digestive System object, graphical data is converted to a current graphic format. It could be argued that this is a more complex operation that should come under the banner of an automatic conversion migration. However, at this level I’ve considered the conversion between bitmap graphics formats as a relatively trivial task and so have included it here.

Obviously with regard to migration costs this is not very technical work, so unit costs will be small but the overall cost for a large number of files could be prohibitive. If an automated approach is taken where a simple conversion tool is produced to perform the migrations, there will be a single initial cost after which unit costs will be low.

Preservation migrations

If emulation or code conversion of the Chuckie Egg object is not possible for technical or cost reasons then it would seem prudent to preserve as much of the look and feel of the original as possible by taking a visual record. For the object to be recreated in the future by re-coding, a video record of the game in action would be invaluable. There are obviously doubts over authenticity of this kind of “preservation” and important questions about what we’re actually trying to preserve. For many preservation purposes these strategies could be seen as woefully inadequate, but in the absence of alternatives, a last ditch effort to record as much information as possible may be all that can realistically be done. With the rapid obsolescence of computing platforms and the vast amount of digital material at risk, compromises between the quality of preservation work and the reality of losing valuable data may have to be made.

With regard to more simple digital objects, a preservation migration of a View file could actually be seen as a sensible way of preserving digital objects of this kind. A large quantity of data in View format could be preserved with a minimum migration or (more usefully) an automatic conversion migration, but with a small number of files preserved additionally using a preservation migration. We then retain at least some record of the look and feel of the original environment in which the objects were created. This goes a long way to reduce the costs involved in this kind of preservation, where we require at least some record of the object’s original environment. This would be far cheaper than the up front cost of producing an emulator or a recreation of the object (or its application software in the case of a digital document).

Recreation

It’s important to consider recreation as a different process to migration (despite many similarities to the execution of a human conversion migration) as it is a process of creating a new object which re-produces the significant properties of the original without incorporating any elements of the original object.

The re-keying of a View document into a current word processor is probably the least technical approach to preservation here and is obviously open to the introduction of errors or other changes to the data. However in some cases it could be the method of preservation chosen. When we cast the same strategy across to more complex objects however, we quickly see a range of further issues and possible problems developing. Re-keying a ViewStore database into a current database application brings up many questions of authenticity and accuracy but even before that it’s important to look at how the work would be done. With the object running on the original BBC system and software, reproducing the intellectual content on a current platform seems quite reasonable. But without the original to work from, a recreation looks like a difficult job. Working from a minimum migration with little technical knowledge of the ViewStore file structure, much of the intellectual content could be lost. A good Preservation migration may have helped to save this data for a future recreation. It should be noted that leaving preservation work until after the platform in question has become obsolete and unavailable will introduce further risks of the loss of significant properties.

A recreation of Chuckie Egg complicates these issues further and this is illustrated very well by the existing implementations of this method available on the Internet. What constitutes an authentic recreation of an arcade game? The best Chuckie Egg conversion provides an almost pixel perfect visual copy of the original game. But place a hardened games player with experience of the original game in front of it and they will quickly pick up on subtle differences in the game play that are different from the original. Asked the question “Is this Chuckie Egg”, they might reply with a definite “no”. To many, the exact details of the ‘feel’ of a computer game may seem irrelevant in this context, but in fact Chuckie Egg acts as a very clear example of the potential for the loss of what constitutes a key part of the intellectual content of many interactive digital objects.

Further to the notion of recreating an object authentically, are there in fact cases where recreating the dated user interface of an object may not be ideal? Does the preserver take the opportunity to enhance access to the reproduced intellectual content of the original?

Recreation has a high unit cost, and depending on the complexity of the object in question this could be prohibitive in a lot of situations. If a considerable amount of effort is going to be spent recreating several objects from one particular platform it may be more worthwhile to produce one emulator which will preserve all of these objects.

A well implemented recreation of a complex object may offer advantages that make the high unit cost of such work worthwhile in some circumstances. It could be argued that such a migration of Chuckie Egg in a modern high level language like C would actually preserve many of the software functions behind the game in an easily understood way. Making it far more understandable at least than the original 6502 machine code of which the original is encoded. For an exceptional digital object, which warranted good preservation, this could be seen as a useful strategy.

Human conversion migration

Much of my discussion of recreation as a preservation strategy is applicable in the use of human conversion migration which recreates software elements of a digital object while re-using as much of the data of the original object as possible. The advantage of this strategy is that a migration of the data elements of the original object could contribute to a more accurate reproduction of the object’s significant properties.

Automatic conversion migration

Automatic conversion migration uses a software tool to automatically convert digital objects to a current environment. With word processor and database files, this can be seen as one of the traditional uses of migration. Although many consider this to be the best way to migrate and preserve an object, it should be noted that preservation of this kind still only retains some of the intellectual content. Nothing of the original environment in which the object was created in is preserved and there is no guarantee that all of the formatting and structure will be retained.

Although automatic conversion migration could be used to convert a batch of files to a current format, which would then be used and preserved as the primary objects, this process will have to be repeated when the current format becomes obsolete. Hence further migration costs will be incurred and additional losses of significant properties may occur.

A more useful approach is that of migration on request, conceived by the CEDARS project [10]. Original objects are maintained and preserved in addition to a migration tool which runs on a current computing platform. This would be employed by users to convert the original bytestream of the object they want to use into a current format (This process will be formalised in chapter 5 of the CEDARS project [10] final report). When the current platform becomes obsolete the migration tool will no longer work, so the preservation problem in this case is obviously focused on the maintenance of the migration tool. Many strategies for ensuring the longevity of a migration tool are equally applicable to those used in emulator development (See “Emulation, Preservation and Abstraction” [1] by David Holdsworth and Paul Wheatley), but much further work needs to be done in this area. Is there an existing format which would be an ideal middle ground for a migration tool to convert objects to, before output to a current format for use? And hence can a standard migration tool framework be developed, into which read or write modules (which understood specific file formats) could be plugged?

There are many advantages to this migration approach:

 

With complex objects to preserve its not even clear if an automatic conversion migration of the original code is possible, especially if associated resources like graphics or data files also have to be dealt with. With many objects (eg. Multimedia programs) this necessitates a specific solution in each case. This is highly technical and therefore likely to be costly. Alternate strategies like recreation or emulation are likely to be more suitable in this situation.

Conclusion

It is clear that migration and emulation strategies will both play an important role in the long-term preservation of digital materials. Migration will be crucial for the preservation of more simple data objects and emulation will undoubtedly be essential for preserving complex objects that incorporate software elements. However the use of migration or recreation strategies for preserving objects of particular outstanding value should not be underestimated. For many objects both migration on request and emulation strategies which interpret the original bytestream will provide useful methods of access for different users of these materials. It is not unrealistic to consider providing more than one means to render a digital object especially if both objects and rendering tools are managed in a sensible way (see Representation Nets in “A blueprint for Representation Information in the OAIS model” [6] by David Holdsworth and Derek Sergeant).

Although much recent work has gone a long way to establish the terminology that provides a common language for the discussion of preservation issues, the description of migration terms in particular has remained relatively confused. To continue progress in the development of practical preservation strategies we should not be afraid to introduce new and more specific language or to redefine ambiguous terms of old. This paper has gone some way to tackling this problem, but there is still a long way to go.

Appendix

View

View was one of the first and most widely used word processing applications on the BBC Micro, and was released under the Acornsoft label.

ViewStore

ViewStore shared the same origins as View, but targeted the database market. ViewStore was quite advanced for a home database application of its day, but it only allowed very simple relationships between tables of data.

Human Digestive System

This educational software was typical of many new learning tools of the day. Textual information and diagrams are combined with programmed animation to describe the functions of a human digestive system. The application also includes test sections which quiz the user on their learning. The Human Digestive System is a good example of software produced under the Micro Electronics Program (MEP) project.

Chuckie Egg

Chuckie Egg was considered one of the classic arcade games of its day. Although not technically advanced for its time, the game’s popularity presents interesting options for user testing comparisons of the game under emulation or in a migrated state. Will the game under emulation bring back the original look and feel for a user who has played the game in the eighties?

Acknowledgements

Many thanks to both the CAMiLEON and CEDARS project teams for their invaluable contribution of ideas and critique. In particular I’d like to thank Kelly Russell, Derek Sergeant and Nancy Elkington.

References

  1. Holdsworth, D and Wheatley, P, “Emulation, Preservation and Abstraction”, http://www.rlg.org/preserv/diginews/diginews5-4.html#feature2
  2. The Commission on Preservation and Access and The Research Libraries Group, Inc “Preserving digital information: Report of the Task Force on Archiving of Digital Information”, (1996), http://www.rlg.org/ArchTF/tfadi.index.htm
  3. CAMiLEON project at http://www.si.umich.edu/CAMILEON/
  4. Open Archiving Information System (OAIS) at http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html
  5. Dollar, C M, “Authentic Electronic Records: Strategies for Long-Term Access” (1999)
  6. Lawrence, Gregory W. Kehoe, William R. Rieger, Oya Y. Walters, William H. and Kenney, Anne R. “Risk Management of Digital Information : A File Format Investigation” (June 2000), http://www.clir.org/pubs/reports/pub93/contents.html
  7. Holdsworth, D and Sergeant, D M, “A blueprint for Representation Information in the OAIS Model”, (1999), http://www.personal.leeds.ac.uk/~ecldh/cedars/ieee00.html
  8. BBC Micro platform at http://bbc.nvg.org/history.php3
  9. Finney, A, “The Domesday Project - November 1986”, http://www.atsf.co.uk/dottext/domesday.html
  10. CEDARS projecthttp://www.leeds.ac.uk/cedars
Consultative Committee for Space Data Systems, “Reference Model for an Open Archival Information System” (Red Book May 1999), http://ssdoo.gsfc.nasa.gov/nost/isoas/ref_model.html

Author Details

Paul Wheatley
CAMiLEON Project http://www.si.umich.edu/CAMILEON/
University of Leeds http://www.leeds.ac.uk/
LS2 9JT, UK

Article Title: “Migration - a CAMiLEON discussion paper”
Author: Paul Wheatley
Publication Date: 02-October-2001
Publication: Ariadne Issue 29
Originating URL: http://www.ariadne.ac.uk/issue29/camileon/