Web Magazine for Information Professionals

Australian Co-operative Digitisation Project, 1840-45

Ross Coleman describes a project which will create a unique research infrastructure in Australian studies through the digital conversion of Australian serials and fiction of the seminal period 1840-45.

The Australian Cooperative Digitisation Project, 1840-45 [1] (ACDP) is a collaborative project between the University of Sydney Library, the State Library of New South Wales, the National Library of Australia and Monash University Library funded through a Australian Research Council (ARC) Research Infrastructure (Facilities and Equipment) Program grant.

This funding, unlike the Elib [2] projects or projects in the US sponsored under the auspices of the Commission for Preservation and Access or the National Digital Library Federation [3] is not directed to the funding of digital library initiatives. ARC funds are openly competitive for the development of research infrastructure and so submissions like ours are considered broadly in competition with facilities for science, technology social sciences and the humanities. That the ARC decided to fund a digital library project against such a broad field is a significant recognition by the major Australian research funding body that such initiatives are of major importance in the support and furthering of research. In our case historical and literary research in nineteenth-century Australian studies.

The assessment criteria for ARC funding include such broad ranging variables as “excellence of research activity to be supported”, “degree of concentration and quality of the research group”, “value to industry and other users of research results and potential for commercial development of research results leading to national benefit”, “increased institutional capacity for consulting, contract research and other service activities”, “ability to contribute to international links in research and innovation leading to national benefits”, and “ability to contribute to effective research training”. In general fairly open criteria, and ones we were able to answer adequately enough in our submission to be successful. So, while these criteria may not appear as technically as rigorous as perhaps they are for digital library projects in the UK and US (though there was some technical assessment), to succeed we needed to establish the importance of the creation of digital resources against the full range of major academic research facilities.

In this project we focus on the digital conversion of journals, newspapers and fiction of the period 1840-45 - a significant period recording the emergence of a colonial identity. The period 1840-45 has long been regarded by many scholars as seminal in the development of an Australian colonial culture. This period, following the end of convict transportation to NSW and preceding the influx of the gold-rush migrations, heralded the agitation for, and introduction of representative government in NSW in 1842 and witnessed the early years of mass free migration to the Australian colonies. It was a period marked by an upsurge in local publication in both the older and newer centres of settlement. The project primarily concentrates on the journals and newspapers that began publication in the period - a record of settlement and activity still largely untapped by researchers. We also look to the fiction published in the period - novels that for the first time were largely inspired by the changed circumstances of Australian life, and signalled the concentration on the description of bush life which was to be so dominant in later fiction.

The period and content was determined by an academic reference group of leading historical and literary scholars in nineteenth-century Australian studies - a group that also gave the project strong academic credentials in the funding process. The intention was to be comprehensive in our coverage of the period, and our bibliographical source for material was Ferguson’s Bibliography of Australia - the most comprehensive bibliography of nineteenth century Australian material. A total of 75 serial titles and four novels were identified. The majority of this material was either scarce or fragile, and were mostly held in the two major repositories of Australian material - the Mitchell Library in the State Library of New South Wales and the National Library of Australia. Sydney University Library, the oldest and largest university library in Australia, also has extensive nineteenth-century holdings, provided the academic rationale for the project.

The primary purpose of the project is to enhance literary and historical research on nineteenth-century Australia by providing improved access to, and preservation of, scarce primary material confined to a few major library collections.

The primary character of the project - the focus on preservation and access, of retrospective print collections - was inspired by work being carried out in the USA under the auspices of the Commission for Preservation and Access. Our project is modelled on initiatives carrying out large-scale digital conversion of existing textual collections for network access and provision of long-term preservation in microform. Particularly work at Cornell and Yale University libraries.

The primary mechanism of the project is to establish a production process that would be carried out under contract by vendors with expertise in filming and imaging. This initially involved the development of a set of workable technical specifications (documentation from the Library of Congress digital project proved very helpful) and testing the capacity of the local industry to do this level of work.

Technically the project is an integrated process of microfilming, imaging and networking, with the production work (microfilming and imaging) be carried out by contract. The production process of microfilming first then scanning from the film may appear conservative, but given the fragility and scarcity of the original material and the lack of experience in Australia with imaging to the standards required (and conversely the expertise in microfilming) this approach is both appropriate and sound.

Unlike the UK contract microfilming of collections, even of scarce material, is common among the large research libraries. The use of contractors for production processes - rather than establish costly inhouse facilities - is cost-effective. The fostering and development of industry expertise in firstly microfilming and now imaging is a critical aspect of this project. It will help create facilities and industry experience that will facilitate similar projects in the future.

The scale of the project was initially estimated at about 150,000 images - a middling scale project, but large enough to be useful and enable us to investigate the management and technical issues of working with this kind of material.

The collaborative model for the project has set the operational organisation of the project - each library participant has a general area of responsibility, by virtue of expertise or position. Each participant is also involved in all aspects of the project to gain the benefit of experience of the project as a whole. In these arrangements Sydney is generally responsible for overall management, coordination and academic specification; the State Library of NSW, for technical and design standards; the National Library of Australia, for network and design standards; and Monash for conversion of digital images of the fiction to ASCII with Sydney enhancing to produce the SGML etexts. All participants are involved in the common issues of content, preparation, design, technical production, user focus, and broad management issues, through participation in operational committees. Through this model each partner would develop broad management and technical skills and expertise in planning and implementing large scale digital conversion projects.

The Project Management Committee is chaired by Professor Elizabeth Webby, Professor of Australian Literature at the University of Sydney. The key operation groups managing the project are the Steering Committee (Alan Ventress from the State Library, Colin Webb from the National Library, Robert Stafford from Monash and Ross Coleman from Sydney who also the Committee Convenor) the Technical Advisory Group (chaired by Alan Howell from the State Library and including Diana Dack of the National Library) and site coordinators at the State Library (Andrew Heath and Rebecca Thomas), the National Library (Lawrie Salter) and Sydney University (Julie Price). So in general the project would -

To assist us develop sound and workable technical specifications for production work and to determine the capacity of local vendors to do the work a test phase was introduced into the project.

We had some concerns about the capabilities of local vendors to do the work to the standards that we required. Many imaging firms had extensive experience at production work with large-scale corporate record imaging, but very little of imaging to the standards that we were setting in regard to the handling, filming, imaging and delivery of this kind of material - that is to digital library standards. This is also a problem in the UK.

A Sydney company was selected for this testphase through a call of Expressions of Interest, and five journal titles that would provide a range of technical problems including page sizes, were selected. Of the five titles, three were to be filmed/scanned from the original print version and two were to be scanned directly from existing microfilm. This test phase has been completed and final technical specifications developed and production work on the bulk of the material is now out for tender.

Digitisation of the fiction was completed some time ago, but for various reasons OCR conversion and SGML mark-up is only now taking place. The works of fiction will be mounted as TEI -encoded electronic texts through Sydney’s Scholarly Electronic Text and Image Service (SETIS) [4] and will probably be mirrored through the University of Virginia’s Etext Centre and the Oxford Text Archive.

The process of ARC funding determined the nature of the audience of the project - it is primarily an initiative to support researchers. Researchers determined the content of the project, and researchers - through an academic advisory group - will test the interface and delivery of images. Testing of the interface and delivery and output of images, electronically and in print form will be taking place over the next few months as the images from the completed production test phase are loaded onto our website.

However, once these digital resources are created and accessible via the web they will be available openly for browsing, for study and teaching at schools, to independent scholars irrespective of their location, and for use in the creation of other digital products.

In a technical sense the project is quite simple and basic. As a film then scan process a lot of attention is being given to the quality of the microfilming process to ensure that the subsequent production process of imaging meets the quality standards and throughput required. As the material is predominantly textual most images will be scanned as bitonal images at 400 dpi - with provision for 200 dpi 8-bit greyscale where appropriate. Many of our concerns have been to do with determining the optimum file sizes for efficient network delivery of quality images for browsing, downloading and printing. The images of the journals and newspapers will be provided in bitmap form - there will be no ocr conversion for display or indexing purposes. In a sense we are providing a “raw” digital library, in a production environment, where researchers can access, download, use and enhance the data as best fits their purposes.

One area of some concern has been the delivery of images of broadsheet newspapers - these have very large file sizes, and more investigation is necessary before finally deciding if this type of newspaper should be excluded from the project. Network delivery of images is fundamental to the project and if material cannot be effectively delivered over the net then they may be excluded from the project. In relation to newspapers work being done on tiling of large images for network delivery is among several developments that we are monitoring.

The document control structure and directory structure for the delivery of the digital images is also quite simple - basically a hierarchical tree structure allowing browsing from title to issue to page. This proposed structure will be tested by the academic advisory group. During this test phase images will be available as either gif, tif or pdf to assist determine the best means of access for the range of researchers expected to use the material.

Copies of the technical specifications developed for the production work are available from the author and will soon be accessible from the project website.

We have been working closely similar projects in the US and the UK. This project was inspired by the work done at Cornell by Anne Kenney and at Yale by Paul Conway - both have acted as technical advisers to the project and the benchmarking work they have done through the Cornell Digital Library prototype and the Open Book Project have been a great help. Two members of our project team have attended at different the Digital Imaging Workshop run through Cornell University Library. Current work on the Making of America Project (Cornell and Michigan) and the Library of Congress digital library project have been of specific interest and documentation and advice from both have been of great assistance.

In the UK our major contact (including discussions in Oxford in 1996) has been with the Elib funded Internet Library of Early Journals [5] - a project dealing with material of similar types, content and age. The distinctions between the ILEJ and the ACDP generally relate to process, purpose, and sophistication. The ACDP production process is predicated on the use of external vendors working by contract to technical specifications that can provide benchmarks for other projects. This is quite different from the creation and use of inhouse facilities that characterise the ILEJ. This difference is partly historical - inhouse facilities for microfilming are not common in Australia while they seem to be in the UK - and partly due to the design and intent of the projects. With the ACDP we hope (and seem to have succeeded) to facilitate the development of a skills and hardware base with local vendors that will facilitate other digital conversion projects. The ACDP is a preservation and access project while the ILEJ focuses on access.

The ILEJ is a more sophisticated project because of the use of ocr (or icr) technologies with fuzzy matching software to a provide some level of indexed access to the journals being imaged. This is something that we have discussed in the ACDP, but apart from the fiction, is not within our project parameters as funded by the ARC. However if possible it is an area that we need to investigate further. The opportunities in using ocr’ing or icr’ing to create forms of keyword indexing (albeit “dirty” indexing) has enhanced access to the content of digitally converted journal publications. This is quite exciting from a researchers/users point of view. Other examples of use of ocr in retrospective digital conversion, such as the JSTOR project, confirm this potential.

The similarities in all these projects at this stage of digital development are much more fundamental, that is the use of these projects to develop the kinds of technical and management skills and expertise within and between the partner libraries to enable us to develop and extend the creation of digital library resources. Our experience in the ACDP is that international cooperation and exchange of documentation and information has been of critical importance as we create digital resources in the Australian situation. It is these kinds of collaborations that underpin the success of these projects, and that in turn, provide the basis of the global digital library.

References

  1. Australian Cooperative Digitisation Project 1840-45 Web Site,
    http://www.nla.gov.au/ferg/fergproj.html
  2. eLib programme Web pages,
    http://www.ukoln.ac.uk/elib/
  3. US Digital Libraries Initiative Web Site,
    http://dli.grainger.uiuc.edu/national.htm
  4. SETIS: Electronic Texts at the University of Sydney Library,
    http://www.ukoln.ac.uk/ariadne/issue8/scholarly-electronic/
  5. Internet Library of Early Journals (eLib project) Web Site,
    http://www.bodley.ox.ac.uk/ilej/

Author Details

Ross Coleman,
Collection Management Librarian
Email: collm@extro.ucc.su.oz.au
Phone: +61 2 935 13352
Fax: +61 2 9351 7305
National Library of Australia Home Page: http://www.nla.gov.au/
Address: University of Sydney Library, University of Sydney, New South Wales, AUSTRALIA. 2006