ROADS: Resource Organisation and Discovery in Subject-Based Services
As MARC and cataloguing give way to metadata and resource description, the true impact of the internet is realised. Cataloguers are being transformed to.....metaloguers(?). The ranks of library school students who sat bemused through lectures on UKMARC and AACR need to indulge in a bit of reconstruction. Really they were applying a canonical syntactical representation to related manifestations, and maybe occasionally considering extensibility. They were doing metadata. And if we had realised that a bit earlier, maybe we would be as rich as Jerry Yang and David Filo; as reported in mid-April, the public share offering in the internet 'catalogue' Yahoo! brings their personal worth to $100 million (each).
The ROADS project is very much concerned with metadata: how it should be created, organised, searched and presented to the user. ROADS is an eLib funded project to develop discovery software for internet resources, and the software is sharable among the academic community. ROADS will be suitable for a number of 'directory' applications (research registers, project databases), but primarily it is being developed to fulfil the needs of the eLib subject based services, and it is they who feed their requirements into the project.
My role in ROADS is as a research officer and I am based at UKOLN. I have particular responsibility for the design and development of the metadata format, and I work with ROADS staff at the University of Bristol Centre for Computing in the Social Sciences to feed in requirements to the development team at Loughborough University. The University of Bristol is also responsible for liaison with information providers, as well as for documentation and project management. In addition UKOLN's role in the project involves investigating interoperability, both in terms of record exchange and provision of gateways to the Z39.50 search and retrieval protocol.
At the beginning of the project it was necessary to choose a metadata format and a search and retrieval protocol on which to base the ROADS system. The project started from the premise that internet resources differ from hard copy resources in a number of ways e.g. location details, internal document structure, lack of stability of location and document versioning, method of publication. These characteristics of network resources, and their effect on users' search behaviour and the search process, impact the design of a resource discovery system. A simple illustration when considering metadata is the importance of non-bibliographic information in the description of internet resources. The administrator of a web site that hosts a resource may well differ from the organisation that 'publishes' the resource. The metadata format should be able to clearly identify this information.
ROADS chose a simple attribute:value record structure for its metadata format based on the IAFA template definition. It is a text based record, human readable throughout with no sub-fields or numeric tags. The format was designed for the purpose of describing internet resources so does not contain redundant features applicable to hard copy resources. The simplicity and availability of the record structure allowed for speedy start-up of the subject services. A significant factor in the choice of the IAFA template is that the simplicity of record structure facilitates involvement of information providers in the description of their own resources. ROADS maintains that contribution of 'self-descriptions' from authors and web site administrators is essential to a sustainable service, and a simple record structure will facilitate this process.
It is acknowledged that other record formats will be used for the description of internet resources in other contexts, and that the ROADS software will need to interoperate with these. An important aspect of ROADS is to track developments as regards metadata during the life of the project, and to contribute to discussions and emerging standards in this area. There are interesting possibilities in the ROADS context for interoperability using the 'Warwick framework'. The recent OCLC/UKOLN Warwick Metadata workshop proposed an architecture for an extended record made up of the Dublin Core set of elements with additional packages containing data relevant to the particular constituency eg. terms and conditions of use, PICS ratings.
The ROADS system will use the whois++ directory service protocol for the search and retrieval of records. This is a lightweight protocol to implement, and was designed for compatibility with the IAFA template. It allows for searches to be refined (e.g. by attribute, truncation of search terms) and for inclusion of special multilingual characters. The whois++ protocol offers exciting possibilities for searching across multiple services which would allow sharing of indexing effort. The mechanism to achieve this cross searching is the index summary or 'centroid'. Whois++ enables the information contained in the inverted index of a particular service to be summarised, the summary consisting of a list of unique words associated with a particular attribute. The summary (centroid) is then available for use by other compatible sites to increase the access range for the end-user. Subject services might gather centroids from other sites in their subject area, in addition they could forward their own centroid to interested parties. Alternatively centroids could be used for cross-disciplinary searching. The whois++ protocol allows for centroids placed on different index servers to be linked in a 'whois++ mesh' so that optionally searches can be referred forward across the mesh.
Since the start of the project, the IAFA template format has been adapted for use with the whois++ protocol and a revised definition is now available. Bunyip, who are associate partners in ROADS from the commercial sector, are making a significant contribution to the development of whois++ and the associated template. The ROADS project will feed in experience of working with bibliographic descriptions to the development of whois++.
Where are we now in the ROADS project? Two eLib subject services (OMNI and SOSIG) are in production mode using a prototype version of ROADS. This is ROADS version 0 and it incorporates the IAFA metadata format. ROADS version 1 is due to go into alpha test shortly and this will incorporate the whois++ protocol. This version will develop whois++ standalone servers which will be used in an independent way by the subject services. The following version of ROADS, version 2, is due for release in early 1997, and will implement centroids. It is intended that this version will allow a directory of services to be established via a whois++ mesh, allowing unified access to distributed services.
Further details of the ROADS system and its partners are available on the ROADS web pages. There is considerable technical information on these pages contributed by Loughborough University which refers on to technical documentation elsewhere.