Metadata Corner: DC5 - the Search for Santa

tony gill; paul miller

Metadata Corner: DC5 - the Search for Santa

Tony Gill and Paul Miller's report from the 5th Dublin Core metdata conference in Helsinki.

Largely in recognition of the sterling work of the Nordic Metadata Project [1], invited representatives of the informal Dublin Core community set off to Finland’s lovely capital for the fifth Dublin Core workshop [2]. Following the success of their exploits Down Under [3], the authors once more fearlessly packed their rucksacks and embarked on a long and arduous voyage for the sake of Ariadne readers, selflessly braving outrageous Scandinavian beer prices and over-zealous representatives of Her Majesty’s Customs & Excise in their efforts to bring the latest news on Dublin Core to an anxiously waiting readership.

Representatives in Helsinki were drawn from around the world, and included the now traditional mix of librarians, computer scientists and subject specialists. For the first time, those actually implementing the Dublin Core in real-life situations were well represented amongst the 70-odd participants [4].

The UK was once more well represented, with attendees from the Arts & Humanities Data Service [5], eLib subject gateways [6], UKOLN [7], Reuters [8], and the museums world.

Hel-where?

Helskini docks Helsinki, capital since 1812 of one of Europe’s oft-forgotten member countries, is a comfortably sized city of some 520,000 inhabitants situated on the Gulf of Finland at approximately the same latitude (60º North) as Unst, the northern-most of the United Kingdom’s Shetland Islands. Tony Gill of the ADAM subject gateway For those looking at these things from a UK perspective, there is a distinctly Russian feel to much of the architecture, which is perhaps unsurprising with St. Petersburg (formerly Leningrad, for those readers not ‘up’ on the latest round of name changes across the planet) only a few hours away on the train. Indeed, we were told that several films have used parts of Helsinki as stand-ins for the less accessible cities of the former Soviet Union, so maybe Helsinki appears more Russian than St. Petersburg or Moscow to our Hollywood-befuddled eyes.

Finland itself is a country of just over five million inhabitants, covering 338,000 square kilometres and stretching from the Gulf of Finland in the south to well inside the Arctic Circle to the north. About 70% of the country is covered in forest, with another 10% under the water of some 188,000 lakes and countless bogs. Free of foreign rule since the Russian Revolution of 1917, Finland spent much of the twentieth century under the shadow of the former Soviet Union, and is now a member state of the European Union, and an increasingly - and deservedly - popular travel destination. The garlic beer restaurant

For those - like us - fitting their sightseeing in between a hectic round of workshop sessions and breakout group brainstormings in and around the bar, highlights of the city definitely include the Lutheran and Orthodox cathedrals, the fish market by the harbour, a church cut straight into solid rock, and a restaurant with a penchant for garlic (yes, this did include garlic beer, which turned out to be surprisingly palatable!). Another ‘interesting’ Helsinki highlight was the tar-flavoured ice cream. Some 70 rather bemused workshop participants are still trying to work out whether this really is a Finnish delicacy, or merely a rather strange joke on the part of the chef. Still, we ate it, and we’re not dead yet. Tony gets lost in Estonia (and perhaps should have looked up)

The capital of Estonia, Tallinn, is also close, and several Dublin Core-ites took the hydrofoil across the Gulf of Finland to this little Baltic state before or after the workshop.

The Dublin Core

For those of you who have spent the past few years pretending to be dead for tax reasons (apologies to Douglas Adams), this talk of a Dublin Core workshop probably makes little sense.

To briefly recap, the Dublin Core is a set of fifteen elements [9] identified by international, interdisciplinary consensus as being ‘core’ to the process of describing diverse objects in such a way that they may be effectively discovered and evaluated. The Dublin Core is not a replacement for existing detailed metadata structures such as the library world’s MARC [10] or the geospatial community’s Content Standards for Digital Geospatial Metadata [11], but can rather be seen as a means of describing the essence - or ‘core’ - of both library books and maps- and many other types of digital and non-digital resource.

Stu Weibel, exhorting the troops towards consensus The Dublin Core effort has been moved forward - under the guidance of Stu Weibel of OCLC - over the past three years by means of five international workshops and an active electronic mailing list. Those who want to know more can find further information in several articles from earlier issues of Ariadne, or by consulting the official workshop reports published in D-Lib Magazine [12]. The Dublin Core site on the world wide web [13] also includes a host of useful links.

Although the Dublin Core is further refined in the wake of each workshop, a significant usage base is beginning to emerge around the planet, and representatives of many organisations using Dublin Core managed to travel to Helsinki. Those using Dublin Core include many projects within UK Higher Education, as well as others such as Reuters, the Danish Government’s Information Service, and Environment Australia. Synopses of this growing adoption were gathered prior to the Helsinki meeting, and over thirty of them remain available on the workshop web site [2]. During presentations from a handful of implementers, the results of the AHDS/UKOLN evaluation of Dublin Core [14] were circulated, as was an example of simple usage guidelines (prepared in this case for Interconnect Technologies Corporation by Diane Hillman of New York’s Cornell University [15]).

Issues Explored in Helsinki

A great deal was discussed over the three days of the workshop, and will be formally reported in the official workshop report, due to appear in D–Lib Magazine early in 1998. Several key issues which appeared especially important to the authors can be identified, and we’ll discuss each of them briefly, below.

The Resource Description Framework — and a formal data model for Dublin Core?

A major debate at Canberra was the way in which Dublin Core information might be embedded within HTML’s basic

<META>

tag without breaking existing automatic HTML validation tools. Discussions explored the need to add functionality to this

<META>

tag within the — then — forthcoming HTML 4.0 specification, and extended to a potential future solution offered by embedding Dublin Core within the structure of a Platform for Internet Content Selection (PICS) header.

The draft HTML 4.0 specification has now been released by the World Wide Web Consortium [16] and following recommendations from the Dublin Core group, includes the capability to specifically handle Dublin Core’s Qualifiers; SCHEME and LANG.

Where a pre–HTML 4.0 compliant piece of metadata was forced into the tag using kludges along the lines of;

<META NAME= “DC.Format” CONTENT= “(SCHEME=IMT) (LANG=en) text/html”>

HTML 4.0 now allows the far neater

<META NAME= “DC.Format”
SCHEME= “IMT”
LANG= “en”
CONTENT= “text/html”>

which means the same, but is easier for both humans and machines to parse as the contents of CONTENT — the value that most people are probably searching for — is identifiably separate from any qualifiers.

More excitingly for the future, the work of Eric Miller, Renato Ianella and others on extending the functionality of PICS has evolved to become the World Wide Web Consortium–backed Resource Description Framework (RDF), a model for which was unveiled in Helsinki [17]. RDF offers the potential both for expressing the detail of the most complex Dublin Core records, and for realising many of the aspirations of the Warwick Framework [18] by permitting the creation of metadata records comprising multiple metadata ‘sets’ compiled from different cataloguing paradigms.

The RDF work has also resulted in the development of a technique for modelling various metadata structures. This technique is applicable to the Dublin Core in general, rather than merely to the Dublin Core implemented within RDF, and a working group is exploring this work in order to derive a formal data model for the Dublin Core effort as a whole.

Z39.50 and the Dublin Core

The Z39.50 protocol (now internationally recognised as ISO 23950) is currently talked about almost as much as the Dublin Core, and is probably understood even less! A number of UK projects, including AHDS, are making use of this protocol in order to allow the integration of disparate — and remote — databases behind a seamless search mechanism. Z39.50 supports a series of ‘profiles’ in order to enable translation between the various databases, and in the past Dublin Core elements have always been squeezed into either the bib–1 or GILS profiles, neither of which necessarily handle the detail of Dublin Core qualifiers very well.

Recognising the value of querying distributed Dublin Core–based databases via Z39.50, a number of organisations within the Dublin Core and Z39.50 communities are now exploring the feasibility of creating a specific Dublin Core profile.

The ‘1:1’ debate (or, what should Dublin Core metadata describe?)

In the course of actually using the Dublin Core to describe ‘real–world’ networked resources, implementers have come across one of the age–old problems of cataloguing ‘complex’ objects — just what should the metadata describe? This issue was raised in Helsinki both by a proposal from the Research Libraries Group, who wanted to use Dublin Core for museum object records on the web, and in the AHDS/UKOLN report.

Strictly speaking, metadata should describe the properties of an object which is itself data, for example a web page, a digital image or a database — this is analogous to the librarian’s practice of cataloguing ‘the thing in hand’. But with networked resources, these properties are often not very interesting or useful for discovery; for example, if a researcher is interested in discovering images of famous artworks on the web, they would generally search using the properties of the original artworks (e.g. CREATOR = Picasso, DATE = 1937), not the properties of the digital copies or ‘surrogates’ of them (e.g. CREATOR = Scan–O–Matic Imaging Labs Ltd., DATE = 1997).

This problem is exacerbated by the fact that networked resources can contain a large number of digital objects that have been derived from diverse sources; for example, consider a web page about an architect created by an academic that includes a scanned image of a photograph, taken by a famous photographer, of one of the architect’s buildings –– who is the creator of this ‘digital object’? The architect, the photographer and the academic all have a valid claim to the title CREATOR, and future generations of researchers may even be interested in the creator of the digital surrogate!

Of course, this is not a new problem –– traditional guides to information resources, such as librarians, museum curators and archivists, have been wrestling with the seemingly impossible task of ‘Modelling the World’ in order to describe information resources for decades. The draft IFLA Functional Requirements for Bibliographic Records [19], for example, makes a distinction between works, expressions, manifestations and items. SPECTRUM: The UK Museum Documentation Standard [20] discusses objects and reproductions, and the VRA Core [21] refers to works and visual documents.

It became clear in Helsinki that the only way to address this problem coherently, and without contravening the semantics of the Dublin Core elements, is to use a separate ‘set’ of metadata for each discrete object, and to create links between the various sets using the RELATION element. This approach became known as the “1:1 approach”, because it entails the creation of one discrete metadata set for every object.

A working group was formed to collate and categorise relationship types between different objects, so that digital resources can be discovered when searching for information about their contents. However, although not impossible, it is likely to prove impractical to create multiple distinct metadata sets in HTML using the

<META>

tag, which is still viewed as the most important application of Dublin Core metadata. An interim proposal, to embed another Dublin Core set describing a source object within the SOURCE element using qualifiers, was reluctantly accepted by the group. The 1:1 approach becomes much more feasible, however, in the more sophisticated environment offered by RDF.

Perhaps the most useful outcome of these discussions, though, was the formal recognition of the problems faced when describing complex, mixed–media resources for discovery purposes.

Expressions of Date

The Dublin Core DATE element has long proved contentious, with many different viewpoints as to what it’s for, how it might be used, and whether or not ‘normal’ use of it breaks a guiding principle from Canberra that qualifiers should refine rather than extend the meaning of any element. Although not agreeing on how DATE should be used, there has been a growing recognition that the current definition is unsatisfactory.

Officially, DATE currently has an extremely narrow definition, stating that the element is used to store the date that a resource was ‘made available in its present form’. Following the recent AHDS/UKOLN workshops in the UK [14], AHDS proposed a broader definition of DATE;

Dates associated with the creation and dissemination of the resource. These dates should not be confused with those related to the content of a resource (AD 43 in a database of artefacts from the Roman conquest of Britain) which are dealt with under COVERAGE or its subject (1812, in relation to Tchaikovsky’s eponymous overture) which are dealt with under SUBJECT.

This definition was felt by AHDS to usefully broaden the definition of DATE in the first sentence and — importantly — clarify the function of DATE with respect to the two elements users appear to most often confuse with it in the second sentence. This definition — and others — are being considered by the DATE working group with a view to clarifying the current confusion felt by so many.

Sub–elements in the Dublin Core

Discussing sub–elements, and other matters Almost since the creation of the Dublin Core, there have been those who sought to refine the 13 (15 since 1996) core elements by means of the TYPE qualifier, which was recently renamed SUBELEMENT to reduce ambiguity. Various mechanisms were used to do this within HTML, such as;

<META NAME= “DC.Creator” CONTENT= “(TYPE=email) collections@ads.ahds.ac.uk”>

or, since Canberra,

<META NAME= “DC.Creator.Email” CONTENT= “collections@ads.ahds.ac.uk”>

A number of lists have evolved, each defining those sub–elements of value to a particular community or world view. However, none of these lists have yet been universally accepted (as can be seen by the continued trend to create new ones!), and the mechanisms by which ‘core’ sub–elements might be specified, or how community–specific sub–elements might be added to this core, have never been formally defined.

In order to remedy this obvious problem, a working group was set up at Helsinki with a remit both to define the mechanisms by which new local and global sub–elements might be defined, and to draw up a non–exclusive list of those key sub–elements most likely to be required across the Dublin Core implementation community. As with the other working groups of the Dublin Core, further details of this effort are available on the Dublin Core web site [13].

The strain is obviously getting to Stu…

Results from the Workshop

Arguably the most important single outcome from the Helsinki workshop is that the band of Dublin Core implementers around the world have been shown that they are not working alone; that they are, in fact, part of a concerted international effort to help organise networked information which is continually gathering support and momentum. Although there was not much time during the formal workshop sessions for implementation discussions divorced from the purity (or pedantry?) of data models and semantics, a great deal of informal discussion did take place in bars and over dinner, and many contacts were established that will surely prove extremely useful to several fledgling projects in the coming months (indeed, some commentators might argue that the BAR–BOFs, alcohol–assisted ‘Birds of a Feather’ sessions, are where the real work gets done!)

Workshop attendees hard at work in a BAR–BOF

Although the more ‘academic’ discussion of semantics, terminologies, data models etc. continues to be a vital part of the Dublin Core effort, there is perhaps a case to be made for the next workshop including an Implementers Day before or after the main event, where only real implementations are discussed, and questions of the ‘what problems did you encounter with x?’ and ‘how did you get around y?’ variety are encouraged.

'What did we decide, again…?'; Summing up on the last day A second outcome from Helsinki was an explosion in the number of working groups addressing issues related to the Dublin Core. These working groups, which are discussed further on the Dublin Core web site, include those addressing the COVERAGE and DATE elements, the procedures for adding sub–elements to the Core, the creation of an encompassing data model, and clarification of the relationship between the SOURCE and RELATION elements.

Finally, the formalisation of the Dublin Core as an Internet standard continues to move forward, with various people tasked with the production of six draft RFCs (Requests For Comment; an information publishing mechanism of the Internet Engineering Task Force) covering the semantics, HTML and RDF implementations for both ‘simple’ and ‘qualified’ Dublin Core. These documents should begin to appear in the not too distant future; and certainly before the next workshop in approximately six months time.

Acknowledgements

Thanks to Kelly Russell at eLib for finding the money to send us both to Helsinki. On behalf of all who were there, thanks also to Juha Hakala and his team for facilitating the meeting, and for keeping their cool in the face of seventy very strong–willed individuals. Stu Weibel also deserves a special mention for his continuing drive and enthusiasm, without which the ongoing Dublin Core effort would be much lessened.

Oh yes, and thanks to Ford for sponsoring the multilingual T–shirt! J

The search for Santa?

Finally, despite having spent several days in the country where Santa allegedly hides out during most of the year, neither of us managed to spot him in order to deliver our Christmas lists (although Paul did see one of his reindeer in Stockholm). So if you’re reading this, Santa, Tony would like someone to filter his e-mail, and Paul would really be quite happy to settle for a good book and a bigger travel budget for next year. You see, he’s never been to the Americas or Africa, and he’s just sure there must be a nice conference out there somewhere…

References

[1] Nordic Metadata Project,
http://linnea.helsinki.fi/meta/

[2] The 5th Dublin Core Metadata Workshop,
http://linnea.helsinki.fi/meta/DC5.html

[3] Miller, P. & Gill, T., 1997, Down Under with the Dublin Core,
http://www.ariadne.ac.uk/issue8/canberra–metadata/

[4] DC5 group photo,

[5] The Arts & Humanities Data Service,
http://ahds.ac.uk/

[6] Electronic Libraries Programme, Access to Network Resources
http://www.ukoln.ac.uk/services/elib/projects/#anr

[7] The UK Office for Library & Information Networking,
http://www.ukoln.ac.uk/

[8] Reuters,
http://www.reuters.com/

[9] Dublin Core Element Set reference definition,
http://purl.org/metadata/dublin_core_elements

[10] MARC Maintenance Agency,
http://lcweb.loc.gov/marc/

[11] Content Standards for Digital Geospatial Metadata, revised draft,
http://www.mews.org/nsdi/revis497.pdf

[12] D–Lib Magazine,
http://www.dlib.org/
also mirrored at http://mirrored.ukoln.ac.uk/lis-journals/dlib/dlib/

[13] The Dublin Core,
http://purl.org/metadata/dublin_core/

[14] Miller, P. & Greenstein, D., (Eds.), 1997, Discovering Online Resources Across the Humanities: A Practical Implementation of the Dublin Core,
http://ahds.ac.uk/public/metadata/discovery.html

[15] Dublin Core Metadata Element Set: Guidelines for Use,
http://www.interconnect.com/dc_guide.html

[16] HTML 4.0 Specification,
http://www.w3.org/TR/WD–html40/

[17] Resource Description Framework (RDF) Model and Syntax,
http://www.w3.org/TR/WD–rdf–syntax/

[18] Lagoze, C., 1996, The Warwick Framework: a container architecture for diverse sets of metadata,
http://www.dlib.org/dlib/july96/lagoze/07lagoze.html

[19] IFLA Functional Requirements for Bibliographic Records,
http://www.nlc-bnc.ca/ifla/VII/s13/frbr/frbr-toc.htm

[20] SPECTRUM: The UK Museum Documentation Standard,
http://www.open.gov.uk/mdocassn/spectrum.htm

[21] VRA Core,
http://www.oberlin.edu/~art/vra/wc1.html

Author details

Paul Miller
Collections Manager
Archaeology Data Service
King’s Manor
YORK YO1 2EP, UK
E–mail: collections@ads.ahds.ac.uk
Web page: http://ads.ahds.ac.uk/ahds/
Tel: 01904 433954

Tony Gill
ADAM & VADS Programme Leader
Surrey Institute of Art & Design
FARNHAM GU9 7DS, UK
E–mail:tony@adam.ac.uk
ADAM web page: http://www.adam.ac.uk /
VADS web page: http://vads.ahds.ac.uk/
Tel: 01252 722441