Web Magazine for Information Professionals

DC 2006: Metadata for Knowledge and Learning

Julie Allinson, Rachel Heery, Pete Johnston and Rosemary Russell report on DC 2006, the sixth international conference on Dublin Core and Metadata Applications, held 3 - 6 October 2006.

DC-2006 [1], the annual conference of the Dublin Core Metadata Initiative (DCMI), took place this year in the city of Manzanillo, on the Pacific coast of Mexico, with a subtitle of ‘Metadata for Knowledge and Learning’. The four-day conference was organised by the University of Colima [2], and the venue for the event was the Karmina Palace Hotel, a large hotel set within its own complex of restaurants, bars, shops and swimming pools.

The conferences of the DCMI emerged from the earlier series of workshops focused primarily on the activity of DCMI’s own working groups. Working group meetings remain a significant feature of the conferences, providing an opportunity for members to discuss progress and to disseminate information on their activities to a larger audience. However, the conferences also offer a platform for a broader community to present their thoughts and experiences in the form of research papers. A third strand of the conference is that of the ‘special sessions’, which typically combine a few short presentations with informal discussion around some particular topic of interest.

This account seeks only to highlight a few of the contributions and discussions which took place during the four days. Presentations for all sessions remain available on the Conference Web site [3].

Conference Opening

After an early morning tutorial on basic semantics, the conference was formally opened by Makx Dekkers, DCMI Managing Director, with some brief remarks about the conference which brought together approximately 200 delegates gathered from across the world. This was the second DCMI conference to be held in a Spanish- speaking country and the first to take place in Latin America.

Thomas Baker, DCMI Director of Specifications and Documentation and Chair of the Usage Board, continued the opening with a description of the Dublin Core Metadata Initiative today, looking beyond the core elements (ISO 15836) to the interoperability framework (an essential building block for Semantic Web applications) and DCMI support for communities. Drawing out some key themes for the conference, Tom pointed to vocabularies, with papers on SKOS (Simple Knowledge Organization System) core and registries, a tutorial and the Registry Working Group. With a paper session on ‘metadata models’ and the Architecture Working Group, Tuesday was branded ‘architecture day’, with frameworks and models another key theme of the conference. The issue of metadata interoperability was tackled through a DCMI/IEEE LTSC Task Force session, the DCMI Libraries Working Group session and a special session on Resource Description and Access (RDA). Education was well represented with a paper session on ‘metadata for education’, a ‘reports from the field session’, the Education Working Group and a closing keynote from Michael Crandall. Discussions of application profiles for the description of ePrints, Collections, and for Kernel metadata, papers on implementation and deployment and special topics such as dates, social networks and tagging and productivity all contributed to sharing application experience. The conference also sought to provide a forum for the various communities that have coalesced around DC implementation.

In the opening plenary, Abel Packer of BIREME, the Latin-American and Caribbean Center on Health Sciences Information [4], took as his starting point the view that information was the raw material of society. He examined how networks facilitated access to information in flexible and continuously changing ways, so that a ‘virtuous cycle’ developed in which information generated new information. He highlighted some particular features of present-day information networks: we are all globally interconnected (‘no one is more than a few handshakes away from anyone else’), but we also see the phenomenon of ‘preferential attachment’ where popularity becomes attractive and, for example, pages with many inward hyperlinks attract more inward links. He examined how a number of Web-based services in Latin America had developed around access to scientific, technical and medical publications, and he proposed a three-layer model of data, indexes (metadata) and interfaces, and argued that the principle of open access to publicly funded resources was compatible with a market of services built on those resources. He closed on a cautionary note, highlighting the “know-do gap” between theory and practice, and emphasising that there remained fundamental inequalities in access to information networks.

Models and Frameworks

Three of the four contributions in the first session of research paper presentations were closely related to the recent work within DCMI to develop the DCMI Abstract Model (DCAM) [5], a specification which describes the components and constructs which make up the information structure which it calls a DC ‘description set’ and how that information structure is to be interpreted.

Mikael Nilsson, Royal Institute of Technology, Sweden, argued that the DCMI Abstract Model was just one component within a larger set of components that constitute a ‘metadata framework’ used by DCMI, consisting of an abstract model, a vocabulary model (and instances of that model in the form of the metadata vocabularies owned and maintained by DCMI), a profile model (and instances of that model in the form of specific DC application profiles), and a set of metadata formats. While these components have been formalised to a greater or lesser degree within DCMI, they are clearly identifiable as distinct parts of the whole. Mikael extended his analysis to the IEEE LOM (Learning Object Metadata) standard and the Semantic Web set of specifications, pointing to corresponding components in those contexts. He concluded with the suggestion that we should abandon, or at least clearly qualify, our use of terms such as ‘metadata standard’ and ‘metadata schema’, and specify instead which of the components within the framework we are concerned with.

Pete Johnston, Eduserv Foundation, presented DC-Text, a simple text-based format for DC metadata [6], introducing a simple method for writing descriptions and description sets based on the DCMI Abstract Model. As a human-readable format, DC-Text has proved useful for the exchange of examples during discussions of the DCAM, and also for the presentation of examples in the new ‘encoding guidelines’ specifications which are currently under development. The usefulness of DC-Text as a machine-readable format is less clear at this point. A Backus-Naur Form (BNF) description of the format is available, but if a machine-readable plain text format for DC metadata is required, it may be more effective to develop an encoding based on YAML [7] or JSON (JavaScript Object Notation) [8].

Sarah Pulis, La Trobe University, Australia, approached the DCMI Abstract Model from the viewpoint of a vehicle for communication between the developers of DC application profiles and system developers using UML, and developed modified, UML-conformant versions of the DCAM resource model and description model which are usable by system architects using UML. In the course of this work she encountered a number of characteristics of the DCAM which made that process difficult, and made some suggestions for changes and clarifications.

In a meeting of the DCMI Architecture Working Group [9] Pete summarised the current work in progress on producing a revised version of the DCAM. This work makes explicit some of the (currently implicit) distinctions between components within the model highlighted by Mikael (particularly the articulation of a ‘vocabulary model’ describing the types of terms referenced in DC metadata descriptions and the type of relationships which may exist between terms); it also corrects some errors and omissions, including at least some of the problems identified by Sarah. One of the most significant changes will be a clarification of the concept of Vocabulary Encoding Scheme in order to dovetail better with DCMI’s historical use of that term and to produce a closer correspondence with the notion of a ‘Concept Scheme’ as that term is used in the context of SKOS.

Metadata Interoperability

The thorny questions of interoperability between ‘metadata standards’ - across specifications developed within frameworks using different, often incompatible, abstract models - was an issue in at least three sessions. In the course of their work on the development of a DC application profile, the DCMI Libraries Working Group [10] has been grappling with the differences between the hierarchical model used by XML Schema-based specifications such as MODS and the statement-oriented model described by the DCAM.

The DCMI/IEEE LTSC Task Force [11], whose work was presented in a ‘special session’, seeks to illustrate how this question can be addressed for the case of the IEEE LOM standard and Dublin Core, by producing a mapping between the set of data elements described by the IEEE LOM standard - strictly speaking from the particular ‘application profile’ described within the standard - to a set of terms that can be used within DC metadata description sets.

Related to the DCMI Libraries Working Group was a special session on RDA (Resource Description and Access) [12] that looked at the new content standard being developed to replace AACR2 []. Such rules are an important aspect of interoperability and this session, which included presentations by Mikael Nilsson and Robina Clayphan, British Library, sought comments from the assembled Dublin Core community to contribute into the current comment period for this new standard, being undertaken to align RDA with other metadata standards.

Vocabularies

In the paper session on ‘ontologies and controlled vocabularies’, Alistair Miles, CCLRC, presented a novel perspective on SKOS [13], by taking as his starting point not existing models for thesauri and controlled vocabularies, but rather an examination of the scope and purpose of SKOS. He identified two usage patterns for a SKOS workflow of creation, indexing and retrieval, where, in the first, a single vocabulary is used for both indexing and retrieval, or, in the second, two different vocabularies are used by the indexer and by the retriever, with the additional need to tie the two together. Use cases will help to establish the requirements for SKOS, and Alistair presented a hypothetical case to demonstrate this.

SKOS again featured in a presentation by Corey Harper, University of Oregon, which looked at approaches to encoding Library of Congress Subject Headings (LCSH). Corey explained his efforts to respond to Tim Berners-Lee’s exhortation to make the content of existing databases available on the Semantic Web, in this case by transforming a MARC representation of LCSH to a representation based on SKOS. He concluded with various suggestions for tools and services which might be built on top of such a resource, including facilitating the use of LCSH (or other controlled vocabularies) in the context of ‘social tagging’ applications.

Other papers focusing on vocabularies included a review of the National Science Digital Library (NSDL) registry project which is supporting the NSDL’s requirement for development and reuse of small vocabularies, particularly in the area of education; and use of OWL to express relationships within the AGROVOC vocabulary to support an Agricultural Ontology Service.

The Cornell NSDL Registry team, Diane Hillmann and Jon Phipps, gave a more detailed update and demonstration of the NSDL Registry in the DCMI Registries Working Group meeting. Other registry activity updates covered The European Library Metadata registry, introduced by Christine Frodl, Die Deutsche Bibliothek, and the JISC Information Environment Metadata Registry, presented by Rachel Heery, UKOLN [14].

The Registry Working Group went on to consider management of change within vocabularies, a topic of interest across metadata registries. Joe Tennis, University of British Columbia, also associated with the NSDL Registry project, led a discussion on modelling concept change within vocabularies, and how such changes might be encoded using SKOS. There was some lively debate on how the notion of ‘concept’ was being used, and the validity of distinguishing ‘snapshots’ of terms and vocabularies from ‘versions’. Once again the Functional Requirements for Bibliographic Information (FRBR) model was invoked to help analysis, for example in distinguishing an ‘abstract concept’ from a ‘concept instance’. Time was too short to come to any conclusions, there will be further discussion on the Registry mailing list of this and other topics that, due to pressure of time, slid off the agenda.

Application Experience

Implementations

A paper session on implementations gave an opportunity to learn how Dublin Core is being used ‘in the wild’. Papers from implementers provide valuable insights into the ways DC can be used to meet requirements.

Chiung-min Tsai, National Taiwan University, related experience of implementing an institutional repository for Digital Archive Communities at the University. In this system the original rich metadata is used for display, and is mapped to simple Dublin Core for search purposes. Leif Andresen, Danish National Library Authority, gave an account of DC providing a common presentation for data from archives, libraries and museums. He outlined mappings from sector specific formats and the DC extensions required. Michael Toth, Michael Toth Associates, fascinated the audience with a description of the role of DC metadata within the Archimedes Palimpsest Manuscript Imaging project. Multispectral imaging has been used to identify layers of text in this thousand year-old manuscript, and the content of the resulting images described using DC, alongside richer metadata. For those who have been associated with the DCMI over the years it is a real thrill to see how DC is proving useful in providing access to such a significant cultural artefact [15].

Collection Description

Marty Kurth and Jim LeBlanc of Cornell University Library presented a paper on their use of the model developed by Michael Heaney in An Analytical Model of Collections and their Catalogues [16] and of the application profile being developed by the DCMI Collection Description Working Group [17] to support the description of operations on catalogues, i.e. on collections of metadata records. While the profile developed by the DCMI CD WG encompasses the concept of services related to collections, it concerns itself only with services which implement a generic retrieval operation, with ‘services which provide access to the items within collections’. Kurth and LeBlanc, on the other hand, extended their view to services performing a range of ‘maintenance’ functions on the catalogue (including accrual, deletion, modification, migration etc).

ePrints Application Profile

The ePrints Application Profile [18] was a new piece of work presented this year in the form of a special session. The profile was developed in the UK by Andy Powell, Eduserv Foundation, and Julie Allinson, UKOLN, with funding from JISC (Joint Information Systems Committee). This session in Mexico provided an opportunity to seek validation of the profile from the assembled expert community. Julie and Andy presented an introduction to and overview of the work and the different work elements. The requirements-gathering exercise that formed the initial part of this work demonstrated the need for a more complex application model than that provided by ‘Simple DC’. The ensuing model adopted the FRBR entities and was based on the DCMI Abstract Model notion of a ‘description set’ to capture descriptions about each of the main FRBR entities (works, expressions, manifestations and items). The resulting application profile provides a richer metadata set, in terms of both the properties captured about the eprint and also in the relationships expressed between entities. Reactions to this more complex approach were generally positive. There was some concern expressed about the retrospective editing for repositories and also some questions relating to the implementation by software developers. It was also accepted that there is still work to do, particularly in creating an XML schema. There were some useful suggestions - for example that we talk to citation services about the work; and that we look to extend the profile to model e-theses fully. The final question put to the group was whether they felt a DC taskforce for eprints would be useful at which there was universal agreement.

FRBR also featured in a presentation by another conference newcomer, Ayako Morozumi, University of Tsukuba, who outlined Japanese work to map Dublin Core, and other metadata models to FRBR, in order to assess the usefulness of this model for an inclusive information environment.

Communities, and the Rest

Such a varied programme with a wealth of parallel sessions presents a considerable challenge to capture in such a relatively short report. Other highlights of the conference included well-attended early morning tutorials on basic semantics (Marty Kurth, Cornell University Library), basic syntax (Andy Powell, Eduserv Foundation), Vocabularies (Joe Tennis, University of British Columbia) and Application Profiles (Diane Hillman, Cornell University). In addition to those already mentioned, Working Group meetings were held for the Kernel, Date, Agents, Accessibility, Tools, Government and Localization and Internationalization groups [19], along with the Global Corporate Circle and special sessions for French Language Projects and Portuguese and Spanish Language Projects.

Paper sessions and topics ranged from the interoperability framework and application profile development, through SKOS, OWL and the LCSH, to DC implementations in imaging projects and cultural heritage preservation databases, metadata for French PhD theses, semantic mediacasting and digital signatures. For Education there was a Working Group meeting, reports from the field and a paper session on Metadata for Education featuring a presentation from CETIS colleagues on the JISC scoping study on vocabularies for describing pedagogical approaches in e-learning.

Finally, the special session on metadata and social tagging highlighted the importance of the more informal and community-oriented approaches to metadata creation that have emerged through social bookmarking and other ‘tagging’-based services, and suggested that this is an area that DCMI may wish to explore further in the future.

Conference Closing

On the final day, Mike Crandall continued the theme of DC-Education with his closing keynote presentation. Mike talked about the Product, the Process and the People involved in DCMI. By taking the products (the element set and abstract model), the processes of collaboration and community development and the enthusiasm of the people involved in DCMI, Mike outlined success stories already out there on the Web. For the future, he talked about taking forward the DCMI message to educate the wider metadata community, reiterating opening comments by Thomas Baker on the advantages offered by the metadata element set and the Abstract Model working together to create community-specific application profiles.

Reflections

Overall impressions of DC-2006 was that it was characterised not so much by any Big New Ideas which set people talking animatedly in the sessions and in the bars afterwards, but rather by a sense of consolidation. As Andy Powell [20] and Stu Weibel [21] note in their weblog posts, at DC-2006 there was a strong feeling that this was the year in which the DCMI Abstract Model became firmly ‘embedded’ in the activity of DCMI, of working groups and of DC implementers.

Whereas in 2004 and 2005, the feeling may have been one of people asking, ‘What is this thing?’ and ‘Do I really need this? Why should I bother?’, at DC 2006, in both the working group meetings and the research paper sessions, there was a clear sense that people were using the DCAM as their foundation, and indeed in some cases providing critical feedback on the DCAM itself. Of course, there is much work still to be done in this area, not least in the revisions to the model itself that are currently under discussion, and in the finalisation of the ‘encoding guidelines’ specifications which build directly on the DCAM to describe how to represent DC metadata description sets in concrete forms. But it seems the value of an abstract model in capturing a shared conception of ‘what DC metadata is’ has been accepted, and the use of that model is becoming well established amongst implementers.

As before, one of the most rewarding and interesting parts of the conference was to meet new people, particularly from the Latin American community. It is in meeting and conversing with those new to the DC conference that one learns what is of interest from the wider international perspective. Delegates left the Karmina Palace Hotel having enjoyed a stimulating conference, and having made many new friends and professional contacts. We hope the DCMI can build on the outcomes of this successful conference over the next year.

References

  1. DC-2006, International Conference on Dublin Core and Metadata Applications: Metadata for Knowledge and Learning. October 3-6, 2006. Manzanillo, Colima, Mexico http://dc2006.ucol.mx/
  2. University of Colima, Mexico http://www.ucol.mx/
  3. International Conference on Dublin Core and Metadata Applications, Programme http://dc2006.ucol.mx/program.htm
  4. BIREME http://www.virtualhealthlibrary.org/
  5. Powell, Andy, Nilsson, Mikael, Naeve, Ambjorn and Johnston, Pete. DCMI Abstract Model. DCMI Recommendation. March 2005.
    http://dublincore.org/documents/abstract-model/
  6. Johnston, Pete. DC-Text: A Text Syntax for DC Metadata. Working Draft.
  7. YAML http://www.yaml.org/
  8. JavaScript Object Notation http://json.org/
  9. DCMI Architecture WG http://dublincore.org/groups/architecture/
  10. DCMI Libraries WG http://dublincore.org/groups/libraries/
  11. DCMI/IEEE Learning Technology Standards Committee (LTSC) Task Force
    http://dublincore.org/educationwiki/DCMIIEEELTSCTaskforce
  12. RDA http://www.collectionscanada.ca/jsc/rda.html
    Editor’s note: this issue carries an article by Ann Chapman of UKOLN and Chair of CILIP/BL Committee on AACR, entitled RDA: A New International Standard.
  13. Simple Knowledge Organisation System (SKOS)
  14. DCMI Registry WG http://dublincore.org/groups/registry/
  15. Archimedes Palimpsest Program Documentation http://archimedespalimpsest.org/programmanage_documents.html
  16. Heaney, Michael. An Analytical Model of Collections and their Catalogues http://www.ukoln.ac.uk/metadata/rslp/model/
  17. DCMI Collection Description Application Profile http://dublincore.org/groups/collections/collection-application-profile/
  18. ePrints Application Profile http://www.ukoln.ac.uk/repositories/digirep/index/Eprints_Application_Profile
  19. DCMI Working Groups http://dublincore.org/groups/
  20. Powell, Andy. ‘Big Crashing Sounds - I can hear you’, eFoundations, 6 October 2006
    http://efoundations.typepad.com/efoundations/2006/10/big_crashing_so.html
  21. Weibel, Stu. ‘It’s the model, stupid’, Weibel Lines, 4 October 2006
    http://weibel-lines.typepad.com/weibelines/2006/10/its_the_model_s.html

Author Details

Julie Allinson
Repositories Research Officer
UKOLN, University of Bath

Email: j.allinson@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/

Rachel Heery
Assistant Director, Research and Development
UKOLN
University of Bath

Email: r.heery@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/

Pete Johnston
Technical Researcher
Eduserv Foundation

Email: pete.johnston@eduserv.org.uk
Web site: http://www.eduserv.org.uk/foundation/people/petejohnston/

Rosemary Russell
Interoperability Focus
UKOLN
University of Bath

Email: r.russell@ukoln.ac.uk
Web site: http://www.ukoln.ac.uk/

Return to top