Metadata (1): Encoding OpenURLs in DC Metadata
This article proposes a mechanism for embedding machine parsable citations into Dublin Core (DC) metadata records [1] based on the OpenURL [2]. It suggests providing partial OpenURLs using the DC Identifier, Source and Relation elements together with an associated 'OpenURL' encoding scheme. It summarises the relevance of this technique to support reference linking and considers mechanisms for providing richer bibliographic citations. A mapping between OpenURL attributes and Dublin Core Metadata Element Set (DCMES) [3] elements is provided.
The OpenURL
The OpenURL provides a mechanism for encoding a citation for an information resource, typically a bibliographic resource, as a URL. The OpenURL is, in effect, an actionable URL that transports metadata or keys to access metadata for the object for which the OpenURL is provided. The target of the OpenURL is an OpenURL resolver that offers localized services in an open linking environment. The OpenURL resolver is typically referred to as the user's Institutional Service Component (ISC). The remainder of the OpenURL transports the citation.
The citation is provided by either using a global identifier for the resource, for example a Digital Object Identifier (DOI) [4], or by encoding metadata about the resource, for example title, author, journal title, etc., or by some combination of both approaches. It is also possible to encode a local identifier for the resource within the OpenURL. In combination with information about where the OpenURL was created, this allows software that receives the OpenURL to request further metadata about the information resource. However, this article focuses on the OpenURL metadata encoding mechanism rather than on the specific details of how OpenURLs are processed and used by resolvers and other software.
Originally known as the SFX-URL, the OpenURL's roots lie in the SFX research on reference linking in hybrid library environments [5]. At the time of writing, the OpenURL is most appropriate for citing bibliographic resources, although this is expected to change as the OpenURL develops and moves through the standardization process. Furthermore, the OpenURL has been developed primarily to support 'reference linking' applications. On its own, it does not provide enough richness to form the basis for detailed, full bibliographic citations, for example it includes only the first author of the work.
An OpenURL comprises two parts, a BASEURL and a QUERY. The BASEURL identifies the OpenURL resolver that will provide context sensitive services for the OpenURL. The BASEURL is specific to the particular user that is being sent the OpenURL - it typically identifies the ISC offered by the institution to which the user belongs. Services that embed OpenURLs in their Web interfaces, for example in their search results, must develop mechanisms for associating a BASEURL with each end-user. One way of doing this is to store the BASEURL in a cookie in the user's Web browser, another is to store the BASEURL along with other user preferences.
The QUERY part can be made up of one or more DESCRIPTIONs. Each DESCRIPTION comprises the metadata attributes and values that make up the citation for the resource. A full breakdown of the components of the DESCRIPTION is not provided here. See the OpenURL specification for full details [6].
Here is an example OpenURL:
http://resolver.ukoln.ac.uk/openresolver/?sid=ukoln:ariadne&genre=article &atitle=Information%20gateways:%20collaboration%20on%20content &title=Online%20Information%20Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel
In this example the BASEURL is <http://resolver.ukoln.ac.uk/openresolver/>, the URL of the UKOLN OpenResolver demonstrator service. The rest of the OpenURL is the QUERY, which is made up of a single DESCRIPTION of an article entitled 'Information gateways: collaboration on content' by Rachel Heery. The article was published in 'Online Information Review' volume 24.
Notice that, because the OpenURL is a URL, it is encoded in such a way that special characters, for example space characters, are represented by a percentage sign followed by two hex digits. This process is known as mandatory escape encoding.
(Note that all the OpenURL examples in this article have been split across multiple lines for display purposes. Note also that the optional OpenURL 'sid' attribute, set here to 'ukoln:ariadne', indicates the service that generated the OpenURL. For simplicitly other example OpenURLs in this article do not contain a 'sid' attribute.)
Proposals
This article makes two proposals. Firstly, that an OpenURL may be given as the value of a DC Identifier element as a way of providing a citation for the resource being described by the DC record. Secondly, that an OpenURL may also be given as the value of a DC Source or Relation element as a way of providing citations for resources that are related to the resource being described.
The mechanism used in both cases is the same - a partial OpenURL is placed in the element value. A partial OpenURL is an OpenURL without a BASEURL. This is because, at the time at which the OpenURL is placed into the DC element value, there is no knowledge of which end-user(s) will receive the OpenURL. It is therefore not possible or sensible to embed the BASEURL part of the OpenURL in the element value. Only the DESCRIPTION part of the OpenURL should be placed in the element value.
A DC encoding scheme [7] of 'OpenURL' should be used to indicate that the value forms part of an OpenURL. The DESCRIPTION part of the OpenURL should be full mandatory escape encoded prior to placing in the DC element value. Furthermore, any ampersand ('&') characters that appear in the OpenURL as attribute separators must be encoded as '&'.
Software that processes DC metadata records containing OpenURL DESCRIPTIONs will have to unencode any encoded '&' characters and add a BASEURL in order to deliver full OpenURLs to the end-user.
Proposal 1 - providing a citation for the resource being described
In order to provide a citation for the resource being described by a DC record, place an OpenURL DESCRIPTION for the resource in the value of a DC Identifier element and indicate a scheme of 'OpenURL'.
Here is an example, encoded using the XHTML <meta> tag:
<meta name="DC.Identifier" scheme="OpenURL" content="genre=article &atitle=Information%20gateways:%20collaboration%20on%20content &title=Online%20Information%20Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel" />
Note that the 'OpenURL' scheme is not yet formally recognised by the Dublin Core Metadata Initiative as a recommended Dublin Core qualifier.
A fuller set of XHTML <meta> tags for this resource might be:
<meta name="DC.Title" content="Information gateways: collaboration on content" /> <meta name="DC.Creator" content="Heery, Rachel" /> <meta name="DC.Identifier" scheme="OpenURL" content="genre=article &atitle=Information%20gateways:%20collaboration%20on%20content &title=Online%20Information%20Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel" />
In this case some information is duplicated in both the OpenURL DESCRIPTION and DC elements. This article makes no recommendations about whether it is sensible to duplicate the metadata in this way.
Note that for some applications, the citation provided by the OpenURL DESCRIPTION will not be sufficiently detailed. In such cases, a rich citation for the resource being described by the metadata record may only be achieved by combining the OpenURL DESCRIPTION with DCMES elements and possibly elements from other namespaces.
Proposal 2 - providing a citation for a related resource
In order to provide a citation for a resource that is related to the resource being described, place an OpenURL DESCRIPTION for the related resource in the value of a DC Source or Relation element and indicate a scheme of 'OpenURL'.
For example, imagine that an HTML version of the journal article mentioned above is made available on the Web. Its embedded metadata might be:
<meta name="DC.Title" content="Information gateways: collaboration on content"> <meta name="DC.Creator" content="Heery, Rachel"> <meta name="DC.Format" content="text/html"> <meta name="DC.Identifier" content="http://www.ukoln.ac.uk/~lisrmh/infogate.html"> <meta name="DC.Source" scheme="OpenURL" content="genre=article& &atitle=Information%20gateways:%20collaboration%20on%20content &title=Online%20Information%20Review&issn=1468-4527&volume=24 &spage=40&epage=45&artnum=1&aulast=Heery&aufirst=Rachel"> <meta name="DC.Relation.references" scheme="OpenURL" content="id=doi:10.1045/december99-dempsey&genre=article &atitle=International%20Information%20Gateway%20Collaboration:%20report of%20the%20first%20IMesh%20Framework%20Workshop &title=D-Lib%20Magazine&issn=1082-9873&date=1999-12&volume=5 &artnum=12&aulast=Dempsey&aufirst=Lorcan">
This DC record refers to two related resources - the original journal article from which the Web version is derived (using DC Source) and an article published in D-Lib Magazine that is cited in the article (using DC Relation).
Rich citations and strategies for handling duplicate information
The example OpenURLs shown above are ideal for supporting 'reference linking' applications. However, in some cases more detailed citation information may be required.
Consider this example DC record for a journal article:
<meta name="DC.Title" content="International Information Gateway Collaboration: report of the first IMesh Framework Workshop"> <meta name="DC.Creator" content="Lorcan Dempsey"> <meta name="DC.Creator" content="Tracy Gardner"> <meta name="DC.Creator" content="Michael Day"> <meta name="DC.Creator" content="Titia van der Werf"> <meta name="DC.Publisher" content="Corporation for National Research Initiatives"> <meta name="DC.Date" content="1999-12"> <meta name="DC.Type" content="article"> <meta name="DC.Language" content="en-us"> <meta name="DC.Rights" content="Copyright (c) 1999 Lorcan Dempsey, Tracy Gardner, Michael Day, and Titia van der Werf"> <meta name="DC.Identifier" scheme="DOI" content="10.1045/december99-dempsey"> <meta name="DC.Identifier" content="http://www.dlib.org/dlib/december99/12dempsey.html"> <meta name="DC.Identifier" scheme="OpenURL" content="id=doi:10.1045/december99-dempsey&genre=article &atitle=International%20Information%20Gateway%20Collaboration:%20report of%20the%20first%20IMesh%20Framework%20Workshop &title=D-Lib%20Magazine&issn=1082-9873&date=1999-12&volume=5 &artnum=12&aulast=Dempsey&aufirst=Lorcan">
Notice that there is information contained in the DC elements that is not available in the OpenURL - for example the names of multiple authors. There is also information in the OpenURL that is not available in the DC elements, and that could not be embedded into DC elements - for example the volume and article numbers. There is information that is more accessible for machine parsing in the OpenURL such as the author's family and given names. Finally, there is some information that is duplicated in both the DC elements and in the OpenURL.
(Note: in the general case, one can imagine information about the affiliations of the authors also being embedded into the DC metadata, though details of the mechanism to do this have not yet been agreed by the DCMI.)
In some cases it might be useful to remove the duplicated information from the DC record. One approach would be to remove attributes from the OpenURL DESCRIPTION, where that information is available in other DC elements. So, in the DC record above, the 'atitle' and 'id' attributes might be removed. In other cases it might also be possible to remove the 'date', 'aufirst' and 'aulast' attributes as well. Software that processes the DC record could attempt to reconstruct a full OpenURL by adding information to the partial DESCRIPTION based on the DC element values.
However, in many cases, particularly where metadata is embedded into a resource dynamically based on a back-end database, the cost of duplicating information in both DC elements and the OpenURL is probably not very high. Clearly, where metadata and OpenURLs are created and maintained manually, there will be consistency implications for any duplicated information.
A DC/OpenURL crosswalk
The table below gives the definitions of the current OpenURL attributes:
The table below provides a mapping from OpenURL attributes to unqualified DC elements.
The table shows OpenURL attributes against the genres for which they are allowed to be used. Mappings to DC elements are shown at appropriate points. An X in the table indicates that the OpenURL attribute may be used with the particular genre, but that there is no sensible DC mapping at that point.
The OpenURL 'genre' can be mapped to the DC Type element, although the list of OpenURL genres does not correspond with the list of types in the recommended DCMIType encoding scheme qualifier [8].
Note that five (author-related) OpenURL attributes are shown mapping to the DC Creator and Contributor elements. In general, several of these OpenURL attributes must be combined to form a complete DC Creator or Contributor value (for example aufirst and aulast). Depending on the formatting of a DC Creator or Contributor element value, mapping back from DC to these OpenURL attributes may be difficult because of the problems of splitting a single name into multiple components.
A richer crosswalk would be possible using qualified Dublin Core elements but this has not been presented here.
OpenURL standardization and future work
A request for fast-track standardization of the OpenURL was approved by NISO during its December 2000 SCD meeting. The expectation is that "NISO's aim will be to move rapidly towards a Draft Standard for Trial Use". Work is currently underway with NISO to establish a Steering Committee to work on the standardization. However, at the time of writing no firm timescales had been established.
It is anticipated that there will be some changes to the OpenURL specification during the standardization process. The nature of the changes will be:
- disentangling the syntax from protocol issues (basically describing the format in an HTTP-independent manner, with HTTP encodings as "examples")
- the introduction of a pointer to metadata on the OpenURL (which is available in an implicit way at this point). The result will be that OpenURL will make metadata available either by value (on the OpenURL) or by reference (pointer on the OpenURL).
- generalizations: the OpenURL must become applicable in a broader context than only scholarly bibliographic information. Generally speaking this means that there is the need to be able to describe objects via an OpenURL by means of a choice of metadata schema (the current bibliographic schema being a special case). Therefore, the notion of a metadata schema identifier will be introduced as a new parameter on the OpenURL. Metadata elements used to describe an object via an OpenURL must be defined within the schema represented by the metadata schema identifier on the OpenURL. The current bibliographic metadata schema will receive an identifier and current tags such as 'aulast', 'aufirst', 'issn', etc. will be defined within that schema.
(The authors would like to thanks Herbert Van de Sompel, Cornell University for providing background information for this section.)
Relation to DC Citation Working Group recommendations
The DC Citation Working Group was set up in November 1998 and was responsible for identifying standard methods for including bibliographic citation information about resources in their own metadata, and related problems of identifying resource version information. The group concentrated specifically on an article's placement within a journal, volume, and issue. The group has made several proposals for qualifiers to the Dublin Core Metadata Element Set (DCMES) to achieve this aim. Specifically:
- In the metadata for an article, DC.Relation.isPartOf could indicate the SICI, DOI and/or URL for the issue. Then, the metadata for the issue could indicate "isPartOf" the DOI and/or URL for the volume. Similarly, the metadata for the volume could indicate the ISSN, DOI and/or URL for the journal.
- Full citation information, including the page range (or other equivalent locator information for non-page-based articles) should go into DC.Identifier. Encoding this information in DC.Identifier recognises the fact that the citation information of journal title, volume number and start page effectively identifies a journal article.
- Furthermore, the working group recommended that DC.Identifier have an Element Qualifier of DC.Identifier.citation for the citation string.
- The text string that follows could also comply with a DC Citation Scheme (or Value Qualifier set) to specifically indicate the structural components of the citation, such as Journal Title (full and abbreviated), Journal Volume, etc. Specific title abbreviations can themselves be referred to external schemes, such as ISO 4, Index Medicus, Chemical Abstracts, Vancouver, World List, and so on. (Not all these schemes are well documented on the Web, they are mentioned here solely to indicate that there are a number of possible "standard" ways of abbreviating journal titles.)
- Other Identifiers could also of course go into DC.Identifier since all DC tags are repeatable, so the SICI, PII, DOI and/or URL for the article could also go here. (Note: in the metadata for an article, the SICI, for example, that is entered into DC.Identifier is the SICI for the article, but the SICI that goes into DC.Relation "isPartOf" is the SICI for the issue.)
- Chronology should be indicated in DC.Date.
- The working group agreed a possible structured-value set:
JournalTitleFull
JournalTitleAbbreviated
JournalVolume
JournalIssueNumber
JournalPageswith the associated semantic definitions of these terms. While this set does not cover every eventuality it deals with the vast majority of cases and will give (together with the article metadata in DC.Title, DC.Creator and DC.Date) complete information for any reference-citation record that anyone might want to extract.
It is worth noting that the working group's proposed structured-value set can be mapped directly to available OpenURL attributes as follows:
Proposed structured value | OpenURL attribute |
---|---|
JournalTitleFull | title |
JournalTitleAbbreviated | stitle |
JournalVolume | volume |
JournalIssueNumber | issue |
JournalPages | spage, epage, pages |
More recently the working group began discussing a related problem of how to capture bibliographic citation information about conference papers, with a view to including other bibliographic genre in the future. OpenURLs provide a way to encode citation information for books, book parts, conference proceedings and papers. However, some conference proceedings are also journal issues. In this case, to capture citation information for an article as both a conference item and a journal item, it would be necessary to include two OpenURLs within repeated DC Identifier elements.
Therefore, the OpenURL DESCRIPTION appears to offer all the functionality identified by the working group for encoding bibliographic citations for simple resource discovery, albeit using a less human-readable syntax than that proposed by the working group. However, it may not offer the required functionality for individual Dublin Core based applications.
(The authors would like to thank Cliff Morgan, John Wiley & Sons, Ltd. (previous chair of the DC-Citation Working Group) for supplying background information for this section.)
Conclusion
The main purpose of this article has been to propose the adoption of an 'OpenURL' encoding scheme for the DC Identifier, Source and Relation elements. By doing this, the DCMI will provide users of DC metadata with a simple method of encoding machine-readable citations for bibliographic resources within their metadata, in particular supporting a mechanism for linking between digital resources and non-digital resources. We have also provided a crosswalk between unqualified DC and the OpenURL attributes and shown how a combination of both OpenURLs and DC metadata can be used to provide richer citations than those provided by either technology on its own.
References
- Dublin Core Metadata Initiative
<http://dublincore.org/> - OpenURL
<http://www.sfxit.com/openurl/> - Dublin Core Metadata Element Set (DCMES)
<http://dublincore.org/documents/dces/> - Digital Object Identifier (DOI)
<http://www.doi.org/> - Reference linking in a hybrid library environment. Part 3: Generalizing the SFX solution in the "SFX@Ghent & SFX@LANL" experiment.
Van de Sompel, Herbert and Hochstenbach, Patrick.
D-Lib Magazine, October 1999.
<http://www.dlib.org/dlib/october99/van_de_sompel/10van_de_sompel.html> - OpenURL Syntax Description
<http://sfx1.exlibris-usa.com/openurl/openurl.html> - Dublin Core Qualifiers
<http://dublincore.org/documents/dcmes-qualifiers/> - DCMI Type Vocabulary
<http://dublincore.org/documents/dcmi-type-vocabulary/>
Author Details
Andy Powell Ann Apps [Andy is a member of the Dublin Core Advisory Committee. Ann is chair of the Dublin Core DC-Type Working Group and a member of the Dublin Core. Ann is also a member of the OpenURL NISO Standards Committee. Advisory Committee.] |