5 Step Guide to Becoming a Content Provider in the JISC Information Environment
This document provides a brief introduction to the JISC Information Environment (JISC-IE) [1], with a particular focus on the technical steps that content providers need to take in order to make their systems interoperable within the JISC-IE technical architecture. The architecture specifies a set of standards and protocols that support the development and delivery of an integrated set of networked services that allow the end-user to discover, access, use and publish digital and physical resources as part of their learning and research activities. Examples of the kind of activities supported by the architecture include:
- Integration of local and remote information resources with a variety of 'discovery' services (for example the RDN subject portals [2], institutional and commercial portals and personal reference managers) allowing students, lecturers and researchers to find quality assured resources from a wide range of content providers including commercial content providers and those within the higher and further education community and elsewhere. (Examples of the kinds of content that are available through the JISC Information Environment include scholarly journals, monographs, textbooks, learning objects, abstracts, manuscripts, maps, music scores, Internet resource descriptions, still images, geospatial images and other kinds of vector and numeric data, as well as moving picture and sound collections).
- Seamless linking from 'discovery' services to appropriate 'delivery' services.
- Integration of information resources and learning object repositories with Virtual Learning Environments (for example, allowing seamless, persistent links from a course reading list or other learning objects to the most appropriate copy of an information resource).
- Open access to e-print archives and other systems for managing the intellectual output of institutions.
A general introduction to the technical architecture of the JISC-IE is available in The JISC Information Environment and Web services, Ariadne issue 31 [3]. The architecture itself, as well as other supporting material, is available through the JISC IE technical architecture Web site [4]. This document is based on a short presentation given to the PALS conference: Delivering Content to Universities and Colleges in June 2002 [5] (on a day that was somewhat sad for English football [6]).
The technical architecture specifies four broad classes of service components within the JISC-IE.
- In the provision layer, content providers make their content (bibliographic resources, full-text, data-sets, images, videos, learning objects, etc.) available to other components.
- In the fusion layer, brokers and aggregators take metadata from one or more content providers and combine it together in various ways, making the resulting metadata records available to other components.
- In the presentation layer, service components interact with content providers, brokers and aggregators to provide services targetted at real end-users. It is probably worth noting that the technical architecture tends to use the word 'portal' when refering to components in the presentation layer. This is possibly somewhat misleading. There will be a large number of different kinds of services in the presentation layer (some of which we haven't thought of yet!), including subject portals, portals offered by publishers and commercial intermediaries, reading list and other tools in VLEs, library portals (e.g. Zportal or MetaLib) [7], SFX service components, personal desktop reference managers, etc.
- Finally, a set of shared services support the activity of all the other service components, for example, by providing shared authentication, authorisation and service registries.
However, there are no firm boundaries between these classes and it is often the case that service providers will offer a mix of the components outlined above. In addition, it must be stressed that content providers will usually want to support both human-oriented Web site access and machine-oriented access to their content.
The remainder of this document is targetted at technical staff and service managers in institutions and organisations who have content to make available to other service components. So if you are a book or journal publisher, or the administrator of an institutional eprints archive, or if you look after a learning object repository, or digital image archive, or you have some other kind of material of value to the UK HE/FE community to make available, please read on...
What do you need to do?
The main focus of the technical architecture is on the standards and protocols needed to support machine to machine (m2m) interaction in order to deliver resource discovery services. Most content providers will already offer a Web site through which end-users can access their content. To be a part of the JISC-IE, content providers also need to support machine oriented interfaces to their resources. It should be noted that this is very much in line with the general trend towards supporting 'Web services' [8] (although some of the protocols and standards in the JISC-IE technical architecture may not be considered as true 'Web service' standards currently) and is compatible with the specifications being developed by the IMS Digital Repositories working group [9].
Note that you do not need to follow all five steps (though it would be nice!). Following just one of the steps below will significantly increase the integration of your content with other services in the JISC-IE and with other developments taking place outside the UK HE/FE community.
Step 1: Expose metadata about your content
Make the metadata about your content available to other service components by making it available for distributed searching or harvesting (you can do both if you like!).
Support searching using Z39.50
Support distributed searching of your content by remote services by offering a Z39.50 target compliant with functional area C of the Bath Profile [10]. In other words, use Z39.50 to expose simple Dublin Core metadata [11] about your content.
Note that Z39.50 is sometimes not seen as being a very Web-friendly protocol! In the future it is anticipated that SOAP (Simple Object Access Protocol) [12] will be used to support distributed searching, perhaps based on SRW [13].
Support harvesting using the OAI Protocol for Metadata Harvesting
Enable remote services to gather your metadata records by offering an Open Archives Initiative repository using the OAI Protocol for Metadata Harvesting (OAI-PMH) [14]. In other words, use the OAI-PMH to expose simple DC metadata about your content.
Step 2: Share news/alerts using RSS
If you support news and/or alerting services, consider offering them in a machine-readable format. News and alerts might typically include:
- service announcements,
- list(s) of new resources.
Use RDF Site Summary version 1.0 (RSS) [15], a simple XML application, to share your news feeds. Use RSS in addition to existing email alerting if appropriate.
Step 3: Become an OpenURL source
If your Web site supports the discovery of bibliographic resources, e.g. books, journals and journal articles, adopt open, context-sensitive linking by adding OpenURLs [16] into search results. This is often, somewhat misleadingly, referred to as 'adding SFX buttons' next to each result (because the original OpenURL resolver was SFX from Ex Libris). In order to embed OpenURLs in search results you will have to support some mechanism for associating a preferred OpenURL resolver with each user - e.g. by using cookies or a user-preferences database.
Step 4: Become an OpenURL target
If your Web site provides access to bibliographic resources, allow links back into your services from OpenURL resolvers. To do this you need to publicise your identifier-based 'link-to' syntax, e.g. the fact that you support ISBN-based, ISSN-based or DOI-based URLs. This implies that you must support deep-linking to resources, either:
- direct to each resource, or
- indirect via an abstract page for each resource.
Publicising your 'link-to' syntax allows OpenURL resolvers to generate URLs for your resources based on metadata (typically identifiers) extracted from the OpenURL.
Step 5: Use persistent URIs
Z39.50, OAI-PMH and RSS expose your metadata to other services. As noted above, you should allow deep-linking from your metadata (e.g. from your search results) to your resources. Your deep-linking URLs should be unique and persistent. For example, they may be based on DOI [17] or PURL [18] technologies. This will ensure long-term use of your URLs, in course reading lists or embedded into other learning resources for example.
Some issues
Various issues need to be considered when adding machine-oriented interfaces to your content and metadata.
Authentication and access control
Many content providers maintain some level of control over who can access their content. Typically this is done using Athens [19], local usernames and passwords, IP address checking or some combination of these. Similar controls can be placed in front of your machine-oriented interfaces. However, it is important to distinguish between controlling access to your content and controlling access to the metadata that describes your content. Furthermore, it is important to note that where the user is interacting with a remote portal that is then performing a cross-search of your metadata, the portal is likely to have challenged the user for authentication information. There will probably have to be some level of trust between you and the portal provider in determining who has rights to search what. The ZBLSA project [20] has specified a lightweight mechanism by which portals can ask content providers if a particular end-user (or class of end-user) is likely to be able to gain access to a particular resource.
In all cases, you retain ultimate control over access to your content at the point that the end-user selects a link in search results or OpenURL resolver, irrespective of how and where the user has obtained the link. Furthermore, you retain control over the richness of the metadata records that you expose to other services, either for searching or for harvesting.
Branding vs. visibility
There is some concern that exposing metadata to external services may lead to a loss of branding. However, this may not be a significant problem. External services can be expected to carry the original content provider's branding as a 'quality stamp' on the content. For example, each RSS channel carries the content provider's name, URL and logo. Furthermore, following URLs in search results leads the end-user direct to the content provider's Web site - and so should ultimately result in more visibility rather than less.
Information flow
The JISC-IE is not just about a one way flow of information from 'publishers' to 'consumers'. As indicated above, most service providers will offer a number of service components. Many publishers are developing (or have developed) 'portal' offerings. Such portals will be able to interact with content made available within the UK HE/FE community (and elsewhere) using the standards described above. For example, the RDN offers Z39.50 access to almost 60,000 Internet resource descriptions (and hopes to offer a SOAP interface fairly soon) [21]. Similary, metadata about the publications being deposited in institutional eprint archives can typically be freely harvested by anyone using the OAI-PMH.
Advertising your collection(s)
So, you have some content (a collection of resources) to make available and you've done the work to support Z39.50 and/or the OAI-PMH. How do brokers, aggregators, portals and other presentation services know that your metadata is available for searching and/or harvesting?
This is actually a two-fold problem. Services (and end-users!) need to know something about the content of your collection (subject coverage, resource types, media formats, etc.) and some technical details about the particular network services that you offer to make that collection available (protocol, IP address, port number, etc.). The JISC IE architecture describes the use of a 'service registry' to provide access to collection and service descriptions. MIMAS are currently being funded by JISC for one year to develop a pilot JISC IE service registry to investigate some of the issues of running such a service.
Richer metadata
Typically, the metadata you hold about your content will be significantly richer than simple Dublin Core. Given the architecture described above, you will need to map your rich metadata to simple Dublin Core in order to make it available for searching and/or harvesting. You may also choose to expose your richer metadata as well. For example, you may choose to make MARC records available using Z39.50, or you may allow people to harvest your IMS metadata records using the OAI-PMH.
Alternatively, you may expose simple Dublin Core metadata to support discovery of your content, but deliver content 'packages' that include both the resource and some richer metadata about the resource at the point that end-users or other services access your content. For example, a learning object repository is likely to use IMS metadata [22] internally. It may map each IMS metadata record to simple Dublin Core and expose them for harvesting using the OAI-PMH. At the point people access the learning objects (by 'clicking on the links' in search results), an IMS content package [23] may be delivered that includes both the learning object and the full IMS metadata about that object.
Conclusion
This article has summarised five steps that content providers can take to more fully integrate their content within the JISC Information Environment, and discussed some of this issues that are associated with doing so. The intention has been to try and show that the JISC IE is not overly difficult, that in many cases it will be in line with the things that content providers are doing anyway, that it is compatable with other initiatives and general Web trends and that integrating access to content within the JISC IE should result in more visibility and use of the high quality resources that are being made available to the UK higher and further education community.
References
- Information Environment: Development Strategy 2001-2005 (Draft)
<http://www.jisc.ac.uk/dner/development/IEstrategy.html> - The RDN Subject Portals Project
<http://www.portal.ac.uk/spp/> - The JISC Information Environment and Web services
Andy Powell & Liz Lyon, UKOLN, University of Bath
<http://www.ariadne.ac.uk/issue31/information-environments/> - JISC Information Environment Architecture
Andy Powell & Liz Lyon, UKOLN, University of Bath
<http://www.ukoln.ac.uk/distributed-systems/jisc-ie/arch/> - 10 minute practical guide to the JISC Information Environment (for publishers!)
Andy Powell
PALS conference: Delivering Content to Universities and Colleges
<http://www.alpsp.org/PALS02.htm> - Brazil end England's dream
BBC Sport
<http://news.bbc.co.uk/sport3/worldcup2002/hi/matches_wallchart/england_v_brazil/newsid_2049000/2049924.stm> - Library orientated portals solutions
Andrew Cox and Robin Yeates, LITC, South Bank University
<http://www.jisc.ac.uk/techwatch/reports/index.html#libportals> - An Introduction to Web Services
Tracy Gardner, IBM United Kingdom Laboratories (Hursley)
<http://www.ariadne.ac.uk/issue29/gardner/> - IMS Digital Repositories Specification
<http://www.imsproject.org/digitalrepositories/index.cfm> - The Bath Profile Maintenance Agency
<http://www.nlc-bnc.ca/bath/bath-e.htm> - Dublin Core Metadata Element Set, Version 1.1: Reference Description
<http://dublincore.org/documents/dces/> - Simple Object Access Protocol (SOAP) 1.1
<http://www.w3.org/TR/SOAP/> - Search/Retrieve Web Service
<http://www.loc.gov/z3950/agency/zing/srw.html> - The Open Archives Initiative
<http://www.openarchives.org/> - RDF Site Summary (RSS) 1.0
<http://purl.org/rss/> - OpenURL Overview
<http://www.sfxit.com/openurl/> - Digital Object Identifier
<http://www.doi.org/> - Persistent URLs
<http://purl.org/ - Athens Access Management System
<http://www.athens.ac.uk/> - The JOIN-UP Programme
<http://edina.ed.ac.uk/projects/joinup/> - Working with the RDN
Pete Cliff, Pete Dowdell & Andy Powell, UKOLN, University of Bath
<http://www.rdn.ac.uk/publications/workingwithrdn/> - IMS Learning Resource Meta-data Specification
<http://www.imsglobal.org/metadata/index.cfm> - IMS Content Packaging Specification
<http://www.imsproject.org/content/packaging/index.cfm>
Author Details
- Andy Powell
UKOLN, University of Bath
a.powell@ukoln.ac.uk