SEAMLESS: Introduction to the Project
SEAMLESS is a two year research project, funded by the British Library, which aims to develop a new model for citizens’ information - one which is distributed, and based on partnerships and common standards.
The objectives of the SEAMLESS project are to:
- build strong and sustainable partnerships between the various information providers operating in the region
- develop and implement common standards (technical and informational) so as to achieve interoperability between their systems and data
- develop a SEAMLESS interface which will allow simultaneous querying of distributed information sources (whether stored in a database, made available on a website, or in word processed documents) and return all the information back to the user in a unified list
- facilitate electronic communication between the information providers and their customers, and between the various participating agencies
- develop a current awareness/alerting service for users (second phase)
Currently the project team (Essex Libraries, Fretwell Downing Data Systems Ltd. and Education for Change Ltd.) are working with 29 organisations in Essex (national government departments, County Council departments, District Councils, Health authorities, business organisations, educational establishments, CABs, voluntary and charitable groups etc.) to develop the necessary standards and set up a prototype system. The application of metatags and the creation of a common thesaurus are being investigated. Once the system has been tested, modified, and proved viable, it is hoped that the system will be opened up to all information providers in the region, and that it will form the basis for the development and delivery of citizen’s information in Essex in the future.
Why do we need a profile for citizens’ information?
Developments in three discrete, but inter-related areas are converging to create a need for a new profile, or standard attribute set, for citizens’ information which will support greater interoperability, help to improve resource description and discovery, and act as a basis for the development of new e-services:
- a growing emphasis on joint working and partnership arrangements, often described as ‘joined up thinking’ or ‘joined up government,’ amongst organisations which provide services to the public. Increasingly, this includes cross-sectoral initiatives involving players from the public, private and voluntary sectors. Examples include: Health Action Zones, Employment Action Zones, Regeneration (Pathfinder) Projects, social exclusion initiatives, Local Agenda 21 projects, the implementation of the Crime and Disorder Act, the development of a National Childcare Strategy and implementation of Early Years Development Plans, Lifelong Learning, University for Industry, and the setting up of Regional Development Agencies. All depend on strong local partnerships and in this context the ability to share information effectively is a key requirement, one which in turn rests upon the development and adoption of suitable standards.
- a concern for the users of local services, and the difficulties they face in trying to access the services and information they need in an increasingly complex and fragmented information environment. In order to live their lives and play their full part in society people need information from a wide range of public, private and voluntary sector organisations, nationally and locally. Most of these organisations produce information in a variety of printed and electronic formats, many of which are available in the public library. Users are confused by a multitude of overlapping information sources, with a multitude of different search interfaces, which act as a barrier to easy access.
Although this approach is gradually being upgraded to a www based environment, which allows access to an increasing amount of information, it still produces ‘information ‘islands’ which can only be bridged through superficial high level hyper-links. Another problem, which is well recognised by information professionals, is that the current state of indexing and description of documents and resources on the web is inadequate, which means that searches tend to favour recall at the expense of precision. There is a need for further development, and practical application, in areas such as the use of metadata, and automated techniques based on harvesting and web crawlers, in order to improve this situation.
* the publication of the influential report on the future for Britain’s public libraries ‘New Library - the Peoples’ Network’ [1] http://www.lic.gov.uk/publications/newlibrary.html has highlighted the need for public libraries to be linked up to a high speed, high capacity digital network. Attention now is turning to the content and services that public libraries will be able to deliver over that network. The provision of citizens’, or community information, has traditionally been one of the public library’s core functions and there is considerable interest the question of whether and how citizens’ information resources held locally can be aggregated, or made available, as a national resource.
Related research
The British Library funded CIRCE project (http://www.gloscc.gov.uk/circe/index.htm ) has been investigating the potential for networking public library community information databases. The fundamental difference between CIRCE and SEAMLESS is that the SEAMLESS team do not see a long term future for public library community information databases as such. Rather they take the view that there is a danger that public libraries may become marginalised as information providers unless the twin ‘threats’ of competition from other information providers and the trend to remote access to information encouraged by the development of the www are addressed. The SEAMLESS project proposes to develop, test and evaluate a new model for citizens’ information provision in which the public library becomes the facilitator, co-ordinator and standard setter for a distributed system (made up of the information resources of a network of local information providers) and provides expertise and training on demand.
Two basic, but crucial, pre-conditions underpin this new model. The first is that a substantial degree of co-operation is needed between the various information providers in any given locality: no one organisation can provide a successful citizens’ information service in isolation. The second is that some common technical and information standards need to be developed and adopted in order to facilitate successful co-operation and to enable the necessary sharing of data between partners and efficient dissemination of data to the wider public.
One of the key aims of the SEAMLESS project is to test whether some of the large body of previous research into interoperability and metadata could beneficially be applied to a new domain - that of citizens’ information. (See www.ukoln.ac.uk/metadata/, www.ukoln.ac.uk/elib/ and www2.echo.lu/libraries/en/metadata/matahome.html for more information on a number of European Union (EU) and Joint Information Services Committee (JISC) projects funded under the Telematics for Libraries and Electronic Libraries (e-Lib) programmes. Interest in this area continues to grow and JISC and BLRIC (British Library Research and Innovation Centre) have recently established UK Interoperability Focus to explore, publicise and mobilise the benefits and practice of interoperability across diverse information sectors (www.ukoln.ac.uk/interop-focus/).
Extant profiles
Standard attribute sets are a useful starting point for considering data representation in any area. A number of these either currently exist or are emergent in the area of citizen’s information. The SEAMLESS team studied existing standard attribute sets and compared their elements and possible application. The team also looked at a variety of sources describing the general application of metadata.[2] [3] [4] http://www.ukoln.ac.uk/metadata/desire/overview [5] [6]
- US MARC Community Information Format
- This is the extension of the US MARC attribute set that covers cataloguing of community information. Further details about this attribute set are available at the Library of Congress website (http://lcweb.loc.gov/marc/community/eccihome.html).
- Dublin Core
- The Dublin Core seeks to establish a way to describe documents and “document-like objects” such as web pages, in a way which will enable search engines to index and retrieve them. Further information is available from the website ( http://purl.org/dc/ ).
- GILS
- The Government (or Global) Information Locator Service is the result of an international agreement (based on original work among government departments in the US) to provide a standard for locating information, whether held in libraries, data centres, or published on the Internet. The standard adopted for this service is ISO 23950, also known as (ANSI) Z39.50.[7] Further information is available from the website ( http://www.usgs.gov/gils/ ).
- CIMI
- Consortium for the Computer Interchange of Museum Information. Since 1990 CIMI has made substantial progress in the development of standards for structuring museums’ data and enabling widespread search and retrieval capabilities. Further information is available from the CIMI website ( http://www.cimi.org ).
- IMS
- Instructional Management Scheme. The IMS Project is developing and promoting open specifications for facilitating online activities such as locating and using educational content, tracking learner progress, reporting learner performance and exchanging student records between administrative systems. Further information can be found in the IMS website (http://www.imsproject.org/what.html ).
Development of the SEAMLESS profile
SEAMLESS was established with the intention that a wide range of types of organisation should be included, so it was important to ensure that the final system would be hospitable to different types of information and that it would meet the needs of varying types of organisation and the particular needs of their customers. In setting out to define a common information profile (attribute set) the project team contacted a wide variety of potential partner organisations, selected to include some who had expressed interest following the launch conference, some who had worked with the library service before, and some whom it was felt would enhance the variety of information challenges for the pilot project.
Meetings were held with each organisation during Spring 1998 to give them more information about SEAMLESS and to collect information about their role and services, and a workshop was held in April. An Information audit was carried out during June and July 1998 to analyse the organisations’ information products and systems in detail and to assist them in the selection of information sets to make available for the pilot project. This information was then collated and there followed an iterative process of developing a set of information attributes which were both broad enough to encompass the range of domains represented and suitably constrained so as to be manageable in the real world working environment of the organisations concerned.
Following the research into existing standards the team undertook a detailed analysis of the sample data supplied by partner organisations during the Information Audit. The team identified and mapped the various elements within each data set to establish overlaps and common terms. Research staff from Essex Libraries, Education for Change and Fretwell Downing then met to discuss the various options.
The original proposal for the SEAMLESS project postulated a information profile based upon the Dublin Core. Initial research within the project indicated that GILS provided a better basis for development. It was felt that it provided a more hospitable attribute set for the elements identified within the sample data than Dublin Core, while being less complicated to apply and offering more potential for accommodating future developments than USMARC. It is also compliant with the international standard for information searching, ISO 23950 (Z39.50) which is used in the project.
Having decided that GILS might be the standard to use, the research team then undertook detailed matching of the data obtained from partners in the Information Audit to the full GILS Core Elements. The profile had to be able to cope with elements from three data formats: data bases where every field would need to be tagged in order to be displayed, web pages where only searchable elements were required and word documents where again searchable elements were needed but where substantial editing might be required to produce useable data.
This work proved that the majority of data would fit into the GILS Core Elements. The major gap was for information relating to educational courses where there seemed to be nowhere to include information about entry requirements, resulting qualification, target audience or the duration or type of course.
The team therefore reconsidered the other extant standards and decided that the IMS profile included elements which would plug this gap. Following advice from Fretwell Downing four IMS elements were included in the SEAMLESS Information Profile as a Learning Provision Subset. In addition, discussions with the participating information providers indicated a desire to incorporate the Alta Vista format for the keyword and description attributes. These therefore appear in the SEAMLESS profile without the SEAMLESS prefix (se.), the intention being that these tags can be recognised by the Alta Vista robots as well as by SEAMLESS.
Matching also showed that for the majority of the data currently included in the project, the full GILS Core Elements was not required. GILS includes some quite complicated nested tags and requires some expertise to implement correctly. The intention is that partner organisations will add the tags themselves and the team was conscious that the process had to be simplified as much as possible. The workload involved in manipulating data for SEAMLESS had already been identified as a potential problem by many of the organisations and it was felt that any long and complicated tagging process might cause some organisations to drop out of the project.
After discussion with GILS experts at Fretwell Downing and Sebastian Hammer of Index Data, Denmark, the team developed a set of 33 SEAMLESS information attributes (the ‘SEAMLESS Information Profile’) which can for the most part be mapped directly onto the equivalent GILS Core Elements.
Details of the SEAMLESS profile
The 33 elements are (mandatory elements in bold type):
Element No. | Name | Description |
1 | title | assigned title or description of the resource |
2 | source | the organisation or provider who is making the information available to SEAMLESS |
3 | date-last-modified | in the form DD/MM/YYYY |
4 | channel | term(s) from the SEAMLESS Channels list |
5 | keywords | term(s) from the SEAMLESS thesaurus |
6 | originator | the body primarily responsible for the intellectual content of the information. |
7 | contact-name | the person to contact for more information |
8 | contact-organisation | the name of the organisation to contact for more information |
9 | contact-address | the address of organisation to contact for more information |
10 | contact-network-address | Email address to contact for more information |
11 | distributor | This element will apply mainly to bibliographic items |
12 | cost | cost information |
13 | begin-date | in the form DD/MM/YYYY |
14 | end-date | in the form DD/MM/YYYY |
15 | time-textual | Time/date expressed in words |
16 | linkage | Show URL, URI, SICI, PII, DOI, PURL, ISBN, ISSN etc. here |
17 | linkage-type | e.g. HTML, MIME, plain text etc. |
18 | medium | e.g. CD-ROM, Book, Video etc. |
19 | place | one term plus it’s post town, e.g. Chelmsford |
20 | description | a textual description relating to the general nature and content |
21 | contributor | e.g. co-author |
22 | date-of-publication-structured | in the form DD/MM/YYYY |
23 | date-of-publication-textual | date expressed in words |
24 | language | language of the intellectual content of the resource |
25 | general-constraint | e.g. copyright, use & reuse, intellectual property etc. |
26 | control-identifier | any local reference number that uniquely identifies the resource within its domain |
27 | record-review-date | in the form DD/MM/YYYY |
28 | supplemental-information | a field to map miscellaneous information |
29 | body | Body text (where appropriate). Basic formatting (white space) is preserved. |
Learning provision sub-set | ||
30 | ims.prerequisite | entry requirements for courses |
31 | ims.educationalobjective | qualification or intended learning result of course |
32 | ims.level | the target audience or level of the course |
33 | ims.duration | length of the course and/or the type of study e.g. full time, part time etc. |
Mapping of SEAMLESS Profile Attributes to GILS Core Elements
The mappings are as shown in the table below. Note that where GILS provides several groupings of sub-elements, the decision was taken within the SEAMLESS project to provide a “flat” (i.e. non-nested) schema, which it was felt would ease the process of data preparation across a wide variety of locations and by staff with varying levels of technical understanding.
SEAMLESS Element No. | Name | GILS Element No. | Equivalent GILS Core Element |
1 | title | 4 | Title |
2 | source | 1019 | Record source |
3 | date-last-modified | 1012 | Date of last modification |
4 | channel | 2074 | Controlled Subject Index sub-group: Controlled term |
5 | keywords | 2074 | Controlled Subject Index sub-group: Controlled term |
6 | originator | 1005 | Originator |
7 | contact-name | 2023 | Point of Contact sub-group: Contact Name |
8 | contact-organisation | 2024 | Point of Contact sub-group: Contact Organization |
9 | contact-address | 2025 - 2029 | Point of Contact sub-group: Contact Street Address Contact City Contact State or Province Contact Zip or Postal Code |
10 | contact-network-address | 2030 | Point of Contact sub-group: Contact Network Address |
11 | distributor | 2006 | Availability sub-group: Distributor Name |
12 | cost | 2055 | Order Process sub-group: Cost Information |
13 | begin-date | 2072 | Availability sub-group: Beginning Date |
14 | end-date | 2073 | Availability sub-group: Ending Date |
15 | time-textual | 2045 | Availability sub-group: Available Time Textual |
16 | linkage | 2021 | Availability sub-group: Linkage |
17 | linkage-type | 2022 | Availability sub-group: Linkage Type |
18 | medium | 1031 | Availability sub-group: Medium |
19 | place | 2042 | Spatial Domain sub-group: Place Keyword |
20 | description | 62 | Abstract |
21 | contributor | 1003 | Contributor |
22 | date-of-publication-structured | 31 | Date of Publication sub-group: Date of Publication Structured |
23 | date-of-publication-textual | 31 | Date of Publication sub-group: Date of Publication Textual |
24 | language | 54 | Language of Resource |
25 | general-constraint | 2005 | Use Constraint |
26 | control-identifier | 1007 | Control Identifier |
27 | record-review-date | 2051 | Record Review Date |
28 | supplemental-information | 2050 | Supplemental Information |
29 | body | None | None |
Learning provision sub-set | |||
30 | ims.prerequisite | None | SEAMLESS/IMS specific sub-group |
31 | ims.educationalobjective | None | SEAMLESS/IMS specific sub-group |
32 | ims.level | None | SEAMLESS/IMS specific sub-group |
33 | ims.duration | None | SEAMLESS/IMS specific sub-group |
Mapping of SEAMLESS Profile Attributes to Dublin Core Elements
During discussion several partners expressed concern about implementing a SEAMLESS attribute set which would not provide additional retrieval advantages in the wider web community beyond those systems already recognising GILS. There was some feeling particularly in the academic organisations that they did not wish to cut themselves off from the Dublin Core community. The team therefore decided to include a mapping of SEAMLESS attributes to Dublin Core Elements as part of the system. This is shown in the following table. For details of similar work see the ‘Dublin Core/MARC/GILS crosswalk’.[8] http://www.loc.gov/marc/dccrocc.htm
SEAMLESS ELEMENT | Purpose | DUBLIN CORE ELEMENT | Purpose |
title | The assigned title or description of the resource. | Title | The name Given to the resource by the Creator or Publisher. |
originator | To identify the organisation(s) or person(s) responsible for the creation of the resource. | Creator | The person(s) or organisation(s) primarily responsible for the intellectual content of the resource. |
keywords | To specify the subject or topic of the resources using a controlled vocabulary that describes its content for resource description and discovery purposes. | Subject | The topic of the resource, or keywords or phrases that describe the subject or content of the resource. |
description | A textual description relating to the general nature and content of the resource. | Description | A textural description of the contents of the resource, including abstracts in the case of document-like objects or contents descriptions in the case of visual resources. |
distributor | To identify the entity responsible for making the resource available in its present form such as a publishing house, university department or corporate entity. | Publisher | The entity responsible for making the resource available in its present form, such as a publisher, a university department, or a corporate entity. |
contributor | To identify other significant contributors to the intellectual content of the resource in addition to the originator. | Contributor | Person(s) or organisation(s) in addition to those specified in the Creator element who have made significant intellectual contributions to the resource but whose contribution is secondary to the individuals or entities specified in the creator element. |
date of publication | To show the date the resource was published. | Date | The date the resource was made available in its present form. |
medium | To specify the physical format and data representation of the resource. | Type | The category of the resource, such as home page, novel, poem, working paper, technical report, essay, dictionary. |
linkage | To provide the location or address of an automatic linkage to an electronic resource. | Identifier | String or number used to uniquely identify the resource. |
linkage type | To identify the data content type associated with the electronic resource e.g. HTML for a web page, PDF for a Portable Document Format file. | Format | The data representation of the resource, such as text/html, ASCII, Postscript file, executable application, or JPEG image. |
None at present (GILS: SOURCES OF DATA) | Source | The work, either print or electronic, from which this resource is derived, if applicable. | |
language | To indicate to the user the language of the intellectual content of the resource. | Language | Language of the intellectual content of the resource. |
None at present (GILS: CROSS REFERENCE RELATIONSHIP, CROSS REFERENCE LINKAGE) | Relation | Relationship to other resources. | |
begin-date end-date time-textual place | To indicate any start or end dates associated with the resource; to indicate the expression of dates and times in words; to indicate the location where the activity occurs | Coverage | The spatial locations and temporal durations characteristic of the resource. |
general constraints | To indicate if any access constraints pertain to the use of the resource. | Rights | The content of this element is intended to be a link ( a URL or other suitable URI as appropriate) to a copyright notice, a rights-management statement, or perhaps a server that would provide such information in a dynamic way. The intention of specifying this field is to allow providers a means to associate terms and conditions or copyright statements with a resource or collection of resources. No assumptions should be made if such a field is empty or not present. |
Searchable attributes
For the initial implementation the following attributes will be searchable:
keyword, subject, name, place and date.
Comments please
The SEAMLESS team would welcome comments on the proposed citizens’ information profile as outlined above from colleagues active in the fields of metadata and interoperability research and from public libraries and other organisations providing information to the public. Please contact either Mary Rowlatt (maryr@essexcc.gov.uk ) or the SEAMLESS team (seamless@essexcc.gov.uk ).
References:
[1] New Library - the Peoples’ Network
Library and Information Commission, 1997
Available from: http://www.lic.gov.uk/publications/newlibrary.html
[2] Dempsey, Lorcan and Heery, Rachel
Metadata: a current view of practice and issues
Journal of Documentation, Vol. 54(2), March 1998, p145 - 172
[3] European Commission, DGXIII -E4
Report of the Metadata Workshop held in Luxembourg, 1st and 2nd December, 1997
[4] Dempsey, Lorcan and Heery, Rachel, with contributions from Martin Hamilton, Debra Hiom, Jon Knight, Traugott Koch, Marianne Peereboom and Andy Powell
A review of metadata: a survey of current resource description formats
DESIRE 1, deliverable 3.2(1), March 1997
Available from: http://www.ukoln.ac.uk/metadata/desire/overview/
[5] Younger, Jennifer A
Resource description in the digital age
Library Trends, Vol. 45(3), Winter 1997, p462 - 487
[6] Heery, Rachel
Review of metadata formats
Program, Vol. 30(4), October 1996, p345 -373
[7] ISO 23950 1998/ANSI?NISO Z39.50 1995
Information retrieval (Z39.50): application service definition and protocol specificationISO, 1998
[8] Dublin Core/MARC/GILS crosswalk
Network Development and Marc Standards Office, last updated 04/07/97
Available from: http://www.loc.gov/marc/dccrocc.html
Author details
Mary Rowlatt
Information Services Manager and Project Leader for SEAMLESS
maryr@essexcc.gov.uk
Cathy Day
Research Assistant
SEAMLESS project
seamless@essexcc.gov.uk
Jo Morris
Research Assistant
SEAMLESS project
seamless@essexcc.gov.uk
Kevin Atkins
Network Services Consultant
Fretwell-Downing Data Systems Ltd.
katkins@fdgroup.co.uk