Web Magazine for Information Professionals

Metadiversity

Michael Day on a Biodiversity conference in the States interested in Metadata.

Introduction and context

First, we simply need to be moving faster to coordinate the information that already exists, on file cards and computers, scattered around the world’s major and minor museums and other collections. … Second these databases must be widely available and ‘customer friendly’. We need to accelerate current efforts for international cooperation and coordination, so that common formats are increasingly agreed and used.
Robert M. May (1994) [1].

 

Biodiversity information management

The management and exchange of information is an important part of the ongoing management of biodiversity and ecosystems. This point was emphasised in a report produced in March 1998 by the US President’s Committee of Advisers on Science and Technology (PCAST) Panel on Biodiversity and Ecosystems [2]. The authors of the report suggest that research should be carried out into the development of information systems that can combine the large amounts of data relevant to biodiversity.

We need … mechanisms that can, for example, efficiently search through terabytes of Mission to Planet Earth satellite data and other biodiversity and ecosystems datasets, make correlations among data from disparate sources, compile those data in new ways, analyze and synthesize them and present the resulting information in an understandable and usable manner (Section IV).

The PCAST report also suggested the creation (and funding) of a “next generation” National Biological Information Infrastructure (NBII-2). Its ambition would be to provide a ‘research library system’, an “enabling framework that could unlock the knowledge and economic power lying dormant in the masses of biodiversity and ecosystems data that we have on hand” (Section IV).

Metadiversity - the symposium

The PCAST report was the catalyst for the National Federation of Abstracting and Information Services (NFAIS), under a co-operative agreement with the Biological Resources Division of the United States Geological Survey (USGS/BRD), to organise a symposium that could discuss the challenges posed by biological diversity information. Thus in mid-November 1998, delegates representing a variety of different communities, including biological and environmental information specialists, museum curators and librarians, began to gather at Natural Bridge, Virginia - described as one of the seven natural wonders of the world.

Natural Bridge, Va., November 1998

Natural Bridge, Virginia.

Concepts of biodiversity

‘Biodiversity’ is a contraction of ‘biological diversity’, a concept that been recognised for a long time in a variety of biological and ecological contexts. The contraction was originally coined for a meeting, the ‘National Forum on BioDiversity’, held in Washington, D.C. in September 1986 and was used as the title of its published proceedings [3]. J.L. Harper and D.L. Hawksworth point out that the term is usually used to refer to biological diversity at three distinct levels: genetic (within species) diversity, species (or organismal) diversity and ecological (or habitat) diversity [4]. The much cited Article 2 of the Convention on Biological Diversity (1992) defines biological diversity as follows:

… the variability among living organisms from all sources including, inter alia, terrestrial, marine and other aquatic ecosystems and the ecological complexes of which they are part; this includes diversity within species, between species and of ecosystems [5].

 Biodiversity is seen as important because it is moral imperative … also may be the source of the raw materials for human exploitation, including medicines, food and other products [6].

The conservation of biodiversity is seen as important because it might help counter the global decline of biological diversity due to human impacts in the form of habitat destruction and fragmentation, pollution, over-exploitation of resources, global climate change and other factors. Conservation practice is based to some extent on the availability of information about biodiversity. Inventorying biodiversity is important because, as Nigel Stork points out, it can “go some way towards providing a clear view of the magnitude of diversity on Earth and its rate of loss” [7]. Inventorying can also help with developing sustainable approaches to the utilisation of natural resources and the discovery of ‘new’ substances which can be used by the pharmaceutical and biotechnology industries.

Metadata, interoperability and biodiversity information management

The Metadiversity symposium subtitle, “Responding to the grand challenge for biodiversity information management through metadata”, emphasised the importance of metadata issues to biodiversity information. Metadata is generally understood to mean ‘structured data about data’ and in the biodiversity context can relate to a wide range of information types including bibliographic data, specimen data from museums, taxonomies developed by systematic biologists and the products of research and geospatial surveying. For this reason, participants in the conference included representatives of many relevant communities; including museums, database producers, biological systematics, libraries, etc.

The metadata concept was introduced by Stu Weibel (Online Computer Library Center) who gave a presentation on the ‘metadata landscape’ with particular reference to the Dublin Core initiative [8]. In so doing, he defined three separate levels of interoperability: semantic, structural and syntactic. Weibel also outlined concepts of modular extensibility with regard to metadata that would enable the addition of metadata elements for discipline-specific requirements including elements tailored for rights management data or biological specimens. This modular-type architecture would enable the invention of new semantics and their integration into systems being built now. As an example of this, Weibel described the Resource Description Framework (RDF) being developed by the World Wide Web Consortium (W3C) as a metadata framework for the Web [9].

In his paper on ‘Building digital libraries for metadiversity’ Clifford Lynch (Coalition for Networked Information) probed the meaning of words like ‘digital libraries’, ‘interoperability’, ‘federation’ and ‘infrastructure’. He noted the difficulty of creating interoperable systems that cover a wide range of data types. He stressed that this was not just a technical issue, but also included solving methodological and economic differences between the diverse communities that ‘own’ biodiversity information. Carl Lagoze (Cornell University) additionally pointed out that traditional metadata creation methods (e.g. cataloguing) are neither appropriate nor sufficient in the networked information environment.

Frameworks and infrastructure

Initiatives for biodiversity information management systems exist at a number of different levels: global, national, regional and local. Part of the challenge for biodiversity information management is ensuring that these initiatives work in co-operation with one another. A variety of presentations at the Metadiversity symposium outlined some of these initiatives and associated infrastructure issues.

The Clearing-House Mechanism of the Convention on Biological Diversity

The Convention on Biological Diversity (CBD) is an initiative of the United Nations Environment Programme (UNEP) and was opened for signature at the UN Conference on Environment and Development (the “Earth Summit”) held in Rio de Janeiro in 1992. The CBD, which has been signed by over 150 countries, contains provisions relating to a wide range of issues, including the need to establish educational programmes, to promote research, to encourage technology transfer and to facilitate the exchange of information. Article 17 states that:

The Contracting Parties shall facilitate the exchange of information, from all publicly available resources, relevant to the conservation and sustainable use of biological diversity, taking into account the special needs of developing countries.

The convention specifies that this information should include the exchange of the results of technical, scientific and socio-economic research and should, where feasible, include the repatriation of information.

It is up to national signatories to decide their response to these particular articles in the CBD. The UK Government, for example, produced a Biodiversity Action Plan that proposed the setting up of a Biodiversity Steering Group that reported in 1995. Among its recommendations, the report suggested improving the quality and accessibility of data and biological reporting by maximising the use of existing data, developing a United Kingdom Biodiversity Database (UKBD) and developing locally based biodiversity information systems [10]. A UK Biodiversity Group has been set up with an Information Group which aims to improve the “accessibility and co-ordination of existing biological datasets, to provide common standards for future recording and to facilitate the creation of a UK Biodiversity Database” [11].

Article 18 of the CBD specifically articulates the need to create a ‘clearing house mechanism’ (CHM) to promote and facilitate technical and scientific co-operation, including the exchange of information [12]. At the Metadiversity symposium, Beatriz Torres (Secretariat of the Convention on Biological Diversity, UNEP) introduced this CHM and explained that it embodied a transparent and decentralised approach to disseminating information.

US initiatives: NBII, FGDC, etc.

The United States has also had a longstanding interest in developing a national focus for biodiversity information. Natural Heritage Data Centers, usually based on individual States, have existed since the 1970s [13]. The creation of a co-ordinating national centre for biodiversity information has been suggested by a variety of US organisations over the past few years [14]. The 1998 PCAST report has noted the importance of this task and suggested that it should develop out of the existing National Biological Information Infrastructure (NBII).

NBII is co-ordinated by the USGS and consists of an electronic gateway to biological data and information maintained by US government agencies at federal, state, and local levels and other partners [15]. A key component of NBII is the NBII Metadata Clearinghouse, a database of descriptions of biological databases and information products developed and maintained by the USGS and NBII partner organisations [16]. Two presentations at the Metadiversity symposium specifically concerned the NBII. Dr. Anne Frondorf (USGS/BRD) outlined some of the challenges involved in developing and implementing metadata standards for the diverse data types and information products that need to be included within NBII. She also stressed the importance of linking with parallel infrastructure efforts, for example in the spatial data community and, in particular, the Federal Geographic Data Committee (FGDC). The NBII Metadata Clearinghouse uses metadata in the form of a Biological Profile of the Content Standards for Digital Geospatial Metadata developed by the FGDC. With reference to NBII-2, James L. Edwards (National Science Foundation) outlined the NBII Framework Plan, described as a “roadmap for interoperable sharing of biodiversity information”. NBII-2 would, of necessity, be a distributed facility which would require the development of interoperable systems that would permit the simultaneous searching of diverse types of data and which would also present this data in a visually useful way. NBII would be a nationally-based gateway to a variety of locally and regionally based biological data but would also need to interact with global gateways like the Global Biodiversity Information Facility (GBIF) being proposed by the Biodiversity Information Subgroup of the OECD Megascience Forum [17, 18]

John Moeller (FGDC) described the infrastructure issues that related specifically to geographic information from the perspective of the US National Spatial Data Infrastructure (NSDI). One key to the successful implementation of a NSDI has been the development of metadata standard in the form of the (recently revised) FGDC Content Standard for Digital Geospatial Metadata [19]. The FGDC is also working with International Standards Organisation (ISO) Technical Committee 211 (TC-211) Working Group 3 on the development of an international standard for geospatial metadata (ISO 15046-15) [20]. The NSDI’s current implementation of a distributed resource discovery system (based on FGDC metadata) is called the Clearinghouse [21]. Clearinghouse uses Web technologies for the client side and ANSI Z39.50 for querying, searching, and the presentation of search results to the Web client. The NBII Clearinghouse is one of the ‘participating nodes’ in Clearinghouse.

Metadata challenges

The symposium largely consisted of a series of presentations outlining the metadata challenges of biodiversity information in a variety of different contexts. Many of these described particular projects. Wayne Moore (Stanford University) described an implementation of the Lightweight Directory Application Protocol (LDAP) developed for managing flow cytometry data at Stanford’s Herzenberg Laboratory [22]. The session on libraries included a brief description of the Alexandria Digital Library (ADL) project by Linda Hill (University of California at Santa Barbara) [23]. ADL is concerned with the creation of a distributed digital library for georeferenced information. Hill described ADL’s concept of a gazetteer - consisting of indexes of geographical names, containing attributes for names, spatial co-ordinates and categories. Using a gazetteer like this, links can be made between spatial co-ordinates and indirect names (geographic names). Other presentations related to particular aspects of biodiversity information management relating to species or ecosystems diversity.

Biological (species) information and museum data

One approach to helping conserve biodiversity is surveying and inventorying the diversity of life that exists on the planet. Edward O. Wilson says that biologists are hampered in this task because they have “only the faintest idea of how many species there are on earth or where most occur” [24]. Current estimates of the number of extant species on the planet are between 5 and 30 million, although the figure may be much higher, while only about 1.5 million species are currently known [25]. In addition, Robert May suggests that it is possible that half of all extant species will become extinct in the next 50 to 100 years - if current rates of tropical deforestation continue [26]. It is, therefore, important that something is done now. In this context, biological systematists have long been aware of the potential role of automation in the creation of an international integrated database of species or a taxonomic information system [27]. Stephen Blackmore of the Natural History Museum, for example, has noted that “foremost amongst the basic details needed in biological information systems are the names and the systematic relationships of all known species” [28]. An additional problem is that the sources of this information are themselves diverse and in different formats, both manual and digital. Any system would need to take account of the large amounts of legacy data (and metadata) that exist in the world’s museums, botanical gardens and libraries [29, 30].

A number of initiatives exist in this area and some of these were described at the Metadiversity symposium. Frank Bisby (The University of Reading) outlined progress on an ambitious project called Species 2000. This is attempting to create a distributed index of all the world’s known species [31, 32]. Species 2000 aims to create stable taxonomic indexes for individual groups of organisms. These indexes would be produced by a number of global species databases (GSDs) that would define a species index for a particular taxon or region. Examples of GSDs currently operating within Species 2000 are the International Legume Database and Information Service (ILDIS) [33, 34] and the Zoological Record from BIOSIS UK [35].

Projects based on specific locations or regions offer a potentially scalable approach to the wider biodiversity information problem. This is an approach taken by the Discover Life in America initiative. John Pickering (University of Georgia) described an initial project concerned with creating a comprehensive inventory of all life forms in the Great Smoky Mountains National Park called the All Taxa Biodiversity Inventory (ATBI) [36]. This project utilises both specialists (taxonomists and ecologists) and many other partners - including educational institutions, museums, government agencies and volunteers - to attempt to create a publicly accessible inventory of the estimated 100,000 species in the Park.

Dr. Bruce Collette (National Marine Fisheries Service (NMFS) Systematics Laboratory) described ITIS (Integrated Taxonomic Information System) - a relational database of scientific and common names for plants and animals [37]. Collette explained that ITIS was designed to replace the list of scientific names maintained by the US National Oceanographic Data Center (NODC) and currently contains over 266,000 names of plants and animals.

Museums of natural history contain many biological and palaeontological specimens (and their associated metadata) which relate to biodiversity. The Natural History Museum in London, for example, has approximately 68 million biological specimens [38]. Dr. Julian Humphries (University of New Orleans) commented that the management and wide dissemination of museum-created metadata about specimens had traditionally not been seen as important as conserving and managing access to the specimens themselves. The use of microcomputers by museums had resulted in the creation of a variety of disparate information systems. The challenge for museums is to make these heterogeneous systems interoperable and to be able to add metadata that is currently either in analogue form or non-existent. Dr. Ray Lester (Natural History Museum) followed this by asking important (and neglected) questions about who should pay for the creation (or retrospective conversion) of this potentially expensive metadata.

Ecosystem information

The ecological level of biodiversity is as important as species or genetic diversity. Large amounts of scientific data relating to ecology are currently gathered from a variety of sources: for example, from land stations, oceanographic surveying, satellites and remote sensing devices. Also, economic, social-economic and demographic data can also be relevant. This data can, collectively, be used to monitor ecosystems and to provide information for policy making. One major problem for integration is that different systems use different spatial units. Dr. Roberta Balstad Miller (Consortium for International Earth Science Information Network (CIESIN), Columbia University) commented that remote sensing data, for example, is usually based on grids while social-economic data typically use political or jurisdictional boundaries. One challenge is integrating data access and dissemination via interoperable metadata systems. CIESIN itself has provided some metadata guidelines and also provides a gateway that gives access to GILS records describing information resources and data access systems available from CIESIN, and from its international co-operative partners [39, 40]. CIESIN is also developing a ‘Unified Metadatabase’ to integrate CIESIN’s diverse data, metadata, and document resources [41].

The remainder of the presentations on ecosystem information management consisted of descriptions of particular initiatives and services. Lola M. Olsen (NASA Goddard Space Flight Center) described NASA’s Global Change Master Directory (GCMD), a metadata based directory of data sets of relevance to global change research [42]. The metadata format used is called GCMD’s Directory Interchange Format (DIF) [43]. DIF records can, if necessary, also be output as FGDC or GILS-type records.

Eliot Christian (USGS) followed this with an account of international work being carried out on a prototype service being developed as part of the Environmental and National Resources Management project of the G-7 Global Information Society Initiative (G7-ENRM). The service is called the Global Environmental Information Locator Service (GELOS) and uses a minimum set of metadata elements developed by the G7-ENRM Metainformation Topic Working Group (MITWG) [44]. In GELOS the metadata element set is be used in conjunction with the ANSI Z39.50 search and retrieval protocol to demonstrate the integration of different types of environment and natural resources information held in numerous locations world-wide [45].

Stefan Jensen (Niedersächsischen Umweltministerium, Hannover) described the European Environment Agency (EEA) founded European Topic Centre on Catalogue of Data Sources (ETC/CDS) [46]. CDS supports the operation of the EEA-EIONET (European Environment Agency - Environmental Information and Observation Network) by giving access to metadata (‘meta-information’) relating to European environmental resources. The CDS core data model is based on an extended GELOS metadata elements set.

Gerald S. Barton (National Oceanic and Atmospheric Administration) outlined recent work carried out by the Working Group on Information Systems and Services (WGISS) of the Committee on Earth Observation Satellites (CEOS). WGISS aims to help improve the accessibility of earth observation data, to enhance its complementarity, interoperability and standardisation and to facilitate the easier exchange of this data through networks [47]. A Task Team within WGISS is developing a Catalogue Interoperability Protocol (CIP) to facilitate the access, searching and retrieval of earth observation data [48].

Watercolour of Blue Ridge Mountains, near Natural Bridge, Va., November 1998

Blue Ridge Mountains, Virginia

Summing up

The Metadiversity symposium demonstrated the extremely wide range of initiatives and projects that currently exist in the area of biodiversity information. The important role of metadata was noted by virtually all of the speakers. The importance of interoperability and distributed (federated) systems was also seen as a key challenge for the future management of biodiversity information. The final sessions were devoted to smaller working groups that built on the challenges identified by the speakers and began to suggest some solutions. What follows is a summary of some of these points.

  • It was noted that many different communities were represented at the symposium. This was felt to be useful. Successful biodiversity information management will depend upon this diversity of organisations and individuals talking to one another and sharing challenges and proposed solutions. This will need some degree of leadership. At least two of the working groups suggested the creation of an organisation that could co-ordinate work in this area, an international association of biodiversity data providers and users.
  • Funding biodiversity information services, including funding the ‘retrospective conversion’ of existing manual data, was seen as another major challenge. It was suggested that progress could be made in this area by encouraging cross-community linkages and with the co-operation of the information industry and the support of professional and learned societies.
  • There was a level of confidence that the technology existed (or was being developed) which would allow the creation of distributed and interoperable systems. However, ensuring the related ‘cultural change’ was more problematic. It was noted several times that interoperability is not just a technological problem but also needs to be applied to the professional cultures of various communities.

The Metadiversity symposium was a good opportunity for a wide range of people interested in biodiversity information management to meet and learn about ongoing projects, common problems and the role of metadata. The longer-term success of the symposium, however, depends on what comes next. It is important that the wide range of organisations involved in biodiversity information find ways to ensure continued communication and co-operation.

References

  1. President’s Committee of Advisers on Science and Technology, Teaming with life: investing in science to understand and use America’s living capital. Washington, D.C.: Executive Office of the President of the United States, March 1998.
    http://www.whitehouse.gov/WH/EOP/OSTP/Environment/html/teamingcover.html
  2. Dublin Core initiative.
    http://purl.org/dc
  3. Resource Description Framework.
    http://www.w3c.org/rdf/
  4. Biodiversity Action Plan Secretariat, Biodiversity Fact Sheet 2: The UK Biodiversity process. Peterborough: Joint Nature Conservation Committee, 1998.
    http://www.jncc.gov.uk/ukbg/fs2.htm
  5. The Clearing House Mechanism of the Convention on Biological Diversity.
    http://www.biodiv.org/chm.html
  6. National Biological Information Infrastructure (NBII).
    http://www.nbii.gov/
  7. NBII Metadata Clearinghouse.
    http://www.emtc.usgs.gov/http_data/meta_isite/nbiigateway.html
  8. Global Biodiversity Information Facility (GBIF). York: Biosis, September 1998.
    http://www.york.biosis.org/gbif/index.htm
  9. Hannu Saarenmaa, A possible technical implementation of the Global Biodiversity Information Facility. Copenhagen: European Environment Agency, 1998.
    http://www.eionet.eu.int/gbif/gbif-implementation-latest.html
  10. Federal Geographic Data Committee, Content standard for digital geospatial metadata (revised). FGDC-STD-001-1998. Washington, D.C.: Federal Geographic Data Committee, June 1998.
    http://www.fgdc.gov/metadata/csdgm/
  11. ISO/TC 211 Geographic information/Geomatics.
    http://www.statkart.no/isotc211/welcome.html
  12. FGDC Spatial Data Clearinghouse.
    http://www.fgdc.gov/clearinghouse/index.html
  13. Stanford University Medical School, Stanford Shared FACS Facility
    http://curie.stanford.edu/sff/
  14. Alexandria Digital Library (ADL).
    http://www.alexandria.ucsb.edu/
  15. Species 2000.
    http://www.sp2000.org/
  16. International Legume Database and Information Service (ILDIS).
    http://www.ildis.org/
  17. BIOSIS, Zoological Record. Information available from:
    http://www.york.biosis.org/zrdocs/zrprod/zoorec.htm
  18. All Taxa Biodiversity Inventory (ABTI), Great Smoky Mountains National Park, 107 Park Headquarters Road, Gatlinburg, Tennessee, TN 37738, USA.
    http://www.discoverlife.org/
  19. Integrated Taxonomic Information System (ITIS).
    http://www.itis.usda.gov/plantproj/itis/
  20. CIESIN Metadata Adminstration, CIESIN Metadata Guidelines (draft). University Center, Mich.: Consortium for International Earth Science Information Network, January 1998.
    http://www.ciesin.org/metadata/documentation/guidelines/
  21. CIESIN GILS Access System:
    http://wwwgateway.ciesin.org/cgi-bin/zgate
  22. CIESIN Initiatives:
    http://www.ciesin.org/metadata/TOC/init.html
  23. Global Change Master Directory (GCMD).
    http://gcmd.nasa.gov/
  24. Lola Olsen, Directory Interchange Format (DIF): writer’s guide, Version 6.0. Greenbelt, Md.: Global Change Master Directory, March 1998.
    http://gcmd.gsfc.nasa.gov/difguide/difman.html
  25. Global Environmental Information Locator Service (GELOS).
    http://ceo.gelos.org/
  26. G7-ENRM Metainformation Topic Working Group, ENRM Metadata Element Definition Paper, 21 July 1997.
    http://unfccc.gelos.org/free/REPORTS/Attribute.html
  27. European Topic Centre on Catalogue of Data Sources (ETC/CDS).
    http://www.mu.niedersachsen.de/cds/
  28. CEOS Working Group on Information Systems and Services (WGISS)
    http://193.36.230.105/wgiss/
  29. CEOS Protocol Task Team, Catalogue Interoperability Protocol (CIP), 13 July 1998.
    http://harp.gsfc.nasa.gov/~eric/cip-page3.html

Acknowledgements

The author would like to thank Dr. Dick Kaser (Executive Director, NFAIS) for his invitation to participate in the Metadiversity symposium and for help with travel expenses. A copy of the paper the author delivered at the Metadiversity symposium can be found at: http://www.ukoln.ac.uk/metadata/presentations/metadiversity/.

Author details

Michael Day
Research Officer
UKOLN: the UK Office for Library and Information Networking
University of Bath
Bath, BA2 7AY, UK
Email: m.day@ukoln.ac.uk