NetLab's Digital Library Gâteau

jessica lindholm

NetLab's Digital Library Gâteau

Jessica Lindholm reports from the conference "NetLab and friends: Tribute and outlook after 10 years of digital library development". The conference was held in Lund, Sweden 10-12 April 2002.

Every future must have a past

How did you celebrate your tenth birthday? Perhaps by making a nice birthday cake with all your favourite ingredients to share with your friends? NetLab [1], the research and development department at Lund University Libraries [2], celebrated its tenth anniversary in April 2002 with a three-day conference in Lund, Sweden [3]. This gâteau consisted of topics on digital library development, divided into five pieces: “Semantic web and knowledge organisation”; “Interoperability and integration of heterogeneous sources”; “Visions, future issues and current development”; “The Nordic situation”; and the surprise session “Tension between visions and reality”.

When announcing a conference as a ten-year anniversary looks to the past must be expected, and yes, NetLab was duly celebrated with both flowers and nostalgia from speakers and participants. There were 140 participants at the conference, from 23 countries.

The welcome speech was given by university librarian Lars Bjørnshauge (Lund University Libraries), with a warm welcome to all partners, colleagues and friends present. Lars Bjørnshauge also gave a tribute to NetLab’s contribution to digital library related technology on the web, and in particular to the ongoing development of digital libraries and contiguous work carried out during the past ten years.

Visions, future issues and current development

The opening and closing speeches were on the theme “Visions, future issues and current development”. Evidently no bold predictions about the future were made, but speakers presented some of the topicalities that embrace the community.

Stuart Weibel (OCLC Office of Research) gave views from a standard community involved in development of metadata frameworks and vocabularies, i.e. the Dublin Core Metadata Initiative (DCMI) [4]. Stuart Weibel started with giving an overview on Dublin Core development, and then went onto experiences in workflows and business models for standard-building communities.

DCMI works as a consensus-building community, and consensus itself often leads to trying to find a balance between being really good and good enough: How do we keep a standard aiming at facilitating discovery of information resources both simple enough and rich enough to be useful? DMCI struggles to achieve global consensus, which inevitably is not very easy to achieve. DCMI at the same time encourages the discussions in order to maintain this balance of both being simple and rich enough - in a sense better is the enemy of good enough. Standard-building communities work from shared values: based on an idea of neutral models; collaboration with openness and monitoring of ideas; and building the work on trust. These shared visions and motivations also means sharing common problems and objectives.

How does the business model for a standard building community work? It is really hard to sell paper copies of your standard when you are on the Web. The standards community sell a good idea, in this case the value of structured metadata and metadata management. Stakeholders hopefully adopt the idea, and bring services and applications of the standard to the users. The community that finishes the provided prototypes will be the ones to see “the light in the user’s eyes”. It is the practitioners that make the tools useable, whereas too much researchers means risking making the ideas invisible. This is a balance walk, the juicy stuff must also be kept, in order to keep the researchers interested and development ongoing.

Lorcan Dempsey (OCLC Office of Research) gave an overview of life in a shared network space. The journey started off from the tradition of sharing which libraries fall back on; shared cataloguing (60s), shared resources (70s), and shared discovery and delivery (90s). In the 00s the discussion and work is on collaborative reference, archiving, digitisation and scholarly communication. As resources and services have become digital, libraries have progressively entered a shared workspace. Being digital suggests reorganisation from vertical organisation around the collection, to horisontal organisation around the processes to best leverage individual and collective strengths.

As regards scholarly communication, we are moving from the chain

author -> publisher -> distributor -> library

to all actors being on the Web, where they potentially all are in contact with each other. We find co-evoluted institutional forms for research, learning and cultural engagement in the services that are being offered: e.g. in the digital library; the public website; the corporate intranet; the student intranet; and in lots of other services. These represent some of the initiatives that could be better coordinated. In many cases the overlaps have to do with the current organisational frameworks.

At the other end of the line, researchers produce material without knowledge of how their work is being aggregated by libraries. What is today’s role for the library? Lorcan Dempsey provided some discussion points. Services like Google and Amazon make you believe that you access everything - but do you really? In a shared workspace we handle ‘non-published unique material’ (e.g. preprints), but also the traditional ‘published non-unique material’. Catalogues and search engines make ‘second order judgements’ more popular, and Google tends to be the judge of what people will use (a threat for librarians?). There are many different models for future collaboration, but we can not really know which models will be sustained. Also, as the environment so quickly changes the shared space calls for active reshaping in a new medium. Libraries need to have organisational frameworks for how to provide specialisation in the networked environment.

Interoperability and integration of heterogeneous sources

The next session focused on building and deploying services to provide interoperability. The main focus for the speakers were on different ways of assuring the users’ access to information and providing reusable data: via e.g. Z39.50; RDF (Resource Description Framework); and LDAP (Lightweight Directory Access Protocol).

Andy Powell (UKOLN, University of Bath) gave an overview on current digital library technical standards; ways those standards are being combined to support various initiatives; trends in portal developments; and the impact on development of digital library services.

One of many angles of approach in Andy’s presentation was regarding exposing metadata. In order for discovery and access of resources, content providers need to make their metadata available either through searching (such as Z39.50), harvesting (such as OAI-MPH) and/or alerting (such as RSS (RDF Site Summary)) technologies. One ‘collection description’ may have more than one service description, e.g. OAI harvesting and a Z39.50 target. Fusion services may be sitting between the actual provider and portals, such as brokers (searching) and aggregators (harvesting and alerting).

Andy Powell also discussed the possibilities in portlets. Portlets provide ‘the building blocks for portals’, and typically, each portlet offers a small chunk of functionality, such as a cross-search or the display of a news channel (i.e the components endusers see within portal pages).

Sebastian Hammer (Indexdata) gave a presentation on the search and retrieval protocol Z39.50. Z39.50 was once seen as the protocol to solve a lot of our problems for cross searching and accessing multiple collections. But as things have evolved, Z39.50 remained within the library community. The virtual union catalogue itself is no longer seen as a replacement of the union catalogue, rather as a supplement. Z39.50 works well as to serve localised implementations. Z39.50 is successful in communication between different services and platforms, but increased resources bring increased problems in reliability and availability. Z39.50 solutions with e.g. service specific profiles, such as the Bath profile, make the protocol valuable and useful. The real problems is the semantics of the metadata; its cataloguing rules and editorial functions.

Libby Miller (ILRT, University of Bristol) gave an illuminating presentation on RDF and RDF Query. Libby stressed that RDF is not the syntax, but the underlying informational model and that RDF describes graphs about data. Libby Miller made her presentation by providing examples, such as her work with Squish, a RDF query language [5], to envisage how to access information that mirrors the flexibility of the RDF informational model. As Libby herself pointed out “it is easy to do the demo, but harder to show you the fantastic possibilities, a typical example of the Semantic Web”.

Why use RDF as the structured data format and associated modelling strategies and tools (‘Semantic Web technologies’) ? The answer Libby Miller gives, is that if you have and always will have control over your data, you do not need RDF. By control is also meant knowing that it will not change over time, and knowing what you might want to combine your data with in the future. RDF is instead useful with other people’s data (data which you do not own) and for reuse of data. Distributed data is difficult to control, since you rely on other people to provide it. The RDF model uses certain principles for modelling data which help with interoperability. There are three principles for creating interoperable data: use RDF as the informational model; use universal resource identifiers (such as URLs); and define the structure of information (write schemas).
Libby’s paper “RDF Query by example” prepared for the conference is available from < http://ilrt.org/discovery/2002/04/query/>.

Peter Gietz (DAASI International) ended this session with a presentation on LDAP-based repositories (LDAP as a network protocol and an information model for accessing information in directories) with particular focus on metadata and ontologies.

Semantic web and knowledge organisation

This theme focused on knowledge structuring and classification issues. The overview was given in an inspirational speech by Diane Vizine-Goetz (OCLC Office of Research), outlining the Semantic Web vision, its core technologies and the need for knowledge organisation. The vision was shown e.g. by quoting Tim Berners-Lee:

	The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.
		Berners-Lee, T, Hendler, J & Lassila, O. “The Semantic Web”, Scientific American, May 2001.;

The key technologies are XML, RDF, XML namespaces (such as DC1.1), knowledge organisation systems (e.g. Dublin Core) and ontologies. Ontologies give for instance the definitions of terms, relationships and data structures.

The Semantic Web is a very compelling goal, but will no doubt cost extra effort and extra cost to pay. Playfully, but with conviciton, Diane Vizine-Goetz says that it not only could, but also should be librarians carrying out and defining the task of getting the information well defined. As this happens software agents will become more available.

Joseph Busch (Interwoven) also presented views on the Semantic Web; we are still in early days of this development, but there is interest in what is done (also from the industrial sector). Joseph Busch was one of few speakers from the commercial sector, and introduced himself and his company as selling librarians’ thoughts in a commercial world. Joseph Busch discussed XML DTD’s and other alternatives, such as the framework VocML (Vocabulary Markup Language).

One of the issues he stressed was that even though the structure may be important, more so is the content. If we have control of the processes and resources we can use them, simplified: we need to have the content published, presenting the framework for describing this content and provide some applications. Libraries must define their role in this process: to focus on being the transparent middle layer, assembling content in a seamless way (and not visible for the users). Joseph also brought up the subject of NKOS registering vocabularies. From such registries we can expect more issues to discuss, but also expect knowledge organisation schemes and lots of interest. The state of the art is more reduced than we want it to be.

The two other presentations in this session gave further depth to the work at hand. Martin Doerr (ICS-FORTH) gave a presentation on the challenge of semantic problems and aspects in thesaurus mapping. “RDF Schema for Thesauri” was a presentation given by Phil Cross (ILRT, University of Bristol) on the interesting possibilities for storing and expressing thesaurus terms and thesauri in RDF Schema for Thesauri (a proposal) - to leverage the thesaurus usage on the Semantic Web.

Nordic libraries and their digital library solutions

The Nordic countries were relatively early implementers, yet despite apparently similar circumstances and conditions there are highly contrasting Nordic digital library solutions and policies.

Mogens Sandfaer (chair) and Juha Hakala

An overview on past and current Nordic projects was given by Juha Hakala (Helsinki University Library). The Nordic libraries fall back on a long tradition of collaboration, and there are also some Nordic projects on the web which have collaborated successfully. One of the examples given was the Nordic Web Archive (NWA) [6] with its goal to archive Web documents for future generations (as a part of each library’s legal deposit obligation). The Swedish Web space has been archived several times with a Combine harvester, which originates from NetLab but has been modified to serve the special purposes of harvesting for preservation (e.g. scheduling rules: the harvester must get the entire file at the same time, e.g. all images etc.). Indexing is done by the Norwegian search engine FAST. FAST recognises Nordic languages and can keep billions of files.

Juha also briefly presented the coming Scandinavian Virtual Union Catalogue (SVUC) [7] which will have full functionality by next summer, with approx. 15-20 million records. Juha points out the long tradition in cooperation between libraries in the Nordic countries, e.g. in inter-library loaning (ILL).

The national libraries seem to be shifting from bibliographic to content description. BIBSYS [9] is the union catalogue of Norway, and is used by all Norwegian university libraries, the national library, all college libraries, and in a number of research libraries. Ole Husby (BIBSYS) opened his presentation by saying that in a way we still live in the 1960s when it comes to the insufficient understanding of bibliographic elements & structure, attributes and relations. The model of thinking that Ole and BIBSYS has embraced is IFLAs model on functional requirements for bibliographic records (FRBR) [10]. BIBSYS has has not yet entered the implementation stage. One reason for this wait is because of the almost impossible task of converting MARC-records automatically into a FRBR-based system. BIBSYS first implementation is likely to be for periodicals, based on the four levels in FRBR.

FRBR has many connections in the community, similarities to e.g. ABC Harmony, RDFS. FRBR is also moving the perspective closer to the description focus on the web, by being more subject focused than item focused.

Furthermore we were given input from Sweden on issues surrounding management of a pre-prints service for Economics, in a talk given by Sune Karlsson (Stockholm School of Economics) with experiences from Swopec [8] and knowledge organisation in the service.

Denmark was represented by Birte Christensen-Dalsgaard (State university library, Århus). She gave a presentation on the interplay between local and national solutions in Denmark, which can be summarised into an important appeal: Share work, experiences and costs! We must consider how we get and maintain data, and to keep the local services focused on the target audience. By also providing facilities for e.g. harvesting and cross-searching the metadata can be repurposed and reused in more generic services.

Tension between visions and reality

Two shared presentations were given on the theme “Tension between visions and reality”. This theme was also known as the “Surprise session”, since the four extra speakers were invited only a few weeks before the conference.

Slightly unexpected at a conference in such northern latitudes, were the representatives from university libraries in nine Sub-Saharan African countries. Their participation was due to workshops in conjunction to the conference for the CELI Project [12].

One of these visitors was also invited to speak in the session “Tension between visions and reality”. Vitalicy Chifwepa (University of Zambia) gave a paper and presentation, shared with Jörgen Eriksson (NetLab, Lund University Libraries). Vitalicy Chifwepa gave his presentation “Internet and information provision: the promises and challenges in African university libraries”. The conclusions in Vitalicy’s presentation was (among other things that came up in the surrounding discussions) that “as African countries become connected, issues of accessing and disseminating worldwide information resources have been improving. To address some of the challenges and constraints above there has been a number of projects and programmes” but “these will, however, need to be backed by solid provisions for sustainability by the African Universities.”” [13] The full paper by Vitalicy prepared for the conference “Internet and information provision: the promises and challenges in African university libraries” is to be found at <http://www.lub.lu.se/netlab/conf/chifwepa.html>.

Michael Day and Roddy MacLeod The next presentation, “Two views of the digital library”, aimed at visualising the difference between the researcher and the practitioner. Due to lack of time this discussion was unfortunately kept very short. The one who gets to see the “light in the user’s eyes” in this session was Roddy MacLeod (Heriot-Watt University) as manager of EEVL. EEVL [11] provides access to resources in engineering, mathematics and computing information. In the other ring corner was the researcher Michael Day (UKOLN, University of Bath) representing applied and project-related research in a multidisciplinary environment.

Closing session

During these three days many subjects were explored, and recurringly different speakers called attention to the importance of well-defined content (in order for the structures to succeed), whether it be for harvesting, searching or others. Other key questions are long time preservation and the need for defining tasks and finding ways of improved collaboration. The event itself was surrounded by interesting meetings and activities with connections to current activities among NetLab staff, such as the NKOS special meeting (around standard developments regarding the use of thesauri, classification and other knowledge organisation systems on the internet [14]), project meetings for the European projects ETB (European Treasury Browser) [15] and Renardus [16]. DCMI had a board meeting and workshops were held by NetLab staff around development of access to information and subject gateways in Africa, i.e. CELI [12].

The Closing session was held by Anna Brümmer (BIVIL), who blew out the ten candles on NetLab’s anniversary gâteau and thanked everyone for coming and for being good friends. Not only was this a nice way of celebrating an anniversary; as pointed out already in the welcome speech this conference is likely to pave way for new interesting ideas for projects and collaboration.

References

[1] NetLab, <http://netlab.lub.lu.se/>
[2] Lund University Libraries, <http://www.lub.lu.se>
[3] NetLab and Friends, Conference <http://netlab.lub.lu.se/conf/>
[4] DCMI <http://dublincore.org>
[5] Inkling: RDF query using SquishQL, <http://swordfish.rdfweb.org/rdfquery/>
[6] Nordic Web Archive, <http://nwa.nb.no/>
[7] SVUC, <http://www.lib.helsinki.fi/svuc/>
[8] Swopec, <http://swopec.hhs.se>
[9] BIBSYS, <http://bibsys.no>
[10] IFLA Functional Requirements for Bibliographic Records, 1998 <http://ifla.org/Vii/s13/frbr/frbr.pdf>
[11] EEVL, <http://www.eevl.ac.uk>
[12] CELI, <http://netlab.lub.lu.se/sida/celi/>
[13] Chifwepa, Vitalicy, “Internet and information provision: the promises and challenges in African university libraries. Presented on the 10th April to the NetLab and Friends of Conference 10th to 12th April. <http://www.lub.lu.se/netlab/conf/chifwepa.html>
[14] NKOS Special meeting at the NetLab conference, April 12, 2002. <http://www.lub.lu.se/~traugott/NKOS-Lund.html>
[15] ETB Project <http://eun.org/etb/>
[16] Renardus Project <http://renardus.org>

Author Details

Jessica Lindholm is part of the Research & Development team within UKOLN, University of Bath. At the time this issue of Ariadne is published she has resumed her work at NetLab, Lund University Libraries.