Fedora Users Conference
The Fedora Users Conference 2006, for users of the open source Fedora repository system [1] was the second to be run under the auspices of the core Fedora development team, following an initial conference at Rutgers University in June 2005. It is, though, one among a collection of conferences and meetings that have taken place over the past year based on using Fedora, with venues as diverse as Copenhagen, Aberystwyth, Sydney, and Hull. There has been a great deal of activity in the past twelve months and this was reflected in the wide range of presentations in Charlottesville.
Full details of the conference, including presentations, are available via the conference Web site [2]. Descriptions are reported here in groups or as individual presentations according to themes that emerged during the conference.
Opening Plenary Session
Thornton Staples, University of Virginia and co-Director of Fedora’s development
Thorny Staples started the conference off with two assumptions: that attendees are in the business of dealing with information/content that is intended to persist; and that the Web is the way of working with this information/content. Fedora meets these assumptions.
In considering the information/content we work with today there is a need to think of ‘aggregation objects’. These have been around in traditional forms for many years, e.g., a newspaper organises an aggregation of many individual articles. More recently, Web sites are all aggregation objects, combinations of many smaller pieces and files. How can we deal with this level of aggregation, and potential disaggregation, of digital objects? Standards, technical, structural and organisational, will be of huge benefit here. Fedora for its part offers a foundation on which to build different architectures. It allows arbitrary units of content that can be assigned behaviours and structural relationships with other units of content allowing these to be flexibly managed and re-combined as required. There is a need to be pragmatic about breaking down digital objects into their constituent parts to support this, but an atomistic view is beneficial. Decisions about the level and degree of aggregation/disaggregation are, he concluded, an art, not a science.
Plenary Session:
Fulfilling the Fedora Vision: Taking on Complex Problems and Providing Flexible Solutions
Sandy Payette, Cornell University and co-Director of Fedora’s development
Sandy followed Thorny Staples’ opening plenary by also describing her vision for Fedora and how the system can meet current demands. There is no doubt that the environment is changing: Grid and Humanities computing have produced novel and innovative ways of facilitating science and digital scholarship. Repositories are situated at the heart of these developments and trends, including SOA, Web 2.0 and the Semantic Web: Fedora can fit in with these. However, it is vital that we look at what people want rather than give them what we think they need. Repositories are enablers of such a network, and are not just about storage or search and access.
How does Fedora measure up currently in its ability to deal with the multitude of digital content? Fedora has a flexible digital object model that can accommodate many different scenarios, and can enable relationships between objects through the use of RDF. The Fedora Services Framework allows other components to be sited alongside Fedora and integrated with it. Fedora can also support long-term digital preservation: its use has been mapped onto the OAIS Reference Model to demonstrate how Fedora can sit within an overall preservation workflow. Additionally, Fedora makes use of XACML (eXtensible Access Control Markup Language [3]) to support flexible and fine-grained authorisation. The software has been scaled to 2 million objects, amounting to over 160 million RDF triples in the associated Resource Index.
Current development priorities are a content model dissemination architecture (CMDA) [4], which will provide a mechanism for sharing content models and associating them with disseminators (a development considered extremely useful by current Fedora users), and the ongoing development of the Fedora Services Framework through the development of a message-brokering service to communicate between the different, and growing number of, components that can link into the Framework.
A VITAL Business? | Evaluating the Role of Vendor Support for Open Source Repository Components in a Research Library Environment
Carl Grant, President and COO, VTLS Inc. / Jeffrey Barnett, Yale University
The means by which open source software can be provided and supported was the topic of two paired presentations early on in the conference. Carl Grant from VTLS [5] presented his views of how a commercial company can make use of open source software to provide services that the community finds valuable. VITAL [6] is VTLS’s repository product based on Fedora. Jeffrey Barnett from Yale University described the use of Fedora with vendor support within Yale University Library.
For a vendor such as VTLS there is an issue about what is open and what is confidential. Customers have an expectation of openness because of open source, but also need to respect VTLS confidentiality where it applies to VITAL-specific add-ons to Fedora: ongoing communication between the company and the customers is essential to enable this. VTLS, for their part, recognise their dependency on Fedora development and their need to contribute back to the open source community from which they have benefited.
Support selling is becoming increasingly important for VITAL as its main model of business. Oddly, this was not always the case as many early Fedora adopters had their own development teams and were thus not interested. However, Fedora users have since spread to institutions where such internal support is not so prevalent.
Yale does have a development team that could support Fedora, but there was a desire not to be the first on the block with such software, and vendor support offered reassurance that the issues had been addressed previously.
The benefits for such support were considered as follows:
- It reduces risk
- You get more functionality (at least in the case of VITAL, which comes with additional supported modules)
- You get better support (based on previous experience)
- It reduces the learning curve
- It can lead to possible development partnerships with the company, reducing the costs of custom development
Notwithstanding the attraction of supported software, the TANSTAAFL (There Is No Such Thing As A Free Lunch) principle applies. Significant training was required to bring library staff up to speed on metadata and XML. The need for community input also takes time.
The tests carried out revealed a number of gaps in the required VITAL functionality. However, the position of VTLS in supplying Fedora means that these can be addressed together. Their experience of using VITAL has, in summary, been a positive one.
A Fedora Architecture to Support Diverse Collections | An Approach to Modelling Rich Content and Disseminators: Promoting Interoperability and Reuse of Content in the Tufts Digital Repository
Ryan Scherle and Jon Dunn, Indiana University/Rob Chavez, Tufts University
Indiana [7] and Tufts [8] Universities both described their use of the Fedora repository to manage a wide range of digital content that had hitherto been managed in many different ways. This situation has led to a series of motivations at Indiana to adopt a repository as a single management tool, motivations summarised as follows:
- A repository allows the centralisation of access and preservation functions
- A repository reduces the level of staff time required to attend to the many different current systems used
- A repository permits easier generation of new collections
- A repository enables digital preservation
Both Indiana and Tufts are re-engineering their digital collections around Fedora. Notwithstanding the diverse range of media types, they have aimed to produce a small set of behaviours that can apply across these types for their easier management. Indiana has also produced, and is producing, a range of tools to help working with Fedora, including ingest and a METS metadata navigator (available through SourceForge [9]).
One particular area of interest at Tufts is “asset actions”. These have emerged from work undertaken within the DLF Aquifer Project [10] and represent ways of providing actionable URIs in harvestable metadata. They are not exclusive to Fedora, but can be implemented with Fedora disseminators to provide interoperable ways of delivering access to objects.
Publishing in the NSDL: Fundamental Concepts for Creating and Reusing Content | Building a National Science Digital Library on Fedora
Carol Minton Morris, NSDL Communications Team, Cornell University/Dean B. Kraft, NSDL
The NSDL (National Science Digital Library) [11] is the online library for the NSF (National Science Foundation) in the US. The NSDL has a Communications Team, based at Cornell University, which aims to promote the purpose and use of the NSDL among its target populations. A key focus of this team is to bring people together to avoid failures of communication that can lead to technical failures.
Carol described the background to the implementation of NSDL’s On Ramp initiative, the development of user services onto the NSDL’s Fedora-based content management system (the NSDL Data Repository (NDR)) - an ongoing initiative into 2007. Terminology was a key aspect of this, to ensure a common understanding of what the NSDL was offering.
The current NSDL instance contains over 100 collections within Fedora. The aim of NSDL 2.0 (the current version) is not unlike that of Web 2.0 - to provide a system and content that is service-based and re-mixable. It was this aim that led to the NSDL team selecting Fedora for the NSDL Data Repository that underpins NSDL 2.0. Key features of Fedora identified included the ability to work with multiple object types and the use of RDF to express relationships between objects.
Dataset acquisition, accessibility, and annotation e-Research Technologies (DART) Project: Fedora and the New Collaborative e-Research Infrastructure | The eSciDoc Project: An Overview
Andrew Treloar, Monash University/Matthias Razum, FIZ Karlsruhe
Andrew Treloar described the work of the DART initiative [12], a large-scale project (A$3.23 million) running over 18 months and employing 40 staff. DART is investigating new ways of curating e-research datasets and allowing them to be used effectively within the scholarly communication chain. Issues being addressed include the required infrastructure, the processes of deposit, access and annotation, and IPR concerns. The work has been inspired by the Pathways Project [13] at Cornell University and Los Alamos National Laboratory.
The project is building a series of demonstrators that can be embedded within research teams and iteratively developed based on feedback. The role of the demonstrators is to demonstrate the value of the end-to-end DART lifecycle approach. Test subject areas are protein X-ray crystallography, climate research, and digital history. Fedora sits at a number of junctures within the DART lifecycle. It acts as the basis for managing data and information, it facilitates collaboration and annotation, it assists with publication, and it also acts as the main access point for discovery.
Matthias Razum described the equally ambitious eSciDoc Project [14], a US$12 million, five-year, project to build an integrated information, communication and publishing platform for Web-based scientific work in order to support these activities within the Max Planck Society. It is not just R&D but aims to build a production and productive system.
Repositories help curate the institutional memory of an organisation and, like memory, should be open to new input and associations. If they are then they can form the basis for the new systems to support scholarship. The repositories need to be interdisciplinary: more, they need to be open, application-independent and flexible in such a way as to allow re-purposing of their content. eSciDoc supports the generation of knowledge through the use of such repositories from first ideas through to publishing, including collaborative work and interactive authoring: Fedora was identified as a system that can enable these aims.
Development of the Fedora Generic Search Service | Considerations about a Peer-to-peer Service for Fedora
Gert Schmeltz Pedersen, Technical University of Denmark
Gert Schmeltz Pedersen gave two presentations on the separate developments to enhance searching of Fedora repositories. The Technical University had previously developed a tool that allowed a search of Fedora using the Zebra search tool. It was then decided to make this a more generic tool that could accommodate a search engine of choice through a plug-in model.
The new generic search service [15] has been developed as a Web application that can sit within the Fedora Service Framework to access an existing Fedora repository plus an indexing and searching engine of choice. The generic search service does not supplant Fedora’s inbuilt search facility, which is provided as part of its management facility; rather the new service provides a search capability for the datastreams inside a Fedora object that reference its digital content.
In his second presentation, Gert described current thinking around the possibility of a peer-to-peer service based on Fedora. The background to this thinking is the Alvis Project [16], an EU initiative to develop an open source distributed semantic-based search engine prototype. Alvis has investigated the role of peer-to-peer as a means of supporting its search functionality and Fedora has been tested within this as a source of content for searching. Work on the project is taking place during summer 2006, and an evaluation is planned towards the end of the year.
The RepoMMan Project: Automating Metadata and Workflow for Fedora | An Institutional Repository for the University of Hull: Supporting User Needs
Richard Green, University of Hull/Chris Awre, University of Hull
RepoMMan [17], a JISC-funded project within the Digital Repositories Programme, was the only UK example of Fedora usage reported at the conference, though there were also attendees from the National Libraries of Wales and Scotland. A poster was also presented highlighting the wide range of Fedora investigation and usage that had been reported at the first meeting of the UK & Ireland Fedora Users Group in May 2006.
Richard Green gave an overview of the RepoMMan project and described in greater detail the work of the project in the areas of workflow (building a workflow engine based around BPEL (Business Process Execution Language for Web Services [18])) and automated metadata generation. In a separate presentation Chris Awre focused on the user requirements- gathering that is taking place within the project to underpin the development of these proposed tools. The aim is to ensure the repository meets not just the needs of end-users (researchers in the first instance), but to investigate how it can fit into their workflows and be used as a working tool rather than simply a place where content has to be put at some point in the production process.
The ARROW Project: A Fedora-based Institutional Repository
Andrew Treloar, ARROW Consortium, Monash University
In his second presentation, Andrew Treloar described the ARROW Project [19], a three-year initiative in Australia to develop a generalised institutional repository solution for research information management across a number of partners, co-ordinated through Monash University in Melbourne. It has examined a number of different initiatives for managing and exposing research information, initially addressing digital surrogates of traditional “print equivalent” publications. ARROW is making use of open standards wherever possible and aims to deliver open source software components for use by others as part of its work.
The project is also making use of open source software for a number of purposes. It is, of course, using Fedora, as supplied by VTLS through its VITAL product. ARROW selected Fedora because it offers a robust architecture, a flexible object-oriented data model, clear versioning, persistent identifiers at a high level of granularity, and clean Web Service interfaces.
Getting content into repositories is a major issue for ARROW as it is elsewhere. The project has followed a mediated deposit model for now, but is slowly working to transit over to a direct deposit model as appropriate.
Don’t Keep It Under Your Hat!
Christiaan Kortekaas, University of Queensland
Christiaan Kortekaas is the lead developer of the Fez client [20] onto Fedora. He used his presentation to describe Fez and how it is being employed to support research assessment within the University of Queensland as well as nationally.
Fez is defined as, “a free, open source, flexible, highly configurable digital repository and workflow management system based on Fedora 2.” It is currently at version 1.2 (April 2006) and has been developed through grant funding as part of wider projects (e.g., APSR (Australian Partnership for Sustainable Repositories) and MAMS (Meta Access Management Project)). The system is written in PHP 5.0+ and optimised for MySQL 5 and use with Fedora 2.1.1. It can manage a wide range of digital content, both individually and in combination. It is highly configurable to work with content models, workflow and security needs of choice through a Web GUI. The security and workflows can also be based on meaningful roles: creator, editor, approver, viewer, etc. Fez essentially helps manage content within Fedora.
Fez was developed as a result of an identified need to provide a clear way to manage diverse digital content. UQ eSpace is a specific adaptation of Fez to support the research assessment exercise. The University has carried out a dry run in preparation for the national Research Quality Framework (RDF) in 2007, a process that has been adapted directly from the UK’s RAE process.
The workflow for submission, reviewing and publishing is configurable and different parts can be role-based. Some parts are automatic (e.g., images are automatically processed to produce thumbnails), but generally everything else is flexible. Current content model development has been focused on requirements for the RAE process, and so exist for most standard publication types, e.g., journal articles, conference papers, patents, books, book chapters, etc.
The internal dry run has been classified a success, and has also highlighted a few lessons to be learned prior to the full RQF in 2007. For example, the best way to bug-test a system is to throw it at users and get them to enter content. This helps identify bugs early and helps raise data quality. It is also important that users understand the fields they are being asked to fill, to make sure they don’t enter the wrong information.
Fez has just been awarded a further year’s grant funding in its own right for ongoing development.
Using DPubS to Publish Fedora Content
David Ruddy, Cornell University
David Ruddy also described a system that can work on top of Fedora. DPubS (Digital Publishing System) [21] is an electronic publishing application that has been developed to provide an alternative for scholarly publishing and it is currently being packaged for open source distribution. Recent activity has focused on layering DPubS on top of institutional repository software allowing content within to be published: such publication could make use of distributed content from multiple repositories as well as from a single source. Initial investigations have focused on Fedora and DSpace to see how the DPubS Repository Service API can interface with them. There are no hard and fast conclusions as yet, though there is clear potential for Fedora to be used to assist with versioning within publications and providing granular access to individual datastreams for specific publishing requirements.
Sakai Fedora Repository Tool
Beth Kirschner, University of Michigan
The development of the Sakai Fedora tool highlights a slightly different means of how user-facing systems can work with Fedora. The development of this tool has its origins in the identified needs of the eResearch community at the University of Michigan. Projects in this area needed a collaboration environment to facilitate interaction, for which they are using Sakai [22]. They also needed a repository in which to store and manage the outputs from experiments, for which they are using Fedora. The Sakai Fedora tool provides a link between the two systems, enabling the user both to deposit and search datasets in Fedora from within Sakai. It has been designed as a generic framework to support various eResearch activities within Sakai and acts as a window between the two systems. Access is made available through the use of the generic search service described earlier. Current activity is considering the role of disseminators in enhancing what can be done with objects from within Sakai.
IRIaB: Institutional Repository In a Box
Christian Tønsberg, Technical Knowledge Center of Denmark
Finally, there was a presentation about a possible way to package Fedora alongside additional tools as a complete package. The Technical Knowledge Center has had its ORBIT (Online Research database In Technology) system in place for over six years. This system allowed the capture and cataloguing of content using the MetaToo Web-based input management tool. Records were then pulled out of the SQL database generated into a storage filesystem from where they could be indexed and made available for searching. Input was role-based and records could be edited by a number of roles as part of an input workflow.
The ORBIT system was designed as a simple system to meet a clear need. Needs over time have multiplied and become more complex, however, and ORBIT is not now considered an adequate architecture to meet current and future needs. The Technical Knowledge Center has thus looked at incorporating Fedora into its architecture to provide a more flexible capability.
The new architecture still makes use of the MetaToo tool as this is still considered a valuable front-end to Fedora. They are building a connector service that uses the Web Service interfaces on both MetaToo (REST (Representational State Transfer )) and Fedora (SOAP (Simple Object Access Protocol)) to pass catalogued records between the two. The generic search service and API-A interfaces to Fedora can then be used to facilitate access to the content.
In order to manage this system and make it available for others they are packaging it, hence the concept of a repository in a box. The initial version of IRIaB [23] contains the MetaToo tool, the connector and the search interface - not, though, Fedora itself: a future version, IRIaB++, may well include Fedora as a .war file once this packaging of Fedora is available (which it should be shortly).
Conclusion
The second Fedora Users Conference contained a wide and extensive collection of presentations, as can be seen from the descriptions above. Additional sessions also reported on the advantages of an SOA (Service-Oriented Architecture) approach in planning the implementation of a repository like Fedora, and how smaller institutions can make use of Fedora. There was also a presentation on the work of the Preservation Services Working Group, one of three Working Groups (the others being Workflow and Search) [24] that are involving members of the community outside the core development team.
Two notable points emerged from these presentations: firstly that other than overviews offered in the plenary sessions there were no presentations from the core developers; and secondly that this was largely due to the number of presentations from elsewhere demonstrating how Fedora was being used and built upon for a wide variety of purposes. In other words, presentations from the Fedora team were not necessary. This was very much a ‘Users’ conference and it was both gratifying and welcome to see the extensive range of uses already in existence.
Notwithstanding this, it also became apparent through the presentations that whilst Fedora is a usable and workable system now, it is not static but is being actively developed. Although funding for Fedora development formally expires in September 2007, the evidence from this conference suggests that it will live on long beyond then as successful open source software.
References
- Fedora http://www.fedora.info
- Fedora Users Conference website http://www.lib.virginia.edu/digital/fedoraconf/index.shtml
- XACML background information http://xml.coverpages.org/xacml.html
- Payette S., Shin E., Wilper C., Wayland R., “Fedora Proposal: Content Model Dissemination Architecture”, Fedora Project 2006 http://www.cs.cornell.edu/payette/fedora/designs/cmda
- VTLS, Inc. http://www.vtls.com/
- VITAL http://www.vtls.com/Products/vital.shtml
- Indiana University Digital Library Infrastructure Project Wiki http://wiki.dlib.indiana.edu/confluence/display/INF
- Tufts University Digital Repository Program http://dca.tufts.edu/tdr
- METS Navigator tool http://sourceforge.net/projects/metsnavigator
- DLF Aquifer Project http://www.diglib.org/aquifer/index.htm
- National Science Digital Library http://nsdl.org/
- DART Project http://dart.edu.au/
- The Pathways Project http://www.infosci.cornell.edu/pathways/
- eSciDoc http://www.escidoc-project.de/homepage.html
- Fedora generic search service http://defxws2006.cvt.dk/fedoragsearch/
- Alvis Project http://www.alvis.info/alvis/
- RepoMMan Project http://www.hull.ac.uk/esig/repomman/
- BPEL background information http://en.wikipedia.org/wiki/BPEL
- ARROW Project http://arrow.edu.au/
- Fez http://www.apsr.edu.au/fez.htm
- DPubS http://dpubs.org/
- Sakai Collaboration & Learning Environment http://www.sakaiproject.org/
- Institutional Repository In a Box (IRIaB) http://defxws.cvt.dk/projects/fedora-generic/dataintegration/iriab/
- Fedora Working Groups http://www.fedora.info/wiki/index.php/Working_Groups