Web Magazine for Information Professionals

Fedora UK & Ireland / EU Joint User Group Meeting

Chris Awre reports on the first coming together of two regional user groups for the Fedora digital repository system, hosted by the University of Oxford in December 2009.

The Fedora digital repository system 1 is an open source solution for the management of all types of digital content. Its development is managed through DuraSpace [2], the same organisation that now oversees DSpace, and carried out by developers around the world. The developers, alongside the extensive body of Fedora users, form the community that sustains Fedora.

Although there have been regular international user group meetings for the Fedora community, hosted in recent years as part of the Open Repositories conference, there have also been a number of more regional initiatives to foster interaction amongst Fedora users and provide assistance to those adopting the software. The Fedora UK & Ireland User Group was founded in May 2006 and has held meetings approximately every six months since that time, promoting the exchange of views, information and experience while fostering collaboration between organisations on subsequent projects. Whilst many delegates are University-based, the user group also encompasses commercial consultancies, the BBC and National Libraries amongst other organisations with an interest in how Fedora can support their work.

The Fedora EU User Group was founded in 2008 as a way of bringing together a growing body of knowledge and expertise across EU countries, and as a way of fostering collaboration within the EU. Distance has limited meetings to roughly annual intervals, though there have also been discussions at the Open Repositories conference and other relevant meetings (e.g., ECDL). Those attending predominantly come from Denmark and Germany, though the meeting in Oxford also welcomed representatives from Sweden and Holland.

The organisation of the annual UK All Hands meeting and IEEE e-Science conference, with its Digital Repositories and Data Management track [3], in Oxford prompted the idea of a joint meeting between the user groups at the same time, to maximise value for those colleagues travelling. The meeting covered two themes: e-research environments and content models [4].

e-Research Environments

An ongoing initiative of the DuraSpace organisation is to foster the coming together of virtual solution communities to help address community-wide digital content management issues and identify how digital repositories can support them. One such area of interest is how digital repositories can support e-research, or scholarship in its widest sense. How can repositories provide the relevant tools to support research practice? The development of ideas in response to these questions is being taken forward by the Scholars Workbench Solution Community [5], and it was this group that led and moderated the morning’s presentations.

The first two presentations covered initiatives to provide fully fledged system environments that could be used and adapted to support research practice in different forms.

The Hydra Initiative

Chris Awre, University of Hull

The Hydra initiative [6] is a collaboration between the University of Virginia, Stanford University, the University of Hull and DuraSpace to model and carry out work towards development of a reusable framework for multi-purpose, multi-function, multi-institutional Fedora repository-enabled solutions. Hydra recognises that repositories can be used to manage content at different stages in its lifecycle, and hence need to provide different points of interaction to support that management. Like a Hydra has many heads on one body, a repository can usefully have many entry points for different purposes onto a single body of content. Hydra is focusing on the separate service components that may be involved and how they can be combined in flexible ways using workflow technologies. This component approach is intended to provide others with the ability to feed into Hydra in the future, and foster community collaboration and development.

eSciDoc-based Virtual Research Environments

Matthias Razum, FIZ Kartsruhe

The eSciDoc Project [7] has recently reached the end of its five-year funding and is now a mature framework of repository and related services based on Fedora that can be combined to serve different research needs. Matthias described two particular examples of how the services have been combined to form virtual research environments (VREs). One project on computer linguistics is using a repository to manage content at different stages of processing. The project is fostering collaboration between librarians, who provide data management input, and the computer linguistic scientists, who are contributing their subject knowledge. Another example was a method to capture lab book information into the repository, using QR codes as the means of transfer between systems.

**

The next two presentations focused on the practicalities of working with researchers to support their work using repositories.

Repositories and Research Pools in Scotland

James Toon, University of Edinburgh

James is project manager for the ERIScotland Project [8], a Scotland-wide project based at Edinburgh, which is working with institutions and researchers across Scotland to identify effective ways of managing and disseminating outputs from research. Scotland has introduced research pools, where funding is directed at a subject rather than an institution specifically, and researchers are having to collaborate more across institutions as a result. This poses a challenge for institutional repository management, increasing the emphasis on the subject area, and has highlighted the disparity between researcher and library management of research outputs. There is a risk that libraries may become myopic in their institutional repository strategy, and need to become more embedded in the research lifecycle.

Fedora for Scientific Data Repositories

Mark Hedges, King’s College London

Mark described a way in which repositories and information professionals can become more closely involved. The BRIL Project [9] is working with the Department of Cell and Molecular Biophysics to establish ways of capturing research data straight from the equipment generating them. This avoids the concept of deposit as a separate process and seeks to lower barriers to adoption. Capturing data in this way potentially allows for processes to be re-run to validate outcomes.

Although the meeting had been organised through the Fedora User Groups, the concept of repositories supporting e-research and the Scholars Workbench is not platform-specific. It is recognised that other repository platforms are also addressing the research support issue, and that the different platform communities have much they can learn from each other. As such, presentations from practitioners using EPrints and DSpace were included in the day in order to highlight similarities and directions.

Institutional Research Data Management: A 10-year Blueprint

Les Carr, University of Southampton

The University of Southampton has long been involved in investigating how repositories can support research. Under the JISC Research Data Management Programme, it is now examining this aspect in the long term, seeking to build policy and service-oriented computer infrastructure for the institutions as a whole [10]. Part of this work recognises that the repository is often best placed when it is supporting and enabling in a hidden capacity, and that researcher value lies outside the repository. It also notes that the repository needs to have effective interaction with a range of other services and systems.

Edinburgh DataShare: Achievements and Aspirations

Stewart MacDonald, University of Edinburgh

The University of Edinburgh had, through the JISC-funded DISC-UK DataShare Project [11], examined a variety of issues relating to how institutions can best manage research data for its researchers. The project is now informing the development of the Edinburgh DataShare service [12] which is looking to work with research teams within the research pools mentioned by James Toon in his earlier presentation. The project has provided a good basis for the services, producing a policy-making guide for research data and testing the Data Audit Framework, which had helped greatly with engagement with researchers.

The morning ended with two presentations describing different instances of how repositories might support research, at the broad and focussed levels.

Metadata for Reuse: ANDS and the role of IRs

Andrew Treloar, Australian National Data Service

The Australian National Data Service [13] has been funded by the Australian Government to build a virtual research data commons. It will capture data from a variety of different sources, both academic and public sector, and is seeking to make this available for reuse to increase the value gained from its generation. This involves the generation of information to aid discovery. (Note however, it is not being labelled as metadata, but instead is described according to what use it will be put.) Such information will be produced so that others can find the data in the first place, sitting alongside information for appreciation of value, for access, and for reuse itself: this includes each collection having its own crawlable Web page. One of the major challenges in collating diverse sources of data is the different infrastructural organisation at different institutions which makes it difficult to describe data collections in a comparable manner.

Fedora-based Portal for Geo-tagged Audio Comments with a Mobile Client

Andreas Hense, Bonn-Rhein-Seig University of Applied Sciences

Andreas presented some work he is involved with to aid the capture of audio comments into a repository using mobile devices. The intention is to allow comments to be captured at a relevant location, and for the comments to be tagged with relevant geographic information. These comments could then be shared in a similar way to images (e.g. Flickr) or videos (e.g. YouTube). Speech-to-text translation may add to the usability of the comments, though it was noted that a high level of accuracy was required to avoid user frustration.

DuraSpace

Thornton Staples provided a brief overview of developments within the DuraSpace organisation:

Content Models

The afternoon session was given over to a discussion of content models within Fedora [16]. Content models are important to Fedora in defining the way digital objects are structured and managed. As such, when creating a Fedora repository it is vital that the content is analysed and appropriate content models established to guide development and implementation.

The next two presentations highlighted the different extents to which the content model architecture in Fedora can be used.

Content Model-driven Software

Asger Askov Blekinge, State and University Library, Denmark

Asger’s presentation described the way in which additional information can be stored in the content model over what is often included: an enhanced content model. His team have also taken an atomistic approach to content models, breaking materials down to their constituent parts and describing each part. The captured information and the atomistic approach allow relationships between objects to be built up and logical views presented according to context. Interfaces can be auto-generated from the content models according to need. They are currently investigating how to support search using the content models, and are confident of using the enhanced content model approach as the basis for other functionality as well.

Content Models in the Hydra Project

Richard Green, Hydra Initiative

By contrast, the Hydra approach to content models is to keep it as simple as possible. This is partly driven by the desire to make Hydra an environment that can be used by many, and being too specific with content models could be a barrier for some. Hydra has separated out content from metadata describing it, and proposes separate content models for each. However, within them, it is feasible to have optional datastreams (individual parts of the model), and it is intended that Fedora return a clean error message if a datastream is not available when the model is implemented. Fedora was designed with the idea of using disseminators to deliver content, and Hydra will seek to use them by default (an approach also adopted by the State and University Library in Denmark - see paragraph above).

Although slightly diverting from the main topic of content models at times, the next two presentations offered a contrasting view of how to manage content models within Fedora.

Fedora Content Modelling

Gert Schmeltz Pedersen, Technical University of Denmark

Gert presented three examples of how content models had been used, the common denominator between them being the use of XML serialisation to implement them in practice. This was a conscious effort to use XML and RDF instead of an RDBMS approach.

Fedora Custom Database Extension

Lodewijk Bogaards, DANS, Netherlands

Lodewijk, on the other hand, presented an approach that reflected that some queries into Fedora can become too complicated, and that a database approach can overcome this. It is unclear which approach is favoured, and there are performance matters to address for both, but Gert’s and Lodewijk’s approaches highlighted the flexibility that could be adopted according to need and knowledge.

***

Remaining presentations offered other examples of how content models are being used. Their full use is still maturing, but they offer a powerful way of structuring repository content and metadata so they can be actively managed and used, rather than comprise merely a static collection within the repository.

Conclusion

The day overall covered a lot of ground. It was useful to have such a mix of people from different European countries since delegates could all bring their different perspectives; everyone left with food for thought as well as new contacts. Future joint meetings will be held as opportunity arises. The two halves of the day highlighted two key aspects: the use of a repository is becoming more flexible and powerful as it is applied to a range of situations in research; and the richness of Fedora, in how it is able to structure research materials, which will allow it to be used in flexible ways as demand requires.

Note

Further information on Fedora User Group activity can be found through the respective JISCMail lists:

References

  1. Fedora http://www.fedora-commons.org/
  2. DuraSpace http://duraspace.org/
  3. 5th IEEE International Conference on e-Science, 9-11 December 2009, Oxford
    http://www.oerc.ox.ac.uk/ieee/programme
    (see programme for details of Digital Repositories and Data Management track presentations)
  4. Details of the meeting and presentations available via the Fedora Commons wiki
    http://www.fedora-commons.org/confluence/pages/viewpage.action?pageId=13762803
  5. Scholars Workbench Solution Community
    http://www.fedora-commons.org/confluence/display/FCCWG/Scholars+Workbench
  6. The Hydra Project http://fedora-commons.org/confluence/display/hydra/The+Hydra+Project
  7. eSciDoc https://www.escidoc.org/
  8. ERIScotland Project http://eriscotland.wordpress.com/
  9. BRIL (Biophysical Repositories in the Lab) Project http://bril.cerch.kcl.ac.uk/
  10. Southampton Data Management: Institutional Data Management Blueprint (IDMB) Project
    http://www.southamptondata.org/
  11. DISC-UK DataShare Project http://www.disc-uk.org/datashare.html
  12. Edinburgh DataShare http://datashare.edina.ac.uk/dspace/
  13. Australian National Data Service (ANDS) http://ands.org.au/
  14. Fedora Repository Development Wiki
    http://www.fedora-commons.org/confluence/display/FCREPO/Fedora+Repository+Development+Wiki
  15. DuraCloud http://www.duraspace.org/duracloud.php
  16. Content models in Fedora
    http://www.fedora-commons.org/confluence/display/FEDORACREATE/Content+Models

Author Details

Chris Awre
Head of Information Management
Academic Services
University of Hull

Email: c.awre@hull.ac.uk
Web site: http://www2.hull.ac.uk/ACS/information management.aspx

Return to top