Fedora UK & Ireland / EU Joint User Group Meeting
The Fedora digital repository system 1 is an open source solution for the management of all types of digital content. Its development is managed through DuraSpace [2], the same organisation that now oversees DSpace, and carried out by developers around the world. The developers, alongside the extensive body of Fedora users, form the community that sustains Fedora.
Although there have been regular international user group meetings for the Fedora community, hosted in recent years as part of the Open Repositories conference, there have also been a number of more regional initiatives to foster interaction amongst Fedora users and provide assistance to those adopting the software. The Fedora UK & Ireland User Group was founded in May 2006 and has held meetings approximately every six months since that time, promoting the exchange of views, information and experience while fostering collaboration between organisations on subsequent projects. Whilst many delegates are University-based, the user group also encompasses commercial consultancies, the BBC and National Libraries amongst other organisations with an interest in how Fedora can support their work.
The Fedora EU User Group was founded in 2008 as a way of bringing together a growing body of knowledge and expertise across EU countries, and as a way of fostering collaboration within the EU. Distance has limited meetings to roughly annual intervals, though there have also been discussions at the Open Repositories conference and other relevant meetings (e.g., ECDL). Those attending predominantly come from Denmark and Germany, though the meeting in Oxford also welcomed representatives from Sweden and Holland.
The organisation of the annual UK All Hands meeting and IEEE e-Science conference, with its Digital Repositories and Data Management track [3], in Oxford prompted the idea of a joint meeting between the user groups at the same time, to maximise value for those colleagues travelling. The meeting covered two themes: e-research environments and content models [4].
e-Research Environments
An ongoing initiative of the DuraSpace organisation is to foster the coming together of virtual solution communities to help address community-wide digital content management issues and identify how digital repositories can support them. One such area of interest is how digital repositories can support e-research, or scholarship in its widest sense. How can repositories provide the relevant tools to support research practice? The development of ideas in response to these questions is being taken forward by the Scholars Workbench Solution Community [5], and it was this group that led and moderated the morning’s presentations.
The first two presentations covered initiatives to provide fully fledged system environments that could be used and adapted to support research practice in different forms.
The Hydra Initiative
Chris Awre, University of Hull
The Hydra initiative [6] is a collaboration between the University of Virginia, Stanford University, the University of Hull and DuraSpace to model and carry out work towards development of a reusable framework for multi-purpose, multi-function, multi-institutional Fedora repository-enabled solutions. Hydra recognises that repositories can be used to manage content at different stages in its lifecycle, and hence need to provide different points of interaction to support that management. Like a Hydra has many heads on one body, a repository can usefully have many entry points for different purposes onto a single body of content. Hydra is focusing on the separate service components that may be involved and how they can be combined in flexible ways using workflow technologies. This component approach is intended to provide others with the ability to feed into Hydra in the future, and foster community collaboration and development.
eSciDoc-based Virtual Research Environments
Matthias Razum, FIZ Kartsruhe
The eSciDoc Project [7] has recently reached the end of its five-year funding and is now a mature framework of repository and related services based on Fedora that can be combined to serve different research needs. Matthias described two particular examples of how the services have been combined to form virtual research environments (VREs). One project on computer linguistics is using a repository to manage content at different stages of processing. The project is fostering collaboration between librarians, who provide data management input, and the computer linguistic scientists, who are contributing their subject knowledge. Another example was a method to capture lab book information into the repository, using QR codes as the means of transfer between systems.
**
The next two presentations focused on the practicalities of working with researchers to support their work using repositories.
Repositories and Research Pools in Scotland
James Toon, University of Edinburgh
James is project manager for the ERIScotland Project [8], a Scotland-wide project based at Edinburgh, which is working with institutions and researchers across Scotland to identify effective ways of managing and disseminating outputs from research. Scotland has introduced research pools, where funding is directed at a subject rather than an institution specifically, and researchers are having to collaborate more across institutions as a result. This poses a challenge for institutional repository management, increasing the emphasis on the subject area, and has highlighted the disparity between researcher and library management of research outputs. There is a risk that libraries may become myopic in their institutional repository strategy, and need to become more embedded in the research lifecycle.
Fedora for Scientific Data Repositories
Mark Hedges, King’s College London
Mark described a way in which repositories and information professionals can become more closely involved. The BRIL Project [9] is working with the Department of Cell and Molecular Biophysics to establish ways of capturing research data straight from the equipment generating them. This avoids the concept of deposit as a separate process and seeks to lower barriers to adoption. Capturing data in this way potentially allows for processes to be re-run to validate outcomes.
Although the meeting had been organised through the Fedora User Groups, the concept of repositories supporting e-research and the Scholars Workbench is not platform-specific. It is recognised that other repository platforms are also addressing the research support issue, and that the different platform communities have much they can learn from each other. As such, presentations from practitioners using EPrints and DSpace were included in the day in order to highlight similarities and directions.
Institutional Research Data Management: A 10-year Blueprint
Les Carr, University of Southampton
The University of Southampton has long been involved in investigating how repositories can support research. Under the JISC Research Data Management Programme, it is now examining this aspect in the long term, seeking to build policy and service-oriented computer infrastructure for the institutions as a whole [10]. Part of this work recognises that the repository is often best placed when it is supporting and enabling in a hidden capacity, and that researcher value lies outside the repository. It also notes that the repository needs to have effective interaction with a range of other services and systems.
Edinburgh DataShare: Achievements and Aspirations
Stewart MacDonald, University of Edinburgh
The University of Edinburgh had, through the JISC-funded DISC-UK DataShare Project [11], examined a variety of issues relating to how institutions can best manage research data for its researchers. The project is now informing the development of the Edinburgh DataShare service [12] which is looking to work with research teams within the research pools mentioned by James Toon in his earlier presentation. The project has provided a good basis for the services, producing a policy-making guide for research data and testing the Data Audit Framework, which had helped greatly with engagement with researchers.
The morning ended with two presentations describing different instances of how repositories might support research, at the broad and focussed levels.
Metadata for Reuse: ANDS and the role of IRs
Andrew Treloar, Australian National Data Service
The Australian National Data Service [13] has been funded by the Australian Government to build a virtual research data commons. It will capture data from a variety of different sources, both academic and public sector, and is seeking to make this available for reuse to increase the value gained from its generation. This involves the generation of information to aid discovery. (Note however, it is not being labelled as metadata, but instead is described according to what use it will be put.) Such information will be produced so that others can find the data in the first place, sitting alongside information for appreciation of value, for access, and for reuse itself: this includes each collection having its own crawlable Web page. One of the major challenges in collating diverse sources of data is the different infrastructural organisation at different institutions which makes it difficult to describe data collections in a comparable manner.
Fedora-based Portal for Geo-tagged Audio Comments with a Mobile Client
Andreas Hense, Bonn-Rhein-Seig University of Applied Sciences
Andreas presented some work he is involved with to aid the capture of audio comments into a repository using mobile devices. The intention is to allow comments to be captured at a relevant location, and for the comments to be tagged with relevant geographic information. These comments could then be shared in a similar way to images (e.g. Flickr) or videos (e.g. YouTube). Speech-to-text translation may add to the usability of the comments, though it was noted that a high level of accuracy was required to avoid user frustration.
DuraSpace
Thornton Staples provided a brief overview of developments within the DuraSpace organisation:
- There is now a new developer’s wiki presence [14] to aid communication to and among developers working on Fedora
- A developer committers community has been formed for Fedora to support the growing number of practitioners contributing directly to the Fedora codebase
- A ‘getting started’ site is being scoped to allow potential users to have a play with Fedora and understand better how it works
- The development of DuraCloud [15], DuraSpace’s cloud computing management layer service, is moving ahead and will be open for use in 2010
Content Models
The afternoon session was given over to a discussion of content models within Fedora [16]. Content models are important to Fedora in defining the way digital objects are structured and managed. As such, when creating a Fedora repository it is vital that the content is analysed and appropriate content models established to guide development and implementation.
The next two presentations highlighted the different extents to which the content model architecture in Fedora can be used.
Content Model-driven Software
Asger Askov Blekinge, State and University Library, Denmark
Asger’s presentation described the way in which additional information can be stored in the content model over what is often included: an enhanced content model. His team have also taken an atomistic approach to content models, breaking materials down to their constituent parts and describing each part. The captured information and the atomistic approach allow relationships between objects to be built up and logical views presented according to context. Interfaces can be auto-generated from the content models according to need. They are currently investigating how to support search using the content models, and are confident of using the enhanced content model approach as the basis for other functionality as well.
Content Models in the Hydra Project
Richard Green, Hydra Initiative
By contrast, the Hydra approach to content models is to keep it as simple as possible. This is partly driven by the desire to make Hydra an environment that can be used by many, and being too specific with content models could be a barrier for some. Hydra has separated out content from metadata describing it, and proposes separate content models for each. However, within them, it is feasible to have optional datastreams (individual parts of the model), and it is intended that Fedora return a clean error message if a datastream is not available when the model is implemented. Fedora was designed with the idea of using disseminators to deliver content, and Hydra will seek to use them by default (an approach also adopted by the State and University Library in Denmark - see paragraph above).
Although slightly diverting from the main topic of content models at times, the next two presentations offered a contrasting view of how to manage content models within Fedora.
Fedora Content Modelling
Gert Schmeltz Pedersen, Technical University of Denmark
Gert presented three examples of how content models had been used, the common denominator between them being the use of XML serialisation to implement them in practice. This was a conscious effort to use XML and RDF instead of an RDBMS approach.
Fedora Custom Database Extension
Lodewijk Bogaards, DANS, Netherlands
Lodewijk, on the other hand, presented an approach that reflected that some queries into Fedora can become too complicated, and that a database approach can overcome this. It is unclear which approach is favoured, and there are performance matters to address for both, but Gert’s and Lodewijk’s approaches highlighted the flexibility that could be adopted according to need and knowledge.
***
Remaining presentations offered other examples of how content models are being used. Their full use is still maturing, but they offer a powerful way of structuring repository content and metadata so they can be actively managed and used, rather than comprise merely a static collection within the repository.
Conclusion
The day overall covered a lot of ground. It was useful to have such a mix of people from different European countries since delegates could all bring their different perspectives; everyone left with food for thought as well as new contacts. Future joint meetings will be held as opportunity arises. The two halves of the day highlighted two key aspects: the use of a repository is becoming more flexible and powerful as it is applied to a range of situations in research; and the richness of Fedora, in how it is able to structure research materials, which will allow it to be used in flexible ways as demand requires.
Note
Further information on Fedora User Group activity can be found through the respective JISCMail lists:
References
- Fedora http://www.fedora-commons.org/
- DuraSpace http://duraspace.org/
- 5th IEEE International Conference on e-Science, 9-11 December 2009, Oxford
http://www.oerc.ox.ac.uk/ieee/programme
(see programme for details of Digital Repositories and Data Management track presentations) - Details of the meeting and presentations available via the Fedora Commons wiki
http://www.fedora-commons.org/confluence/pages/viewpage.action?pageId=13762803 - Scholars Workbench Solution Community
http://www.fedora-commons.org/confluence/display/FCCWG/Scholars+Workbench - The Hydra Project http://fedora-commons.org/confluence/display/hydra/The+Hydra+Project
- eSciDoc https://www.escidoc.org/
- ERIScotland Project http://eriscotland.wordpress.com/
- BRIL (Biophysical Repositories in the Lab) Project http://bril.cerch.kcl.ac.uk/
- Southampton Data Management: Institutional Data Management Blueprint (IDMB) Project
http://www.southamptondata.org/ - DISC-UK DataShare Project http://www.disc-uk.org/datashare.html
- Edinburgh DataShare http://datashare.edina.ac.uk/dspace/
- Australian National Data Service (ANDS) http://ands.org.au/
- Fedora Repository Development Wiki
http://www.fedora-commons.org/confluence/display/FCREPO/Fedora+Repository+Development+Wiki - DuraCloud http://www.duraspace.org/duracloud.php
- Content models in Fedora
http://www.fedora-commons.org/confluence/display/FEDORACREATE/Content+Models