DAEDALUS: Initial Experiences With EPrints and DSpace at the University of Glasgow
DAEDALUS [1] is a three-year JISC-funded project under the FAIR Programme [2] which will build a network of open access digital collections at the University of Glasgow. These collections will enable us to unlock access to a wide range of our institutional scholarly output. This output includes not only published and peer-reviewed papers but also administrative documents, research finding aids, pre-prints and theses. DAEDALUS is also a member of the CURL (Consortium of University Research Libraries) SHERPA Project [3].
The project began in July 2002 and this article provides an overview of our initial experiences with EPrints and DSpace. It is based on a presentation given at the Digital Preservation Coalition forum meeting in June 2003 [4] and is archived in our DSpace service.
Why EPrints and DSpace?
The most frequently asked questions which the DAEDALUS team receive (after ‘ Why DAEDALUS?‘ [5] and ‘How much content do you have?) are ‘Why are you using both EPrints and DSpace?‘ and ‘Why not one or the other?‘.
The answers to the latter questions lie in the origins of the DAEDALUS Project. DAEDALUS evolved from our initial experiences with EPrints and the creation of a pilot institutional service. This pilot was set up to accept a wide range of content from preprints, published papers and theses and to be an OAI (Open Archives Initiative)-compliant service (which was the case). Once this pilot service was up and running, using EPrints 1.x, it became very apparent that there were different criteria which should be applied to the different types content which we accepted.
In our initial implementation it was also not easy to identify readily the type of content which was on display. EPrints requires details such as publication status and document type but these were not displayed in our first release. To remedy this we made some changes to the record level description for items so that they would display status and type. These fields are all displayed in GNU EPrints 2.x and it is apparent what the status and type of an item is.
The proposal for DAEDALUS was to take this one step further and to build a range of distinct services which would be differentiated by content type so that peer-reviewed content for instance would not be interfiled with technical reports or theses. The collections were identified as:
- Published and peer-reviewed papers
- Grey literature including technical reports and working papers
- Theses
- Administrative documents
- Research finding aids
During the lifetime of the project these individual collections will be underpinned by a local OAI service search to enable users to cross-search our range of collections. Work has not yet begun on that service. This decision then provided us with the opportunity to use different pieces of software such as GNU EPrints which we had experience with, and the Virginia Tech software for e-theses. At the time of writing the bid for the FAIR Programme, (March 2002), DSpace was not publicly available and it was our intention to use the GNU EPrints software for both published and peer-reviewed material as well as the pre-prints / grey literature material.
One of the aims of DAEDALUS is to gain experience in the installation and rollout of different pieces of software and we will publish a full comparison of the software on the project Web site. Work has already been undertaken by the ‘Theses Alive!‘ Project to look at DSpace and the ETD-db software [6]. The GNU EPrints software has been used for our published and peer-reviewed papers service. The role of this service has been expanded beyond our initial proposal also to include bibliographic records (without full text) as a means of seeding the service with publication details. This approach has already been taken by Lund University in Sweden [7] using EPrints.
The decision to swap DSpace into the mix for the pre-prints and grey literature service was made for a variety of reasons.
We felt that we were in a position (with both the hardware and technical support) to run DSpace in addition to the ETD-db and GNU EPrints software.
The DSpace model of Communities and Collections and the institutional origins of the software provided administrative features sufficiently different from GNU EPrints to make it an interesting proposition for departmental content. The future possibility of devolving the administration of Communities to individual departments was another key area which we are interested to explore. We are also particularly interested in the digital preservation components of DSpace. The focus for DAEDALUS is on text-based research material rather than datasets, images and other types of digital content - much of which drove the development of DSpace at MIT.
About EPrints
GNU EPrints 2.x ‘is free software which creates online archives. The default configuration creates a research papers archive‘ [8]. With its origins in the Scholarly Communication movement, EPrints default configuration is geared to research papers but it can be adapted for other purposes and content. It was developed in the Intelligence, Agents, Multimedia Group at the Electronics and Computer Science Department of the University of Southampton. There are currently some 106 EPrints listed on the GNU EPrints Web site. Anecdotal evidence would indicate that there are many more in development or use.
Within the FAIR Programme in the UK the TARDis Project [9] is developing an EPrints Archive at the University of Southampton and feeding comments back to EPrints software development, much of which will be included in v.2.3.
GNU EPrints is freely distributable and subject to the GNU General Public License [10]. Further information about GNU EPrints as well as the software, documentation and a full list of features as well as documentation is available from its Web site [8]. A list of (known) installed sites is also available.
Figure 1: software.eprints.org
About DSpace
DSpace is ‘a digital repository designed to capture, store, index, preserve, and redistribute the intellectual output of a university‘s research faculty in digital formats.‘ [11] Like EPrints, it is an open-source system and is freely available for anyone to download and run at any type of institution, organisation, or company (or even just an individual). DSpace was jointly developed by MIT Libraries and Hewlett-Packard. Users of DSpace are also allowed to modify DSpace to meet an organisation‘s specific needs. The BSD distribution license describes the specific terms of use [12]. Within the UK, other projects such as DSpace@Cambridge [13] are using DSpace to build digital repositories.
Further information about DSpace is available from the DSpace Federation Web site [14]. This has been recently relaunched and contains a wealth of information about DSpace as well as an FAQ section, implementation guidelines and links to software itself together with the DSpace mailing lists.
Figure 2: DSpace.org
DSpace, as a product emerged from an institutional imperative to manage and preserve digital content and that imperative has shaped its development and the model of ”Communities and Collections” which it has adopted. Communities can be mapped to administrative units of an organisation and at Glasgow, like MIT, we have created Communities which map to our ”early adopter”departments. Collections for different types of content, research groups, etc. can then be created within communities.
Figure 3: DSpace Communities at Glasgow
Our initial experiences with DSpace were that the current implementation of Communities and Collections was too flat. This issue is being addressed and the addition of sub-communities is on the shortlist of enhancements for the 1.2 release [15]. This enhancement will enable us to nest academic departments within faculties in a more hierarchical fashion, similar to the Browse by Departments option we provide in EPrints.
Implementation at Glasgow
At Glasgow both GNU EPrints and DSpace are running on the same server, namely a Sun Fire server with the Solaris Operating System, 4 GBytes of memory and 2 x 36 GBytes Disks. The specifications of this server are in excess of what both EPrints and DSpace would recommend for a minimal installation of the software but we felt that it was important to ”future proof” the server for the lifetime of the project and beyond. The advice from both EPrints and DSpace is to start small. They can both be installed on a desktop specification PC running a flavour of Linux and this is a useful starting point to gain experience and also to run a pilot service.
Installation of GNU EPrints
EPrints was the first package installed on the DAEDALUS server since we had previous experience with the original version of the software. It took about a week and a half to do the initial installation of the software and to get an ‘out of the box‘ service up and running. We then spent several additional weeks configuring and customising the service. These configurations ranged from the ‘look and feel‘ of the service to the addition of new fields and format types.
Although EPrints is now on version 2.x, the software was substantially rewritten from version 1.x. It was not until version 2.x that it became GNU EPrints — so much so that EPrints recommended a clean and new installation of the software rather than an upgrade. It is also worth noting that the installation routines in EPrints 2.x were very much improved over version 1.x — as was our experience of installing it.
Figure 4: Glasgow EPrints Service
Installation of DSpace
The installation of DSpace was a more complicated affair. We had no previous experience with it until the publicly released version 1.0 in November 2002. Work began on the installation of DSpace in February 2003. In all, it took us some 3 months (off and on) to install and configure the software fully - other work was ongoing around this process. Much of the time devoted to installation can be attributed to the version of Tomcat which was already installed. Tomcat is the servlet engine which handles requests for Java servlets which are passed to it by Apache. We assumed DSpace would work with this installation, but it could not be forced to work with it and so we had to go back and re-install Tomcat from source.
DSpace has also been installed on Solaris rather than Linux and many of the problems we encountered were associated with the specific versions of packages which needed to be installed to make DSpace run. This may also have implications for DSpace upgrades, particularly if there is other software running on the server which also requires specific versions. Ultimately the software was further upgraded to version 1.1 in May 2003 with little difficulty. At the next release of DSpace we consider missing out Apache and just using Tomcat standalone or mod_jk which other institutions have examined.
Other sites such as the University of Tennessee have provided their experiences on installing DSpace on Solaris on their Web site ‘DSpace for Dummies‘ [16]. Like EPrints, DSpace also has a technical list, DSpace-tech [17] at SourceForge which is very active and an invaluable resource for installing and configuring DSpace.
Figure 5: Glasgow DSpace Service
Skills Sets and Platforms
EPrints and DSpace use different backend programming languages, Perl for EPrints and Java for DSpace. Staff with experience in Java or Perl, depending on the choice of software is highly recommended.
GNU EPrints | DSpace | |
Operating System | Unix / Linux | Unix/Linux |
Backend programming language | Perl | Java |
Database Management System | MySQL | PostGreSQL |
Configuration
The configuration of the services can range from the ‘look and feel‘ to the implementation of additional functionality. Both products can be used ‘out of the box‘ as they are installed. It is very likely, however that sites will want to do some elements of customisation even at the most basic level of the ‘look and feel‘.
Configuring GNU EPrints
To date the most extensive configuration work we have done has been with the GNU EPrints software. This has ranged from implementing a Glasgow ‘look and feel‘ to the creation of additional fields and document types.
The range of configuration work which we have done with EPrints has been as a result of our familiarity with the software and the strength of our skills base in Perl. These additional fields have included one to enable us to provide a link out to DSpace if a preprint is available.
Figure 6: EPrints - Preprint available
By default EPrints deals with the existence of preprint (and postprint) versions very elegantly and provides both backward and forward links to different versions automatically. Our decision to split the location of the preprint from the final version has meant that we could not take advantage of this excellent feature. With GNU EPrints the configuration changes are done at the code level and changes must be made in various xml files in the /cfg directory, (ArchiveRenderConfig.pm; ArchiveConfig.pm and so on). The EPrints documentation provides a helpful range of How To guides for adding additional fields as well as other refinements [18].
Configuring DSpace
DSpace like EPrints can also be significantly modified and it provides a breadth of functionality. This ranges from the submission process through to the administration of the service. To date we have not made any substantial code changes to our installation. We have changed the ‘look and feel‘ of our DSpace implementation with a local stylesheet and imposed our own colour scheme. We have also added our own Glasgow header and footer. The service is currently branded ‘The Glasgow DSpace Service‘ but beyond that we have not altered the basic functionality of the service.
It is possible, indeed encouraged by DSpace that sites work with the code. The University of Edinburgh‘s ‘Theses Alive!‘ Project has developed an add-on module for theses for DSpace [19]. At the Administrator level it is possible to make various changes to the Community and Collection home pages in DSpace. These can be customised with an image as well as descriptive text. There is also a field for adding copyright information about a Community or Collection. A list of ‘Recent Submissions’ are displayed in the sidebar.
Figure 7: DSpace Community - Department of Slavonic Studies
There is also a Template feature which can pre-populate fields with content which you may wish to see added to each item in a collection. In the DAEDALUS collection we added text in the Sponsor field to acknowledge that the project has been funded under the JISC FAIR Programme. This content is now automatically added to each item which we deposit in this collection. It can also be removed at the point of submission if the content of a field is inappropriate.
Submission
Both EPrints and DSpace are designed for self-archiving and the deposit of content by individual authors (or submitters); however within the FAIR projects alternative submission processes are being investigated. Part of the TARDis Project is to offer an assisted deposit gateway to authors and a new interface has been configured for this. Under the auspices of DAEDALUS we are currently providing a mediated submission service rather than actively encouraging our users to deposit their own content — a self-perpetuating, self-archiving model is one to which we would want to move to in the longer term.
DAEDALUS content can only be submitted to the service by registered users and it is necessary to login to each service to deposit content.
Submission of GNU EPrints
In EPrints, after the user has logged in, they must select the content type first; this will determine the range of fields to be completed - some of which are mandatory - from a drop down menu. A typical set of options is shown below. In the Glasgow EPrints Service, we have configured the service to display only journal article as an option.
Figure 8: Drop-down list of content for EPrints
The range of content types can be added and deleted in the metadatatypes.xml file in the \cfg directory.
It is then necessary to enter the bibliographic data about your item. Fields marked with a * in this section are mandatory. An additional field which we added here, with a radio button option was to confirm if the full text of content would also be deposited.
Figure 9: EPrints - Full Text Provided
After completing the bibliographic details, it is necessary to upload the files into EPrints. This is done by selecting the format and indicating the number of files. We have added a range of file formats such as XML Docbook to our service. EPrints will then display what the record looks like and the content for the various metadata fields.
The final step of the process is the click-thru‘ agreement:
Figure 10: EPrints Click Thru Agreement
Content submitted into EPrints automatically goes into the Submission Buffer, even if you are an Administrator. Once in this buffer, it can be viewed, edited, accepted or rejected. When the content is accepted, it becomes available through the search interface. It is not displayed in the Browse listings until the generate_views routine is run. This routine should be a scheduled cron job in the EPrints service and be set to run at least daily.
Submission of DSpace
With DSpace the submission process is very similar; but the initial stages are centred on the collection into which the content will be deposited, rather than the type of content.
After logging into My DSpace, the user can click on Start a New Submission and then select the Collection to which the item is to be added from a drop down list. The full list of Collections are displayed in alphabetical order. We found it necessary to give all of our collections different names to ensure that there was no confusion as to which Community it belonged to. It is also possible to start from the Collection and to use the ‘Submit to this collection’ button.
Figure 11: Submission to a DSpace Collection
Once a Collection has been selected the user is presented with some initial choices about the item:
Figure 12: DSpace item choices
The DSpace submission process has a ‘sausage bar‘ progress indicator to indicate where you are in the seven step submission process. The section you are in, in the case below for verifying submission, is highlighted in red.
Figure 13: DSpace deposit bar
The Describe sections are similar to the Bibliographic Details entry page in EPrints. There are only three mandatory fields in the default DSpace: title, date and language. Like EPrints, you can also save the work in progress. It is also possible to move back to earlier submission stages by clicking on a previously completed stage in the progress bar. At the Upload stage, rather than selecting a file type and then uploading it, in the way in which EPrints does, DSpace compares the file type to its bitstream registry and then assigns a file type to it. Additional file types can be added to this registry through the Administration section of DSpace.
It is also possible to provide some descriptive text about the file and this is particularly useful for items with multiple files.
Figure 14: DSpace items
The final stage of the DSpace submission process is the click-thru licence.
DSpace comes with an example MIT licence for informational purposes — this will need to be customised. A copy of the licence is held with the record and can be viewed by the Administrator.
License granted by William Nixon (w.nixon@lib.gla.ac.uk) on 2003-06-18T08:56:16Z (GMT)
The workflow rules which are in place for the Collection will determine if the item is to be made immediately available or if it will go into the ‘Pool‘ - the DSpace equivalent of the EPrints Submission Buffer.
My Dspace and (My)EPrints
Both EPrints and DSpace have a ‘MyRepository‘ feature. This provides a Web interface for registered users to submit content, view items which they have deposited and return to any content which is already in progress.
GNU EPrints
Within EPrints there is a User Area Homepage where users can return to any documents they are currently submitting, can view submissions they have already made, or any which are pending in the submission buffer.
Figure 15: EPrints User Area
My DSpace
In DSpace, users who have a role in the workflow of a collection can also see any tasks which are in the ‘Pool‘ for them to take. They are also notified by e-mail when a new task has been assigned to them.
Figure 16: My DSpace
Administration
The administration of the services is handled through a Web interface. This enables the Administrator to manage the registered users (Users or E-People), approve/delete items and to create additional communities or subject headings.
Administration of GNU EPrints
In EPrints, it is possible to manage user details as well as the content of the service. Content may be edited directly or moved into the submission buffer and out of the main archive before any work can be done on it. There is also a system status option for the Administrator account. This lists the number of items in the service, the current release and the amount of space currently in use. It is also possible in version 2.x of EPrints to add or delete entries from the subject tree through the web interface.
Figure 17: EPrints - Subject Editor
Administration of DSpace
DSpace has a well developed Web user interface for the administration of the service and provides access to the range of areas which an administrator may want to update. The left-hand column indicates the range of areas which may be administered.
Figure 18: DSpace Administration
The Communities and Collections model in DSpace means that it is possible to be very granular with the access to the content of collections. Within EPrints access control can be applied to individual items. Different workflows can also be put in place as well as different policies for access to individual collections. These are managed though the use of Groups and Policies to control access to Collections and the items which they contain. The Adminster Authorization Policies tool provides a powerful range of options for managing access to DSpace content.
Figure 19: DSpace Policies Tool
DSpace provides system status information via e-mail. These are comprehensive reports providing similar as well as detailed information about user logins, logs of searches in the service and the number of times items have been viewed.
Further experiences
There remains a range of work to be done at Glasgow with EPrints and DSpace but the services are now in place and no longer ‘vapourware‘. GNU EPrints and DSpace are richly featured products which are still in the early days of their development. They are both in ongoing development by their own developers as well as by active communities of users.
Ongoing areas which we will continue to investigate include but are not limited to:
- Import/Export of bibliographic details and content
- Scalability of content
- OAI-PMH implementation
- Implementation of a handle server to manage persistent urls
Conclusion
This article set out to provide a flavour of two of the software options available for building institutional repositories. They have much in common and the choice of which, or both, or neither [20], will hinge on a range of local factors. It is not a question of which software is better but rather which is appropriate for the institutional services which you are building, their purpose and the content. Will it be to free research papers or is it to manage and preserve digital content, or both?
The choice of software is only one component in a larger collection of issues for the implementation of an institutional repositories service. There is a range of policy decisions which must be made and from that will flow the decisions on assets, advocacy, access and audience.
At the University of Glasgow we see GNU EPrints and DSpace as complementary products which have enabled us to take a twin-track approach to our advocacy work in gathering different institutional assets which present different challenges. Ultimately it is the cultural change and advocacy work which will ensure that these services have content and do not languish empty and unfulfilled. Experience has shown us that it is not enough to merely build such services, the real challenge is to gather the content but that is another article.
Acknowledgements
My thanks to my DAEDALUS colleagues, in particular Stephen Gallacher, Lesley Drysdale and Morag Mackie whose work, comments and assistance have been invaluable.
(Editor’s note: Readers may also be interested to read DSpace vs. ETD-db: Choosing software to manage electronic theses and dissertations by Richard Jones in issue 38).
References
- DAEDALUS Project http://www.lib.gla.ac.uk/daedalus/index.html
- The JISC FAIR Programme http://www.jisc.ac.uk/index.cfm?name=programme_fair
- The SHERPA Project http://www.sherpa.ac.uk/
- Digital Preservation Coalition Forum, 24 June 2003. http://www.dpconline.org/graphics/events/24603dpcforum.html
- Nixon, William J, “DAEDALUS: Freeing Scholarly Communication at the University of Glasgow”, Ariadne 34, December 2002/January 2003 http://www.ariadne.ac.uk/issue34/nixon/
- Jones, Richard, ‘DSpace and ETD-db Comparative Evaluation‘, http://www.thesesalive.ac.uk/arch_reports.shtml
- LU:Research http://lu-research.lub.lu.se/
- EPrints.org http://software.eprints.org/
- TARDis Project http://tardis.eprints.org/
- EPrints GNU License http://software.eprints.org/gnu.php
- DSpace FAQ: What is DSpace? http://dspace.org/faqs/dspace.html#what
- BSD License http://www.opensource.org/licenses/bsd-license.php
- DSpace@Cambridge http://www.lib.cam.ac.uk/dspace/
- DSpace Federation Web site http://dspace.org/
- Dspace-general list: Preview of next release of DSpace
http://mailman.mit.edu/pipermail/dspace-general/2003-September/000006.html - ‘DSpace for Dummies‘ http://sunsite.utk.edu/diglib/dspace/
- DSpace Tech List http://lists.sourceforge.net/lists/listinfo/dspace-tech
- EPrints 2.2 Documentation - How-To Guides http://software.eprints.org/docs/php/howto.php
- Theses Alive! Project, DSpace Add-on for Theses http://www.thesesalive.ac.uk/dsp_home.shtml
- There is other software such as CDSware: http://cdsware.cern.ch/
Author Details
William J Nixon
William is the Deputy Head of IT Services, Glasgow University Library and Administrator of the Glasgow ePrints Service. He is also the Project Manager: Service Development for DAEDALUS (University of Glasgow)
Email: w.j.nixon@lib.gla.ac.uk
Web site: http://www.gla.ac.uk/daedalus
Article Title: “DAEDALUS: Initial experiences with EPrints and DSpace at the University of Glasgow”
Author: William Nixon
Publication Date: 30-October-2003
Publication: Ariadne Issue 37
Originating URL: http://www.ariadne.ac.uk/issue37/nixon/