Web Magazine for Information Professionals

Take a Peek Beneath the EPrints V3 Wrappers

With v3 officially launched at the Open Repositories Conference in San Antonio last week, William Nixon and Peter Millington report on the EPrints 3 pre-launch briefing in London, 8 December 2006.

The EPrints 3 unwrapped event at the Congress Centre in London was billed as ‘an early Christmas present’ [1] and was an opportunity for the EPrints community to enjoy a preview of EPrints 3 (hereafter referred to as EP3) scheduled for release at the end of January 2007. The official launch was held at the Open Repositories Conference [2] in San Antonio, Texas on the 24th of January and the software is now in final release.

The London meeting generated a lot of interest from the delegates, a lot of interaction with the EP3 team and a lot of discussion about new features. At times there was so much interest and discussion that the timetable occasionally looked in danger of slipping; however that just underscored the interest and the comments from the floor. It was also an opportunity for Chris Gutteridge, lead developer on EP3, to respond to questions, to make suggestions and to get new ideas in both the morning and afternoon sessions. The morning session, led by Les Carr, Eprints Technical Director, was a demonstration and a walkthrough of the software. Jessie Hey then led a discussion about the features and activities which a Librarian/Administrator would like to have in EP3.

The afternoon session, led by Timothy Miles-Board, provided a technical overview of the software and covered installation, upgrades and the configuration/use of new features such as plug-ins, flexible workflows and autocompletion. Les Carr underlined the fact that EP3 is more than just an upgrade; it is whole new piece of software. EP3 will bring new features to the repository party: some minor, such as easily re-sorting search results, and some major, for example autocompletion, the architecture and flexible workflows. For existing users, especially those who have heavily configured their installation, upgrading to EP3 will be challenging - but then nothing worth doing is rarely otherwise.

An Introduction to EPrints 3

Leslie Carr, EPrints Technical Director, University of Southampton

The EP3 event promised ‘Christmas magic’ and was opened by a very skilful conjuror who used such commonplace items as a newspaper, torn up and then magically reassembled to draw parallels with the search functionality of the new software.

EP3 though is no parlour trick and it looks like an exciting, flexible and feature-rich release which helps to address some of the broader challenges faced by those of us populating our institutional repositories. These challenges such as metadata quality, ease of self-deposition and the use of embargo options for content not immediately available reflect the ‘on the ground’ reality for many repositories and institutions. EP3 appears to be evolving in line with many of the needs of its original community base.

Les began with the history of EPrints, from its initial development back in 1999 with funding from JISC. There are now over 210 repositories using the EPrints software. He gave an update on the EPrints team and some background on the drivers for the development of EP3 including more flexibility and more power for users and administrators.

He provided a live walkthrough and demonstration of the new EP3 features and enhancements. An EP3 demonstration repository [3] is live on the EPrints Web site. Although at first glance it looks similar to EP2 (and familiarity was an element of the redesign), that first glance is deceptive. There is also new Web site in place for the EP3 software and related files, plug-ins and add-ons [4]. Ironically, at the time of the meeting that site was running EP2. This site will also host plug-ins and other add-ons for the software.

Range of New Features

So what is new and would you want it? Should you upgrade to EP3, and if so what are the implications? The upgrade issues were discussed in the afternoon session. An ‘at-a-glance’ view of the new features in EP3 include:

Key features and Quicktime demos are now available [5].

Searching or browsing an EP3 repository appears to be much the same as previous versions, although there are some added features including the opportunity to sort results lists by author, title, year etc without having to re-run the search. An RSS feed can also be created based on particular searches. The ‘Latest Additions’ page displays the most recently deposited items, with the added option of the list being available as a news feed in Atom, RSS 1.0 and RSS 2.0 formats.

When selecting a deposited item from a list, EP3 now shows all the associated files as a collection of thumbnails, including the first page of a PDF file. Hovering over the thumbnail brings up a larger version of the image. This felt like a ‘cool’-feature-looking-for-an-application that could soon become irritating - like animated GIF’s and Flash movies. Administrators might therefore want to switch it off. On the other hand, there are no doubt document types and applications where this feature would be extremely useful - e.g. multimedia materials - although these could perhaps be implemented in a different form, such as the virtual light box for collections of images, as requested by one of the delegates.

The file list also includes files that are not publicly available, typically because of an embargo. However, in such cases there is a ‘Request a copy’ button to email the depositor. All readers have to do is enter their email address and the rest is automatic. When depositors receive the resultant message, they need only click on buttons either to send the requested file, or to send a polite rejection. This feature is intended to forestall concerns about depositing items before their embargo period has expired, or where an author wishes to retain control over the item’s distribution (hardly within the spirit of the OAI). The chances are that this is likely to achieve its aim.

Importing and Exporting Data

There are numerous format options for exporting search results including Dublin Core and METS as well as bibliographic software formats like Reference Manager. These are implemented using plug-ins described as ‘kind of cool’, written by Southampton, but to which other users can also contribute.

Another ‘cool’ feature is latitude and longitude fields which can be used to export data to Google Maps and look like fun to play with. In the context of scholarly publication, the potential uses still need thinking through. The demonstration showed locations on a Google Map using co-ordinates added into some sample records, but this was hardly a serious application. There may be scope for using the feature to show the location of field research sites, assuming the depositors/mediators are prepared to provide the necessary metadata. It is an indicator though of the way in which EP3 is evolving to provide hooks into other services and to provide new opportunities to work with the material deposited.

In summary, there have been some interesting improvements to the search and browse interfaces, but they are, as yet, of unproven usefulness.

Deposit and Workflow

There have been major changes to the deposit process, in order to make it more user-friendly and to encourage further take-up of self-deposition. This is one of the key changes in EP3 and the one which could potentially have the greatest impact on those institutions which are using a mediated deposit model. When implemented in conjunction with the autocompletion feature, EP3 provides an easier interface for self-depositors.

The first major change is the introduction of a set of tabs for the various deposit stages:

Type -> Upload -> Details -> Subjects -> Deposit

These feel similar to the ‘sausage bar’ used in DSpace’s deposit process . The tabs are effective because they make it clear from the outset what stages are involved. In principle they also show depositors where they are in the process, although the stages do not necessarily have to be completedin the order given, and repositories can customise the sequence. There is error-checking for obligatory data before final deposition.

Users no longer need to specify the file type they are depositing before they upload it, the EP3 software will now recognise it.

The tabs also mean that if you need to go back to make amendments, you no longer need to navigate sequentially through all the intervening pages to return to the section you need; you can now skip directly to the required section. The interface is also “Back button-friendly”, a feature of which Chris Gutteridge was very proud and one which could minimise user confusion and frustration. Further details on how to configure the workflow and the deposit process were covered in the afternoon’s technical session.

Autocompletion

There was much play made of various fields having autocompletion. The default ones are journal, author and ISSN. Autocompletion uses JavaScript to monitor what is being typed and queries the relevant EPrints MySQL tables. This means that as more data is added to the archive, the more useful autocompletion becomes. In theory, the autocompletion features could be made to query an external database (such as Zetoc or SHERPA/RoMEO), but network connections are likely to be too slow for this to be effective. One exception might be using LDAP on a local network to validate ‘creators’.

The autocompletion for journals in the demonstration archive was linked to Southampton’s version of the RoMEO database. Matched journals were displayed along with some RoMEO data and Southampton’s version of the RoMEO colour codes.

Another approach could be to create pre-populated local databases of researchers and staff, journals, etc., from such sources as the institution’s publications database, so that they can then be used for look-up and validation. In the case of journals, there is an opportunity for external databases to offer a facility for checking titles in bulk and returning appropriate data for storage locally. This could be done through an API and/or something new such as a Web form where a list of titles could be copy-and pasted for processing. Such a facility would also have other uses.

The autocompletion feature, for authors’ names in particular, raises an interesting authority issue. Should the repository use the authority name for an author or his or her name as cited in the published paper or research output? Perhaps there is scope for a separate author authority field? Autocompletion can also be combined with conditional workflows to provide a customised self-deposit process.

To select a subject or academic unit, the user is no longer presented with one massive hierarchical list. Instead, only the top levels in the hierarchy are displayed, which can be expanded/contracted to show the lower/higher levels. This approach is similar to the way folders can be expanded and contracted in Windows Explorer. It can also be applied to other EPrints pages where little-used fields can be collapsed out of the way until needed. What is more, these views can be made conditional. So, for instance, for a depositor from the History Department, the history subject classes could be automatically expanded to their full depth while keeping the other subjects in a collapsed state.

Poor usability is one of the barriers that deters authors from self-archiving their publications. The suite of new deposit, workflow and autocompletion features in EPrints 3 have gone a long way in making the deposit process much more user-friendly and intuitive. It will be interesting to see over the next year if repository administrators can exploit these new features to encourage more self-deposit into their repositories.

There still remains the significant barrier of having to create PDF or XML versions of articles, but this is outside the remit of EPrints 3. A plug-in anyone?

Requests for Features

The morning session also provided an opportunity to ask questions and to make suggestions for future developments and enhancements. These included the addition of a third choice, “unknown” for the refereed field, an option for RefWorks as a format for export and a script for the global updating of authors in the name authority file.

EP3 will also collect and store statistics (including Apache logs) and the team plans to introduce ways to provide access and views to these statistics in future releases. Although it will not be possible to view statistics via EP3 in the initial release, the software will start collecting data from the moment it is installed and up and running. The EPrints team will also be looking at COUNTER [6]. It is anticipated that these additional statistics service will be available in 1st Quarter of 2007.

We also raised the idea of an e-mail alert when an embargo is about to be lifted, particularly for theses - which would provide a warning that a thesis is soon to be released and should be reviewed. However, there are situations where we would not want to have embargoes automatically lifted, but would want an alert about them.

What Would a Librarian/Administrator Want to Be Able to Do?

Jessie Hey, University of Southampton

Jessie Hey provided a more high-level EP3 view looking at the activities which a Librarian/Administrator would like to do and which could be made easier. Jessie set the scene for this short session using her experiences at Southampton. There was some further workflow discussion and also the suggestion for de-duplicating and checking. This will not be in the final release of EP3 but the software team recognised it as an important feature. Workflows configuration was also discussed, in particular, how complex or easy it was to set up based on different scenarios.

A full list of the various requests [7] and an opportunity to submit additional ones is available via the EPrints Web site.

EP3 Technical Overview

The afternoon session was a technical overview of EP3 by Tim Miles-Board and included installation and upgrades; configuring the deposit process, and working with plug-ins, as well as details on the XML format and the improved indexer. This session, like the morning’s was interactive and Chris Gutteridge, although not leading it, was in the audience responding to a wide range of queries.

Installation and Upgrades

On the face of it, EP3 continues the reputation of its predecessors in being easy to install, with an administration tool called epadmin and a wiki that walks you through the process step by step - ideal for new repositories.

The EP3 team did note though that this is a very significant upgrade and migration from EP2 to EP3 should not be attempted as a straight upgrade. Instead they suggest that EP3 is installed on same server as your current EPrints service but in a different place. There is a migration tool which when run will copy all of the data, fields and metadata into EP3. This though is likely to be just the start of the upgrade process, especially if there has been a lot of customisation.

We concur that things will not be simple if a repository is being upgraded from an earlier version. Upgrades and post-installation modifications may be difficult due to the apparent lack of maintenance utilities and poor advice on customisation. We have worries that the answer to many of the customisation-related questions from the audience was “just delete it from the config/xml file”. We would advise that for maintainability, settings should be commented out, not deleted. Commenting out means that the setting can be reinstated at a later date simply by un-commenting it. Otherwise (as we have found to our cost), the person doing the modifications may have no clue as to what needs re-inserting, where, and in what format.

More generally, we got the feeling that there was too much reliance on manual editing of configuration files. No doubt some manual editing may be required, but this should be the exception. Many things could be implemented using an installation and configuration tool - if one existed - with radio buttons, check boxes, drop-down pick lists and the like for selecting settings such as the repository’s language, permitted document types, etc.

The EP3 final release is now available and can be downloaded from http://files.eprints.org. The EP3 team recommends Red Hat Linux as the Unix platform but EP3, like its predecessors, will run on various flavours of Unix. It will also continue to require the core software of Apache 2.0, mod_perl, perl, mySQL. The EP3 team also suggested the key activities which should be done once EP3 is up and running. Although these should be done already, sometimes it doesn’t hurt to repeat their importance: keep the operating system patched and updated, keep EPrints and its plug-ins up to date and make backups. The backups issue was one about which Chris Gutteridge felt particularly passionate, and he advocated not only the need for backups but also regular checks to ensure that they will restore the system.

The remainder of the technical session focused on a number of new features and gave a broad brush insight into how they could be implemented.

The deposit process can now be configured to provide different workflows but these should still feel familiar to users. These can be defined using XML and defined into stages. It is now much easier to group related fields together e.g. publication data such as ISSN, journal title, publisher; the file upload option can now be done at any stage - and EP3 will recognise the file type. Fields or groups can be collapsed for a more streamlined look (this uses JavaScript). Text can also be inserted into the workflow to provide further help, guidance and explanation.

For Systems Administrators, EP3 will now provide an alert if the indexing has stopped rather than relying on administrators to keep an eye on it, or worse, to be alerted by a user.

There will continue to be a range of support and help options available for EP3. EP3 installation information is available on the ePrints wiki.

This support and the documentation available will continue to develop through each subsequent release of the software.

Extending EP3 functionality with Plug-ins

The technical overview also provided more details on the use of plug-ins to extend EP3’s functionality. Plug-ins can be used for:

One of the key design goals for the EP3 team was that plug-ins should be easy to write and will require minimal coding. A couple of slides then followed, looking at the coding for a range of plug-ins: import and export; screen and input component e.g. draw a molecule in an applet. Import plug-ins can be harder to write but there are many existing libraries for plug-ins available.

The latest updates and plug-ins will be available from the EPrints Web site and these will also include subject trees, translations and themes.

Conclusions

All in all, EP3 looks like a significant improvement over the earlier versions and a significant milestone in the journey towards the ideal repository software. EP3 addresses real issues for repository managers such as controlling quality, encouraging the take-up of self-deposit and embedding the repository in the broader institutional context. The deposit process in particular has been made more usable and user-friendly, thus removing one of the deterrents to self-deposition.

The search and browse facilities are fundamentally much the same as before, which some may find disappointing. It remains to be seen which of the new features are really useful and which are just icing on the cake.

While installation appears to be a straightforward process, we have concerns about how easy it will be to migrate from heavily customised earlier versions but we understand that a migration tool is being developed. There are concerns about how easy it will be to make post-installation modifications, should they be required, due to the reliance on manual editing of configuration files. EPrints.org could significantly improve matters by developing a suitable configuration tool for administrators.

EP3 looks like a feature-rich upgrade which builds on the success of the platform to date. While there are concerns about the upgrade issues, the new features and the shift in architecture render the time required to upgrade worthwhile.

EP3.1 is scheduled for release around Easter 2007 and once the initial upgrade to 3.x has been done, we would anticipate that subsequent 3.x upgrades should prove more straightforward.

References

  1. EPrints v3 Briefing http://www.eprints.org/software/v3briefing.php
  2. Open Repositories Conference, San Antonio, Texas, USA http://openrepositories.org/
  3. EPrints v3 Demonstration site http://demoprints3.eprints.org/
  4. EPrints filestore http://files.eprints.org/
  5. Introducing EPrints 3 http://www.eprints.org/software/v3/
  6. COUNTER http://www.projectcounter.org/
  7. EP3 Requirements http://www.eprints.org/software/v3/v3requirements.php

Author Details

Peter Millington
SHERPA Technical Development Officer
University of Nottingham

Email: peter.millington@nottingham.ac.uk
Web site: http://www.sherpa.ac.uk/

William J. Nixon
Digital Library Development Manager
University of Glasgow

Email: w.j.nixon@lib.gla.ac.uk
Web site: http://www.gla.ac.uk/enlighten/

Return to top