Web Magazine for Information Professionals

Open Repositories 2009

Adrian Stevenson reports on the four-day annual Open Repositories conference held at Georgia Tech in Atlanta, GA, USA over 18 - 21 May 2009.

I recently attended the annual Open Repositories 2009 Conference [1] in Atlanta, Georgia which hosted 326 delegates from 23 countries. For myself, as the SWORD [2] Project Manager, the event proved to be very worthwhile. My colleague Julie Allinson and I were both able to give a plenary presentation on the first day and a half-day workshop on the final day.

Much of the conference addressed developments surrounding the Fedora, DSpace and EPrints systems that have occurred over the last year. There was also a number of presentations and discussions in respect of the new strategic partnership between the DSpace Foundation and Fedora Commons, including much mention of the new DuraSpace Project [3], a joint endeavour to provide a Web-based service for storing digital content in the cloud. Microsoft was much in evidence at the event as well, launching its new Zentity repository system as well as demonstrating and presenting its range of scholarly tools [4].

The Cutting Edge of SWORD

Adrian Stevenson, UKOLN and Julie Allinson, University of York

Julie began by giving some background to the SWORD2 Project; she explained that SWORD (Simple Web-service Offering Repository Deposit) is concerned with lowering the barriers to deposit and that before SWORD, there was no accepted, standardised means to deposit into repositories. The project team decided that the development of a profile of the Atom Publishing Protocol [5] was the best fit to meet the use cases and requirements. SWORD facilitates remote deposit of items into a repository. This could be from a user's desktop, for instance from word-processing applications. It can also facilitate multiple deposit, for example to a funder's repository, a national subject-based repository or an institutional repository, in one click. The SWORD profile provides extensions and constraints to the Atom Publishing Protocol to meet the needs of the repository community, in particular addressing requirements such as package deposit and mediated deposit, i.e. deposit on behalf of another. The project has developed and provided demonstrator repository installations of Fedora, DSpace, EPrints and Intralibrary which any user can access to test SWORD deposit. The project has also provided a number of demonstrator deposit clients that can be used with the demo repositories or any other SWORD-enabled repositories to test SWORD implementations. I then gave a live demonstration of a number of deposit clients that the SWORD2 Project has developed including a Web page-based client and a Facebook client. The presentation was well received judging by the number of questions, feedback direct to us after the talk and via some positive comments over Twitter [6].

photo (52KB) : Georgia Tech campus

Georgia Tech campus

Connecting Authors and Repositories through SWORD

Pablo Fernicola, Microsoft Scholarly Communications

We were followed by Pablo Fernicola from Microsoft. Pablo talked about and demonstrated the 'Authoring Add-in for Word 2007' [7] that Microsoft has developed which incorporates SWORD and which means it's now possible for authors to deposit articles directly from Word. The SWORD-related information is incorporated into template files, so all the author has to do is click a button. Moreover, through the add-in, author metadata can be gathered in a largely automated fashion, reducing duplication in data entry and author irritation.

I think it's fair to say that SWORD was a very popular topic throughout the conference, getting numerous name checks in presentations. Pablo as well as Alex Wade and Lee Dirks, also of Microsoft, have been doing a great job of marketing SWORD in their presentations about the Word Authoring Add-in and the new SWORD-enabled Zentity repository system [8] that was officially launched at the event.

Strategies for Innovation and Sustainability: Insights from Leaders of Open Source Repository Organisations

DSpace Foundation and Fedora Commons

Michele Kimpton, DSpace Foundation and Sandy Payette, Fedora Commons

Sandy and Michele talked about the new Duraspace initiative and how it is relevant to the broader issue of sustainability. There was talk about how Duraspace can add value to the general strategy of being mission driven. A key goal of Duraspace is to support the two communities and seek synergies to achieve a critical mass for innovation. The benefits of this approach are efficiency and widening access to funding opportunities, partly by reducing the amount of competition. There was acknowledgement that there is some degree of risk of over-committing given the limited resources available.

Eprints

Les Carr, University of Southampton

Les began with a fairly detailed history of Eprints [9] in order to provide some historical context to the issue of sustainability. According to Les, the bottom line was that sustainability came down to funding in the case of Eprints. Resources were needed for innovation, and were typically being provided by funding bodies such as JISC [10] as well as some commercial customers. The main resources doing the innovation in this instance were the existing Eprints team. Les saw the benefit of the Eprints approach as one of achieving coherent developments with effective innovation, whilst striking a good balance of providing many free as well as some paying services. Some disadvantages were low commitment from platform adopters, little inter-adopter contact and reduced visibility in the wider community.

The Microsoft Business Model

Lee Dirks, Microsoft Research

Lee explained some of the thinking behind the current Microsoft business model. The idea centred on the transformation of scholarly communication, with the salient points being that interoperability was essential, and that using community standards such as SWORD was 'a must'. He also drew attention to the launch of the new Microsoft Research-Outputs Repository system, Zentity v1.0. Lee stated that the Microsoft Scholarly Research Group is not actually a new product group, rather one that is now taking an 'open-edge' approach, with Zentity representing one component of wider efforts to participate in the scholarly lifecycle.

Keynote Address: Locks and Gears: Digital Repositories and the Digital Commons

John Wilbank, Vice-President of Science at Creative Commons

John Wilbanks was next up, repeating the frequently cited idea that mandates are necessary to solve the deposit motivation problem. In time there would be need for fewer sticks such as these, as the benefits of repositories become more evident. He highlighted issues such as one mentioned in the 'Caveat Lector' blog [11], that it took an hour to change a link in DSPace. Clearly this is a problem. He also mentioned that there has still not been enough digitisation of materials.

Talking about the Web, he noted that it gives us the ability to move knowledge around, and that the existing scenario of articles and citations by contrast are a 'tin can' network, by which he meant by it is a relatively poor and inefficient system for information sharing in comparison.. He highlighted the OAI-ORE specification [12] as something that promises to take us from the tin can to the Internet. A key aspect of the take-up of the Web related to the ability to make and pass round copies of resources. The old way was to make Web pages by copying the HTML and then editing the content. The content of the these web pages was out of copyright of course, but this was the way information was shared.

John went on to say that as useful as the Web has been for sharing data, it needed an upgrade. This upgrade to the Web is the Semantic Web [13] that allows computers to be able to understand things. We also have to teach the Web how different words can mean the same thing using ontologies. He mentioned that you can the use same method of copying code and changing a few details to apply to other subject areas, a bit like the old days of HTML. John made a stand against the creation of further standards. The 5% potential advantage they might represent was just not worth the effort in the long run. We should collaborate, not compete, and he cited Duraspace as a good example of such an approach.

Naming, Branding and Promoting the Institutional Repository: A Social Marketing Approach from the Canadian Perspective

Wayne Johnston, University of Guelph Library

Wayne explained how social marketing uses a commercial marketing approach that is customer-oriented, insight-driven, and theory and exchange-based. He outlined the 'four Ps': the product; the price; the place, i.e. the how and where, or the convenience factor, and; the promotion, i.e. what to say and which channel to use. An example given was a naming contest as a possible way to develop a brand name. It is a way to engage the target audience, although the resulting name may not be suitable, and it can be very time consuming. There is also the think-tank approach with people who have a vested interest. Any name needs to be tested to check what comes to mind and whether it is memorable. He mentioned that some naming firms charged as much as $75,000 for a name which indicated how important it was considered to be. Wayne then went on to talk about branding. A brand was important as it captured and carried a complex message in simple form and could assume a life of its own.

Secrets of Success: Identifying Success Factors in Institutional Repositories

Elizabeth Yakel, University of Michigan

Elizabeth outlined some case studies which illustrated that institutional repositories tended to be successful when they achieved broad and voluntary participation. It was important for sustainability that the repository be absorbed into the larger institutional planning although in reality this very rarely happened. We needed to take a longer view to assess impact as institutional repositories were still quite new. The case studies were developed by holding on-site interviews. The impact areas considered were the content and the technology such as building technical competence and experience with new and different technologies. The role of the library was considered, given that institutional repositories had brought the role of the library into focus. The importance of the longer-term view was stressed along with the proposition that we will look back in five years and be surprised at the impact.

Adding OAI-ORE Support to Repository Platforms

Alexey Maslov, Texas Digital Library

Alexey reported on the Texas Digital Library's experiences of adding harvesting support to their DSpace repository using both OAI-PMH [14] and the OAI-ORE specification. He commented that, although OAI-PMH was useful, it had no dissemination facility. A possible solution was to use packaging formats to disseminate over OAI-PMH and then unpack, but this approach was subject to erroneous interpretation and ambiguities at the PMH layer. OAI-ORE however was both specialised for the purpose and simple. There were three aspects to mapping DSpace to ORE: aggregations, resources and resource maps (ReMs) [15]. These three aspects were mapped onto the DSpace software model, thereby establishing a mapping between the ORE data model and their own data model. They used DSpace 'bundles' for storage and went with the Atom XML for the ORE serialisation [16].

Crosswalks were implemented from ORE ReMs to DSpace items and the harvester was implemented at DSpace collection level. The implementation allowed harvesting of the metadata and the file bitstreams if the harvester supported ORE, or it was possible to harvest just the metadata. The result was that harvesting had been achieved using both OAI-PMH and OAI-ORE.

Promoting Your Research with Citeline: An Advanced Bibliographic Citation Publishing Service

Sean Thomas, MIT

Sean initially described the problematic aspects of promoting research. Currently there was no centralised citation process. It was disorganised and bitty, and lacked standardisation. For example, dates varied, and widely differing citation styles were used. This made things difficult for users. Neither were page headers and HTML used consistently. In addition, there were multiple exposure channels with publication lists sprinkled amongst many sites. Many were not timely or persistent and had out of date pages and broken links. There were many inefficiencies and consequently the resources could not be described as very useful.

Citeline [17] aimed to provide a centralised service for citation data publishing. It removed technical barriers and provided an out-of-the-box presentation. Citeline offered rich features and simple workflows, supporting once-only updating, and with the information then being re-purposed onto any number of different Web sites. Citeline also allowed data export to other systems employing semantic Web technologies, such as RDF/XML, by using the SIMILE toolkit [18].

Sean then demoed the Citeline. It looked impressive and seemed quick as well.

SWORD Futures Workshop

photo (55KB) : SWORD Futures Workshop

SWORD Futures Workshop

Julie Allinson, University of York and Adrian Stevenson, UKOLN

On the final day Julie Allinson and I led the half-day SWORD Futures workshop. We were delighted to see a packed house for the show-and-tell section.

Pablo Fernicola from Microsoft talked about their use of SWORD. Authors could deposit through Office with a simple one-click interaction with repositories from software that was, for many disciplines, the main authoring tool. Currently, all of the SWORD demonstration repositories [19] work with the Word Authoring plug-in.

Simeon Warner from arXiv talked about their SWORD implementation into arXiv by Thorsten Schwander [20]. In particular Simeon talked through their workflow for 'replace', which has extended SWORD in a purely ATOMPUB way by using PUT to the edit url specified in the ATOM response document.

Julian Cheal of UKOLN demonstrated a prototype desktop repository uploader tool built using Adobe Air [21]. The tool used the Sherpa Romeo API [22] for journal/publisher information and drew upon data from the JISC-funded NAMES Project [23] for author names. It worked in a similar way to the Flickr uploader tool and had the potential to be used as a batch uploader tool for repository managers and administrators.

Glen Robson from the National Library of Wales then talked through the SWORD Fedora 3 plug-in. It worked using a configuration file and a set of file handlers that were passed the content type and mime type. Currently there were four file handlers, for jpeg images (ties to in-built Fedora disseminators), for zip files (unpacks and stores as separate datastreams), METS (retrieves contents) and anything else (simple stores).

The second part of the workshop went into the finer detail of SWORD implementation including discussions on some ongoing issues concerning matters such as packaging and HTTP headers. It was certainly a useful session for the SWORD Project, not only to note further requirements but also to get to meet a number of SWORD implementers face to face.

Poster Session

The poster session 'minute madness' was entertaining with all the speakers explaining their poster to a time-bomb animation backdrop. Mahendra Mahey of UKOLN and John Robertson from CETIS hosted the poster for the UKOLN/CETIS Repositories Research Team contribution. I wasn't involved directly, but it was clear they had a fair bit of interest. I noticed that both the Duraspace and Fedora Twitter feeds gave the poster a mention.

Repo Challenge

The RepoChallenge event sponsored by JISC and Microsoft also attracted a lot of interest. The winner was MentionIt by Tim Donohue of the University of Illinois, with a notable runner-up of FedoraFS by Rebecca Koesar of Emory [24]. The announcement and prize-giving was held at the end of the conference dinner at the Georgia Aquarium.

Conclusion

This was my first Open Repositories and I found it a really useful event in a number of ways. My work on the SWORD Project is clearly heavily focused on the repository community, so it was extremely useful to get a chance to put the message across to such a focused and receptive audience. It was also gratifying and very useful to see how well SWORD has already been adopted and how much of a talking point it appeared to be in this community at the moment. As John Robertson noted on his blog post [25], SWORD really was everywhere this year. The event was also an efficient opportunity to meet many members of the repository community face to face, whether they had a specific interest in SWORD or not. It was clear from many of the talks that there's still a way to go on the repositories road, and recent global events may even have played in part in hindering this progress. There does appear to be very real movement towards the overall goal however, and I hope to see further progress at next year's event, and in the coming years.

References

  1. Open Repositories 2009 Conference https://or09.library.gatech.edu/
  2. The SWORD Project http://www.swordapp.org/
  3. Duraspace http://www.duraspace.org/
  4. Microsoft Scholarly Communications http://www.microsoft.com/scholarlycomm
  5. Atom Publishing Protocol http://www.ietf.org/rfc/rfc5023.txt
  6. SWORD Project twitter feed http://www.twitter.com/swordapp
  7. Microsoft Article Authoring Add-in for Word 2007 http://research.microsoft.com/en-us/projects/authoring/
  8. Microsoft Zentity 1.0 Research-Output Repository Platform http://research.microsoft.com/en-us/downloads/48e60ac1-a95a-4163-a23d-28a914007743/
  9. Eprints http://www.eprints.org/
  10. Joint Information Systems Committee (JISC) http://www.jisc.ac.uk
  11. Caveat Lector Blog http://cavlec.yarinareth.net/
  12. Open Archives Initiative Object Reuse and Exchange http://www.openarchives.org/ore/
  13. Semantic Web http://en.wikipedia.org/wiki/Semantic_Web
  14. Open Archives Initiative Protocol for Metadata Harvesting http://www.openarchives.org/OAI/openarchivesprotocol.html
  15. OAI-ORE Resource Map http://www.openarchives.org/ore/1.0/datamodel#Resource_Map
  16. OAI-ORE Resource Map Implementation in Atom http://www.openarchives.org/ore/1.0/atom.html
  17. Citeline http://citeline.mit.edu/
  18. Simile Project http://simile.mit.edu/
  19. SWORD clients and demonstrators http://www.swordapp.org/sword/demonstrators
  20. arXiv.org SWORD/APP Deposit API http://arxiv.org/help/submit_sword
  21. IE Demonstrator Blog http://blog.iedemonstrator.org/
  22. Sherpa Romeo API http://www.sherpa.ac.uk/romeo/api.html
  23. NAMES Project http://namesproject.wordpress.com/
  24. RepoChallenge Winners http://dev8d.jiscinvolve.org/2009/05/20/repochallenge-winners/
  25. John's JISCCETIS Blog http://blogs.cetis.ac.uk/johnr/2009/06/05/open-repositories-2009/

Author Details

Adrian Stevenson
SWORD Project Manager
UKOLN
University of Bath

Email: a.stevenson@ukoln.ac.uk
Web site: http://www.swordapp.org/ and http://www.ukoln.ac.uk/

Return to top