A Recipe for Cream of Science: Special Content Recruitment for Dutch Institutional Repositories
Results
Cream of Science: The Challenge
One of the key challenges of the DARE Programme [1] is to encourage scholars to deposit digital versions of their research output in a university archive (institutional repository) that, in turn, can make this output accessible on the Internet. With this in view, a project called Cream of Science was initiated in the summer of 2004. One of the prime aims of Cream of Science is to unlock top quality content to the scientific community and make it more easily and digitally accessible. Another target is to demonstrate that scholars are willing to deposit their materials in a repository, thereby also increasing the awareness of other scholars. All DARE partners selected ten of their prominent scientists and made their complete publication list, with as much full text available as possible, visible and digitally available through DAREnet. All in all, almost 24,000 full text publications were made accessible by means of the repositories. The project ran from October 2004 to April 2005. The purpose of this article is to share the experience of the journey into Cream, the results and the lessons learned.
Selecting the Cream
Cream of Science aimed to create a selection of prominent Dutch academics, not necessarily the best 150 in the Netherlands. There was no objective criteria to define ‘prominent’, so each DARE partner was free to use whatever selection mechanism it wanted. Most of them used a formal method; selection by the Executive Board and/or using a letter from the Dean to invite academics to be part of Cream. In some cases, the University Librarian undersigned the letter. Follow-up was carried out by (faculty) librarians, using personal contacts to convince academics to join the team. In some cases, Faculty Deans did this work and/or helped to facilitate the process. The response was generally positive, but a small group still held back. Nevertheless, between January and May 207 academics joined Cream. Initially the goal was 150, so we were already witnessing a ‘me-too’ effect. It is interesting to note that nobody was forced to join Cream. Instead, Cream was joined by those who shared the project’s aims and objectives.
Launch and Celebration
The launch of Cream was part of a two-day leadership conference ‘Making the Strategic Case for Institutional Repositories’, organised by SURF, the Coalition for Networked Information (CNI) and the Joint Information Systems Committee (JISC) in the UK. All Cream Scientists were invited, as well as the Executive Boards of all universities and participating organisations and the seventy international repository experts attending the conference. Regrettably, a number of scholars were unable to attend the celebrations, which led to significant disappointment, but all in all a total of 60 scholars were able to attend and enjoy the festive occasion.
The site was launched in style on 10 May 2005 in the Royal Netherlands Academy of Arts and Sciences’ (KNAW) Trippenhuis, by Prof. Dr. Frits van Oostrom, KNAW’s President, who was petitioned by attending fellow ‘Cream of Scientists’ to open the site.
During the launch, six institutions signed the Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities. These were NWO, the University of Amsterdam, Utrecht University, Wageningen University and Research Centre, Leiden University and JISC, SURF’s counterpart in the United Kingdom. KNAW and SURF had signed the Berlin Declaration previously. Signatories intend to encourage their researchers/grant recipients to publish their work according to the principles of the open access paradigm. Since 10 May, two other universities have signed the Declaration (Delft and Groningen) and LIBER also signed the Declaration on 22 September 2005.
In addition to the official launch, all people involved in the project were invited to a surprise celebration. Sharing their experiences, successes and frustrations with colleagues from other universities in a relaxing atmosphere was highly appreciated.
Within just a day of its launch on May 10, DAREnet registered half a million hits. Over the next two days some very unexpected things started to happen: newspapers were publishing short items about Cream, radio reporters were calling for interviews and the Web site was overwhelmed by visitors. Subsequent technical problems caused the Web site to crash several times after this and it took two days to stabilise it.
More importantly, numerous Cream scientists were proud to be part of the project and mentioned it in their weblogs, Web sites etc. Researchers were also enthusiastic about the initiative. A number of them now refer to www.creamofscience.org on their own Web site for a complete overview of their work. They regard ‘Cream of Science’ as a hallmark of quality. Some media reports, particularly those published outside the Netherlands, suggested that the site was a declaration of war by universities against academic publishers. This is not the case. On the contrary, some publishers, such as Springer, are co-operating with the ‘Cream of Science’. DARE’s primary objective is to share knowledge and the site is one means of achieving this aim.
Results in Facts and Figures
Cream of Science consists of:
- 15 institutions
- 207 authors (187 male, 20 female)
- 40,479 records (average of 195 records per author; ranging from 3 to 1,224 records per author, per institution)
- 23,853 documents are full text available (59%, ranging from 19% full text availabiltiy to 96% per institute)
- 25% is copyright obstructed, 15% only metadata available at the moment, 1% is lost. Statistics for DAREnet
Statistics for DAREnet
Month | Unique visitors | Number of visits | Pages | Hits | Bandwidth |
May 2005 | 35,762 | 49,538 | 746,208 | 1,471,585 | 8.74 GB |
June 2005 | 17,054 | 21,582 | 105,008 | 342,388 | 2.53 GB |
July 2005 | 9,462 | 11,676 | 45,990 | 140,967 | 1.07 GB |
as of 10 May 2005
Due to both national and international media attention following the launch on 10 May, the initial number of visits to the Web site was unexpectedly high. Although predictions were about 50,000 searches between May and December 2005, this target was met in just a single day. This meant the site encountered an overload for a while, but the problem was soon resolved. It was, of course, impossible to keep this high score and the summer period certainly contributed to the drop in numbers. Statistics for the following months will show whether or not visitors find DAREnet and Cream of Science interesting enough to visit again.
Project Issues and Topics
Copyright
For most libraries and authors, copyright is an important issue. DARE has no copyright policy, but a copyright approach. A major step is the publication by DARE of an analysis of author - publisher contracts. Much to the surprise of the DARE community, it was clear that all materials published before or during 1997 could be part of a repository without any copyright restrictions.
Publishers had, up to then, not incorporated electronic publication into their contracts. After 1997, they changed their contracts and included the digital version as part of the copyright agreements. Another surprise was that most authors did not really know what exactly they had signed up to, thus giving away their copyright to a commercial publisher.
Most DARE partners decided to stay on the safe side, either because their scientists wanted it that way, or because the library had a ‘strict’ policy. But, on some locations, a majority of scientists decided that they wanted to include all publications in the repository, regardless of copyright.
Many libraries spent a lot of time and effort finding out which articles were copyright-free and which were not, by using the Publishers’ Copyright Listings via the Sherpa Web site [2]. This proved to be a very time-consuming task and in some cases this activity was therefore terminated. For materials published before or during 1997, both PDF’s and metadata were stored in the repositories. For materials published after 1997, which still had copyright restrictions, only metadata was available.
As a result, Cream gives open access to almost 60% of the documents on the publication lists:
- publications whose copyrights have never been transferred like reports, dissertations and conference proceedings are open access;
- publications before or during 1997 where the digital copyright was never transferred. Most of these have been scanned;
- some publications published after 1997 : where authors unconcerned by any potential confrontation with their publisher(s) explicitly direct their inclusion in Cream;
- permission was sought to scan a number of monographs and this was granted by the publisher without any problem; the publishers saw this as promotion of their printed collection;
- articles published in open access journals;
- last but not least: Springer co-operated with the Cream of Science project, allowing worldwide access via DAREnet to the articles of the 207 participating scientists.
The other 40% of Cream materials are not freely accessible. Copyright restrictions after 1997 represent the largest hurdle.
Content
Metadata
One important observation in Cream was that many libraries did not have metadata to cover the publication lists of the Cream scientists. They all had metadata records available for their monographs and journals, but most of them (except for the technical universities) did not hold metadata for the individual articles. Since metadata input was based on the publication lists provided by the authors (i.e. researchers etc., not cataloguers), the quality was sometimes very poor. Some institututions tried to add metadata (using both internal and external sources), but due to a lack of manpower and time this was not always feasible. For this reason, Cream does not always offer a complete metadata record.
Scanning
Less then 20% of materials published before or during 1997 is available in digital form in DARE repositories. So a lot of scanning needed to be done. This was done either by an internal scanning department or outsourced. The outsourcing was centrally organised by DARE and outsourced to Strata.
All DARE partners gathered the articles from their library archives. Articles that were not available were obtained via centrally organised inter-library loan from other libraries. It turned out that 1% of the publications were lost because it was impossible to find a printed or digital version of the original document. This fact clearly shows that repositories serve a valuable purpose.
Strata developed dedicated software to link metadata with scanned object files. A very tricky logistical problem was the integration of the separate workflows - input of metadata, scanning, inter-library loan and the local lending process of volumes that had to be transported from library to Strata and vice versa. Special documentation was written to make this process work, but for most libraries this caused problems and misunderstandings. Quality issues needed to be addressed. Scanned articles that came out of inter-library loan requests were initially of low quality. Everybody realised this was all pioneering work, never attempted before elsewhere in this way. As a consequence, the expression ‘trial and error’ acquired a deeper meaning.
Harvesting
Because Cream of Science is part of DAREnet (read more about this in the next paragraph), sets [3] are used for harvesting.
The ARNO and DSpace sites had to implement a set mechanism because both systems did not support sets, as defined by OAI-PMH. Other systems found other working solutions. By 8 May 2005, just in time, harvesting of the final version was ready and the DAREnet platform was frozen in order to have a stable version for the official opening on 10 May.
Web site
Cream of Science [4] is a part of DAREnet [5]. DAREnet was launched in January 2004. Initially its purpose was to demonstrate the network of the local collections of digital documentation held by all the Dutch universities and several related institutions, presenting them to the user in a consistent form. This also makes it possible to search one or more of the repositories concerned. DAREnet is unique. No other nation in the world offers such easy access to its academic research output in digital form. In the space of a year, DAREnet now serves not only to demonstrate the network but also the usefulness of repositories, permanent storage and open access.
The 2004 version of DAREnet was a demonstrator, not a production platform. With Cream, this needed to change and performance, stability and functionality needed to be improved. DARE programme management decided to start a pilot with the SURFnet Search Engine (which uses the FAST software, also used by Scirus and others) for DAREnet. Later, in January, the pilot having proved itself successful, the new version of DAREnet was implemented, though this parallel activity did occasion additional pressure on the Cream Project.
DAREnet harvests all digitally available material from the local repositories, making it searchable. But it limits the harvest to those objects that are full content available to everyone. Toll-gated objects (e.g. publications held by publishers who only provide access through expensive licences) are not harvested by DAREnet and can only be found in the local repository. This means that the total content of all repositories in the Netherlands inevitably exceeds that of the content that can be found in DAREnet. However, DAREnet guarantees free and open access to all its content for everyone, with no restrictions.
Cream of Science is an exception to this rule of open access. In order to attract maximum interest to Cream of Science it was decided to give access to the complete overview of publications of the 207 selected scientists, therefore including copyright-restricted publications. Due to these copyright restrictions, about 60% (approx. 25,000 items) is full- content available. Of the other 40% only metadata is available.
For all Cream scientists, a personal page was set up containing basic information: photo, affiliation, research field(s) and specialism(s), awards and, if available, there is a link to their personal Web site. Added value is provided by the link to the most recent list of publications available through the repositories.
We have also provided a Project Chronology for readers interested in following the history of the project.
Lessons Learned
Evaluation: The Library’s View
We asked the libraries for their views on the results of Cream. Some general observations can be made although it must be stressed that local conditions vary greatly. One thing all DARE partners experienced was a ‘post-Cream anti-climax’. People had worked really hard to realize the project’s aims and some time to unwind was badly needed. Libraries reported the following benefits from Cream:
- For some, funding has been granted for follow-up projects on Cream to improve quality, add new content and add new academics to the local repository
- For some, temporary or longer term manpower has been added to the library’s workforce
- Great, some or no enthusiasm for DARE and Cream have all been attested by local academics
- Criticism of metadata, scan quality and DAREnet functionalities (too limited and sloppy; limited full-text indexing)
- Increased awareness of DARE, repositories, open access
- Improved relations with faculties
- Improved relations with academics
- Repositories are here to stay!
- Workflow and infrastructure still need a lot of tuning and optimisation
- Exposure (national): all institutional media paid attention to the Cream Project and its results
- Exposure (international): Google Scholar has indexed DAREnet, including Cream.
Because of the pioneering work that needed to be done, many people had to stretch their imagination, their skills, their patience and their commitment to collaborate and improvise. This was not an easy task, for anyone. Despite the time pressures, it must be said that Cream generated a lot of enthusiasm and dedication. It was fascinating to see people grow in terms of their personal development and skills. For them, this was not just a lesson learned but an important experience in their professional lives.
Problems and solutions
The work done by the DARE libraries was pioneering in many ways. Below, we offer the reader a list of the most important problems encountered and the solutions that we have chosen.
Problem | Solution |
Organisational problems | |
Local | |
Selection of Cream scientists | Local decision by library, faculties or university management board |
No manpower | Local: engage temporary manpower; Central: additional funding and creation of support DARE teams |
Central | |
No hard data for planning of necessary capacity | Using best-guess estimates and adjusting during the process |
High-volume contract but low-volume use of support | Use the money for higher quality (greyscale scan instead of bi-tonal), or more support from Strata to solve errors in metadata or downloads into the repository |
Workflow integration of metadata, scan, inter-library loan | No effective solution was found. Daily trouble-shooting was used to remedy this |
Incorrect ideas about copyright | Analysis of author - publisher contractsUsing the special SURF Web site on copyright |
Not 100% Open Access and digital availability | Accepting what is; Asking some publishers about their willingness to co-operate and to give special permission for Cream-publications Experimenting with tweaking [6] of PDFs |
Technical problems and solutions | |
Author name variations | Use of a standard list during metadata input and a formatted list in DAREnet [7] |
Journal title abbreviations and variations | Use of a standard list during metadata input |
Crash after launch due to unforeseen high number of hits | Adding additional server(s) |
Costs
The SURF Foundation is co-ordinator and co-funder of the DARE Programme. Therefore, SURF was also financially involved. The original budget for costs for Cream on the SURF-side of the project was Euro 100,000. Later, this budget was raised to Euro 200,000 because of the additional number of scientists. This budget was used to cover the costs of scanning, metadata input by support teams (labour) and inter-library loan. There are no exact figures on the costs incurred by the DARE partners at the local level.
But there are some estimates. Estimation of the total cost of Cream is Euro 10,000 per scientist. We estimate that Euro 8,000 per scientist were spent by the library and Euro 2,000 by SURF. Of these Euro 2,000, about Euro 1,000 was spent on scanning, metadata input and inter-library loan. All other costs at the SURF level and locally related to staff effort (manpower).
This means that we spent Euro 50 for each Cream publication, Euro 45 on staff effort and Euro 5 on other costs. Compared to the average cost per hardcopy publication in a traditional library process, the costs per Cream publication are certainly not higher. On the contrary, Cream processes had to be invented on the spot and were not streamlined and standardised. We expect to be able to reduce the costs to about Euro 10 per publication once the processes and infrastructure have been improved.
Conclusions
Scientists are willing to deposit their output in repositories when the circumstances are right. Cream resulted in a short-term ‘me-too’ effect and mid-term expansion via new local Cream-like projects. Cream has created much more awareness of repositories in the scientific community, improved relations between scientists and libraries and an important growth in the content of Dutch repositories.
Added value for the scholars involved is the digital availability of as many of their publications as possible, the logistics involved, and the resulting extra exposure.
Cream has demonstrated that repositories are here to stay and that they serve many purposes, one of them being the prevention of publication loss. Although this diverse collection of 207 Dutch leading scholars cannot (yet) be seen as a major source of academic information, the myth that material in institutional repositories is of low quality is now a thing of the past.
The Netherlands are at the forefront of international repository developments and Cream has been an enormous impulse for the open archive community, on a national and international level.
As much as we wanted Cream of Science to be Open Access (being part of DAREnet), offering the scholars their complete publication list (full text available or only metadata) was essential to securing their involvement.
Although challenging in its logistics, a project like Cream of Science can be done. The final product is not perfect at all, but, as in many other examples, the journey into Cream was more important in itself than the resulting Web site.
Average cost per document was Euro 50, which is less than process costs for a hardcopy document in a traditional library setting. Costs can be reduced to Euro 10 per document once processes and infrastructure become more standardised, thereby improving cost-efficiency.
The technical infrastructure needed for a networked national repository system, like that of the Netherlands, needs a lot of improvements and fine-tuning. Existing tools and solutions are still very young and will benefit greatly from being redesigned and/or enhanced.
Cream in itself is not enough to keep the DARE train rolling. New ideas need to be developed to improve awareness among academics and to ‘seduce’ them into loving repositories. The ego factor seems to play an important role, but that’s only on the outside. Libraries really need to go into the faculties, learn to speak the language of the scientific community and co-create the tools that are needed for new, innovative scientific communication.
It is difficult to anticipate the (media) attention such a project will receive. But the more the better. Don’t be too modest. Do anticipate an initial rush on the Web site.
Do’s and Don’ts
- Do involve scientists and reassure them that there are benefits in repositories. Connect with them and talk their language, not yours.
- Do co-ordinate the selection process for scholars carefully, both with executive boards, faculty and with scholars.
- Do use your enthusiasm to keep on track, despite adverse situations. Do allow people to develop, despite potential frustrations.
- Do make a project plan before you start (if you have the time). And discuss the plan with all the people involved to get the right basic data and commitment. But be aware that you will have to improvise. The plan itself is not a guarantee that the project will be successful.
- Do shop around and force suppliers to be competitive. Don’t go for a fixed-price, fixed- volume deal unless you are absolutely sure about your volumes.
- Do communicate a lot (daily) with the people who really do the work and help them as much as you can.
- Do not allow scanning quality to be poor. Demand better quality from inter-library loan suppliers.
- Do accept the fact that repositories and repository software are still very immature and that you will encounter things that do not work yet or not in the way you would want them to work. Be flexible and jot down what needs to be improved later.
- Do celebrate and reward the work by the people who made the end result possible.
- Don’t worry too much about copyright. Before or during 1997 it is not an issue, after 1997 it makes no sense to check each individual publication. Most of the post-1997 publications do have copyright restrictions and you have to make up your mind how you deal with this, whether in a liberal (e.g. tweaking) or stricter fashion.
- Don’t use separate suppliers and/or workflow streams for metadata input, scanning and ingest into the repository. Make the workflow as simple as possible.
- Don’t underestimate the workload on library personnel and don’t ignore current (old) library policies. For example, no head of a lending department allows journal volumes to leave the building.
Future Plans
Life after 10 May 2005
After the initial overload situation, traffic on the Web site slowed down, media attention disappeared and life went back to normal, or so it seemed. Libraries turned their attention to things that had been postponed because of the workload for Cream of Science. A new problem arose: what do we do with new material that Cream scientists now start to offer to the library?
Some libraries are writing project plans for follow-up on Cream. For example, Utrecht University has decided to add 15 scientists to their Cream scientists and Tilburg University has decided to spend additional money to expand the local Cream activities.
Improving Infrastructure
Because of the comments made about infrastructure and workflow in respect of Cream, DARE programme management discussed this with the DARE partners in meetings held in May and June 2005. There were two major themes. The first one was to stabilise and improve the infrastructure and workflow. There is still too much that has to be done manually, systems are not integrated seamlessly, input, processing and output still need a lot of attention; there are quality issues and so forth. The DARE partners stated that the last year of the DARE Programme needs to focus on these improvements. As a result, DARE is already drawing up a proposal for optimisation to be implemented in 2006.
The second theme was continuation. All DARE partners agreed that there needs to be a DARE 2 programme. Cream of Science is a catalyser, but more and longer-term effort is needed to realise a platform for improved scientific communication with repositories as a tool for the scientists. We are currently discussing what approach can help us to create a DARE 2 programme.
Acknowledgements
Many thanks to the project leaders involved who have given their input to this article.
References
- The DARE programme ‘Digital Academic Repositories’ is a joint initiative of Dutch universities to make all their research results digitally accessible in a standardised way. The programme is co-ordinated by the SURF Foundation http://www.surf.nl/
- Sherpa http://www.sherpa.ac.uk/
- Set = a term used in the OAI-PMH (Open Archives Initiative Protocol for Metadata Harvesting) protocol relating to a subset of metadata records in a repository database.
- Cream of Science http://www.darenet.nl/nl/
- DAREnet http://www.creamofscience.org/
- Tweaking: One library decided to experiment with tweaking PDF’s. They took postprints, removed the publisher’s information and layout, reformatted the text into a new layout, while keeping the start and end page numbers the same, inserted university information (e.g. logo) and created a new PDF. The time required was too prohibitive to continue this strategy on a larger scale during the Cream Project. But the experiment has not been terminated and it is expected that a software tool can be written (or may already be available) to make this a more feasible approach.
- Standardisation of author names: The lack of standard forms of Cream author names caused several problems. For a project like Cream these problems can be overcome with temporary solutions. In the long run a more robust approach is necessary. As part of the DARE Programme a DAI (Digital Author Identifier) will be implemented in the first half of 2006. Currently a special project has been started to prepare the DAI infrastructure. The PICA author name thesaurus will be used as a basis.