JISC and SURF International Workshop on Electronic Theses
Doctoral theses contain some of the most current and valuable research produced within universities, but are underused as research resources. Where electronic theses and dissertations (ETDs) are open access, they are used many times more often than paper theses that are available only via inter-library loan. Many universities and other organisations across Europe are now working hard to make ETDs more openly available and useful. In an attempt to co-ordinate this activity, an invitation-only workshop was held at the Vrije Universiteit Amsterdam in January, to see what could be learned from existing examples of best practice and to see how the participants might work together in the future. The workshop was attended by representatives from Belgium, Denmark, Finland, France, Germany, Italy, the Netherlands, Norway, Portugal, Sweden and the UK. A representative from the USA (Johns Hopkins University) also attended to ensure join-up with initiatives there.
National Activities
Prior to the workshop, representatives from the 11 participating European countries submitted a brief description of national ETD activities under a consistent set of headings, and these are summarised here.
1. Are electronic doctoral (PhD) theses being collected digitally and made accessible? In most countries, ETDs are being collected and made available, but in only a minority of countries is this systematic at a national level. For example, in Germany, Die Deutsche Bibliothek collects ETDs from universities, although the arrangement is voluntary pending new legal deposit arrangements. In most countries, the online availability of ETDs is a matter for the university granting the doctorate, so there can be considerable variation between institutions. In Belgium, for example, one university has only just started collecting ETDs, whereas in another it has been mandatory for some years. There are a number of national projects ongoing to develop better availability, such as EThOS [1] in the UK, and 'Promise of Science' in the Netherlands.
2. How many per year? What percentage of the total of ETDs? As you might expect from the answer above, this varies dramatically between, and within, countries. In Sweden over 4500 ETDs are available (mostly via the DiVA [2] service), in France the figure is around 6000 (via the TEL [3] and Cyberdocs [4] projects), in the Netherlands around 6,600 ETDs are accessible via the DAREnet [5] and in Germany the figure is 40,000 via DissOnline [6]. In Finland, about 60% of the 1430 doctorates awarded in 2005 resulted in an ETD (the equivalent percentage in Germany for 2004 was around 30%, and for France in 2005 it was 10-20%), whereas in the UK there are perhaps only a few hundred ETDs available.
3. Is anyone identifying and resolving legal (eg copyright) or plagiarism issues? Responsibility for warranting that ETDs can legally be shared is generally devolved to the first point in the chain, either the library or the author. Some good practice exists, for example in Finland there is a contract for publishing the thesis in the university archive, which mentions the requirement on the author to clear third party rights. Plagiarism issues are rarely addressed, the notable exception being in Sweden.
4. Is anyone preserving ETDs? At a basic level, libraries / institutions holding ETDs have disk backup and similar procedures but, where long-term preservation is addressed, this is mostly by the national library, either using dedicated systems such as DIAS [7] in Germany or ENCompass [8] in Finland, or by Web archiving techniques, as in Norway. Many aspects of best practice are in place in Sweden, as a result of the SVEP [9] and DiVA [2] projects implementing elements of the OAIS (Open Archival Information System) model at both local and national levels. In the Netherlands e-theses form an integral part of the DARE [5] network of Institutional Repositories. They are therefore included in the agreement with the Royal Library of the Netherlands for preservation of the publications in their e-Depot [10].
5. Is anyone linking ETDs with related material on which they are based (including data, statistics, multimedia, etc)? The overwhelming impression is that, while practice in this area is patchy or non-existent at present, most countries expect it to assume much greater importance in future. Where repositories link ETDs to supporting files, notably in Germany, this is usually by wrapping those files into a single package, and linking from the primary ETD file. A project in Sweden, in collaboration with Johns Hopkins University, intends to build an infrastructure that will support persistent links between textual (eg ETD) repositories and data archives.
6. How are countries implementing syntactic interoperability (eg, simple / advanced cross-search, use of OAI-PMH harvesting protocol)? National-level interoperability obviously depends on where the ETDs are held. Where they are held centrally, typically by the national library, then discovery services are available via the online catalogue, services based on interoperability standards such as the OAI-PMH (Open Archives Initiative-Protocol for Metadata Harvesting), Z39.50 and SRU (Search/Retrieve via URL), full text indexing by Google, and so on. Where they are held locally, the situation is more patchy, and there is a need for a common metadata profile to support the harvesting model. Such profiles are developed in the UK, Germany, Finland and Sweden.
7. How are countries implementing semantic interoperability (eg, access via disciplines / subjects, multilingual access)? Where the national library takes a leading role, they have often agreed on a single classification scheme, notably Germany (Dewey Decimal Classification (DDC)), Portugal (Universal Decimal Classification), and Sweden (a national scheme). Elsewhere, subject classification is much less likely, and variable where implemented locally.
Multilingual access seems to mean mostly the native language plus English, with abstracts and interfaces often available in both. The use of English for the thesis itself seems to be growing, in Portugal, Finland and The Netherlands for example.
The DiVA Project [2] is worth a special mention as a rare example addressing semantic markup of the ETD document itself, using an XML schema. Similar work has taken place in France in the Cybertheses Project [11].
8. What are the business models (financial sustainability - who pays?) Where the ETD service is run by the national library, this may be as a result of theses being included in the scope of national legal deposit arrangements (Germany, Norway, Portugal), although such arrangements do not always provide the business model for a comprehensive thesis service (Italy). Otherwise, the responsibility and business model may be owned by the university (Finland, Norway, Belgium, The Netherlands, France) or in co-operation between university library and the national library (Sweden).
9. What are the organisational / roles and responsibilities (who does what?) The business models are a reflection of the organisational roles. These roles, and the associated workflows, depend on the national legal deposit arrangements, on academic custom and practice, and on whether or not the thesis is commercially published. There are roles for the author, the faculty, the university library, the university administration and, often, the national library.
10. Who manages legal issues (copyright/licences, liability, etc)? Again, this varies widely. Some theses are collections of published articles and some are research reports, though with individual chapters, perhaps, being the basis for published articles either before or after the theses is submitted and made available. Where theses are research reports, copyright and responsibility for checking for third-party copyright material typically rests with each author, though questions may be raised about these checks. Where theses are collections of published articles, then university libraries sometimes attempt to negotiate with publishers to make them available.
Thematic Discussions
After an opening keynote by Susan Copeland (UK EThOS Project [1] and NDLTD [12] Board member), much of the workshop consisted of breakout groups, discussing three overlapping themes that covered many of the issues noted above:
- Interoperability (syntactic and semantic)
- Enriching ETDs (links to datasets and multimedia, preservation)
- Management issues (legal issues, business models)
The aim of the discussions was to identify areas of work where either co-ordination or development would be valuable at a European level. The discussions were informed by a presentation on Framework 7 by Chris Reilly from the UK Office of Science and Technology, who noted four criteria that might be used to assess such value, namely:
- European added value - will the activity add value that would not be realised if undertaken by member states?
- Additionality - is the work itself something that would not be undertaken anyway by member states?
- Political and economic will - can the activity demonstrate economic benefit?
- Capacity - can the sector absorb and use funds effectively?
Interoperability Discussion Group
Informed by a briefing paper by Traugott Koch (UKOLN [13]), and after agreement that the problem addressed by interoperability is discovery and access, much of the discussion centred on the value and practicalities of subject classification and providing multilingual access. The group agreed that we don't know enough about why and how people might access ETDs, and that further work would be useful on these questions. An animated discussion of the value of multilingual access ended with agreement that a mixed approach would be necessary, to cater for cases where English was acceptable as a common language, and where this was not the case. While DDC is translated into many languages, is this to a sufficient level of granularity to be useful when applied to the very highly specific work in ETDs? The group agreed that access based on DDC might need to be supplemented by using newer techniques that exploit full text indexing, machine learning, etc. Finally, though importantly, the group saw a real need for interoperable rights information, both in terms of resource-specific expression (that is, licences), and in terms of repository-specific granting of certain rights to certain users (that is, policies).
Enriching (Links to Data/Multimedia, and Preservation) Discussion Group
This discussion was informed by a briefing paper by Sayeed Choudhury (John Hopkins University) and Eva Müller (Uppsala University, Sweden). Given the sheer scale and complexity of data and multimedia material that may support ETDs, the group agreed that dedicated repositories would be needed for it. Therefore, there would be a need both for a persistent linking infrastructure between ETDs (perhaps in institutional repositories) and supporting material, and for a means of deciding on the boundary conditions between the two. However, would the supporting material, datasets for example, change over time, or would they have to be fixed? This issue illustrates the dual nature of theses, both as an examination (and hence referencing a fixed point in time), and as a research output (and therefore, perhaps, evolving). Other possibilities for enriching or adding value to ETDs exist, such as plagiarism detection and print-on-demand.
Management Issues Discussion Group
Wilma Mossink (SURF [14]) introduced her briefing paper on this topic. In terms of legal issues, there were two possibilities. Where the author retains copyright in a thesis, then there needs to be greater clarity over what rights she should grant to a repository that makes the thesis available. Perhaps a European model licence or toolkit could be useful. Where the thesis includes published material, then access depends on the terms and conditions of the agreement between the author and the publisher. Is it possible that such terms and conditions can be documented in the same way as done for journals by the Sherpa/RoMEO [15] database? On more general management issues, the group discussed the need for theses workflow to be determined locally, but for some co-ordination of these workflows at a national or (given the Bologna process [16]) a European level. Some sharing of best practice could be valuable at this stage. However, for those seeking to make ETDs open access, cultural change is likely to be at least as big a challenge as policies and formally agreed workflows. Like the interoperability group, this group felt that we need to know more about the users of ETDs services, both depositors and those wanting to use ETDs.
General Discussion
While the thematic discussion groups had identified plenty of areas where work at a European level could really make a difference, there was one question that haunted the event from this point, which was whether or not any of the issues discussed were specific enough to ETDs (as compared with research outputs more generally) to warrant a specific programme of work. The workshop heard from two European projects, one (DART Europe [17]) that is founded on the belief that there are sufficient ETD issues to warrant European action, and one (DRIVER) that is building a generic repository infrastructure into which ETDs might fit. The remainder of the meeting was centred on this question, and good arguments were put for both sides of the debate.
Those arguing for a specific programme of work for ETDs noted that such a programme would address a clear, focused area, with a number of aspects that were particular to it. For example, the workflows associated with ETDs are not shared by other research outputs, and the copyright situation is often, though not always, different. As openly available, and highly detailed, records of research, ETDs do have value as resources for other researchers in many disciplines. They may also be a useful resource for social network services based on, for example, standardised citation listings. Furthermore, the Bologna process [16] of harmonisation within tertiary education may well bring into existence several reasons for making ETDs available, such as the need for metrics relating to the process, the need to compare doctoral (Ph.D.) theses, and the need to harmonise the extent to which doctoral students reflect on all aspects of research reporting (including the permissions they grant) as an integral aspect of their trade.
Those arguing that a dedicated ETD programme of work should not take place at a European level noted that there is no compelling reason to treat either ETDs (as a research output) or Europe (as a place) as separate. ETDs form but one element in an international research scene. Some argued that, in fact, theses are not useful research resources, being inferior in a number of ways to published papers. Finally, if a proposal were developed for European funding, this would need to be for a substantial amount of money to justify the considerable investment that would be necessary in putting the proposal together. There is no clear case for this kind of substantial funding, although there may be one for less ambitious co-ordination work, including with initiatives elsewhere in the world, particularly the USA.
The meeting did not reach consensus on this issue. It may be that ETDs and services based around them are just one set of bricks in a larger wall of research services. If so, then is the right approach to build these bricks, in co-operation with others working on nearby bricks, or to work on a whole region of the wall and have ETDs as just a single workpackage within that effort? To an extent, the answer will depend on the funding instruments available to us.
We agreed to nominate a small task group to develop a workplan for European co-ordination on ETDs. This workplan will describe both proposed activities and their rationale, bearing in mind Chris Reilly's four criteria, noted above. It will be circulated to those who attended the workshop, and to anyone else who is interested. There may be an email list, a shared Web area, and a follow-up meeting during 2006. Those interested in being involved should contact either Neil Jacobs or Gerard van Westrienen.
References
- EThOS http://www.ethos.ac.uk/
- DiVA (Digitala Vetenskapliga Arkivet) http://www.diva-portal.org/
- TEL (The European Library) http://www.theeuropeanlibrary.org/
- Cyberdocs http://sourcesup.cru.fr/cybertheses/
- DAREnet http://www.darenet.nl/en/page/language.view/home
- DissOnline http://www.dissonline.de/
- DIAS (Digital Information Archiving System) http://www-5.ibm.com/nl/dias/
- ENCompass http://www.endinfosys.com/
- SVEP (Samordning av den Svenska Högskolans Elektroniska Publicering)
http://www.svep-projekt.se/english/ - e-Depot http://www.kb.nl/dnp/e-depot/e-depot-en.html
- Cybertheses http://mirror-fr.cybertheses.org/
- NDLTD (Networked Digital Library of Theses and Dissertations)
http://www.ndltd.org/ - UKOLN http://www.ukoln.ac.uk/
- SURF http://www.surf.nl/en/home/index.php
- Sherpa/RoMEO database http://www.sherpa.ac.uk/romeo.php
- Bologna Process http://www.dfes.gov.uk/bologna/
- DART Europe http://www.dartington.ac.uk/dart/