Book Review: Managing Research Data

sally rumsey

Book Review: Managing Research Data

Sally Rumsey reviews a book which describes and explains the topics of interest central to practitioners involved with research data management.

Higher Education institutions (HEIs) in the UK are planning and implementing infrastructure and services to manage research data more urgently than they did for research publications. One policy framework sent to UK vice-chancellors from a major UK funding body (EPSRC), which set out clear expectations of responsibilities for data management at institutions within a given timetable, appears to have been the spark that prompted research data management (RDM) to be taken up by the upper echelons of management, and concrete activities set in place to start addressing the problem.

Setting Out the Context

RDM provision will incur significant extra cost to institutions at a time of shrinking budgets. Despite this, institutions are taking the matter very seriously. This is because there are a number of critical factors that have forced institutions to consider RDM provision a core issue. Theey include the emergence and subsequent policing of research funders’ requirements, the fact that there are few existing institutional safe archival stores for research data, the realization that researchers do not generally have easy access to other researchers’ data in the same way that many of them do to journal articles, and that key journals are increasingly demanding a reliable citation to research data underpinning articles. Moreover, there have been a couple of recent high-profile negative data stories in the national press, for example, the so-named ‘Climategate’ incident, and the tree ring data collected by an academic at Queen’s University, Dublin and requested under a Freedom of Information request [1][2]. Also, as Proctor et al. state in chapter 7 of Managing Research Data, institutions are aware of the risks to their reputation if they do not in future ‘provide a fit-for-purpose research data management service.’ It is accepted that there are possible ways of using research funding to pay for RDM services, but this does not resolve the entire sustainability problem. Basically, RDM is something most institutions recognise they must do, even though they are wrestling with how they will fund provision in the long term.

A Practical Guide

Managing Research Data opens with information about the significant financial investment made in the UK to produce research data. It goes on to describe the data landscape that faces those involved in managing research data, particularly information professionals. The book serves as an excellent practical primer and guide to the world of RDM, especially for those coming fresh to this topic. In addition, existing information professionals who find themselves working in the area of RDM would benefit from consulting the book.

Chapter 1 would make a worthy item on library and information courses’ ‘required reading’ lists. It sets out the context clearly, complete with the dilemmas, difficulties and complexity that anyone involved with research data has to tackle. It is written with a sensitive understanding of the academic community and its viewpoints, for example a general distaste for high-level decrees, and how data curation can often work best as a collaborative effort between discipline expert and data curators. It is accepted (in chapter 7) that a combination of provision of RDM services within HEIs with data management mandates imposed by other bodies is not enough to change research practice overnight.

The structure of the book is that each chapter covers a broad topic relevant to research data management, such as policies, roles and responsibilities, and data management planning; the editor and authors should be commended for creating a volume that does this in a way that is clear, concise and of real practical value. The complexity of the landscape is acknowledged, including the impact of dealing with sensitive data.

It is inevitable in a book such as this that it provides a general, largely theoretical view. What those involved with RDM face daily is the more messy, real-life situations. For example, the expectation is that the large amounts of data are influenced by policies and requirements for data management plans imposed by funders. However the reality is that some institutions are faced with data produced by research that is unfunded and therefore not governed by external policies, or that is not, for whatever reason, able to be deposited in a specialist national data centre. Institutions need to work out what to do in these instances: what is to be retained, how to manage it, and how to pay for its curation.

Sarah Higgins describes the data management lifecycle, as conceptualised by the DCC (Digital Curation Centre) [3]. The discussion of the separate stages of the lifecycle could each stand perfectly well as independent briefings for those wanting a short overview.

The chapter describing developments in Australia and the US provides a contrast to the overall UK perspective of the book. It gives a useful overview of different models for broaching the problem of data management. I always admire the Australians and their propensity to think big, roll up their sleeves and act. There is a lot we in the UK can learn from what has been achieved and is planned on the other side of the world. The section on the US was interesting too, although it wasn’t clear how, if at all, DataOne relates to Data Conservancy, nor was it clear (as it was with the Australian model) how the US developments are being funded and how they therefore might continue in the long term.

Who Should Read This Book?

In the preface, the editor, Graham Pryor, explains that ‘initially, the aim of this book was to introduce and familiarise the library and information professional with the principal elements of research data management.’ He goes on to say that he believes it will serve a wider audience. I definitely agree, although I doubt that one group, active researchers who produce data, will read the book themselves (although I’d be delighted to be proved wrong).

Each chapter of the book provides a succinct and clear overview of key areas involved in RDM. Most notably, the chapters on policies, sustainability, emerging infrastructure and on data management planning.

One topic I would like to have seen expanded is that of legal matters, rights, licensing and data ownership. These are areas that, as Angus Whyte says ‘represent the most significant barrier to sharing data.’ Therefore this topic would merit longer discourse within the book. Legal matters are key to the management and reuse of data and where there is a certain lack of knowledge. Practitioners implementing RDM infrastructures need to ensure adequate provision of information and guidance.

As Brian Lavoie points out in his chapter, researchers have generally lacked incentives to store, manage, curate and share their data. This is a key point. The situation does appear to be changing, and reliable data citation is becoming more important. The DataCite service [4] and the use of DOIs (Digital Object Identifiers) [5] are becoming de facto standards for identifying and referencing data. I believe that this is going to become more accepted, and researchers will soon expect datasets to be published and cited using persistent identifiers and links.

I have a bit of a gripe about the tables and diagrams in the book. Table 3.1 giving details of research funders’ data policies is not laid out in a way that makes it easy to compare policy details. The shading in figure 9.3 showing the OAIS and Data Conservancy mapping is not clear, neither was the explanation of the diagram in the text. The reader is referred to nodes depicted as triangles and dots in the diagram demonstrating the conceptual overview of DataOne (Fig. 9.4), but they are too small to be immediately noticeable or useful.

Conclusion

This is an excellent book for anyone, not just information professionals, looking to ‘introduce and familiarize’ (Pryor, in the preface) themselves with a complex and challenging, yet increasingly important topic. The book benefits from a prestigious line-up of knowledgeable authors, including those who are actually ‘doing’ research and research data management. As an edited volume it fits well together as a single entity even though written by a number of individuals: chapters reference other chapters and the reader is not left with a sense of a ‘cobbled-together’ mix of disparate topics from different people. The content can equally well be dipped into, as read from cover to cover.

There is always a danger with this type of book that the environment will have moved on since writing and publication, and indeed it has, with a fresh batch of JISC RDM [6] projects underway and significant reports being published. There are also emerging social network services such as collaboration tools like Colwiz [7] and Mendeley [8], sharing tools like Figshare [9], as well as blogs and wikis that are being increasingly used by researchers and which will have an impact on the research data management environment. However, I expect this book will remain a valuable resource for those working or intending to work in the field for some while yet.

List of Chapters

1. Why manage research data? - Graham Pryor
2. The lifecycle of data management - Sarah Higgins
3. Research data policies: principles, requirements and trends - Sarah Jones
4. Sustainable research data - Brian F. Lavoie
5. Data management plans and planning - Martin Donnelly
6. Roles and responsibilities – libraries, librarians and data - Sheila Corrall
7. Research data management: opportunities and challenges for HEIs - Rob Procter, Peter
    Halfpenny and Alex Voss
8. The national data centres - Ellen Collins
9. Contrasting national research data strategies: Australia and the USA - Andrew Treloar, William
    Michener and G Sayeed Choudhury
10. Emerging infrastructure and services for research data management and curation in the UK
      and Europe - Angus Whyte

References

BBC News, 'Show Your Working': What 'ClimateGate' means. 1 December 2009
http://news.bbc.co.uk/1/hi/8388485.stm
BBC News, University told to hand over tree ring data. 19 April 2010.
http://news.bbc.co.uk/1/hi/northern_ireland/8623417.stm
DCC (Digital Curation Centre) http://www.dcc.ac.uk/
DataCite http://datacite.org/
DOI (Digital Object Identifiers) system http://www.doi.org/
JISC RDM http://www.jisc.ac.uk/whatwedo/programmes/mrd.aspx
Colwiz, Collective Wisdom http://www.colwiz.com/
Mendeley http://www.mendeley.com/
Figshare http://figshare.com/

Author Details

Sally Rumsey
The Bodleian Libraries
University of Oxford

Email: sally.rumsey@bodleian.ox.ac.uk
Web site: http://www.bodleian.ox.ac.uk/

Sally Rumsey is Digital Collections Development Manager at the Bodleian Libraries, University of Oxford. She manages the Oxford University Research Archive (ORA) publications repository, and is involved in developing and implementing services to support Oxford’s emerging research data management infrastructure.