Category Archives: Research Data Management

ORDO monthly online drop-ins

Did you know, on the first Thursday of every month between 14.00 and 15.00 we run an online drop-in for ORDO, our research data repository?

We’re here to help, whether you’re interested in using ORDO but not sure where to start, or you’ve been using it for a while and have questions about how to make the most of it.

To join, go to our Adobe Connect “Research Support” page and click on “join room” (and if you find the link takes you to the “DISS Home” page instead, click on “Resources” at the top and scroll down to “Research Support”).

Dates for the next few months:

  • Thursday 1st August 14.00-15.00
  • Thursday 5th September 14.00-15.00
  • Thursday 3rd October 14.00-15.00

Hope to see you there!

Research data sharing: ensuring greater research integrity?

You may have read in the news recently about a scandal concerning the doctoring of research data within a lab run by a top UK academic. Earlier this month UCL released details of the inquiries into misconduct, which were undertaken in 2014 and 2015. Of the 60 papers reviewed, the panels found evidence of misconduct in 15 of them. This included “cloning” whereby features were copy and pasted throughout an image, and some of the data fabrications were reportedly fundamental to the conclusions reached by the authors.

This news story struck me as a prime example of why data sharing is so important to improve research integrity. If the data underpinning the papers in question had been made publicly available in a trusted research data repository, it seems unlikely that misconduct of this level would have happened. Data sharing should encourage greater transparency of results – ensuring that researchers are less likely to falsify research findings or fabricate data, and if they do then this sort of misconduct could be spotted much more quickly. Would a culture of data sharing also have instilled a sense of responsibility on researchers to “do the right thing” rather than cutting corners?

Sharing research data can seem like an onerous task, however if a possible outcome of data sharing is greater research integrity, then it needs to be recognised as an important part of all researchers’ work.

ORDO best practice #2 Archiving a website

Continuing my series on best practice in ORDO, this time I’m going to trumpet The Robert Minter Collection: https://doi.org/10.21954/ou.rd.7258499.v1 which was deposited by Trevor Herbert in December 2018. According to the ORDO record:

This is a copy of the data underlying the website ‘The Robert Minter Collection: A Handlist of Seventeenth- and Eighteenth-Century Trumpet Repertory’ which contained a database of music collected by Robert L. Minter (1949-81).

Minter’s interest was in the collection of sources that contribute to our understanding of the trumpet at various points in its history before the twentieth century.

This is regarded as one of the world’s largest fully catalogued datasets about early trumpet repertoire.

The website in question was created in 2008 and is no longer active, however it had been archived by the Internet Archive, most recently in May 2017. In 2018, Trevor approached the Library for help archiving the data contained on the website because he was aware that although the Internet Archive had maintained much of the information, not all functionality and content had been preserved; most crucially the database itself is no longer searchable.               

ORDO was deemed a good fit for creating an archive of the content of the website. It allows the deposit of any file type and enables in-browser visualisation of many of these so it is not always necessary to download documents in order to view them. By depositing the material in ORDO, Trevor also obtained a DOI (Digital Object Identifier) – a persistent, reliable link to the record which will be maintained even if the materials are no longer available for any reason. Any materials added to ORDO are guaranteed to be maintained for a minimum of ten years.

Within the record there are four files – an access database, a csv copy of the data, a zip file containing information about the collection, database and website and a list of files in the zip file. The description in the record makes it clear to any potential users what they are accessing and how they can be used. Since it was deposited in December, the collection has been viewed 139 times and downloaded 18 times. Now that deserves a fanfare!

Call for Data Champions!

The Library is launching a new Data Champions programme, and we are looking for PGR students and staff who are interested in taking part.

What would this involve?

Data Champions are expected to:

  • Lead by example – make data open (via ORDO or other data repositories); share best practice through case studies and blog posts, and share Data Management Plans on the Library Research Support website 
  • Promote OU Research Data Management (RDM) services and tools within your unit
  • Provide discipline specific data management advice and support to colleagues
  • Attend and contribute to Library-run events 
  • Contribute to The Orb, Open Research Blog 
  • Offer feedback to Library Services to support RDM service development

What’s in it for me?

Data Champions will benefit from the following: 

  • Boost CV – increase funding opportunities by having RDM “expert” status  
  • Increase visibility – dedicated profile on the Data Champions webpage, opportunity to contribute to the successful Open Research Blog 
  • Opportunity to network with colleagues from across the OU 
  • Be instrumental in developing the OU Research Data Management Service and improving the culture of data sharing at the OU 
  • Receive 100 GB of data storage on ORDO as default 
  • Attendance for one Data Champion per year to the annual Figshare Fest conference in London 

Do I need to be a data expert?

No  – we’re looking for a range of people from different disciplines who work in different ways with different types of data. You could be a research student, early career researcher, professor, member of research support staff or an IT specialist. You might have experience compiling surveys, collecting lab-based data, harvesting big data or creating video data. Whoever you are and whatever your area of interest, we’d love to hear from you.

Don’t worry if you don’t consider yourself a data expert, your knowledge in your specfic area is invaluable and training and support will be given.

What’s the time commitment?

We expect the Data Champion role to require a commitment of 1-3 hours a month, but this can be flexible according to the amount of time you are able to give.

How do I apply?

Send an email to library-research-support@open,ac,uk  by 31st July with the subject “Data Champions” stating what type of research you are involved with and whether there’s any particular contribution you’d like to make.

When do I start?

We are going to launch the programme with a Data Champions Forum in September. This will be an opportunity to meet the other Data Champions, find out more and help shape the Data Champions programme.

 

ORDO best practice #1 Documenting data

Over the coming months I’m going to focus on some examples of best practice on ORDO. The creators of all the items in this series will receive a reusable Figshare coffee cup as way of thanks and congratulations.

The first series of items I’m going to focus on are the OpenMARS Database datasets (https://doi.org/10.21954/ou.rd.c.4278950.v1) , deposited by James Holmes (STEM) earlier this year. From the data record:

“The Open access to Mars Assimilated Remote Soundings (OpenMARS) database is a reanalysis product combining past spacecraft observations with a state-of-the-art Mars Global Circulation Model (GCM). The OpenMARS product is a global surface/atmosphere reference database of key variables for multiple Mars years.”

Since their deposit in February, these datasets have been downloaded a total of 291 times, making them some of the most popular items on ORDO. This is a fine reward for all the hard work that went into preparing them for sharing.

What’s so good about them?

There are four datasets which are published individually and also grouped together as a collection. The most impressive thing about these is the documentation accompanying these datasets, which is excellent:

  • On the landing page for each dataset is a description, which clearly details the provenance of the dataset and information about the OpenMARS project
  • Each dataset has a PDF reference manual. This can be read in the browser, and as the datasets are large (~25GB each) and use a file format that requires specialist software and does not display in the browser (.nc) this means that users can decide if the data is useful before download
  • The documentation within the reference manual is very detailed and includes information on access (using a sample Python script included in the dataset), structure of the dataset, provenance and quality assurance
  • The datasets clearly reference the funding body – the European Union’s Horizon 2020 Research and Innovation programme

Is it FAIR?

The gold standard for research data is that it should be FAIR – Findable, Accessible, Interoperable and Re-usable. These datasets fulfill all but one of the criteria detailed in Sarah Jones and Maarjan Grootfeld’s FAIR data checklist (original version at https://doi.org/10.5281/zenodo.1065991).  It only falls down on the fact that the data are not in a widely available format, but considering the nature of the data this would be very difficult to achieve, and since the reference manuals are very accessible, this issue is dealt with. See the completed checklist.

And finally, a word from James…

‘Adding datasets produced by our team at the Open University that will be of interest to multiple different users was really simple to do using the ORDO system, and the team that manage it were very helpful if I had any questions during the process. Thanks!’

 

Upcoming training from the Research Support team

We’ll be delivering some training over the next few months on a range of topics, including: using ORO, how to claim your research publications, managing and sharing research data, and academic profiles

Something there for everyone, we hope!

All will be recorded, so if you can’t make it along in person or online at the time, you can catch up later at your leisure (using the ‘View previous recordings’ link at the top of  our Adobe Connect page.

  • Writing successful data management plans. Tuesday 22nd Jan, 14:00-14:30 (Online) Sign up at My Learning Centre
  • Working with research data. Wednesday 30th Jan, 11:30-12:00 (Online) Sign up at My Learning Centre
  • Data sharing: how, what and why? Monday 11th Feb, 14:00-14:30 (Online) Sign up at My Learning Centre
  • Data sharing: legal and ethical issues. Tuesday 19th Feb, 11:30-12:00 (Online) Sign up at My Learning Centre
  • Open Research Online (ORO) – An introductory session. Monday 11th Feb, 15:00-16:00 (face-2-face and online) Sign up at My Learning Centre
  • Open Research Online (ORO) – An advanced session. Wednesday 27th Feb, 11:00-12:00 (face-2-face and online) Sign up at My Learning Centre
  • Academic social networking/author profile systems. Wednesday 13th March, 10.30-11.30 (face-2-face and online) Sign up at Graduate School Network
  • Claiming your research publications: ORCIDs at the OU. Wednesday 20th March, 10:30-12:00 (face-2-face and online) Sign up at My Learning Centre

If you have any question, please get in touch at  library-research-support@open.ac.uk

Data Conversation – talking with researchers about open data

A couple of weeks ago we held an informal event for researchers to share their experiences and knowledge of working with research data.

The idea was to hear from researchers about how they work and what’s important to them, away from the (valuable but not always so exciting) talk about complying with funder policies and writing data management plans. We hoped this would start some conversations and potentially help build a community around research data management at the OU.

If that sounds familiar it could be because it’s something Lancaster University have been doing very successfully for a while. The suggestion to plan a similar event at the OU came from talking with our friends at Figshare (the repository our research data repository, ORDO, uses), in particular Megan, who also gave us lots of help before and during the event. So, with thanks, we pinched Lancaster’s idea and even the name ‘Data Conversation’.

We had a theme of ‘open data’ and invited OU researchers to come along to talk on that topic for about 15 minutes – and were delighted to have a brilliant line-up of talks.

Our speakers

David King – a Visiting Fellow in Computing & Communications, David talked about the history of his work with biodiversity and agriculture data, and the many systems he has used to manage and share information. We heard how technologies and tools like DOIs, institutional repositories (hello ORO and ORDO!), and collaborative document management like Office365 can help to work with and share research data. David also touched upon his joint research in the Humanities with Francesca Benatti on the A Question of Style project. You can see David and Francesca’s slides here.

Sarah Middle – Sarah’s a PhD student studying Digital Humanities/Classical Studies, and talked about her PhD in using linked data in Ancient World research. Through examples of Sarah’s work linking UK Arts and Humanities project data, and working with the British Library on Privy Council appeals data, we saw how openly available data can be re-used. However, re-using that data can require a lot of work to make it usable in a new format, and to be sure if, and how, it can be shared further. Sarah also took us through the process she has gone through to ensure the data she collects from surveys and interviews can be as open as possible, by working with the OU’s ethics committee and library research support.

Nancy Pontika – Nancy is Open Access Aggregation Officer at CORE, (the Open Access repository based in the OU’s Knowledge Media Institute), and told us about the work CORE does to provide research publications to anyone, anywhere, by harvesting content from open access repositories. CORE has over 135 million metadata records and 11 million full text items and makes its API and dataset open for others to use freely. We also heard about the development of the upcoming analytics dashboard, for institutions to assess the impact of their research outputs. You can see Nancy’s slides here.

Tony Hirst – A Senior Lecturer in Telematics, Tony gave us a whirlwind tour of the many ways he has used open data to answer topical questions, or really to investigate anything that he finds of interest (including the companies connected to Iron Maiden). It was a great demonstration of how an inquisitive and playful approach can produce novel information by combining freely available datasets. You can find many examples of Tony’s work in these links, and generally on his blog OUseful.Info. Tony had delivered an earlier session for the library team here at the OU, about how virtual machines and Jupyter notebooks can be used in teaching and research data sharing, which really piqued our interest too.

Discussion

Along to hear the talks and join the discussion were a mix of researchers, research support staff and librarians. After the talks and follow up discussion we had some round table discussions on ‘open data’ topics:

  • What most interests you about sharing your data openly?
  • What might prevent you from sharing your data?
  • Where might you go (or have you gone) for support on sharing your data?
  • Where might you look to deposit your data, and why?

These images show the ideas we captured (click on them to see in detail):

From all of this some themes emerged:

  • Making data ‘open’ can be a tricky thing to do. Echoing what we often find when working with researchers – that working out where to put it, how to organise and describe it, and whether it is indeed ok to share it (e.g. for personal or otherwise sensitive data) takes time and effort. Then actually doing it takes time too.
  • There are lots of resources and people to go to for support and advice. This is great and shows a commitment from funders, institutions and, most importantly, researchers to work openly. Is there a risk that that it can be hard to pick your way through to the relevant information you need? Possibly.
  • Is it intrinsic or something extra? For some, data sharing is part of their work (See our speakers for example). For others it is seen as an extra task to do at the end of a project or when publishing.
  • There doesn’t have to be one ‘right way’. In the talks we heard positive examples of data being shared and used in a variety of ways. Things like ORCID, DOIs and metadata standards can help identify and link data consistently, but beyond that we don’t all have to use the same methods and systems.
  • It is well worth doing. We were to an extent preaching to the choir, but the mood in the room was that it is certainly well worth doing. Our speakers illustrated a variety of uses and approaches where open data enables and supports research, and the comments we noted for ‘What most interests you about sharing your data openly?’ highlighted benefits for data authors, data re-users, research participants and for generally improving research.

How can we help?

So what can we, as a library, do?

  • We can continue to provide the tools and systems to store, preserve and share research data.
  • We can support researchers in using them – and when they do, we can help promote and connect the data and other outputs they share.
  • We can continue to provide advocacy, training and advice on data sharing to make researchers aware and prepared to share when planning their work.
  • We can also continue to listen and have conversations with researchers about what they are doing, their priorities, and what would help them to do it.

Next Steps?

We’d love to have another Data Conversation in the new year on a new topic. If you’d like to take part – either to speak about your work or join to hear what others have been up to – please get in touch library-research-support@open.ac.uk

And thanks again to everyone who came along!

Written by Dan Crane, Research Support Librarian.

Training offer: Making your research data open

There are spaces available on our training session ‘Making your research data open‘ on Tuesday (27th November 2018), 10:00 to 11:30.

Photo by Finn Hackshaw on Unsplash

In this session we will look at why, how, what and when to share data:

  • Why should you share your data? We’ll discuss the benefits and the reasons why data sharing is such a hot topic at the moment.
  • How can you do it? We’ll take a look at the OU’s data repository, ORDO, and provide guidance on preparing data for sharing, including sensitive data
  • What should you share? Do you need to share everything? What do funders and publishers want you to share?
  • When should you share? We’ll the look at the stages of the research process when sharing data is most useful to you and others.

Sign up via My Learning Centre – any if you have any questions, get in touch at library-research-support@open.ac.uk.

Practical Strategies for Research Data Management: workshop slides

Yesterday I ran a session on Practical Strategies for Research Data Management, where we talked about the basics of research data management, including options for data storage and organising data. We also looked at how to write a data management plan using a DMP template, and ended with a game of DMP Bingo.

Thanks to everyone who took part and contributed to the discussions.

The slides are available here:

 

A reminder too that will be running two online sessions covering the same material in January. Sign-up and see full details on My Learning Centre.