NYU Health Sciences Library. Karen Hanson, Kevin Read, Alisa Surkis

Everybody wants to avoid data loss, article retractions, data errors. So it’s important to think about naming conventions, backups, workflows, variables.

Variables (Meet Dr. B Hartwell, Cardiology Researcher and Dr M. Audaheer, Research Coordinator)

Dr Hartwell: I’ve been looking at our data. I think we might get some interesting results if we use the new Oxford BMI calculation rather than the standard BMI. Can we run that analysis?

Dr Audaheer: Uh… I need to show you something. We only recorded the standard BMI so we can’t calculate an alternate one!

Dr Hartwell: That’s too bad… that could have been a game changer but at least we can still do the standard analysis.

How could this have been avoided? Never record compounded variables when the component variables are available.

Workflows Dr Hartwell: We need to collect data from 300 patients with hypertension. In this folder are blank copies of the form I designed for data collection. I’ve also emails each of you an empty spreadsheet for the results. Bye! Send me your spreadsheets when you’re done.

6 months later… Okay, let’s see what kind of results have been collected…

Wait, what is this? This data is a mess! I can probably sort out height and weight but I’ll have to leave smoking out of the analysis. I guess that will have to do.

When planning data collection never assume that variable names mean the same thing to everybody. Be explicit about variable type: numeric, text, categorical. Be explicit about units of measurement: kgs, stones, lbs. Be explicit about definitions: current smoker? How many cigarettes a day? Ever smoked? Make sure everyone is one the same page BEFORE starting to collect data!

Naming conventions Dr Audaheer: I finished entering the last of the data, but was confused about something… Some of the values for pulse-ox are over 100 per cent!

Dr Hartwell: That’s because that’s the pulse RATE…Wait a minute! The pulse-ox and pulse rate data have been mixed up. How can we possibly correct this?! The study is ruined!! Ruined!!

Ambiguous file or field names can be dangerous! Come up with a consistent naming convention that can be used by the entire team and document it!

Backups One final thought… life is unpredictable so create a backup plan! Keep your data on a secure server. Ensure copies are geographically dispersed. How many days of data can you afford to lose?

Transcribed by Library Services, The Open University

Video produced by the NYU Health Sciences Library

Data collection

This can include:

Digitisation and data entry

When data are digitised, transcribed, entered in a database or spreadsheet, or coded, quality can be ensured by standardised and consistent procedures for data entry with clear instructions.

This may include:

Data checking

Data checking is when data are edited, cleaned, verified, cross-checked and validated. Checking typically involves both automated and manual procedures, for example:

Video guides to managing data quality

Video: Data Quality Checking Produced by the Statistical Services Centre in collaboration with the University of Reading, this video discusses the importance of data quality checks throughout a research project.

Contact the Library Research Support team