Data management and analysis
This module addresses some of the key concepts required for the traditionally important area of data management, and the increasingly important area of data analytics. You will gain a practical, legal and ethical understanding of how to access, query and manage data collections, using traditional relational databases and contemporary NoSQL approaches. Using real-world datasets, standard software packages and data visualisation techniques, you will learn how to organise and analyse data collections to answer questions about the world, as well as developing an appreciation of user needs surrounding data systems.
What you will study
This module will provide you with a broad overview of the concepts, techniques and tools of modern data management and analysis. It will compare traditional relational databases with an alternative model (a NoSQL database), and will help you learn how to choose the most appropriate means of storing and managing data, depending on the size and structure of a particular dataset and its intended use. You will be introduced to preliminary techniques in data analysis, starting from the position that data is used to answer a question, and introduced to a range of data visualisation and analysis techniques that will instil an understanding of how to start exploring a new data set.
To ensure that you are comfortable with handling datasets, you will explore a range of real-world datasets to illustrate the key concepts in the module. Sources such as data.gov.uk, the World Bank, and a range of other national and international agencies may be used to provide appropriate data. You will spend approximately equal time between issues in data management (technical and socio-legal issues in storing and maintaining datasets), and issues in data analytics (understanding how data can be used to answer questions).
The module is framed around a narrative that looks at how to manage and extract value and insight from a range of increasingly large data collections. At each stage, a comparison will be drawn between different ways of representing the data (for example, using different sorts of charts or geographical mapping techniques), and limitations of the mechanisms presented. To enable you to get a feel for the use of data, each stage will also include an overview of some data analysis techniques, including summary reporting and exploratory data visualisation. This module is driven by Richard Hamming's famous quote: 'The purpose of computing is insight, not numbers'.
Some of the key ideas are:
Introducing data analysis
Starting with a data file such as a spreadsheet, this unit will provide you with a brief introduction to some basic operations on simple data files. This will give you an opportunity to study an outline of the key ideas in the module and help you become familiar with the module software.
Concepts in data management
You will look at three key areas in data management: data architectures and data access (CRUD), data integrity, and transaction management (ACID). Each of these topics will be illustrated using a relational database, and one non-relational alternative. The advantages and limitations of each model are discussed.
Legal and ethical issues
Here you will consider the legal and ethical issues involved in managing data collections. You will be required to obtain and read (parts of) the Data Protection Act and the Freedom of Information Act, and demonstrate how these apply to issues in data management. You will also consider privacy, ownership, intellectual property and licensing issues in data collection, management, retrieval and reuse.
Concepts in data analytics
These sections will focus on using data to answer a real question; the focus will be on exploratory techniques (such as visualisation) and formulating a question into a form that can be answered realistically using the data that is available. Issues in processing techniques for large and real-time streamed data collections will also be addressed along with techniques and technologies (such as MapReduce) for handling them. In this part of the module you will use a statistical package such as the python scientific libraries and/or ggplot2 to visualise the data and carry out appropriate analyses.
If you are considering progressing to The computing and IT project (TM470), this is one of the OU level 3 modules on which you could base your project topic. Normally, you should have completed one of these OU level 3 modules (or be currently studying one) before registering for the project module.
Module website, online study materials, sample datasets and module software.
A computing device with a browser and broadband internet access is required for this module. Any modern browser will be suitable for most computer activities. Functionality may be limited on mobile devices.
Any additional software will be provided from a hardware device e.g. DVD drive or USB stick or is generally freely available. However, some activities may have more specific requirements. For this reason, you will need to be able to install and run additional software on a desktop or laptop computer with either:
- Windows 7 or higher
- macOS 10.7 or higher
- a modern Linux version
The screen of the device must have a resolution of at least 1024 pixels horizontally and 768 pixels vertically.
To participate in our online-discussion area you will need both a microphone and speakers/headphones.
Our Skills for OU study website has further information including computing skills for study, computer security, acquiring a computer and Microsoft software offers for students.