What you will study
Applied statistical modelling (M348) will develop your general statistical modelling skills beyond that delivered by Analysing Data (M248). In this module, simple linear regression is extended to model a wide variety of dataset types.
Book 1: Linear models
You’ll start with a revision of simple linear regression, combined with an introduction to the statistical software used, namely R. Initially, simple linear regression will be extended in two separate ways: firstly, by including more than one continuous explanatory variable, and secondly to deal with situations when the explanatory variable is categorical. You’ll then see how these two extensions can be combined to form regression models with any number of variables, continuous or categorical. You’ll then finish this book by putting the modelling techniques you’ve learned so far into practice by building a statistical model to predict success at the Olympics. In doing so, you’ll discover how fitting a model is only one part of using data to answer a question.
Book 2: Generalised linear models
All models you consider in Book 1 assume that the response variable is continuous and can be modelled, possibly after transformation, using a normal distribution. Although this is often sufficient for data analysis, there are situations where it is not. So, in Book 2, you’ll consider how linear models can be extended to cope with such situations. The resulting models are known as generalised linear models.
You’ll see that it’s possible to have models where the response distribution used is a binomial distribution instead of a normal distribution. You’ll then see that it’s possible to use other distributions as well, such as the Poisson distribution or the exponential distribution. Finally, in this book, you’ll see how a particular form of generalised linear model, the loglinear model, can be used to explore relationships between categorical variables. The loglinear model is particularly helpful when contingency tables relate to data with three or more categorical variables.
Book 3: Applications
Having extended the range of ‘regression’ tools in your data analysis toolbox, Book 3 focuses on two specialist applications of statistics: econometrics and data science. You’ll study only one of these.
In the econometrics strand, you’ll see how the assumptions associated with linear models can be problematic when applied to economic data. For example, data may represent observations made over time, so they’re not independent. In this strand, you’ll see how econometricians deal with such problems.
The data science strand focuses on a couple of topics of particular interest to data scientists. Firstly, you’ll focus on finding clusters in data: groups of observations that are similar to one another but different to observations in other groups. Next, you’ll learn suitable techniques for grouping the data when we don’t have examples of any groups – or even know how many groups there should be! You’ll then consider the challenges that ‘big data’ bring and discuss what can be done to address some of these challenges.
You’ll finish with a unit that will pull the content in the module together and help you prepare for the end-of-module assessment.
Vocational relevance
The ability to analyse and interpret data is central to many careers in, for example, government, health, business, finance and market research. The material in this module explores the fundamental statistical techniques required for analysing and interpreting data. Statistical software packages are important data analysis tools for practising statisticians: the use of one such statistical software package is integral to this module. Another vital skill required by practising statisticians is communicating the results from their data analyses. You’ll develop this skill through statistical report writing.