March update

The main focus for the project over the last few months has been to work on a small-scale cohort study, exploring several aspects related to library data and trying to get a better understanding that goes beyond the quantative data.

Working with two academics who lead on two modules in one of the University’s programmes, this means that we’re able to draw not only on the library data, but can also set it alongside results from a reflective quiz and then carry out some follow up interviews to try to understand more about what students are doing and what their motivations might be.

We’ve now completed some interviews and are now in the data analysis stage.  So we are coding interview transcripts, looking at survey results and pulling together relevant library data.

We’ve also been working on providing some data for a set of reports that are being provided to each School about their engagement with the library, we’ve carried out a few briefing sessions for some key stakeholders and put out a few internal communications messages.

The team also ran a session for library staff to get them to explore the library data using a scenario of reporting on library engagement with a course team and using data about library resource accesses, student results and completion figures.  This seemed to be a really useful exercise in getting people to think about how they might use the data in practice and it identified some of the challenges in building up data capabilities.library explorer badge

On the data analysis side we’ve carried out some work to look into whether students who complete modules are more likely to be accessing library resources.   We’ve also been writing up some of our early work for publication in a journal.

Posted in Update | Leave a comment

Library Data project – December update

The Library Data team have been out and about over the last three months with presentations at the Northern Collaboration Conference 2016 in September in Liverpool (Presentation slideshare) and at the UKSG Forum in London in November (Presentation PDF 123kb).   The presentations summarised early work that found a distinct pattern of increasing resource access as students study at higher levels and also a pattern where students who get better results in their modules are accessing more library resources.  But we’re also finding very great variation between different modules.

Since then the project team have been working on some correlation studies using two measures of student success, an overall continuous assessment score and an overall examination score.  Both of these scores are presented as percentages.   As with the initial work there is a lot of variation between different modules, so while some modules show positive correlations, others have no significant correlation.

We’ve now turned to start to look at retention and are initially focusing on looking at module completion rates and investigating whether there are differences between students who have accessed library resources and those who haven’t in terms of module completion rates.  One of the approaches we’ve looked at is to investigate the correlation between the percentage of students completing the module with the average number of library resources accessed by students on that  module and also with the percentage of students on the module who have accessed library resources.  Again, we’ve got wide variations between modules.

Attention has also turned to some student interviews to try to understand more about the ways students use information resources within their module, and motivations behind this. We are interetsted to find out how this qualitiative data might shed light on our understanding and interpretation of the quantitive data about e-resource accesses.

Some progress has been made with discussions with corporate IT about putting e-resource access data into the institutional data warehouse.    Internally within the library we’ve been exploring the use of elastic search and kibana as a method of handling and visualising the library data.  This shows some promise as a way of allowing library staff to quickly see patterns in the data.

Finally we’ve been working with colleagues across the library to provide data that can be used in a new series of library reports aimed at the Faculties and Schools.

Posted in Update | Tagged , , , | Leave a comment

OU Library data project update September 2016

Over the summer we’ve continued working on our library data project and have managed to build on some of the early pilot work with further analyses but using a larger pool of data.     We’re now able to run queries against the main institutional data warehouse so we  can run our own queries and look at wider trends.

Research Study 3
This has progressed quite a long way and we’ve been able to look at data from modules starting in 2014 and in 2015.  We’ve combined data on library resource accesses from both Ezproxy and OpenAthens with student results data.

We’ve combined some of the results categories together to slightly simplify the interpretation.  So while level 1 modules generally have Pass and Distinction categories, level 2 and 3 modules tend to have Grade 2 Pass, Grade 3 Pass, Grade 4 Pass and Distinction.   We’ve combined the different pass categories into one pass category.

If we look across the whole range of undergraduate students (around 300,000 students across the two years – as students will probably study more than one module in that period), we see the same sort of patterns we saw with the original three pilot modules.  Students who fail accessing around a third of the online library resources compared with students who pass.  Students gaining a distinction accessing nearly twice the number of library resources as students who pass.

Now we can look in more detail at the different modules, Faculties, levels of study and presentation dates, we can start to see that there are differences between them.  In some cases we will know that there are modules that don’t make so much use of library resources, but it’s very useful data for our liaison librarians to discuss with their Faculty colleagues.

Further studies
We’ve followed up Research Study 3 with a piece of work to look at whether we could follow the approach used by the University of Wollongong in their work

Covered in

and also at

Cox, B. L. and Jantti, M. (2012) Capturing business intelligence required for targeted marketing, demonstrating value, and driving process improvement, Library & Information Science Research. 34, PP308-316 doi:10.1016/j.lisr.2012.06.002

Although we are using Ezproxy data, one difference is that Wollongong have used a count of the amount of time students accessed resources whereas we are using a count of the number of Ezproxy accesses.

This has proved to be a really useful exercise as we’ve been able to follow most of the steps and have been fortunate to be able to correspond with one of the authors (Brian Cox) on some of the details, which has helped to clarify some of the steps and decisions.

One insight that this has made clear for us is the high percentage of students who don’t access the library (we know that not all modules require library use as their module materials can be quite comprehensive).    But the levels of non-use decrease as students study at higher levels and also seem to be decreasing over time as we’ve started to compare modules starting in autumn 2014 with those starting in autumn 2015 and are seeing more students accessing library resources.

What’s next
We’re still aiming on writing up the research for publication and will also be turning our attention to looking at the relationship between library use and student retention.  Plus, we’ve also a small cohort qualitative study starting in the autumn.


Posted in Data analysis, Data sources, Research study 3, Update | Leave a comment

OU Library Data project update May 2016

We’re continuing to work on the Library Data project and have made some good progress over the last two months in several areas.

Research Study 2
We used the same small sample dataset (three level 1 modules, n=11,501) that we had used to look at the relationship between library use and student attainment to analyse library use against demographic data to see if age, gender or previous educational attainment showed different levels of library resource access.  The picture that emerged was:

  • Older students (56 and older) (n=990) averaged more than 11 resource accesses, against a mean of just over 3 for the under 25 age group (n=3,382).  The mean number of library resources used increases steadily through the age groups.
  • Male students (n=5,344)  access a mean of 5.7 resources, female students (n=6,157) 4.7 resources.
  • Students with No formal qualifications, Less than A levels or A Levels or equivalent all accessed a mean of between 4.58 and 4.72 resources.  Students with HE qualifications a mean of 6.7 and those with a Postgraduate qualification a mean of 9.7 resources accessed.
  • On the face of it, you might draw the conclusion that the older you are and the higher your previous education experience the more library resources students access.  But interestingly when you look at combining age and previous education you get a more complex story (see graph below).
Mean number of library resources accessed
Resources accessed by age and education








Designing Research Study 3
We’re designing a detailed study to compare library resource accesses and student success. Colleagues in IET are working with us to guide the analysis.  This should give us a robust view of whether the early indications we had in Research Study 2 are borne out with further analysis.  We intend to publish the results.

Qualitative study
We’re talking to colleagues in a Faculty about a qualitative study with a level 1 and a level 2 module. The plan is to see whether there is a relationship between student participation in library skills activities and attainment. The study will look at qualitative and quantitative data and so will require a different methodology to our first three studies because they only looked at quantitative data.

Getting library data into the main university data warehouse
One of the big aims of the project is to get data about use of library online resources added to the main institutional data warehouse.  This should greatly help with encouraging data users to take library use into account when analyzing data about the student experience.  The good news is that our proposal to add library data has been approved and we are hoping for an Autumn 2016 implementation.


Posted in Update | Leave a comment

Data and analytics skills assessment

Improving data and analytics skills is something that is in scope for the Library Data project.  The aim is to identify the skills that the Library needs and to help to develop those skills.  So our starting point has been to assess our current level of data and analytics ‘readiness’.

The survey
The approach we’ve taken is to design a very short and simple questionnaire of five questions asking library staff to rate their level of confidence using a likert scale. Using the categories of Strongly Disagree/Disagree/Neutral/Agree/Strongly Agree.

The five questions are:

  • I am confident that I know what analytics data the library has and I know how to access it
  • I am confident that I know what analytics data the University has and I know how to access it
  • I am confident in my ability to use simple analytics tools
  • I am confident in my ability to analyse data
  • I am confident in my ability to use data to support decision making

The questionnaire was printed out and handed round at the start of a Staff Development Hour session in the library that was going to be giving an update on the work of the Library Data Project.  Library staff were invited to complete the questionnaire before the session started and responses were deliberately kept anonymous as we were interested in an overall  impression of our data readiness.

We collected the questionnaires straightaway and had 27 completed responses, most of the people attending the session.  The approach we took to the analysis was to score each response to each question using a weighting score.  Strongly Disagree was scored as 1, Disagree as 2, Neutral as 3, Agree as 4 and Strongly Agree as 5.  The weighting was multiplied by the number of responses against each level on the likert scale and then totaled for each question.   Dividing the total by the number of responses gives an average response that can be compared against the weighting scores.  So a score below 3 would be disagreement and over 3 would be agreement with the question.

For example, in Question 1, if you have the following pattern of responses:
Strongly disagree  1, Disagree 2, Neutral 1, Agree 4, Strongly Agree 2
Adding weighting and carrying out the calculations gives you:
(1×1) + (2×2) + (1×3) + (4×4) + (2×5) = 34, dividing by the number of responses (10) gives an average of 3.4 – a response between neutral and agree.

The scores for each question are useful in helping to identify where there is the least confidence and might shape which areas you could concentrate on.  The scores can also be averaged to give an overall level.

Presenting the results
We decided to create a graphic to present the results from the questionnaire in a more visual way.   The idea is that you can see at a glance the level of confidence against each of the questions and for the overall picture.  So the red ring is placed on the scale based on the average of the responses.  One advantage of the approach is that you can use the visualization to track how the readiness level changes over time by adding a different colour ring if you repeat the questionnaire.  And that is something that we plan to do later in the project.  A made up sample of the visualization approach is shown below:

Data skills readiness visualisation

Next steps
We plan to repeat the exercise later in the project, possibly at the end of a future update on the project at a Staff Development session.  We’d expect that we’d then be able to compare the scores to provide us with a measurement of change in confidence levels and to help with shaping future priorities in the project.

Posted in Data sources, Skills, Update | Leave a comment

Research Study 2: post #2

Pilot data study

We now have OU student data for our three pilot modules (one Arts, one Technology and one Law) across seven 2014-15 presentations. The data includes variables relating to attainment e.g. module pass/fail and assignment scores, and demographics e.g. age and gender and includes students who are studying more than one module.

We have started to process and analyse this data alongside the Library EZproxy usage data. This work is helping us to define and refine the processes that we will use to analyse the data for all modules with 2014-15 presentations. For the time being our focus will be on attainment so are looking at two measures – the Overall Continuous Assessment Score (a percentage) and the final outcome (graded as Distinction, Pass, Fail, Deferred or Withdrawn).

The underlying process for preparing Library EZproxy student data has continued to be to create and run MySQL queries using MySQL Workbench and then export the data to Excel for basic analysis.  We process the data in Excel to give a count for the number of resources accessed. Then we combine this data with the results and scores using a combination of pivot tables and lookup tables to end up with a column of OCAS scores, a column of results and a column with the resource use count.  Each row is a separate student and at this stage once the data has been joined together we anonymise the data by removing the student ID we used to join the two data sets together. 

Copying the resulting data into SPSS we have started to carry out a set of statistical tests on the pilot data. The tests include Spearmans and Pearsons Correlations and one way ANOVA multivariate analysis. Our literature review (which is ongoing) is informing our choice of statistical tests.

The pilot dataset is very small but nevertheless, our (very) draft findings from just three modules are really interesting:

  • Students gaining a distinction seem to be using twice as many library resources as those who pass.
  • Students who fail use less than a quarter of the number of library resources used by those students who gain a distinction.
  • There looks to be a positive correlation between the number of library resources used and the student’s OCAS (Overall Continuous Assessment Score).

Graph of use of library resources by resultWe’ve a lot more work to do, to validate what we’ve found so far, and to extend it to other modules to see if they have similar patterns of activity, as we might just have hit upon modules that are atypical.

As mentioned in a previous post, processing data in Excel is very time consuming and there is a high risk of generating errors due to manual processing and the multiple steps required. To combine and process a complete set of OU and Library data for 2014-15 presentations efficiently and effectively we need a more automated method.  We now have access to SAS Enterprise Guide, we are investigating SAS as a possible tool for querying and processing the OU and Library data, producing statistical analyses and presenting the findings.  So we’re looking at suitable SAS training for the Project team and starting to think about the roles that may be required for a Business as Usual Library Data Service.


Posted in Research study 2 | Tagged , , , | Leave a comment

Research Study 2: post #1

The Library Data project team has been carrying out preliminary work towards Research Study 2 (which will investigate whether there is a correlation between library resource activity data and student attainment) throughout January.

We have had a couple of meetings with the OU’s Institute of Educational Technology (IET) to discuss the OU data that we will need from them and for advice on appropriate statistical analysis techniques and methodologies. Using an appropriate methodology or process for preparing the data for statistical analysis is important. We are defining our data preparation processes at the start of each investigation and refining as we go along. This approach feels very similar to the ‘agile’ methodology that the team used in their previous project, OUDA.

We have started to carry out statistical analyses on our data, producing frequency distributions and frequency groupings for particular modules. We have been teaching ourselves how to use SPSS statistical analysis software and learning more about the statistical analysis functionality within Excel, including ‘binning’ to create intervals. We have (unfortunately and not with out considerable frustration!) discovered that the results output from the frequency analysis ‘wizard’ within Excel 2013 are incorrect. It seems that one row of data is always missed, strangely it’s not always the same row. We plan to use SPSS to create the bins instead.

Incidentally, we also discovered an issue with SPSS: for any particular module SPSS z-scores were slightly different to (higher than) the Excel z-scores when using the same no. of decimal places. Our research suggests that SPSS automatically assumes (and so calculates standard deviations on basis of) a ‘sample’. Whereas the formula we’d used in Excel was that for a ‘population’.  Standard deviation is higher for the ‘sample’ because you divide by n-1 instead of n.  From what we have read there is no simple way in SPSS to change the z-score formula/calculation. Presumably this is something that the majority of SPSS users ‘just live with’? It is only going to make a significant difference where populations are very small. It seems strange that the statistics course the project team studied last year (Statistics in Education for Mere Mortals) and statistics text books go to the trouble of explaining the difference between populations and samples and give the different formulas for each, but a well-used tool for stats analysis (SPSS) doesn’t give you the population option. Indeed SPSS text books don’t even mention that it’s not an option!

We have continued to build on our SQL knowledge to create new queries as required for our investigations.  The analysis of ATHENS data is ongoing, this study is aiming to find out about library e-resource access at each study level (1, 2, 3 and post-graduate) within each OU Faculty and whether the levels of e-resource access by Faculty and study level in ATHENS and EZproxy data follow similar patterns. 

Findings so far

From looking at the EZproxy data for all OU students we see a pattern that shows that their use of library resources increases by level, e.g. there is more usage at level 2 than level 1 and more at level 3 than level 2.  [The levels essentially equate to years 1, 2, 3 of a degree course, but an OU student will do several level 1, 2 and 3 modules over several years to build towards their degree.]  Post-graduate usage is higher than at level 3. Students are expected to develop their independent learning skills and make more use of library resources as their studies progress and so these results are unsurprising.


The next steps for the project are to start to combine library use data with data on student success and retention.  Initially this will be for a small sample of modules to allow us to test and develop suitable methods, but the aim will be to do work that goes across the whole range of levels and faculties.

Posted in Research study 2 | Leave a comment

Research Study 1: Basic e-resource access data

We are nearing the end of our first study, which was to produce some basic information about Library e-resource access by Open University (OU) students. We have been using data from the EZproxy Starting Point URLs (SPU) logs to find out about library e-resource access at each study level (1, 2, 3 and post-graduate) within each OU Faculty. We have recently started to analyse ATHENS data too.

OU e-resource access via ATHENS is a much smaller set of data than EZproxy; we estimate that EZproxy access accounts for 89% of OU library e-resource access and ATHENS 11%. We are investigating whether the levels of e-resource access by Faculty and study level in ATHENS and EZproxy data follow similar patterns, although we know that the resources being accessed will generally differ.

We are starting to identify patterns of use, and it may be useful to compare this data with existing studies on library use by discipline. Will our data for a distance learning institution match data from campus-based universities? We may then want to consider developing benchmarks of library use per faculty/level. This work is a recommendation from a previous OU Library study by Neil Dixon who explored whether library use correlates with module satisfaction. These benchmarks would enable Liaison Librarians to plan library interventions, for example for courses where low-level use of resources has been identified.  One of the areas for us to follow up is looking at the patterns of use comparing modules that have different levels of library resources being recommended.

The process has been to create and run a MySQL query for each Faculty using MySQL Workbench and then export the data to Excel for basic analysis. We have also queried an OU Tableau workbook for numbers of students at module start, part-way into the module and at the completion date and exported this data to Excel.

The preparation of the data (contained within fairly substantial datasets, each with several hundred thousand rows) within Excel has been time consuming. The creation of pivot and look-up tables is relatively quick but the semi-automated counting of rows and merging of EZproxy data with OU data takes time especially for Faculties with large numbers of modules.

There have been some advantages to using Excel: we have increased our knowledge of the data, plus our understanding of how to manipulate data effectively and we have identified (and resolved) a couple of important database-related issues. Manually processing the data in Excel has also enabled us to identify gaps in the data. However it may be more efficient to have access to all of the data within MySQL Workbench (or another ‘tool’) and to query it there rather than create pivot and look-up tables etc. within Excel.  We are also exploring doing some of the counts as part of the SQL query to save Excel processing time.  Additionally, we have realized that subsequent studies are likely to require the use of SQL queries to analyse the data due to the size and complexity of the data required. We shall soon start investigating possible tools (as listed in our previous blog post).

Data analysis
We have used the EZproxy data to produce the following figures for each OU Faculty by level of study:

  1. The average number of e-resource accesses per student (who has accessed e-resources)
  2. Number of students at the part way point of the course (PWP) compared with the number accessing e-resources
  3. Number of students who completed the module (MCP) compared with the number of students who accessed e-resources


The underlying data for each Faculty was based on module-level data from EZproxy logs and from the OU’s Tableau workbook. The full set of data required for the analysis was not available for all modules. Modules with missing data were not included in the study. Data from students studying resits was also omitted.

Where students are studying more than one module the EZproxy system cannot track which module their e-resource access was related to.  Therefore the datafiles include an entry for each module studied by the student at the date of the resource access.  The effect of these duplicate entries on the figures has not been analysed. Future studies could also analyse and compare e-resource accesses for students who are only studying one module with students who are studying more than one module.

PWP: The number of students at the PWP is used across the OU to represent the number of students who started the module. This number is likely to be more accurate than the number of students registered at the start of the module. The number of students registered at the start can include students who didn’t actually start studying the module; these students may (for example) have changed their module choice or paused their study.

Comparisons between the PWP student numbers and the MCP student numbers are simple comparisons with the number of students recorded as having accessed library resources.



Posted in Research study 1 | Leave a comment

Up and running with the OU Library Data Project

The Project Team have  been working on the Library Data project for about three weeks now. Here are some of the key questions we’ve been researching.

What research has been done using Library usage data?

We’ve made good progress with a literature review of research that’s been carried out using library data within libraries. Research studies have explored student achievement, retention, use by discipline, and demographics. We’ve decided that student achievement is a good starting point for our research. Can we show that there is a correlation between student achievement and use of our Library e-resources, as other HEI libraries have done (e.g. Cox & Jantti, 2012 2; Stone & Ramsden, 2103 1)?

Research of this kind often focuses on use of the physical library, or a combination of physical library use and e-resource data. At the OU Library we’re going to need to focus solely on use of e-resources and perhaps other measures of digital library use. Possibilities include:

  • queries to the Library helpdesk
  • number of information literacy skills activities accessed
  • remote attendance at online Library training events

We’ll continue to build on our literature review, but we have enough information now on research into student achievement to begin the first phase of our analysis work.

What data on library use is available to us?

We’ve started to explore the data that we have within the library on use of our electronic resources.

Data source Description
EZproxy raw logs All user activity during a session, including requests for images, scripts etc.
EZproxy starting point URL logs Details of the database or article that users first click into, but no activity after that is recorded.
Athens Data Details of user sessions from resources authenticated using Athens.
LibLink data LibLink is our in-house developed resource system. Log files would include any resources that were recommended to students within their modules.

Our main source of Library usage data is going to be the EZproxy starting point URLs log, and we will begin by analysing that data. We’ll consider analysing the other data logs alongside the EZproxy data to provide additional measures of library use for comparative purposes.

What additional data will we need?

We’ll need to access additional data held within other OU systems to join up with our Library data. For our initial research we will need to include:

• OU Qualification (e.g. English Literature)
• Module
• Level of study (Level 1, 2, 3, or postgraduate)
• Faculty
• Degree result

We are talking to our institutional data experts about what data we will need to access and about statistical expertise.

Which tools should we use to query and analyse the data?

Initially we asked our Library IT team to query the raw data on our behalf and they created MS Access files for us to work with. This did not work that well for us for a variety of reasons e.g. the volume of data made it slow to manipulate the data, and we needed to go back to our IT colleagues each time we needed to refine queries. We are currently using MySQL workbench to run our own queries on the raw data, and exporting to Excel for initial analysis.

We’ll need to evaluate further tools, and some that we have identified so far include:

  • SPSS
  • Tableau
  • Jaspersoft business intelligence
  • Qlikview

What should we focus on initially?

We are currently working up plans for our first two research studies:

Research Study 1 – Basic usage data

Our first task will be to produce some basic information about Library use by OU Faculty and level. We will be able to identify patterns of use, and it may be useful to compare this data with existing studies on library use by discipline. Will our data for a distance learning institution match data from campus-based universities? We may then want to consider developing benchmarks of library use per faculty/level. This work is a recommendation from a previous OU Library study by Neil Dixon who explored whether library use correlates with module satisfaction. These benchmarks would enable Learning and Teaching Librarians to plan library interventions, for example for courses where low-level use of resources has been identified.

Research Study 2

Our second study will focus on library use and student achievement to test the hypothesis that:

There is a statistically significant correlation between library resource activity data and student attainment.

At this stage in the project there are many different strands to explore and many questions to answer. We will explore some of these areas in more detail in future posts.

1. Cox, B.L. and Jantti, M. (2012) ‘Capturing Business Intelligence Required for Targeted Marketing, Demonstrating Value, and Driving Process Improvement’, Library & Information Science Reserach, Elsevier BV, 34(4), pp. 308-316, [online].

2. Stone, G. and Ramsden, B. (2013) ‘Library Impact Data Project: Looking for the Link between Library Usage and Student Attainment’, College & Research Libraries, American Library Association, 74(6), pp. 546-559, [online].

Posted in Update | Leave a comment

Welcome to the OU Library Data Project!

At a time when the University has a focus on improving student retention it is important that the role that use of library services and content plays in student retention and success is articulated and understood. In Library Services we collect large amounts of data and use some of it to report on, but we have not systematically investigated these data or carried out statistical analyses to derive insights about the relationship between library use and student performance or retention. The impact that OU library services have on student achievement and retention (including progression) is not currently understood. The Library Data Project aims to shed light on this aspect and ultimately aims for its findings to improve the student experience.

OU Library Data Project Powerpoint slide

The University of Huddersfield’s Library Data Impact Project analysed library usage data of 33,074 undergraduate students across eight UK universities. They found a statistically significant relationship between student attainment and two library indicators – e-resources use and book borrowing statistics. The University of Minnesota 1 found library usage in the first year is associated with higher performance and retention. We are aiming to find out if this is also the case at the Open University, a distance learning institution.

The two projects are keen to make it clear that any conclusions drawn are not indicators that library usage and student attainment or retention is a causal relationship. Other factors will have an influence on students’ achievements.

Library Analytics roadmap
We’ve designed an OU Library Analytics roadmap to cover a package of activities aimed at achieving three specific outcomes over the next couple of years:

  • Have the evidence to demonstrate how library engagement enhances student achievement and retention – with quantitative and qualitative studies
  • Be able to provide library data more readily for library staff and other university staff to make use of
  • Build up the skills and confidence of library staff in being able to use data to show how library use contributes to student success and retention

Key early tasks for the OU Library Data Project Team are to establish some processes, places and tools to analyse library data, to join up library data with other data from the university and to start producing some case studies and evidence as early as possible around what the data tells us about student engagement with the library. In the medium and longer term it is also planned to do some qualitative work in the form of some form of longitudinal study with students.

1. Soria, K. M., Fransen, J. and Nackerud, S. (2014) ‘Stacks, Serials, Search Engines, and Students’ Success: First-Year Undergraduate Students’ Library Use, Academic Achievement, and Retention’, The Journal of Academic Librarianship, Elsevier BV, 40(1), pp. 84–91, [online] Available from:

Posted in Update | Leave a comment