Project dissemination activity – short update January 2018

A write-up of some of the work of the Library Data project will be appearing in Information and Learning Science during 2018.  The paper “Library resources, student success and the distance-learning university” covers the module-by-module correlation analysis undertaken by the team.

The paper is available as a pre-print at:

Richard Nurse, Kirsty Baker, Anne Gambles (2018) “Library resources, student success and the distance-learning university”, Information and Learning Science,

The team will also be talking about some of the work and how it has been used at the Sherif event “What does your eResources data really tell you?” in February.

Posted in Update | Tagged , | Leave a comment

Project wrap-up and closedown

The Library Data project has now closed and this final blog post provides a summary of the main activities of the project and identifies some of the lessons learned from the work.

Project activities

The core project activities comprised several streams of activity

  • Taking library resource access data (derived from EZproxy and OpenAthens systems) and combining this with student results and completion data to produce a number of analyses and visualisations.
  • A detailed cohort study with two Social Work practitioner modules.  This combined some data analysis with data from a reflective quiz, an analysis of an assignment and interviews with students using the directed storytelling methodology from ethnography research approaches.
  • Improving the infrastructure used to manage and visualise library data and introducing ElasticSearch and Kibana.
  • Skills activities to help library staff build capabilities in analysing and using data.
  • Work to continue to push for library data to be included within the institutional data warehouse.
  • Communications around the project activities.

Lessons learned

  • Data cleaning – a large amount of effort is taken up in extracting and processing data before any analysis can be undertaken.
  • Skills – vitally important for the team to have data manipulation and analysis skills and to have skills in a range of tools such as Excel and SPSS but also some capabilities around SQL and SAS.
  • Communication – valuable to get messages out quickly to key stakeholders about initial findings.
  • Statistical capabilities – important to have access to advice from statisticians.
  • Improving skills and confidence – Hands-on sessions with data helped with improving staff confidence but some staff more inclined to engage with data activities.
  • Value of data in engagement – being able to provide detailed data to faculty colleagues very valuable in engaging them in conversations about how students are  using the library.
  • Flexibility – need to be prepared to adjust project – particularly important with an exploratory project.

Project outcomes

Data analyses produced by the project have been used as part of a new set of library reports provided to Schools about engagement with the library.

The project has shown that library resource accesses increase at the OU as students study at higher levels.  Students who gain higher grades are accessing more library resources (a pattern seen in many other institutions).   A forthcoming paper ‘Library resources, student success and the distance-learning university’ (2017).  Information and Learning Science, covers this part of the work of the project.

The small cohort study has started to show the value of using UX techniques to gain a better understanding of why students are using the library, what their motivations are and what their perceptions are of the value of the materials.

Some progress has been made with getting library data into the institutional data warehouse and other stakeholders now have library data inclusion on their agenda.

The Library has some improved capability to undertake data analysis as well as some better tools to manage internal data.

Posted in Update | Leave a comment

May update

Our library data work is coming to the end in the next few weeks so we’re working on finishing off some of the current work and have started to pull together lessons learned and end of project documents.

The main activity over the past month or so has been focused on the social work study.  This is looking at students studying one level 1 and one level 3 module and is analysing a range of qualitative and quantative data.   This includes library resource access data and results from a reflective quiz that students completed during their module.   One of the team also interviewed two students, one from each module to get them to talk about their experiences of using library resources.  Using an approach that has its roots in ethnography called ‘directed storytelling’, the interviews gave the students the chance to talk about their motivations at some length and provided a chance to explore in detail their engagement with resources and information literacy skills.  Even in a very small scale exercise like this there looks to be some interesting insights that are arising and we should be able to produce a useful case study.

Some of the library data analyses have now been used in our new annual reports to the academic Schools and seem to have been well-received.  We’ve also been spending some time talking to various people in the University about library data and how it might be useful, particularly when combined with other university data.   It’s helpful to see how other people can see how data about student engagement with library materials is a valuable source of information about what students are doing away from the formal learning management systems.

We’ve also been finishing off and submitting an article for publication on some of the work with library data from the autumn 2015 cohort of students, investigating the relationship between library resource accesses and module results.   Finally we’ve been able to use some visualisation tools to make some of the library data available internally in a way that allows library staff to manipulate it.

Posted in Update | Leave a comment

March update

The main focus for the project over the last few months has been to work on a small-scale cohort study, exploring several aspects related to library data and trying to get a better understanding that goes beyond the quantative data.

Working with two academics who lead on two modules in one of the University’s programmes, this means that we’re able to draw not only on the library data, but can also set it alongside results from a reflective quiz and then carry out some follow up interviews to try to understand more about what students are doing and what their motivations might be.

We’ve now completed some interviews and are now in the data analysis stage.  So we are coding interview transcripts, looking at survey results and pulling together relevant library data.

We’ve also been working on providing some data for a set of reports that are being provided to each School about their engagement with the library, we’ve carried out a few briefing sessions for some key stakeholders and put out a few internal communications messages.

The team also ran a session for library staff to get them to explore the library data using a scenario of reporting on library engagement with a course team and using data about library resource accesses, student results and completion figures.  This seemed to be a really useful exercise in getting people to think about how they might use the data in practice and it identified some of the challenges in building up data capabilities.library explorer badge

On the data analysis side we’ve carried out some work to look into whether students who complete modules are more likely to be accessing library resources.   We’ve also been writing up some of our early work for publication in a journal.

Posted in Update | Leave a comment

Library Data project – December update

The Library Data team have been out and about over the last three months with presentations at the Northern Collaboration Conference 2016 in September in Liverpool (Presentation slideshare) and at the UKSG Forum in London in November (Presentation PDF 123kb).   The presentations summarised early work that found a distinct pattern of increasing resource access as students study at higher levels and also a pattern where students who get better results in their modules are accessing more library resources.  But we’re also finding very great variation between different modules.

Since then the project team have been working on some correlation studies using two measures of student success, an overall continuous assessment score and an overall examination score.  Both of these scores are presented as percentages.   As with the initial work there is a lot of variation between different modules, so while some modules show positive correlations, others have no significant correlation.

We’ve now turned to start to look at retention and are initially focusing on looking at module completion rates and investigating whether there are differences between students who have accessed library resources and those who haven’t in terms of module completion rates.  One of the approaches we’ve looked at is to investigate the correlation between the percentage of students completing the module with the average number of library resources accessed by students on that  module and also with the percentage of students on the module who have accessed library resources.  Again, we’ve got wide variations between modules.

Attention has also turned to some student interviews to try to understand more about the ways students use information resources within their module, and motivations behind this. We are interetsted to find out how this qualitiative data might shed light on our understanding and interpretation of the quantitive data about e-resource accesses.

Some progress has been made with discussions with corporate IT about putting e-resource access data into the institutional data warehouse.    Internally within the library we’ve been exploring the use of elastic search and kibana as a method of handling and visualising the library data.  This shows some promise as a way of allowing library staff to quickly see patterns in the data.

Finally we’ve been working with colleagues across the library to provide data that can be used in a new series of library reports aimed at the Faculties and Schools.

Posted in Update | Tagged , , , | Leave a comment

OU Library data project update September 2016

Over the summer we’ve continued working on our library data project and have managed to build on some of the early pilot work with further analyses but using a larger pool of data.     We’re now able to run queries against the main institutional data warehouse so we  can run our own queries and look at wider trends.

Research Study 3
This has progressed quite a long way and we’ve been able to look at data from modules starting in 2014 and in 2015.  We’ve combined data on library resource accesses from both Ezproxy and OpenAthens with student results data.

We’ve combined some of the results categories together to slightly simplify the interpretation.  So while level 1 modules generally have Pass and Distinction categories, level 2 and 3 modules tend to have Grade 2 Pass, Grade 3 Pass, Grade 4 Pass and Distinction.   We’ve combined the different pass categories into one pass category.

If we look across the whole range of undergraduate students (around 300,000 students across the two years – as students will probably study more than one module in that period), we see the same sort of patterns we saw with the original three pilot modules.  Students who fail accessing around a third of the online library resources compared with students who pass.  Students gaining a distinction accessing nearly twice the number of library resources as students who pass.

Now we can look in more detail at the different modules, Faculties, levels of study and presentation dates, we can start to see that there are differences between them.  In some cases we will know that there are modules that don’t make so much use of library resources, but it’s very useful data for our liaison librarians to discuss with their Faculty colleagues.

Further studies
We’ve followed up Research Study 3 with a piece of work to look at whether we could follow the approach used by the University of Wollongong in their work

Covered in

and also at

Cox, B. L. and Jantti, M. (2012) Capturing business intelligence required for targeted marketing, demonstrating value, and driving process improvement, Library & Information Science Research. 34, PP308-316 doi:10.1016/j.lisr.2012.06.002

Although we are using Ezproxy data, one difference is that Wollongong have used a count of the amount of time students accessed resources whereas we are using a count of the number of Ezproxy accesses.

This has proved to be a really useful exercise as we’ve been able to follow most of the steps and have been fortunate to be able to correspond with one of the authors (Brian Cox) on some of the details, which has helped to clarify some of the steps and decisions.

One insight that this has made clear for us is the high percentage of students who don’t access the library (we know that not all modules require library use as their module materials can be quite comprehensive).    But the levels of non-use decrease as students study at higher levels and also seem to be decreasing over time as we’ve started to compare modules starting in autumn 2014 with those starting in autumn 2015 and are seeing more students accessing library resources.

What’s next
We’re still aiming on writing up the research for publication and will also be turning our attention to looking at the relationship between library use and student retention.  Plus, we’ve also a small cohort qualitative study starting in the autumn.


Posted in Data analysis, Data sources, Research study 3, Update | Leave a comment

OU Library Data project update May 2016

We’re continuing to work on the Library Data project and have made some good progress over the last two months in several areas.

Research Study 2
We used the same small sample dataset (three level 1 modules, n=11,501) that we had used to look at the relationship between library use and student attainment to analyse library use against demographic data to see if age, gender or previous educational attainment showed different levels of library resource access.  The picture that emerged was:

  • Older students (56 and older) (n=990) averaged more than 11 resource accesses, against a mean of just over 3 for the under 25 age group (n=3,382).  The mean number of library resources used increases steadily through the age groups.
  • Male students (n=5,344)  access a mean of 5.7 resources, female students (n=6,157) 4.7 resources.
  • Students with No formal qualifications, Less than A levels or A Levels or equivalent all accessed a mean of between 4.58 and 4.72 resources.  Students with HE qualifications a mean of 6.7 and those with a Postgraduate qualification a mean of 9.7 resources accessed.
  • On the face of it, you might draw the conclusion that the older you are and the higher your previous education experience the more library resources students access.  But interestingly when you look at combining age and previous education you get a more complex story (see graph below).
Mean number of library resources accessed
Resources accessed by age and education








Designing Research Study 3
We’re designing a detailed study to compare library resource accesses and student success. Colleagues in IET are working with us to guide the analysis.  This should give us a robust view of whether the early indications we had in Research Study 2 are borne out with further analysis.  We intend to publish the results.

Qualitative study
We’re talking to colleagues in a Faculty about a qualitative study with a level 1 and a level 2 module. The plan is to see whether there is a relationship between student participation in library skills activities and attainment. The study will look at qualitative and quantitative data and so will require a different methodology to our first three studies because they only looked at quantitative data.

Getting library data into the main university data warehouse
One of the big aims of the project is to get data about use of library online resources added to the main institutional data warehouse.  This should greatly help with encouraging data users to take library use into account when analyzing data about the student experience.  The good news is that our proposal to add library data has been approved and we are hoping for an Autumn 2016 implementation.


Posted in Update | Leave a comment

Data and analytics skills assessment

Improving data and analytics skills is something that is in scope for the Library Data project.  The aim is to identify the skills that the Library needs and to help to develop those skills.  So our starting point has been to assess our current level of data and analytics ‘readiness’.

The survey
The approach we’ve taken is to design a very short and simple questionnaire of five questions asking library staff to rate their level of confidence using a likert scale. Using the categories of Strongly Disagree/Disagree/Neutral/Agree/Strongly Agree.

The five questions are:

  • I am confident that I know what analytics data the library has and I know how to access it
  • I am confident that I know what analytics data the University has and I know how to access it
  • I am confident in my ability to use simple analytics tools
  • I am confident in my ability to analyse data
  • I am confident in my ability to use data to support decision making

The questionnaire was printed out and handed round at the start of a Staff Development Hour session in the library that was going to be giving an update on the work of the Library Data Project.  Library staff were invited to complete the questionnaire before the session started and responses were deliberately kept anonymous as we were interested in an overall  impression of our data readiness.

We collected the questionnaires straightaway and had 27 completed responses, most of the people attending the session.  The approach we took to the analysis was to score each response to each question using a weighting score.  Strongly Disagree was scored as 1, Disagree as 2, Neutral as 3, Agree as 4 and Strongly Agree as 5.  The weighting was multiplied by the number of responses against each level on the likert scale and then totaled for each question.   Dividing the total by the number of responses gives an average response that can be compared against the weighting scores.  So a score below 3 would be disagreement and over 3 would be agreement with the question.

For example, in Question 1, if you have the following pattern of responses:
Strongly disagree  1, Disagree 2, Neutral 1, Agree 4, Strongly Agree 2
Adding weighting and carrying out the calculations gives you:
(1×1) + (2×2) + (1×3) + (4×4) + (2×5) = 34, dividing by the number of responses (10) gives an average of 3.4 – a response between neutral and agree.

The scores for each question are useful in helping to identify where there is the least confidence and might shape which areas you could concentrate on.  The scores can also be averaged to give an overall level.

Presenting the results
We decided to create a graphic to present the results from the questionnaire in a more visual way.   The idea is that you can see at a glance the level of confidence against each of the questions and for the overall picture.  So the red ring is placed on the scale based on the average of the responses.  One advantage of the approach is that you can use the visualization to track how the readiness level changes over time by adding a different colour ring if you repeat the questionnaire.  And that is something that we plan to do later in the project.  A made up sample of the visualization approach is shown below:

Data skills readiness visualisation

Next steps
We plan to repeat the exercise later in the project, possibly at the end of a future update on the project at a Staff Development session.  We’d expect that we’d then be able to compare the scores to provide us with a measurement of change in confidence levels and to help with shaping future priorities in the project.

Posted in Data sources, Skills, Update | Leave a comment

Research Study 2: post #2

Pilot data study

We now have OU student data for our three pilot modules (one Arts, one Technology and one Law) across seven 2014-15 presentations. The data includes variables relating to attainment e.g. module pass/fail and assignment scores, and demographics e.g. age and gender and includes students who are studying more than one module.

We have started to process and analyse this data alongside the Library EZproxy usage data. This work is helping us to define and refine the processes that we will use to analyse the data for all modules with 2014-15 presentations. For the time being our focus will be on attainment so are looking at two measures – the Overall Continuous Assessment Score (a percentage) and the final outcome (graded as Distinction, Pass, Fail, Deferred or Withdrawn).

The underlying process for preparing Library EZproxy student data has continued to be to create and run MySQL queries using MySQL Workbench and then export the data to Excel for basic analysis.  We process the data in Excel to give a count for the number of resources accessed. Then we combine this data with the results and scores using a combination of pivot tables and lookup tables to end up with a column of OCAS scores, a column of results and a column with the resource use count.  Each row is a separate student and at this stage once the data has been joined together we anonymise the data by removing the student ID we used to join the two data sets together. 

Copying the resulting data into SPSS we have started to carry out a set of statistical tests on the pilot data. The tests include Spearmans and Pearsons Correlations and one way ANOVA multivariate analysis. Our literature review (which is ongoing) is informing our choice of statistical tests.

The pilot dataset is very small but nevertheless, our (very) draft findings from just three modules are really interesting:

  • Students gaining a distinction seem to be using twice as many library resources as those who pass.
  • Students who fail use less than a quarter of the number of library resources used by those students who gain a distinction.
  • There looks to be a positive correlation between the number of library resources used and the student’s OCAS (Overall Continuous Assessment Score).

Graph of use of library resources by resultWe’ve a lot more work to do, to validate what we’ve found so far, and to extend it to other modules to see if they have similar patterns of activity, as we might just have hit upon modules that are atypical.

As mentioned in a previous post, processing data in Excel is very time consuming and there is a high risk of generating errors due to manual processing and the multiple steps required. To combine and process a complete set of OU and Library data for 2014-15 presentations efficiently and effectively we need a more automated method.  We now have access to SAS Enterprise Guide, we are investigating SAS as a possible tool for querying and processing the OU and Library data, producing statistical analyses and presenting the findings.  So we’re looking at suitable SAS training for the Project team and starting to think about the roles that may be required for a Business as Usual Library Data Service.


Posted in Research study 2 | Tagged , , , | Leave a comment

Research Study 2: post #1

The Library Data project team has been carrying out preliminary work towards Research Study 2 (which will investigate whether there is a correlation between library resource activity data and student attainment) throughout January.

We have had a couple of meetings with the OU’s Institute of Educational Technology (IET) to discuss the OU data that we will need from them and for advice on appropriate statistical analysis techniques and methodologies. Using an appropriate methodology or process for preparing the data for statistical analysis is important. We are defining our data preparation processes at the start of each investigation and refining as we go along. This approach feels very similar to the ‘agile’ methodology that the team used in their previous project, OUDA.

We have started to carry out statistical analyses on our data, producing frequency distributions and frequency groupings for particular modules. We have been teaching ourselves how to use SPSS statistical analysis software and learning more about the statistical analysis functionality within Excel, including ‘binning’ to create intervals. We have (unfortunately and not with out considerable frustration!) discovered that the results output from the frequency analysis ‘wizard’ within Excel 2013 are incorrect. It seems that one row of data is always missed, strangely it’s not always the same row. We plan to use SPSS to create the bins instead.

Incidentally, we also discovered an issue with SPSS: for any particular module SPSS z-scores were slightly different to (higher than) the Excel z-scores when using the same no. of decimal places. Our research suggests that SPSS automatically assumes (and so calculates standard deviations on basis of) a ‘sample’. Whereas the formula we’d used in Excel was that for a ‘population’.  Standard deviation is higher for the ‘sample’ because you divide by n-1 instead of n.  From what we have read there is no simple way in SPSS to change the z-score formula/calculation. Presumably this is something that the majority of SPSS users ‘just live with’? It is only going to make a significant difference where populations are very small. It seems strange that the statistics course the project team studied last year (Statistics in Education for Mere Mortals) and statistics text books go to the trouble of explaining the difference between populations and samples and give the different formulas for each, but a well-used tool for stats analysis (SPSS) doesn’t give you the population option. Indeed SPSS text books don’t even mention that it’s not an option!

We have continued to build on our SQL knowledge to create new queries as required for our investigations.  The analysis of ATHENS data is ongoing, this study is aiming to find out about library e-resource access at each study level (1, 2, 3 and post-graduate) within each OU Faculty and whether the levels of e-resource access by Faculty and study level in ATHENS and EZproxy data follow similar patterns. 

Findings so far

From looking at the EZproxy data for all OU students we see a pattern that shows that their use of library resources increases by level, e.g. there is more usage at level 2 than level 1 and more at level 3 than level 2.  [The levels essentially equate to years 1, 2, 3 of a degree course, but an OU student will do several level 1, 2 and 3 modules over several years to build towards their degree.]  Post-graduate usage is higher than at level 3. Students are expected to develop their independent learning skills and make more use of library resources as their studies progress and so these results are unsurprising.


The next steps for the project are to start to combine library use data with data on student success and retention.  Initially this will be for a small sample of modules to allow us to test and develop suitable methods, but the aim will be to do work that goes across the whole range of levels and faculties.

Posted in Research study 2 | Leave a comment