What has RISE produced?
- A database based on EZProxy activity data, enhanced with bibliographic data from Crossref, and containing terms used to search via the RISE interface.
- A web search interface http://library.open.ac.uk/rise to show recommendations to users searching the Ebsco Discovery Solution
- A Google Gadget version of the search interface http://www.open.ac.uk/blogs/RISE/search-interfaces/
- A set of project code http://code.google.com/p/rise-project/ released as open source (GNU GPL) to allow others to parse logfile data into a database and show recommendations
A major barrier to the open release of data from RISE has been the lack on open article level metadata. One future area of work could be to encourage providers to open up their article level metadata to allow others to build services that use it. It isn’t always clear what can be done with data that can be obtained through APIs and web services and it would be helpful to have a resource that recorded what different data is out there, how it can be accessed (i.e. what record keys do you need), what data can be retrieved, and what you can or cannot do with the data.
Opening up activity data to an extent that it could be aggregated is clearly in the very early stages. It would be useful if some standards and formats were established and agreed that could be used and then applied systems. Part way through the RISE project the OpenURL project at EDINA released the first batch of OpenURL data and we did a comparison between the data stored by RISE and the EDINA OpenURL data. If you start with EZProxy data then there is very little cross-over with the OpenURL standard.
What can other institutions do to benefit from the work of RISE?
To help people replicate the proof of concept work that RISE has undertaken we’ve put together a few resources to help get people started:
- a technical resources page outlining how RISE approached the work
- a proxy logfile flowchart to show how the data in the logfile can be used for recommendations
- a code repository for the RISE code
- Blog posts here covering topics such as data privacy, anonymisation and data reuse.
- The project email address email@example.com will continue to be monitored so if there are questions about building recommendations from EZProxy data then by all means get in touch.
Most significant lessons
Lesson 1: If you want to make use of activity data then you need to make sure that you retain it for an appropriate period of time. Our EZProxy logfiles were routinely destroyed after a few months because they were not being used. Using the data to provide recommendations provides justification for keeping the data (but you still need to ensure you think about when you delete that data from the recommendations system).
Lesson 2: You can make recommendations out of proxy logfiles but… (Activity Data from proxy logfiles) … it isn’t particularly straightforward. All the logfiles give you as a recommendation is a relationship that says ‘a user looked at this resource and then looked at that resource’ To make other types of recommendations you need to treat the logfile as the first link in the chain of data. So you use the user logon to find out which course they are on (to make the recommendation ‘Students on your course are looking at these resources’) and save the search terms they use in your systems (to make the recommendation – ‘People who searched for this subject looked at these resources’).
Lesson 3: You need some bibliographic data and it isn’t always easy to get from the logfiles or from the systems you use. And when you get it you can’t always store it locally due to licensing restrictions. But you need article titles, journal titles and dates for example so you can show users a sensible recommendation. Users need enough information to be able to judge the potential value of the recommendation.
If we were allowed a Lesson 4 – it would be that users actually say that they like the idea of recommendations and would be happy to see them in our systems.
Addressing the big issues
Privacy, anonymisation and data processing have very much been at the heart of the work that RISE has been undertaking and several blog posts cover these aspects in detail.
For RISE we have benefited from some of the work that had been carried out before, particularly from the EDINA OpenURL project and the OU LUCERO project in paving the way for some of the privacy and licensing aspects.
RISE final thoughts
RISE has been a really enjoyable project, investigating an area that has been a bit different to things that we have tackled before. It has shown that recommendations can be made from EZProxy data, that users like recommendations and that overall there is a value in showing recommendations to users of new generation discovery solutions.
Thanks go to everyone who has contributed to making the project work so well, to Liz and Paul, the Project Manager and Developer, to the other JISC projects who have provided us with information or spoken at our events, to the Synthesis project team (David, Tom, Mark and Helen) and to Andy the Programme Manager who have all helped to make the six months of the RISE project go past so quickly and make it so interesting. And a big thanks to JISC for funding the project!