The project is now into the second month and we’ve covered quite a lot of ground already. Apart from setting up the usual project processes, blogs and getting everything running we’ve pushed on with starting the technical build work, with the RISE developer Paul.
So far we’ve already designed the recommendations database to store the EZProxy log files. We are processing the full log files and now have all our log files back to December ingested into the database. We’ve got a test feed of data to be able to add course codes to the database so we can relate searches to courses. That will let us show searchers what people on their courses are searching for. We’ve identified a tweak to the file as it currently just gives us the first course that students are studying. It might not be an issue for other institutions but for the OU students might be studying several courses at a time. [To clarify: OU students study separate modules e.g. http://www3.open.ac.uk/study/undergraduate/course/aa100.htm which may form part of their degree course, or may be studied separately]
Our EZProxy log files contain records coming from both our Ebsco Discovery Solution and from SFX. So we’ve been investigating how to get more data about the entries in the log files. The approach that we have been taking is to use the log files as the base layer of data and then query other systems to pull in further information to add to the database to help the recommendations. So we are using the EDS API to draw in subject data and looking at the SFX API to do something similar. Through a combination of ISSNs, DOIs and other techniques we able to add in data such as journal tiles and article titles.
For recommendations we’ve settled initially on three levels
- Level one provides recommendations based on a course you are associated with. e.g. ‘people on your course are searching for these articles’.
- Level two recommendations are based on association. So a connection is assumed between documents searched by a user closely together in a session
- Level three makes connections based on subject data about the article by comparing subject terms and making recommendations on the basis of best matches between articles.
Recommendations will be relevance ranked and reinforced by a rating system that asks users to rate how useful they are to them. We’ve been working on technical diagrams and documentation that will get added to the blog as part of the main technical post.
Code design is taking account of the need to build code that can be released publicly. So we are setting it up so users can easily add in the details of which APIs they want to use and configure details of settings such as their authentication system. We’ve also agreed the wireframe for the user interface and started some discussions with colleagues about the Google Gadget version of search. On the evaluation side we’ve agreed with colleagues carrying out focus groups on the EDS system that they will ask for views about the value of recommendations.
Finally this month, Liz and Richard went to the JISC Activity Data Programme Startup meeting in Birmingham. Apart from the chance to hear about the Programme, the other projects and the synthesis project that will be working to draw together the wider aspects of the projects, there were several useful sessions. Particularly helpful was the technical discussion which looked at challenges, issues and solutions covering not only technical aspects but also IPR and anonymity, data issues and user interfaces. As well as uncovering some of the issues it also helped to pull out some common ground between the projects and areas where we might be able to collaborate. So we’ve a number of areas to follow up with several projects and have already started to do so.
The day was also the chance to talk about the hypothesis that is at the heart of the Activity Data projects. Evaluation of the hypothesis is very much seen as key and building up evidence to support or disprove it is critical. Interestingly the hypothesis approach means that it is much more evident that the project is experimental. It isn’t expected to build a fully sustainable service in a six month period. JISC see this period as phase one and a key part of the synthesis project is to help to tease out what comes next. All in all a really useful day.