We thought that it might be useful to set out step by step how we get from an EZProxy logfile entry to a set of bibliographic data that we can use in the display of our recommendations. As a minimum when you make a recommendation users will want to see the article title so they can judge whether it is relevant. The parser we use to process the daily EZProxy logfiles carries out the following steps:
- Extract the Ebsco accession number from the EZProxy url.
We are able to do this because we push access to our Ebsco Discovery Solution (EDS) through EZProxy. Consequently most of the records in the EZProxy logfile will be Ebsco urls.
- Use the Ebsco accession number as a key to obtain bibliographic data from the EDS API.
We query the Ebsco API with the Ebsco accession number and look for a DOI, ISSN, Volume number, Issue number and start page. We aren’t allowed to store this in the RISE database so we then have to obtain some article level metadata from a source that allows us to store bibliographic data within the RISE database.
- Use the Ebsco data, ideally the DOI but if there is no DOI then use the ISSN, Volume number, Issue number and start page to query Crossref.
At this stage we are trying to obtain a match for an article from the Crossref database so we can retrieve some bibliographic metadata that we can store in the RISE database. Ideally we want to match against the DOI but if we can’t then we look for other combinations of data.
- From Crossref retrieve the article title, journal title, ISSN, volume and issue details and start page.
Once we have some relevant bibliographic data then we store them in the RISE database along with the DOI, if present.
Why are we using Crossref for the bibliographic data?
Crossref’s terms and conditions allow libraries to store the data locally, ‘the Library may cache the DOIs and metadata and incorporate DOIs and metadata into their content and library systems’ http://www.crossref.org/03libraries/33library_agreement.html Unfortunately our understanding of Crossref’s terms are that they would prevent the data that is derived from Crossref being openly released.
What other approaches could be adopted?
It may well be possible to adopt other approaches. Two spring to mind. EDINA have recently openly released a set of OpenURL data and it would be interesting to try to match RISE content against that dataset. Another alternative would be to use the Mendeley API to do a similar exercise. It would be interesting to see which might give the best result. In both cases the bibliographic data is openly available so would mean that a RISE dataset could be released that could include some bibliographic data