(More) Innovations in Reference Management
Today is the second ‘Innovations in Reference Management’, which the TELSTAR project has organised as parts of it’s dissemination and ‘Benefits Realisation’ activity.
The day is starting off with an introduction from Nicky Whitsed, Director of Library Services at the Open University. She reflects that it was 22 years ago that she was involved in implementing the ‘Reference Manager’ software (interestingly in medicine) – and highlighting the various trends that are pushing innovations in the area today – Information Literacy, Linked Data, the need to cite datasets as opposed to more traditional publications.
Now Martin Fenner going to talk about Trends in Reference Management. Also going back to ‘Reference Manager’ in 1985 – a personal store of knowledge and an index for a (print) offprint collection. Soon after this it became possible to import references using standards like Z39.50. By 1992 – article about EndNote Plus said “It is hard to imagine a reprint file management and bibliography generation program that does more than EndNote Plus”; “it automatically assemble bibliographies from inserted in-text citations”. Martin says we shouldn’t forget that for most researchers this is still the main use of reference management packages – and that things have not really changed much on this front since 1992.
However, then we have the web. Where previously we had Index Medicus, now we have PubMed freely available online. In 2000 the DOI was introduced. The web and online activity prompted questions of how we could share references with others. Some reference management s/w are completely online – only one copy of the reference which is stored online; other packages synchronize local copies of data with online storage (EndNote and Zotero take this approach). While there are many reasons to share references, Martin bringing us back to writing for publication – and the fact that you may be writing collaboratively and need to share references – and also the new online authoring environments such as Google Docs, Office Live, Buzzword etc. However, so far we haven’t seen good integrations of reference managers into these online writing tools. Martin suspects this is because of the niche nature of reference management.
Another idea that is perhaps obvious (says Martin) but took a while to be developed is storage of electronic copies of papers (usually pdf). Now seeing software which does this: ’Papers’ – new software for Mac that manages references and pdfs (looks very much like iTunes). Also Mendeley recently launched which also manages pdfs. While many other packages allow you to attach pdfs to references, but not as tightly integrated as Papers and Mendeley.
However, once you have sharing, and you have attachment of full-text, you immediately have copyright questions raised. Even where there are more permissive licenses – such as Creative Commons – it may be that terms such as ‘Non commercial’ can cause complications – as this is about how the copy is used, not whether you make a copy.
By 2009 there are a wide range of reference management tools – Martin shows list of 15, but notes this is only small subset of available software/services. Martin says while they all tend to do the ‘basic’ tasks, there is a wide variety of additional features, and also vary in price (starting at ‘free’ for Zotero and Mendeley). But as an institution you won’t have the resource to support them all, so have to make a decision.
Martin now highlighting a few of the more recent developments in reference management software:
Mobile interfaces – iPhone apps (notes Nature iPhone app delivers papers in ePub format, not pdf). All references in Nature iPhone app are linked to online copies etc. Also iPhone app from Scopus – includes alerts etc. iPad – makes web based services like citeulike usable on portable device; Cell has been experimenting with different formats for articles online – not just pdf, and also additional services linked to document – but requires flash, so doesn’t work on iPad! PLoS has iPad app.
Challenge – does every journal get it’s own interface via an app? ’Papers’ for Mac has an iPad version – can sync between desktop and iPad – gives single interface to all your pdfs
So Martin highlights:
- variety of mobile formats: PDF; ePub; HTML; Flash
- different types of mobile service: alerts; readers etc.
Martin now highlighting attempts to introduce unique identifiers for authors – mentioning ORCID (Open Researcher and Contributor ID, http://www.orcid.org/). Previous schemes have been limited – by geography, discipline or publisher. ORCID is meant to be universal. The number of ways an author name can be expressed in a ‘reference’ is extremely large – even for relatively unique names. Also specific challenges dealing with names which are expressed in a different script to native language – e.g. Chinese names written in English script.
Idea is that when you submit a manuscript for publication, you have to submit your ORCID. Easier to do this for all papers going forward – challenge of going back and doing for all historical publications probably too big a job.
ORCID could be used not just for authors, but for other roles – e.g. reviewers, compilers (of data), programmers (of software).
Now over 100 organisations participating in ORCID initiative – but still much work to be done and things to be worked out. Has been agreed that the software developed by Thomson Reuters for their ‘ResearcherID’ will be reused to provide infrastructure/software.
Martin hopes to see reference management software adopting use of ORCID in 1-2 year timescale.
Will start to see new services based on ORCID – e.g. like biomedexperts – can summarise and authors expertise, and also see connections between authors (e.g. who has co-published with whom).
Martin mentions use of BibApp which allows collection of publications information for all researchers within an institution (open source software developed at University of Wisconsin and University if Illinois)
Martin mentions ‘CRIS’ (Current Research Information Systems) – good identifiers such as DOI and ORCID really help with these.
Martin suggests that using ORCID could make it easier to reference new types of ‘publication’ – e.g. blog posts, and see links between papers and blog posts written by same author.
Martin mentioning ‘DataCite’ for citing datasets – we will hear more about this later today from Kevin Ashley I expect.
Finally Martin saying – references now appear everywhere – published on the web – need ways of finding them and capturing them. Also look at ways of assessing ‘importance’ – e.g. citation counts is traditional way of doing this. Now PLoS looks at page views and pdf downloads as well as citation counts – what they are calling ‘article level metrics’ – while this is a common concept in social media, it isn’t commonplace in scientific literature.
Also, not just about metrics but quality. Services like ‘Research Blogging’ and ‘Faculty 1000′. Twitter also growing in usage – can be a good way of discovering references, but how to get into your reference manager (I’ll mention something about this later today in my session)
Innovations in Reference Management 2
Following on from our popular January event, I’m now very pleased to announce a second Innovations in Reference Management event, which is taking place on 21st June 2010, at the studio in Birmingham.
This free event will include talks looking at trends in reference management, different approaches taken to managing lists of references (such as ‘reading lists’), the latest developments with the freely available Zotero reference management tool, where and how references appear on the web, and looking at the emerging requirements to reference or cite datasets.
You can visit the event page for more information, full programme details, and to register.
The tag for the event will once again be #irm10 on twitter, or simply irm10 elsewhere.
TELSTAR – the next phase
The TELSTAR project was originally scheduled to finish at the end of February 2010. However, I’m really pleased to say that with the support of the Open University and JISC, we’ve got agreement to extend the project through to July 2010.
At this point, the project has essentially completed it’s initial objectives, and the deliverables listed in the original project plan (http://www.open.ac.uk/telstar/Deliverables) are more or less complete (with a few exceptions). I’ll post some more structured links to the various deliverables over the next couple of weeks, but many are contained within the ReMIT (Reference Management Integration Toolkit) that I’ve been posting on this site over the last couple of weeks.
At this point, the focus of the project will change somewhat, and project team will be changing as well, so at this point I want to say a huge thank you to Jason, Jes (the developers) and Sarah (our project support officer up to the end of this week) for their hard work over the last 6-12 months. They’ve all been key in getting us to this point, and I can’t quite believe what we’ve been able to achieve – I’m so impressed with what we’ve done. There are too many other people to mention who’ve made it possible for us to get to this point, but I should mention Richard (the Digital Libraries Programme Manager at the Open University library, and my boss) who has provided support and advice throughout.
The next phase of the project will focus on three areas:
- more piloting at the Open University
- working with other Moodle/RefWorks sites to get them up and running with the TELSTAR developments
- running one or more community events around reference management and related topics
I hope to be announcing some dates for one or two events in the near future. If you are interested in making use of the TELSTAR developments, especially if your institution is already using Moodle and RefWorks, get in touch. Also, if you have comments on the work we’ve done so far, such as ReMIT, then let us know, as these pieces of work can continue to develop and be refined between now and July.
Once again, a big thank you to the project team, and everyone else who has contributed to the project over the last 18 months (and longer).
ReMIT: Reference Management Integration Toolkit
I’m currently re-formatting (and partially re-writing) my approach to the project deliverables. Rather than focussing on the individual documents listed in our deliverables, I’m focussing on the idea of a ‘toolkit’ which is intended to help others considering issues around integrating ‘Reference Management’ into their learning environment(s).
I’m aiming for something pretty practical, but want to ensure that I don’t simply list “how we did it for TELSTAR”, as I’m aware the decisions we made in many areas will be related to insitutional specific policies and practices.
I’m going to start posting sections as I write them, but I’m posting the table of contents here for comment. If you were approaching this, what would you like to see here – have I missed anything?
ReMIT (Reference Management Integration Toolkit)
- Introduction
- What is Reference Management?
- Introduction to Referencing
- Reference Management software
- Why Integrate Reference Management?
- Information Literacy and Referencing
- Integrating Reference Management into a Technology Enhanced Learning environment
- Stakeholders
- User Requirements
- Linking to online resources
- Workflows
- Supporting good practice in Reference Management
- Information literacy skills
- Referencing styles
- Copyright and Reference Management
The Open University approach
- Background
- Business Case
- RefWorks and course production
- MyReferences
- Linking
- Authentication
Sharing References
A lot of the TELSTAR project has focussed on how authors or tutors put references into course materials, and how students can then take copies of those references and manage them. However something we’ve always wanted to look at is how references might be shared by students between themselves.
We originally had thought of this functionality as perhaps allowing a student to ‘publish’ a reference (or set of references) to a public area – this is similar to how many Reference Management packages enable sharing – for example RefShare functionality in RefWorks, and concepts of publishing your library, and using Groups in Zotero.
However, the more we talked about the possible ways of sharing, and particularly talked to students about what they would find useful, we realised that this idea of ‘publishing’ a set of references probably wasn’t what was needed. It was also clear that students didn’t particularly want to share ‘references’ – they wanted to share ‘resources’ (i.e. the thing that the reference is pointing at). We came to the conclusion that within the Open University’s learning environment, the most likely place students might want to share a resource/reference was on a forum.
So, we have developed a way of enabling references to be inserted in Moodle forum posts, as well as Moodle wiki pages and Moodle blog posts. To see how this works watch the video below:
The problem we have with this mechanism is that the ‘cut and paste’ of a horrid chunk of ‘escaped’ xml is as ugly as it gets in terms of a user interface
However, we are really pleased with the result in terms of how the references appear in the forums/wikis/blogs – we automatically add links to online version where possible and students (or staff) can take copies to MyReferences or other Reference Management software.
So, we are left with a quandry – is this functionality worth releasing to students? Could we release it to tutors only? Does the end result justify the clunky cut and paste mechanism?
Any comments welcome!
Linking and Persistence
I’m currently working on a deliverable which relates to the provision of ‘persistent links’ to resources. This is part of that report and I’d be interested in feedback. As well as the text I’ve inlcuded a specific question at the end – I’d be very interested in responses:
When providing links to online resources it is clearly desirable that the links will work over long periods of time. However, it is common for resources to be identified and located by multiple URLs over time. This creates a challenge when forming a reference to an online resource.
This report will not attempt to cover all aspects of persistent identifiers, which are well covered elsewhere, particularly by Emma Tonkin’s 2008 article on the topic in Ariadne . However, it will consider approaches discussed within the TELSTAR project.
Digital Object Identifiers (DOIs)
A DOI name “provides a means of persistently identifying a piece of intellectual property on a digital network and associating it with related current data in a structured extensible way.” (from http://www.doi.org/faq.html#1)
On the web, a given DOI can be ‘resolved’ via a DOI System proxy server – the most commonly used being http://dx.doi.org. A DOI can be resolved by appending the DOI to the proxy server URL. For example:
DOI Name: doi:10.10.123/456
URL for resolution: http://dx.doi.org/10.10.123/456
In the majority of cases such a URL will resolve to the full text of the resource on a publishers website. However, there are examples of a DOI resolving to other services – such as a page listing a number of different URLs for the identified resource when it is available through multiple routes.
DOIs are being widely adopted to identify journal articles with a smaller amount of use to identify books, book chapters and other types of resource (see http://www.crossref.org/06members/53status.html for a breakdown of the different resources being identified by DOIs). The DOI has become part of some commonly used Citation styles such as APA .
Linking to online versions of articles using the DOI has a major drawback. Because the standard behaviour of DOI resolution services is to link to the ‘publisher’ version of the paper, it does not take into account the ‘appropriate copy’ problem . In brief the ‘appropriate copy’ problem is the issue that there may exist a number of different routes to a resource, but typically members of an institution will only be able to use a subset of the overall routes, depending on institutional subscriptions and services. It was the ‘appropriate copy’ problem that led to the development of the OpenURL standard.
PURLs (Persistent URLs)
A PURL is “an address on the World Wide Web that points to other Web resources. If a Web resource changes location (and hence URL), a PURL pointing to it can be updated.” (from http://purl.oclc.org/docs/faq.html#toc1.5)
PURLs were created in recognition that web resources can change location (and so URL) . A PURL can be assigned to a web resource and if the web resource changes location the PURL can be updated to point to the new location (URL) for the resource.
PURLs can be created through the use of appropriate software, either by hosting the software or by using a public PURL server such as that hosted by OCLC.
OpenURLs
Unlike DOIs and PURLs, OpenURLs are not specifically persistent identifiers for a resource. The OpenURL framework standard (ANSI/NISO Z39.88) enables the creation of applications that transfer packages of information over a network. The only significant implementation of the standard is to transfer metadata related to bibliographic resources.
OpenURL has seen widespread adoption by University Libraries in combination with ‘OpenURL resolver’ software. This ‘resolver’ software typically uses the metadata available from an OpenURL (transported over http) and provides a link to the ‘appropriate copy’ based on the library’s subscription information.
OpenURLs are also commonly used by ejournal platforms to enable inbound links to specific resources (typically journal articles).
As the metadata related to a publication tends to be persistent over time OpenURLs can be seen as ‘persistent’ in one sense. However, OpenURLs in themselves simply provide a transport mechanism for metadata, and how they are ‘resolved’ and what they resolve to depends on the resolver software and the information available to that resolver. This means the result of resolving an OpenURL can change over time.
Managed URLs
It is possible to enable ‘persistence’ of links to online resources by introducing and managing a level of redirection. Using a ‘managed’ URL which in turn redirects to the location of the resource it is possible to then use the managed URL in place of the current location of the resource. If the resource is moved the managed URL can be updated to point at the new location of the resource.
The Open University currently uses a number of different types of Managed URLs depending on the type of resource being linked to. These mechanisms are described below in the section on the “Current Linking strategy at the Open University”.
[the following paragraphs are not part of the report, but conclude with some questions which I'm looking for answers to, so comments would be welcome]
An example of a ‘managed URL’ at the Open University is the use of a system called ROUTES. ROUTES is an implementation of the Index+ software from System Simulation.
This is used to give a ‘managed URL’ to freely available web resources. When a resource is added to ROUTES, its URL is recorded in the record. For example see the ROUTES record for the BBC Homepage.
Once a resource has been added to ROUTES, a ROUTES URL is used in place of the resource primary URL in Open University course material. This ROUTES URL results in a http status 302 being returned (i.e. a redirect) to the resources primary URL as recorded in ROUTES. Then, if the resource moves in the future the ROUTES record can be updated, but the ROUTES URL being used in OU course material does not change. For example:
- Resource: BBC Homepage
- Primary URL: http://www.bbc.co.uk
- ROUTES Record: http://routes.open.ac.uk/ixbin/hixclient.exe?_IXDB_=routes&_IXSPFX_=f&submit-button=summary&$+with+res_id+is+res9377
- ROUTES URL: http://routes.open.ac.uk/ixbin/hixclient.exe?_IXDB_=routes&_IXSPFX_=g&submit-button=summary&$+with+res_id+is+res9377
So, my questions are
- Can we talk about ROUTES URLs as PURLs, or are there important differences between what the PURL software is doing and what ROUTES does?
- If so, what are these differences?
- Does the more generic term ‘managed URL’ fit the bill?
Service Usage Model (SUM) for Citation Management
One of the workpackages in the TELSTAR project involves working towards development of a Service Usage Model (SUM) that will be offered as a contribution to the e-Framework.
The e-Framework for Education and Research is “an international initiative that provides information to institutions on investing in and using information technology infrastructure. It advocates service-oriented approaches to facilitate technical interoperability of core infrastructure as well as effective use of available funding. …The e-Framework maintains the content to assist other international education and research communities in planning, prioritising and implementing their IT infrastructure in a better way.”
We feel that it is quite important to attempt to model the work that is being done in the TELSTAR project by describing it in a controlled and systems-neutral way in order that other F/HEIs that have a similar business need have the opportunity to adopt similar methodologies regardless of the technical systems they may have available.
We are using the templates provided by the e-Framework to describe the business-level capabilities, the business processes or workflows, the technical functionality, the structure and arrangement of the functions, applicable standards, design decisions, data sources and services used.
We have started with a ‘top-level’ SUM which is a broad view of the whole area of what we have called “Citation Management”. We aim to follow up with 6 more detailed SUMs that represent the 6 business processes that the project is addressing. These are:
- Add references
- Aggregate references
- Import/export references
- Create bibliography
- Manage bibliography
- Recommend resources
We would welcome any comments on the top-level SUM over the next few weeks, and will add drafts of the detailed SUMs as they are developed. You can read and comment on the Citation Management SUM at https://e-framework.usq.edu.au/users/wiki/CitationManagement.
IRM10 – from reference management to real-time impact metrics
Victor Henning is the last presentation of the day (we close with a panel session). Victor says research is inherently social. Mendeley is built on this concept. Mendeley both helped and hindered by lack of library background. In fact there is a strong music background to those involved in Mendeley.
The Last.fm model – you have a ‘scrobbler’ which monitors everything you listen to and uploads details to your last.fm account. This means you can build recommendations and links based on your listening habits. Mendeley makes research data social – mashing up research libraries, researchers, papers and disciplines (as opposed to music libraries, artists, genres etc.)
Mendeley offers a free desktop interface, which interacts with your web-based account – you can also login to your account on the website. Desktop interface extracts metadata from pdfs which are uploaded – and then uses that to get the correct metadata – e.g. if there is a DOI). You can read and annotate papers within the desktop application. Allows you to turn existing pdf collection into a structured database.
Mendeley includes ‘cite as you write’ functions – plugins for Word and Google Docs – you can drag and drop from Mendeley to a Google Doc. Also supports ‘shared collections’ – synchronises across desktop applications – including annotations etc. On Mendeley everything is private by default (contrasts with CiteULike). Mendeley is a tool for collaboration – and more functionality is coming around this. Mendeley can sync with both CiteULike and Zotero. Also support and bookmarklet and CoinS.
Mendeley allows you to see ‘real time impact metrics’ – most read papers, most used tags etc. Mendeley looking at recommendations not just on collaborative filtering, but also on analysis of content – extracting keywords etc.
What could it mean for Impact Factor? There are lots of criticisms levelled against citation-based metrics – skewed distribution, wrong incentives to researchers (target only high-impact journals, write with a view to citation), studies find only 20% of papers cited have actually been read. Mendeley can measure ‘usage’ of document by each user – annotations, how often opened etc.. It can also measure reading time and repeat readings per paper. Since user data recorded as well Mendeley can break down statistics by academic discipline, geographic region, academic status (students, researchers etc.)
Some data – e.g. ‘most read’ already on Mendeley website – and being picked up by researchers. Mendeley are not bibliometricians – so they are going to open up the data via an API so that libraries, publishers, bibliometricians can do analysis.
Coming in the future – better collaboration tools – Group/Lab management online, document-centric discussion feeds – all accessible via API. Full-test search in Mendeley and other databases, statistics queries and library systems integration also coming soon. Will be able to do queries like “what is the most read paper for this tag in this geographic region”.
IRM10 – Social bookmarking and ‘referencing’
Kevin Emamy is going to talk about social bookmarking. “The problem with the Internet is that there is just too much of it”. Social tools can help map where the ‘pearls’ are.
CiteULike have a bookmarklet – something you can add to your browser bookmarks. When you are viewing a web page you can click the bookmarklet, and it sends the metadata to CiteULike. Importantly the default setting is to make this public (although you can mark citations as private if you want). When you bookmark from a wide variety of ‘academic’ sources (databases, e-journal platforms etc.) CiteULike knows how to retrieve metadata on the item (these bits of code are open source and CiteULike users contribute to this – as they often break when sites change).
Kevin using example of hi-res pictures of Neil Gaiman’s library/bookshelves being posted on the web (at Shelfari) – people are immediately interested both in what books they had in common, and what Neil had that they hadn’t read. We all know this experience – getting ‘personal recommendations’ is powerful. CiteULike allows you follow users so you can see what they are bookmarking. Also when you bookmark a resource you can see who else has bookmarked it.
PLoS now show how many times a paper has been bookmarked on Connotea and CiteULike. CiteULike supports an API – you can supply a DOI and get details of the CiteULike data out. CiteULike also provide a complete export of their data – for non-commercial use only. Being used for research projects – such as this PhD thesis http://ilk.uvt.nl/~toine/phd-thesis/index.html (which became basis of CiteULike recommender system). CiteULike recommendations have a 17.73% ‘acceptance’ rate (that is user copies recommendations into their own account)
IRM10 – Help me Igor
Euan Adie from the Nature Publishing Group is taking the difficult ‘post-lunch’ session (sorry @stew). He’s talking about taking referencing into ‘non-traditional’ environments – looking at Google Wave, Blogs and Mobile.
First up, Google Wave. Nature have written a Wave plugin called ‘Help me Igor’. You invited Igor to your wave, and then you can type a command that looks like ‘cite xxx’ where ‘xxx’ is a search term or phrase. Igor finds this command and searches some sources (currently PubMed and Connotea can be used) for references that match the search terms. If it finds a result, it inserts the reference into the Wave as a numbered footnote.
Igor is proof-of-concept – but was relatively easy to code because it is using existing APIs which are documented and supply easy to parse responses (e.g. XML). Much easier to parse XML/RDF/MODS than RIS or BibTeX).
Now Euan talking about a project to collect information from blog posts etc. that link to Nature / NPG Journals. Enter a DOI and see all the posts related to that digital object. Nature Blogs support an API documented at http://blogs.nature.com/help
Finally Euan talking about Mobile devices. User cases for mobile different to those for desktop. (Sorry, missed this bit – was still thinking about Igor!)