↓ Archives ↓

Archive → January, 2010

Linking and Persistence

I’m currently working on a deliverable which relates to the provision of ‘persistent links’ to resources. This is part of that report and I’d be interested in feedback. As well as the text I’ve inlcuded a specific question at the end – I’d be very interested in responses:

When providing links to online resources it is clearly desirable that the links will work over long periods of time. However, it is common for resources to be identified and located by multiple URLs over time. This creates a challenge when forming a reference to an online resource.

This report will not attempt to cover all aspects of persistent identifiers, which are well covered elsewhere, particularly by Emma Tonkin’s 2008 article on the topic in Ariadne . However, it will consider approaches discussed within the TELSTAR project.

Digital Object Identifiers (DOIs)

A DOI name “provides a means of persistently identifying a piece of intellectual property on a digital network and associating it with related current data in a structured extensible way.” (from http://www.doi.org/faq.html#1)

On the web, a given DOI can be ‘resolved’ via a DOI System proxy server – the most commonly used being http://dx.doi.org. A DOI can be resolved by appending the DOI to the proxy server URL. For example:

DOI Name: doi:10.10.123/456
URL for resolution: http://dx.doi.org/10.10.123/456

In the majority of cases such a URL will resolve to the full text of the resource on a publishers website. However, there are examples of a DOI resolving to other services – such as a page listing a number of different URLs for the identified resource when it is available through multiple routes.

DOIs are being widely adopted to identify journal articles with a smaller amount of use to identify books, book chapters and other types of resource (see http://www.crossref.org/06members/53status.html for a breakdown of the different resources being identified by DOIs). The DOI has become part of some commonly used Citation styles such as APA .

Linking to online versions of articles using the DOI has a major drawback. Because the standard behaviour of DOI resolution services is to link to the ‘publisher’ version of the paper, it does not take into account the ‘appropriate copy’ problem . In brief the ‘appropriate copy’ problem is the issue that there may exist a number of different routes to a resource, but typically members of an institution will only be able to use a subset of the overall routes, depending on institutional subscriptions and services. It was the ‘appropriate copy’ problem that led to the development of the OpenURL standard.

PURLs (Persistent URLs)

A PURL is “an address on the World Wide Web that points to other Web resources. If a Web resource changes location (and hence URL), a PURL pointing to it can be updated.” (from http://purl.oclc.org/docs/faq.html#toc1.5)

PURLs were created in recognition that web resources can change location (and so URL) . A PURL can be assigned to a web resource and if the web resource changes location the PURL can be updated to point to the new location (URL) for the resource.

PURLs can be created through the use of appropriate software, either by hosting the software or by using a public PURL server such as that hosted by OCLC.

OpenURLs

Unlike DOIs and PURLs, OpenURLs are not specifically persistent identifiers for a resource. The OpenURL framework standard (ANSI/NISO Z39.88) enables the creation of applications that transfer packages of information over a network. The only significant implementation of the standard is to transfer metadata related to bibliographic resources.

OpenURL has seen widespread adoption by University Libraries in combination with ‘OpenURL resolver’ software. This ‘resolver’ software typically uses the metadata available from an OpenURL (transported over http) and provides a link to the ‘appropriate copy’ based on the library’s subscription information.

OpenURLs are also commonly used by ejournal platforms to enable inbound links to specific resources (typically journal articles).

As the metadata related to a publication tends to be persistent over time OpenURLs can be seen as ‘persistent’ in one sense. However, OpenURLs in themselves simply provide a transport mechanism for metadata, and how they are ‘resolved’ and what they resolve to depends on the resolver software and the information available to that resolver. This means the result of resolving an OpenURL can change over time.

Managed URLs

It is possible to enable ‘persistence’ of links to online resources by introducing and managing a level of redirection. Using a ‘managed’ URL which in turn redirects to the location of the resource it is possible to then use the managed URL in place of the current location of the resource. If the resource is moved the managed URL can be updated to point at the new location of the resource.

The Open University currently uses a number of different types of Managed URLs depending on the type of resource being linked to. These mechanisms are described below in the section on the “Current Linking strategy at the Open University”.

[the following paragraphs are not part of the report, but conclude with some questions which I'm looking for answers to, so comments would be welcome]

An example of a ‘managed URL’ at the Open University is the use of a system called ROUTES. ROUTES is an implementation of the Index+ software from System Simulation.

This is used to give a ‘managed URL’ to freely available web resources. When a resource is added to ROUTES, its URL is recorded in the record. For example see the ROUTES record for the BBC Homepage.

Once a resource has been added to ROUTES, a ROUTES URL is used in place of the resource primary URL in Open University course material. This ROUTES URL results in a http status 302 being returned (i.e. a redirect) to the resources primary URL as recorded in ROUTES. Then, if the resource moves in the future the ROUTES record can be updated, but the ROUTES URL being used in OU course material does not change. For example:

So, my questions are

  • Can we talk about ROUTES URLs as PURLs, or are there important differences between what the PURL software is doing and what ROUTES does?
  • If so, what are these differences?
  • Does the more generic term ‘managed URL’ fit the bill?

Service Usage Model (SUM) for Citation Management

One of the workpackages in the TELSTAR project involves working towards development of a Service Usage Model (SUM) that will be offered as a contribution to the e-Framework.

The e-Framework for Education and Research is “an international initiative that provides information to institutions on investing in and using information technology infrastructure. It advocates service-oriented approaches to facilitate technical interoperability of core infrastructure as well as effective use of available funding. …The e-Framework maintains the content to assist other international education and research communities in planning, prioritising and implementing their IT infrastructure in a better way.”

We feel that it is quite important to attempt to model the work that is being done in the TELSTAR project by describing it in a controlled and systems-neutral way in order that other F/HEIs that have a similar business need have the opportunity to adopt similar methodologies regardless of the technical systems they may have available.

We are using the templates provided by the e-Framework to describe the business-level capabilities, the business processes or workflows, the technical functionality, the structure and arrangement of the functions, applicable standards, design decisions, data sources and services used.

We have started with a ‘top-level’ SUM which is a broad view of the whole area of what we have called “Citation Management”. We aim to follow up with 6 more detailed SUMs that represent the 6 business processes that the project is addressing. These are:

  • Add references
  • Aggregate references
  • Import/export references
  • Create bibliography
  • Manage bibliography
  • Recommend resources

We would welcome any comments on the top-level SUM over the next few weeks, and will add drafts of the detailed SUMs as they are developed. You can read and comment on the Citation Management SUM at https://e-framework.usq.edu.au/users/wiki/CitationManagement.

IRM10 – from reference management to real-time impact metrics

Victor Henning is the last presentation of the day (we close with a panel session). Victor says research is inherently social. Mendeley is built on this concept. Mendeley both helped and hindered by lack of library background. In fact there is a strong music background to those involved in Mendeley.

The Last.fm model – you have a ‘scrobbler’ which monitors everything you listen to and uploads details to your last.fm account. This means you can build recommendations and links based on your listening habits. Mendeley makes research data social – mashing up research libraries, researchers, papers and disciplines (as opposed to music libraries, artists, genres etc.)

Mendeley offers a free desktop interface, which interacts with your web-based account – you can also login to your account on the website. Desktop interface extracts metadata from pdfs which are uploaded – and then uses that to get the correct metadata – e.g. if there is a DOI). You can read and annotate papers within the desktop application. Allows you to turn existing pdf collection into a structured database.

Mendeley includes ‘cite as you write’ functions – plugins for Word and Google Docs – you can drag and drop from Mendeley to a Google Doc. Also supports ‘shared collections’ – synchronises across desktop applications – including annotations etc. On Mendeley everything is private by default (contrasts with CiteULike). Mendeley is a tool for collaboration – and more functionality is coming around this. Mendeley can sync with both CiteULike and Zotero. Also support and bookmarklet and CoinS.

Mendeley allows you to see ‘real time impact metrics’ – most read papers, most used tags etc. Mendeley looking at recommendations not just on collaborative filtering, but also on analysis of content – extracting keywords etc.

What could it mean for Impact Factor? There are lots of criticisms levelled against citation-based metrics – skewed distribution, wrong incentives to researchers (target only high-impact journals, write with a view to citation), studies find only 20% of papers cited have actually been read. Mendeley can measure ‘usage’ of document by each user – annotations, how often opened etc.. It can also measure reading time and repeat readings per paper. Since user data recorded as well Mendeley can break down statistics by academic discipline, geographic region, academic status (students, researchers etc.)

Some data – e.g. ‘most read’ already on Mendeley website – and being picked up by researchers. Mendeley are not bibliometricians – so they are going to open up the data via an API so that libraries, publishers, bibliometricians can do analysis.

Coming in the future – better collaboration tools – Group/Lab management online, document-centric discussion feeds – all accessible via API. Full-test search in Mendeley and other databases, statistics queries and library systems integration also coming soon. Will be able to do queries like “what is the most read paper for this tag in this geographic region”.

IRM10 – Social bookmarking and ‘referencing’

Kevin Emamy is going to talk about social bookmarking. “The problem with the Internet is that there is just too much of it”. Social tools can help map where the ‘pearls’ are.

CiteULike have a bookmarklet – something you can add to your browser bookmarks. When you are viewing a web page you can click the bookmarklet, and it sends the metadata to CiteULike. Importantly the default setting is to make this public (although you can mark citations as private if you want). When you bookmark from a wide variety of ‘academic’ sources (databases, e-journal platforms etc.) CiteULike knows how to retrieve metadata on the item (these bits of code are open source and CiteULike users contribute to this – as they often break when sites change).

Kevin using example of hi-res pictures of Neil Gaiman’s library/bookshelves being posted on the web (at Shelfari) – people are immediately interested both in what books they had in common, and what Neil had that they hadn’t read. We all know this experience – getting ‘personal recommendations’ is powerful. CiteULike allows you follow users so you can see what they are bookmarking. Also when you bookmark a resource you can see who else has bookmarked it.

PLoS now show how many times a paper has been bookmarked on Connotea and CiteULike. CiteULike supports an API – you can supply a DOI and get details of the CiteULike data out. CiteULike also provide a complete export of their data – for non-commercial use only. Being used for research projects – such as this PhD thesis http://ilk.uvt.nl/~toine/phd-thesis/index.html (which became basis of CiteULike recommender system). CiteULike recommendations have a 17.73% ‘acceptance’ rate (that is user copies recommendations into their own account)

IRM10 – Help me Igor

Euan Adie from the Nature Publishing Group is taking the difficult ‘post-lunch’ session (sorry @stew). He’s talking about taking referencing into ‘non-traditional’ environments – looking at Google Wave, Blogs and Mobile.

First up, Google Wave. Nature have written a Wave plugin called ‘Help me Igor’. You invited Igor to your wave, and then you can type a command that looks like ‘cite xxx’ where ‘xxx’ is a search term or phrase. Igor finds this command and searches some sources (currently PubMed and Connotea can be used) for references that match the search terms. If it finds a result, it inserts the reference into the Wave as a numbered footnote.

Igor is proof-of-concept – but was relatively easy to code because it is using existing APIs which are documented and supply easy to parse responses (e.g. XML). Much easier to parse XML/RDF/MODS than RIS or BibTeX).

Now Euan talking about a project to collect information from blog posts etc. that link to Nature / NPG Journals. Enter a DOI and see all the posts related to that digital object. Nature Blogs support an API documented at http://blogs.nature.com/help

Finally Euan talking about Mobile devices. User cases for mobile different to those for desktop. (Sorry, missed this bit – was still thinking about Igor!)

IRM10 – Moving Targets

This session from Richard Davis and Kevin Ashley from the University of London Computing Centre.

When you reference something you are expecting to give the user a fighting chance of being able to discover the material you have referenced. Traditionally physical material will be preserved somewhere, but when looking at web resources we have to look at the areas of digital and web preservation. Looking at Web preservation – examples like Wayback machine and the UK Web archive show some ‘good practice’ in this area.

When Richard cited a blog in a recent piece of work he cited the copy of the blog post on the UK Web Archive (http://www.webarchive.org.uk/ukwa/) instead of the initial blog post. But he questions whether others would ever do the same. Does this need to be part of information literacy training?

Quote from Peter Murray Rust – “I tend to work out my half-baked ideas in public” – academics may spend as much time on blog posts as they do on an academic paper. Michael Nielsen say in comparison to blogs “journals are standing still”. Heather Morrison highlights the shift from discrete items to the connected collection – both internal and external to the library.

The ‘ArchivePress’ project (http://archivepress.ulcc.ac.uk/) is looking at harvesting blog content from RSS feeds – idea is to make it easy to collect blog content – e.g. by an institutional library – and provide a persistent record of the work – an institutional record? Some rights issues may be simpler as the academic will already have contract with the institution that covers how their work can be used.

ArchivePress display of blog posts adds in a ‘cite this post’ link – with different styles offered – allows the citation of a persistent version of the content. Richard envisages a ‘repository’ type idea – showing mocked up examples that look like DSpace and e-prints :)

At the Universities of Leiden and Heidelberg there is a ‘citation repository’ specifically for Chinese web content (which is particularly volatile). The citation repository stores the original URL for the content – but most of these no longer work – proving the value of the repository.

New kinds of institutional collection – preserving content for the research of the future.

Now Kevin Ashley taking over – going to convince us that blogs need preserving. At a recent conference at Georgetown University – “Blogging has indeed transformed legal scholarship. Now it’s time to move the dialogue forward” – this from a discipline that regards itself as conservative.

Henry Rzepa (at Imperial) says “how long should/can a blog last. The last question leads on to whether blogs should be archived or curated” (http://www.ch.ic.ac.uk/rzepa/blog/?p=1459)

In the past you achieved ‘citability’ by publishing in the right place. Traditionally citations are location independent – you don’t say where you got the resource, you simply describe it. We need something as simple for blogs.

Kevin says:

  • Institutions can provide mechanisms
  • Authors need to use them
  • Blogs need to automatically expose citable link
  • “Permalinks” with a bit more “perm”

IRM10 – RefWorks to deliver ‘new titles’

This session from Paul Stainthorp (University of Lincoln) is looking at how he has used RefWorks to power ‘new titles’ lists for the library website.

‘New title’ lists are lists of new resources that have been purchased by the library – a way of informing staff and students what is being bought, and what the latest acquisitions are. Demand for the service from faculties, and also a way of the library being accountable. It has been done in the past, but using paper lists, which was time consuming.

Unfortunately the library system used at Lincoln doesn’t have this functionality out of the box, and the library doesn’t have the resources to add on the functionality. So looked at using existing tools to do it instead. Subject librarians and students already familiar with RefWorks – Paul has been promoting it for the last 3 years!

Paul working within the constraints of current systems and current working practices. So, he set to design something using RefWorks, Yahoo Pipes, Google Feedburner, Feed2JS.

When a new book is received it is added to a RefWorks account – organised by subject – e.g. add all books on journalism to a single folder. Then use the ‘RefShare’ functionality to publish the folder as a RSS feed. Paul is happy at this point – everything from this point on is window dressing :)

Paul then uses Yahoo Pipes (it’s good, free and powerful, but perhaps not for everyone). Yahoo pipes extracts ISBN from RefWorks RSS feed, then checks for information on Amazon.co.uk – if it finds anything it enhances the reference, it formats it nicely, creates a link from the item to the OPAC (using ISBN) and strips out any extraneous information. Typical enhancement from Amazon is cover image. Also adds links to other services – e.g. RefWorks, Google Books, LibraryThing, Amazon.

Feedburner used to shorten the URL for the RSS feed, allows email subscriptions to RSS feed. Feed2JS used to enable the display of the lists within web pages – specifically Blackboard.

See slides at http://www.slideshare.net/pstainthorp/feed-me-weird-things-using-refworks-rss-feeds-for-new-title-lists

Innovations in Reference Management 2010

Today the TELSTAR project is running an event on ‘Innovations in Reference Management’. As we’ve been working on the project we’ve found that there seems to be a lack of ‘community’ to discuss and collaborate around the practice of Reference Management. Even in terms of products there doesn’t seem to be a strong ‘user group’ (in an organised fashion at any rate) for any of the major Reference Management packages such as RefWorks, EndNote, Zotero etc.

As we talked to others across the HE community about the project, we found that there was a lot of interest in what we were doing, and that there was quite a lot of innovation going on around the practice of Reference Management and the use of the relevant software. We felt there would be real value in running an event to highlight some of this work.

So, IRM10 (follow #irm10 on Twitter, or see an archives of all tweets at http://www.twapperkeeper.com/irm10/) is happening today – we’ve got a great programme (see http://www.open.ac.uk/telstar/event/programme) and I hope it will be an interesting day. I’ll be posting throughout the day on this blog, and we are recording all the sessions so we’ll be posting these later.

User Requirements

The TELSTAR project has a range of deliverables, many of them are in the form of documentation or reports. As we approach the end of the project, I will be posting drafts of various deliverables to this blog for public comment. This is the first of these posts.

Although this document is aimed at the requirements for software developed for the Open University, I hope it will prove useful to others, and as such I’d like to make sure it is understandable and possibly applicable to other situations. With this in mind I’d like to invite comments on the document, whether there are any parts that don’t make sense to you (especially those outside the Open University) and also if there is anything missing from these requirements that you would have expected to see there.

Introduction

Purpose of this document

This document formally describes the functional user requirements for the TELSTAR project at the Open University. It is part of the Prototype Technical Documentation which is a key deliverable of the project.

The Telstar Project is looking at how to integrate references to resources into a VLE, making it as easy as possible for students to access the referenced resources, while encouraging students (and teachers?) to adopt good practice in referencing and citations – e.g. Using an appropriate reference/citation style).

Scope of this document

This document is specifically focussed on the requirements of the TELSTAR project which is in turn based on the needs of stakeholders at the Open University.

Definitions

Course Website A protected website accessible only to those taking the course. Course Websites at the Open University are created using the Moodle VLE software
DOI “A DOI® (Digital Object Identifier) is a name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks.” http://www.doi.org/overview/sys_overview_021601.html
ePortfolio “A purposeful collection of information and digital artifacts that demonstrates development or evidences learning outcomes, skills or competencies “(http://www.eportfolios.ac.uk/?pid=174)
ISBN International Standard Book Number – “a … number that uniquely identifies book and book-like products published internationally” http://www.isbn.org/standards/home/isbn/us/isbnqa.asp#Q1
Structured Content (SC) An XML schema used by the Open University to author course material for publication in both print and online formats.

General Description

Stakeholders

The Stakeholders for these user requirements are:

  • Course teams (authors, administrators and clerical staff)
  • Editorial staff
  • Librarians
  • Students

Common tasks

Course teams:

  • Add a new reference to a piece of structured content
  • Add new references to a Resource page on the Course website
  • Create an RSS feed of references and add this to a Resource page on the Course website
  • Add a new reference to an existing RSS feed of references
  • Request ‘persistent’ URLs for resources from the library

Editors:

  • Manually adjust the display format of a reference in structured content
  • Check links from References to online resources work
  • Request ‘persistent’ URLs for resources from the library
  • Correct incorrect links from References to online resources

Librarians:

  • Add a new reference to a Resource page on the Course website
  • Create an RSS feed of references and add this to a Resource page on the Course website
  • Add a new reference to an existing RSS feed of references
  • Check links from References to online resources work
  • Correct incorrect links from References to online resources
  • Supply ‘persistent’ URLs for resources to Course teams and Editors

Students:

  • Follow a link from a reference on the Course Website to a resource
  • Add reference to their personal environment, from the Course Website and other sources
  • Orgainse references by a variety of criteria
  • Bookmark useful resources found online (via the Course Website or through other routes)
  • Add references or citations to assessed pieces of work
  • Create a bibliography for assessed pieces of work

Methodology

The requirements listed here were compiled through a number of routes:

  • Interviews with individual stakeholders
  • Focus groups with stakeholders, specifically library staff and students
  • Usage statistics from existing systems that provide relevant functionality
  • Anecdotal feedback provided to the library over time

Functional Requirements

1. References within Structured Content

Number

Requirement

Mandatory (M)/Desirable (D)/Future Enhancement (E)

1.1

It should be possible to enter new reference details via a simple form

M

1.2

It should be possible to capture new reference details by supplying minimal metadata such as a DOI or ISBN

D

1.3

It should be possible to format the references within SC (Structured Content) to follow the OU Style Guide specification for references

M

1.4

It should be possible to format the references within SC (Structured Content) in a variety of other citation/referencing styles

D

1.5

It should be possible to manually adjust the reference text as displayed (i.e. the style) while maintaining the link to structured reference data

M

1.6

It should be possible to indicate if reading/viewing a resource is required or optional

D

1.7

Links to online versions of printed references should be created automatically where possible

M

1.8

It should be possible to manually insert a link into a reference and use this in preference to the automatically created link

M

1.9

It should be possible to suppress the display of a link from a reference to an online resource

D

1.10

Any links to online versions of resources should be persistent over time

M

1.11

Any links to online versions of resources should be accompanied by a full reference to the resource (in the style agreed for the course)

M

1.12

Students should be able to export all references, or a subset of references, to their ePortfolio

M

1.13

Students should be able to export all references, or a subset of references, to other formats of their choice

D

1.14 It should be possible to check all links to online resources from structured content are valid, preferably by some automated process M

2. References to Resources on Course Websites (excluding structured content)

Number

Requirement

Mandatory/Desirable

2.1

It should be possible to enter new reference details via a simple form

M

2.2

It should be possible to capture new reference details by supplying minimal metadata such as a DOI or ISBN

D

2.3

Links to online versions of printed references should be created automatically where possible

M

2.4

It should be possible to manually insert a link into a reference and use this in preference to the automatically created link

M

2.5

It should be possible to suppress the display of a link from a reference to an online resource

D

2.6

Any links to online versions of resources should be persistent over time

M

2.7

Any links to online versions of resources should be accompanied by a full reference to the resource (in the style agreed for the course)

M

2.8

It should be possible to link references to the relevant courses

M

2.9

It should be possible to display all the references linked to a specific course, within the relevant course context

M

2.10

It should be possible set the order in which references appear when displayed within a course context

M

2.11

It should be possible to group references under different headings when displayed within a course context

M

2.12

It should be possible for a student to export a subset of, or all, references to their ePortfolio

M

2.13

Students should be able to export all references, or a subset of references, to their ePortfolio

M

2.14

Students should be able to export all references, or a subset of references, to other formats of their choice

D

2.15

It should be possible to manually adjust the reference text as displayed (i.e. the style) while maintaining the link to structured reference data

M

2.16 It should be possible to check all links to online resources from resource pages are valid, preferably by some automated process M

3. References in ePortfolios

Number

Requirement

Mandatory/Desirable

3.1

It should be possible to enter new reference details via a simple form

M

3.2

It should be possible to capture new reference details by supplying minimal metadata such as a DOI or ISBN

D

3.3

It should be possible to capture reference details from Resource pages on Course Websites

M

3.4

It should be possible to capture reference details from Structured Content on Course Websites

M

3.5

It should be possible to edit references previously added to the ePortfolio

M

3.6

It should be possible to organize references into groups (e.g. by tagging or organization into folders)

M

3.7

It should be possible to add references to existing organizational groups

M

3.8

It should be possible to remove references from existing organizational groups

M

3.9

Where a link is automatically generated to an online versions of the resources that is not available to the student, they should be directed to a page explaining the problem and offering further support.

D

3.10

It should be possible to create a list of references from the ePortfolio in a variety of citation/referencing styles

M

3.11

It should be possible to export a set of references from an ePortfolio to an appropriately styled bibliography in a Word compatible file format

D

3.12

It should be possible to export a set of references from an ePortfolio to a file that could be imported into other bibliographic management systems

M

3.13

It should be possible add comments to a reference within an ePortfolio

M

3.14

It should be possible to share a reference from an ePortfolio with other students

M

3.15

It should be possible to add comments to shared references

D

3.16

It should be possible to automatically import references relevant to the student’s current course registrations

E

3.17

It should be possible to add files as attachments (e.g. PDFs) to references

E

3.18 It should be possible to see local library service options for references that are not available online. E.g. Option to see if a local public or University library has a copy of the resource E