↓ Archives ↓

Archive → June, 2010

Data citation and referencing

Last talk of the day from Kevin Ashley (from the Digital Curation Centre) – he says if you don’t know what data citation is now, he hopes he will be able to tell you why you should care about it and why it will be important in the future.

Kevin mentioning the DCC Curation Lifecycle Model – but today’s talk is focussing only on one aspect – Access, Use and Reuse.

So – why should we care about data citation? Kevin giving example of paper on LIDAR and RADAR images of ice clouds – in paper, only images – not the data used to create those images. Kevin showing how data can be misrepresented – showing graphs that don’t start at zero on one scale can lead to misleading conclusions.

So – data behind graphs can be very important. Kevin says that data used to support statements in publication should be as accessible as the publication – so statements and findings can be examined and challenged.

Kevin showing how you can misrepresent data – e.g. by taking a subset of results (that happen to favour a particular conclusion) – the data published is not always (all of) the data collected. Kevin mentioning a few texts on this – and my favourite that I was googling as he spoke ‘How to Lie with Statistics’ by Darrell Huff

Kevin giving example of studying Biodiversity – requires many different data sources, some of which won’t be published, some of which won’t have been compiled through academic research…

All of these issues mean we really ought to care about data citation.

‘Data is Different’. With traditional bibliographic resources it has basically come from a ‘print’ paradigm – i.e. ‘published’ – we’ve moved online with many of these resources, but still fundamentally the same – you ‘publish’ something and then you cite it.

However, a data set may be being added to on a continuing basis – a telescope maybe collecting more and more data all the time. What you cite now may be different by tomorrow (Kevin draws parallel to citing web resources like blogs)

So – approaches to dealing with this:

  • Giving data digital object identifiers (e.g. datacite)
  • Capturing data subsets at a point of publication
  • Freezing those subsets somewhere
  • Publication led

These works well in certain areas

Approaches 2:

  • Dataverse (thedata.org) – submit your data, get a checksum (so you can check if it has changed since publication) and citation and publish
  • Ebank/ecrystals – harvest, stor, cite
  • DataCite – working at national level with libraries and data centers

However, data changes and can be very very big: – can be changing by the second, and be petabytes in size. If you take a ‘publication’ approach – it may not be apparent that four different references to subsets of data are actually all part of the same dataset.

One way of dealing with ‘big data’ issue – rather than making copies – keep change records – create reference mechanises that allow reference to a specific change point – Kevin mentioning Memento as a possible model for this.

Another alternative is using ‘annotation’ rather than citations. When data sources have many (thousands) of contributors instead of citing data sources in publications, annotate data sources with publications. Example of ‘Mondrian’ approach where blocks of colour are assigned based on what types of annotation there are for different parts of the dataset. Turns data set into something that can be challenged in itself…

Kevin mentioning Buneman’s desiderata (see http://homepages.inf.ed.ac.uk/opb/homepagefiles/harmarnew.pdf)

Kevin concerned that the tools we have now aren’t quite ready for the challenges of data citation.

Reference Management and Digital Literacy

The first presentation after lunch was me, so you’ll have to wait for a blog post on ‘References on the Web’ until I get to write it up!

Now Helen Curtis going to talk about the links between Digital (and Information) literacy and Reference Management. [apologies to Helen I didn’t capture this as well as I would have liked – she covered much more than I’ve got here] – her slides are at http://www.slideshare.net/helencurtis/

At the University of Wolverhampton, long history of using EndNote, used mainly by staff – researchers and postgraduates.

Few drivers to change this – in 2006 University introduced ‘Blended Learning Strategy’; seeing increased use of online resources; development of graduate attributes – including digital literacy. Also other drivers – impact of web technologies and growing concerns around academic conduct and plagiarism.

Role for reference management:

  • significant for digital lieracy
  • use tools to develop information managemet skills
  • less emphasis on learning particuar s/w – more on behaviour and application
  • Become much more relevant to undergraduate use
  • new and emerging tools are web-based

Seeing move from personal list of references, aimed at researchers with large lists of references, to more flexible tools – sharing and collaboration becoming more significant.

New approaches:

  • Teach principles of information and reference management
  • Involvement in curriculum design/team teaching
  • Linking use to assessment
  • Using the tools to aid understanding of referencing and constructing a reference
  • Using the tools as evidence of engagement with scholarly resources
  • Exploiting the sharing collaboration features

Introduced group of students to using EndNote web – got v positive feedback – (paraphrased) ‘this was the first assignment where I didn’t lose marks on referencing’

Reading Lists and References

Most University courses offer some lists of ‘recommended reading’ to their students, in this session we’ve got three presentations on ‘reading list’ systems from librarians.

University of Plymouth: Aspire (Jayne Moss)

Wanted reading list system to help improve service to students, and manage stock better in library. Decided to work with Talis – felt they could work with the company.

The key features they were looking for were:

  • had to be a tool for the academic
  • had to be easy to use – intuitive
  • designed to match the academic workflow

Worked with Talis, ran focus groups with academics, found out about the academic workflow – found a huge variety of practice. Boiled down to:

  • Locate resource
  • Capture details/Bookmark
  • Create list
  • Publish list

Integrated with DOI/Crossref lookup. Encourage academics to give access to librarians so they can check details etc.

Once you have list of ‘bookmarks’ can just drag them into reading list.

Student experience

  • v positive feedback
  • easy to use
  • links to lists embedded in their teaching site
  • liked ability to add notes (which are private to them)

Students can also tag items – although Jayne not convinced this is used much

Library experience

  • Displays availability taken from universities library catalogue – much easier than in catalogue interface!
  • Great way of engaging faculty
  • Getting accurate reading lists
  • Developed good relationship with Talis
  • Get to influence ongoing development (e.g. link from adding item to reading list to creating order in library acquisitions system)

Future developments

  • Aspire built on semantic tech
  • enable academics to build ‘better’ lists
  • enable students to collaborate and connect lists – e.g. create an annotated bibliography
  • smarter workflows – e.g. link to library acquisitions

University of Lincoln: LearnBuild LibraryLink (Paul Stainthorp)

Paul reflecting that Lincoln only partially successful in implementing ‘reading lists’.

University of Lincoln – bought reading list system, funds were only available for short period, so had limited time to assess full requirements and how far chosen product met their requirements.


  • filled a void
  • improved consistency
  • gave library an ‘in’ on launch of new VLE (Blackboard)
  • hundreds of modules linked in by 2000
  • students are using them – have usage stats from both LearnBuild and Blackboard
  • some simple stock-demand prediction

Unfortunately there were quite a few areas not so successful:

  • not intuitive; time-consuming
  • software not being developed
  • no community of users
  • competing developements (EPrints, digitisation, OPAC, RefWorks)
  • too closely linked to Blackboard module system
  • Subject libraries don’t like it, but lack of uptake from academics means that it is the subject librarians who end up doing the work.

However, unless library can demonstrate success, unlikely to get money to buy better system… So library putting more effort into make it work.

Paul saying because they are in this situation, they have been thinking laterally, and going to come at it from a different angle. Library has an opportunity to do some ‘free’ development work – funding with no strings attached.

Created “Jerome” (patron saint of libraries) – a library unproject.

Taking some inspiration from the TELSTAR project (yay) – hope to use RefWorks webservices and regain some control for the library

The Open University: TELSTAR (Anna Hvass)

Anna talking about traditional course production mechanism at the OU – printed materials written and sent out to students. Although more delivery online now, still a huge team of people involved in writing and delivering an OU course – from writers to editors to media producers to librarians. Can take anything up to 2 years to produce a course.

Currently when creating resource lists there is a huge variation of practice – every course, faculty and librarian can have a different approach! Until TELSTAR there were several tools that could be used – but not integrated together, and not used consistently.

TELSTAR developed ‘MyReferences’ – place you can collect references, you can create bibliographies etc. Also run ‘reference reports’ which allow you previews of what references will look at in course website.

You can create ‘shared accounts’ in MyReferences which you can use to share a single account between whole course team. Also include librarian, editors, etc. in shared accounts.

Can create and edit references. Once finished, can pull list through into course website. When references display in course website get links to online resources. Students can also export references back from lists in course website – can add references to blogs, forum posts etc. using ‘Collaborative activities’ export. Can export it to their own ‘MyReferences’ account. Can export it to other packages via RIS files.

Once student has collected references in MyReferences they can create bibliographies etc.

Main benefits:

  • Makes it easier for course teams to work together – and gives them control which they like
  • Once you have course teams working on lists together, many other aspects of library integration into courses come more easily
  • Students don’t have to go to another systems or another login to use it

Positive feedback from students and staff so far. Now looking at further developments – and to keep selling it course teams!

IRM2: Zotero – latest developments

Next session this morning is from Rintze Zelle – who has become part of the the Zotero community and has been part of the core team developing CSL 1.0

Zotero (http://www.zotero.org/) – free, open source, and has been developed as a Firefox extension. Rintze starting off with a demo of Zotero.

Zotero ‘translators’ – custom pieces of code that ‘scrape’ bibliographic details from specific webpages – e.g. PubMed Central – will create a Zotero record for the item, and include a link back to the original web page. Zotero can also capture pdfs at the same time where available. There are ‘translators’ available for a wide variety of journals and publisher platforms (and library catalogues) etc. Rintze also showing how translator for Google Scholar offers to download all items in a search result, or you can pick the ones you want to import to Zotero.

Zotero also allows you to add items by identifier – e.g. DOI, ISBN etc.; Also can extract metadata from pdfs if you import them into Zotero.

Zotero supports wide range of material types – books, articles, audio/video recordings (e.g. import data from Amazon page for DVD), blog posts, etc. etc.

Can import files – e.g. RIS files

Can organise your Zotero library – create folders, use tags

Can create a bibliography  – just select references from your Zotero library you can select references and drag them into a text editor – and it will paste styled references (your choice of styling) into the editor (if you keep the shift key pressed when you drag and drop, you will get in-text citation style instead). Zotero also has plugins for Word and Open Office.

Zotero somewhere between a full desktop client, and an online service. All references in your Zotero library are stored locally on your computer, but you can sync to an online store (for free). Can sync just references, or you can sync pdfs/full-text aswell – but limited to 100Mb (free). You can pay for more space, or use your own WebDav compliant storage.

Zotero supports ‘Groups’ online – you can join groups and share references with others, or collaborate on bibliographies/research etc. Groups have ‘online libraries’ where you can view all the references in the group library, and you can access an RSS feed from the library. However you cannot currently edit the references online – you have to do this via the Firefox extension.

Zotero forums are quite active, and good place to go for support.

Rintze now going to introduce some new features coming to Zotero.

Zotero Commons

This project started in 2007, but still in development. Zotero Commons is collaboration with Internet Archive. Takes sharing references much further than current ‘groups’. Zotero Commons will offer permanent storage for open materials at the Internet Archive – will assign permanetn, unique archive URLs. [looks like basically an alternative to current Open Archiving solutions?]


Already there is easy access to the Client API – easy way of extending the client. For example there is an add-on that plots locations from publications on to a map [I guess particularly good for conference papers]

There is a Web API, but is currently ‘read-only’, but read-write is coming.

Standalone Client

This will be a version of Zotero that is independent of Firefox – you don’t need to install and run Firefox. Will give better use of screen estate (e.g. on netbooks), and provide better integration with other browsers via APIs

Citation Style Language (CSL) 1.0

CSL is a free and open XML language to described citation styles. Zotero 2.0 and Mendeley both support CSL 0.8, and there are over a 1000 styles available.

CSL 1.0 allows for localization. E.g. to Punctuation, Dates and Terms – Rintze showing some differences between US and Dutch formats – e.g. used of ‘accessed’ vs ‘bezocht’ to show date an onlien version of resource was accessed.

Name Particles – e.g. the ‘van’ in Ludwig van Beethoven. Styles differ in how they handle these. CSL 1.0 allows for different practices. Rintze mentions example of a paper he submitted, he was told references not correctly sorted, because publisher handled these name fragments differently.

CSL 1.0 alls the use of rich-text in formatting – so allows for use of things such as sub- and super-scripts.

CSL 1.0 more mature than previous versions. Increasing support from other developers – and development of CSL processors. citeproc-js will be integrated into Zotero 2.1 release – so this will be first Zotero release to support new features.

Q & A

Couple of interesting questions to highlight:

Q: Why isn’t everyone using Zotero?

A: Still some problems – e.g. things solved by CSL 1.0 like rich-text in references. Wouldn’t necessarily recommend to non-technical users quite yet

Q: When will standalone client be available, because not allowed to use Firefox in NHS in UK

A: No date; small development team so new developments take time

Presentation online at http://www.slideshare.net/rintzezelle/zotero-innovations-in-reference-management

(More) Innovations in Reference Management

Today is the second ‘Innovations in Reference Management’, which the TELSTAR project has organised as parts of it’s dissemination and ‘Benefits Realisation’ activity.

The day is starting off with an introduction from Nicky Whitsed, Director of Library Services at the Open University. She reflects that it was 22 years ago that she was involved in implementing the ‘Reference Manager’ software (interestingly in medicine) – and highlighting the various trends that are pushing innovations in the area today – Information Literacy, Linked Data, the need to cite datasets as opposed to more traditional publications.

Now Martin Fenner going to talk about Trends in Reference Management. Also going back to ‘Reference Manager’ in 1985 – a personal store of knowledge and an index for a (print) offprint collection. Soon after this it became possible to import references using standards like Z39.50. By 1992 – article about EndNote Plus said “It is hard to imagine a reprint file management and bibliography generation program that does more than EndNote Plus”; “it automatically assemble bibliographies from inserted in-text citations”. Martin says we shouldn’t forget that for most researchers this is still the main use of reference management packages – and that things have not really changed much on this front since 1992.

However, then we have the web. Where previously we had Index Medicus, now we have PubMed freely available online. In 2000 the DOI was introduced. The web and online activity prompted questions of how we could share references with others. Some reference management s/w are completely online – only one copy of the reference which is stored online; other packages synchronize local copies of data with online storage (EndNote and Zotero take this approach). While there are many reasons to share references, Martin bringing us back to writing for publication – and the fact that you may be writing collaboratively and need to share references – and also the new online authoring environments such as Google Docs, Office Live, Buzzword etc. However, so far we haven’t seen good integrations of reference managers into these online writing tools. Martin suspects this is because of the niche nature of reference management.

Another idea that is perhaps obvious (says Martin) but took a while to be developed is storage of electronic copies of papers (usually pdf). Now seeing software which does this: ‘Papers’ – new software for Mac that manages references and pdfs (looks very much like iTunes). Also Mendeley recently launched which also manages pdfs. While many other packages allow you to attach pdfs to references, but not as tightly integrated as Papers and Mendeley.

However, once you have sharing, and you have attachment of full-text, you immediately have copyright questions raised. Even where there are more permissive licenses – such as Creative Commons – it may be that terms such as ‘Non commercial’ can cause complications – as this is about how the copy is used, not whether you make a copy.

By 2009 there are a wide range of reference management tools – Martin shows list of 15, but notes this is only small subset of available software/services. Martin says while they all tend to do the ‘basic’ tasks, there is a wide variety of additional features, and also vary in price (starting at ‘free’ for Zotero and Mendeley). But as an institution you won’t have the resource to support them all, so have to make a decision.

Martin now highlighting a few of the more recent developments in reference management software:

Mobile interfaces – iPhone apps (notes Nature iPhone app delivers papers in ePub format, not pdf). All references in Nature iPhone app are linked to online copies etc. Also iPhone app from Scopus – includes alerts etc. iPad – makes web based services like citeulike usable on portable device; Cell has been experimenting with different formats for articles online – not just pdf, and also additional services linked to document – but requires flash, so doesn’t work on iPad! PLoS has iPad app.

Challenge – does every journal get it’s own interface via an app? ‘Papers’ for Mac has an iPad version – can sync between desktop and iPad – gives single interface to all your pdfs

So Martin highlights:

  • variety of mobile formats: PDF; ePub; HTML; Flash
  • different types of mobile service: alerts; readers etc.

Martin now highlighting attempts to introduce unique identifiers for authors – mentioning ORCID (Open Researcher and Contributor ID, http://www.orcid.org/). Previous schemes have been limited – by geography, discipline or publisher. ORCID is meant to be universal. The number of ways an author name can be expressed in a ‘reference’ is extremely large – even for relatively unique names. Also specific challenges dealing with names which are expressed in a different script to native language – e.g. Chinese names written in English script.

Idea is that when you submit a manuscript for publication, you have to submit your ORCID. Easier to do this for all papers going forward – challenge of going back and doing for all historical publications probably too big a job.

ORCID could be used not just for authors, but for other roles – e.g. reviewers, compilers (of data), programmers (of software).

Now over 100 organisations participating in ORCID initiative – but still much work to be done and things to be worked out. Has been agreed that the software developed by Thomson Reuters for their ‘ResearcherID’ will be reused to provide infrastructure/software.

Martin hopes to see reference management software adopting use of ORCID in 1-2 year timescale.

Will start to see new services based on ORCID – e.g. like biomedexperts – can summarise and authors expertise, and also see connections between authors (e.g. who has co-published with whom).

Martin mentions use of BibApp which allows collection of publications information for all researchers within an institution (open source software developed at University of Wisconsin and University if Illinois)

Martin mentions ‘CRIS’ (Current Research Information Systems) – good identifiers such as DOI and ORCID really help with these.

Martin suggests that using ORCID could make it easier to reference new types of ‘publication’ – e.g. blog posts, and see links between papers and blog posts written by same author.

Martin mentioning ‘DataCite’ for citing datasets – we will hear more about this later today from Kevin Ashley I expect.

Finally Martin saying – references now appear everywhere – published on the web – need ways of finding them and capturing them. Also look at ways of assessing ‘importance’ – e.g. citation counts is traditional way of doing this. Now PLoS looks at page views and pdf downloads as well as citation counts – what they are calling ‘article level metrics’ – while this is a common concept in social media, it isn’t commonplace in scientific literature.

Also, not just about metrics but quality. Services like ‘Research Blogging’ and ‘Faculty 1000’. Twitter also growing in usage – can be a good way of discovering references, but how to get into your reference manager (I’ll mention something about this later today in my session)