Data citation and referencing
Last talk of the day from Kevin Ashley (from the Digital Curation Centre) – he says if you don’t know what data citation is now, he hopes he will be able to tell you why you should care about it and why it will be important in the future.
Kevin mentioning the DCC Curation Lifecycle Model – but today’s talk is focussing only on one aspect – Access, Use and Reuse.
So – why should we care about data citation? Kevin giving example of paper on LIDAR and RADAR images of ice clouds – in paper, only images – not the data used to create those images. Kevin showing how data can be misrepresented – showing graphs that don’t start at zero on one scale can lead to misleading conclusions.
So – data behind graphs can be very important. Kevin says that data used to support statements in publication should be as accessible as the publication – so statements and findings can be examined and challenged.
Kevin showing how you can misrepresent data – e.g. by taking a subset of results (that happen to favour a particular conclusion) – the data published is not always (all of) the data collected. Kevin mentioning a few texts on this – and my favourite that I was googling as he spoke ‘How to Lie with Statistics’ by Darrell Huff
Kevin giving example of studying Biodiversity – requires many different data sources, some of which won’t be published, some of which won’t have been compiled through academic research…
All of these issues mean we really ought to care about data citation.
‘Data is Different’. With traditional bibliographic resources it has basically come from a ‘print’ paradigm – i.e. ‘published’ – we’ve moved online with many of these resources, but still fundamentally the same – you ‘publish’ something and then you cite it.
However, a data set may be being added to on a continuing basis – a telescope maybe collecting more and more data all the time. What you cite now may be different by tomorrow (Kevin draws parallel to citing web resources like blogs)
So – approaches to dealing with this:
- Giving data digital object identifiers (e.g. datacite)
- Capturing data subsets at a point of publication
- Freezing those subsets somewhere
- Publication led
These works well in certain areas
Approaches 2:
- Dataverse (thedata.org) – submit your data, get a checksum (so you can check if it has changed since publication) and citation and publish
- Ebank/ecrystals – harvest, stor, cite
- DataCite – working at national level with libraries and data centers
However, data changes and can be very very big: – can be changing by the second, and be petabytes in size. If you take a ‘publication’ approach – it may not be apparent that four different references to subsets of data are actually all part of the same dataset.
One way of dealing with ‘big data’ issue – rather than making copies – keep change records – create reference mechanises that allow reference to a specific change point – Kevin mentioning Memento as a possible model for this.
Another alternative is using ‘annotation’ rather than citations. When data sources have many (thousands) of contributors instead of citing data sources in publications, annotate data sources with publications. Example of ‘Mondrian’ approach where blocks of colour are assigned based on what types of annotation there are for different parts of the dataset. Turns data set into something that can be challenged in itself…
Kevin mentioning Buneman’s desiderata (see http://homepages.inf.ed.ac.uk/opb/homepagefiles/harmarnew.pdf)
Kevin concerned that the tools we have now aren’t quite ready for the challenges of data citation.
Reference Management and Digital Literacy
The first presentation after lunch was me, so you’ll have to wait for a blog post on ‘References on the Web’ until I get to write it up!
Now Helen Curtis going to talk about the links between Digital (and Information) literacy and Reference Management. [apologies to Helen I didn't capture this as well as I would have liked - she covered much more than I've got here] – her slides are at http://www.slideshare.net/helencurtis/
At the University of Wolverhampton, long history of using EndNote, used mainly by staff – researchers and postgraduates.
Few drivers to change this – in 2006 University introduced ‘Blended Learning Strategy’; seeing increased use of online resources; development of graduate attributes – including digital literacy. Also other drivers – impact of web technologies and growing concerns around academic conduct and plagiarism.
Role for reference management:
- significant for digital lieracy
- use tools to develop information managemet skills
- less emphasis on learning particuar s/w – more on behaviour and application
- Become much more relevant to undergraduate use
- new and emerging tools are web-based
Seeing move from personal list of references, aimed at researchers with large lists of references, to more flexible tools – sharing and collaboration becoming more significant.
New approaches:
- Teach principles of information and reference management
- Involvement in curriculum design/team teaching
- Linking use to assessment
- Using the tools to aid understanding of referencing and constructing a reference
- Using the tools as evidence of engagement with scholarly resources
- Exploiting the sharing collaboration features
Introduced group of students to using EndNote web – got v positive feedback – (paraphrased) ‘this was the first assignment where I didn’t lose marks on referencing’
Reading Lists and References
Most University courses offer some lists of ‘recommended reading’ to their students, in this session we’ve got three presentations on ‘reading list’ systems from librarians.
University of Plymouth: Aspire (Jayne Moss)
Wanted reading list system to help improve service to students, and manage stock better in library. Decided to work with Talis – felt they could work with the company.
The key features they were looking for were:
- had to be a tool for the academic
- had to be easy to use – intuitive
- designed to match the academic workflow
Worked with Talis, ran focus groups with academics, found out about the academic workflow – found a huge variety of practice. Boiled down to:
- Locate resource
- Capture details/Bookmark
- Create list
- Publish list
Integrated with DOI/Crossref lookup. Encourage academics to give access to librarians so they can check details etc.
Once you have list of ‘bookmarks’ can just drag them into reading list.
Student experience
- v positive feedback
- easy to use
- links to lists embedded in their teaching site
- liked ability to add notes (which are private to them)
Students can also tag items – although Jayne not convinced this is used much
Library experience
- Displays availability taken from universities library catalogue – much easier than in catalogue interface!
- Great way of engaging faculty
- Getting accurate reading lists
- Developed good relationship with Talis
- Get to influence ongoing development (e.g. link from adding item to reading list to creating order in library acquisitions system)
Future developments
- Aspire built on semantic tech
- enable academics to build ‘better’ lists
- enable students to collaborate and connect lists – e.g. create an annotated bibliography
- smarter workflows – e.g. link to library acquisitions
University of Lincoln: LearnBuild LibraryLink (Paul Stainthorp)
Paul reflecting that Lincoln only partially successful in implementing ‘reading lists’.
University of Lincoln – bought reading list system, funds were only available for short period, so had limited time to assess full requirements and how far chosen product met their requirements.
Successes:
- filled a void
- improved consistency
- gave library an ‘in’ on launch of new VLE (Blackboard)
- hundreds of modules linked in by 2000
- students are using them – have usage stats from both LearnBuild and Blackboard
- some simple stock-demand prediction
Unfortunately there were quite a few areas not so successful:
- not intuitive; time-consuming
- software not being developed
- no community of users
- competing developements (EPrints, digitisation, OPAC, RefWorks)
- too closely linked to Blackboard module system
- Subject libraries don’t like it, but lack of uptake from academics means that it is the subject librarians who end up doing the work.
However, unless library can demonstrate success, unlikely to get money to buy better system… So library putting more effort into make it work.
Paul saying because they are in this situation, they have been thinking laterally, and going to come at it from a different angle. Library has an opportunity to do some ‘free’ development work – funding with no strings attached.
Created “Jerome” (patron saint of libraries) – a library unproject.
Taking some inspiration from the TELSTAR project (yay) – hope to use RefWorks webservices and regain some control for the library
The Open University: TELSTAR (Anna Hvass)
Anna talking about traditional course production mechanism at the OU – printed materials written and sent out to students. Although more delivery online now, still a huge team of people involved in writing and delivering an OU course – from writers to editors to media producers to librarians. Can take anything up to 2 years to produce a course.
Currently when creating resource lists there is a huge variation of practice – every course, faculty and librarian can have a different approach! Until TELSTAR there were several tools that could be used – but not integrated together, and not used consistently.
TELSTAR developed ‘MyReferences’ – place you can collect references, you can create bibliographies etc. Also run ‘reference reports’ which allow you previews of what references will look at in course website.
You can create ‘shared accounts’ in MyReferences which you can use to share a single account between whole course team. Also include librarian, editors, etc. in shared accounts.
Can create and edit references. Once finished, can pull list through into course website. When references display in course website get links to online resources. Students can also export references back from lists in course website – can add references to blogs, forum posts etc. using ‘Collaborative activities’ export. Can export it to their own ‘MyReferences’ account. Can export it to other packages via RIS files.
Once student has collected references in MyReferences they can create bibliographies etc.
Main benefits:
- Makes it easier for course teams to work together – and gives them control which they like
- Once you have course teams working on lists together, many other aspects of library integration into courses come more easily
- Students don’t have to go to another systems or another login to use it
Positive feedback from students and staff so far. Now looking at further developments – and to keep selling it course teams!
IRM2: Zotero – latest developments
Next session this morning is from Rintze Zelle – who has become part of the the Zotero community and has been part of the core team developing CSL 1.0
Zotero (http://www.zotero.org/) - free, open source, and has been developed as a Firefox extension. Rintze starting off with a demo of Zotero.
Zotero ‘translators’ – custom pieces of code that ‘scrape’ bibliographic details from specific webpages – e.g. PubMed Central – will create a Zotero record for the item, and include a link back to the original web page. Zotero can also capture pdfs at the same time where available. There are ‘translators’ available for a wide variety of journals and publisher platforms (and library catalogues) etc. Rintze also showing how translator for Google Scholar offers to download all items in a search result, or you can pick the ones you want to import to Zotero.
Zotero also allows you to add items by identifier – e.g. DOI, ISBN etc.; Also can extract metadata from pdfs if you import them into Zotero.
Zotero supports wide range of material types – books, articles, audio/video recordings (e.g. import data from Amazon page for DVD), blog posts, etc. etc.
Can import files – e.g. RIS files
Can organise your Zotero library – create folders, use tags
Can create a bibliography - just select references from your Zotero library you can select references and drag them into a text editor – and it will paste styled references (your choice of styling) into the editor (if you keep the shift key pressed when you drag and drop, you will get in-text citation style instead). Zotero also has plugins for Word and Open Office.
Zotero somewhere between a full desktop client, and an online service. All references in your Zotero library are stored locally on your computer, but you can sync to an online store (for free). Can sync just references, or you can sync pdfs/full-text aswell – but limited to 100Mb (free). You can pay for more space, or use your own WebDav compliant storage.
Zotero supports ‘Groups’ online – you can join groups and share references with others, or collaborate on bibliographies/research etc. Groups have ‘online libraries’ where you can view all the references in the group library, and you can access an RSS feed from the library. However you cannot currently edit the references online – you have to do this via the Firefox extension.
Zotero forums are quite active, and good place to go for support.
Rintze now going to introduce some new features coming to Zotero.
Zotero Commons
This project started in 2007, but still in development. Zotero Commons is collaboration with Internet Archive. Takes sharing references much further than current ‘groups’. Zotero Commons will offer permanent storage for open materials at the Internet Archive – will assign permanetn, unique archive URLs. [looks like basically an alternative to current Open Archiving solutions?]
APIs
Already there is easy access to the Client API – easy way of extending the client. For example there is an add-on that plots locations from publications on to a map [I guess particularly good for conference papers]
There is a Web API, but is currently ‘read-only’, but read-write is coming.
Standalone Client
This will be a version of Zotero that is independent of Firefox – you don’t need to install and run Firefox. Will give better use of screen estate (e.g. on netbooks), and provide better integration with other browsers via APIs
Citation Style Language (CSL) 1.0
CSL is a free and open XML language to described citation styles. Zotero 2.0 and Mendeley both support CSL 0.8, and there are over a 1000 styles available.
CSL 1.0 allows for localization. E.g. to Punctuation, Dates and Terms – Rintze showing some differences between US and Dutch formats – e.g. used of ‘accessed’ vs ‘bezocht’ to show date an onlien version of resource was accessed.
Name Particles – e.g. the ‘van’ in Ludwig van Beethoven. Styles differ in how they handle these. CSL 1.0 allows for different practices. Rintze mentions example of a paper he submitted, he was told references not correctly sorted, because publisher handled these name fragments differently.
CSL 1.0 alls the use of rich-text in formatting – so allows for use of things such as sub- and super-scripts.
CSL 1.0 more mature than previous versions. Increasing support from other developers – and development of CSL processors. citeproc-js will be integrated into Zotero 2.1 release – so this will be first Zotero release to support new features.
Q & A
Couple of interesting questions to highlight:
Q: Why isn’t everyone using Zotero?
A: Still some problems – e.g. things solved by CSL 1.0 like rich-text in references. Wouldn’t necessarily recommend to non-technical users quite yet
Q: When will standalone client be available, because not allowed to use Firefox in NHS in UK
A: No date; small development team so new developments take time
Presentation online at http://www.slideshare.net/rintzezelle/zotero-innovations-in-reference-management