How useful are standard RSS feeds for research repositories?
There is no doubt that a very good way to embed an institutional repository across campus is to encourage people to create publication feeds from it. For example, in indivudal staff pages, research group pages, or (as we are doing here at the OU at the moment) in an online research degrees prospectus. Not only does this help demonstrate to Faculty that the repository has uses, it also encourages academics to keep depositing their publications, so as to not adversely affect the content of the feeds created.
The most commonly used feed system is of course RSS, and all repository software comes RSS-ready, out-of-the-box. But exactly how useful are RSS feeds for the type of content a research repository contains? I would argue not very, and this has long been a concern of mine. The reason is, like RSS, really simple: standard RSS feeds do not deliver repository content in an order which is useful for Faculty pages, i.e. by date of publication.
I mentioned above that we are embedding publication feeds from ORO in our soon-to-be-launched online research degrees prospectus. When I was approached about this I explained that RSS feeds would be very easy to implement, but that they would deliver the most recent content added to the repository, and not necessarily the most recently published items. Nevertheless, it was decided to go ahead, mainly due to tight time schedules. I suspected that when the prospectus went out for approval to Faculties this decision would come back to bite, and I was right.
RSS feeds provide a reflection of recent activity in the repository, and not necessarily recently-published research. We are in the process of uploading a selection of exemplar (but old) PhD theses at the moment, so naturally these appeared in the RSS feeds for the prospectus. Also, in another area, one particular person had been spending some time depositing a large number of his publications, and so the RSS feed consisted only of that person’s work.
Of course, there are solutions. It is quite easy to re-write RSS for it to be delivered in a different way, and this is indeed what we are doing for the research degrees prospectus. However, RSS is a standard, and so we cannot really change it for the whole site. Just because someone wants their RSS feeds delivered like this, does that mean the next person will? But, I return to my original question of this post: exactly how usefel are standard RSS feeds for research repository content? Although “recently added” probably has a use, I think “recently published” has more.
December 1st, 2009 at 11:44 am
I think there are a few different issues to untangle here. RSS was originally designed for ‘latest’ news to be published, and this is how it generally functions – perhaps most specifically for blog posts.
However, some people realised that RSS delivered a lightweight way of sharing information in a format independent way, and started to push the standard in a slightly different direction. This met with resistance from a community for whom the simplicity of RSS was one of it’s major strengths.
This has resulted in two divergent forms of RSS:
RSS 0.9x and RSS 2.x are the simpler form, which essentially contain headlines, descriptions and links
RSS 1.x is the more sophisticated form that can contain (I think) arbitrary amounts of data in RDF
Along side this was the development of the Atom protocol – which had some similarities with RSS for sharing information, but also came with a ‘publication’ side. The Atom Publication Protocol is used for the SWORD API being adopted by some repositories http://www.ukoln.ac.uk/repositories/digirep/index/SWORD_APP_evaluation
ORO supports (as far as I can see) RSS 1.0, RSS 2.0 and Atom. Two of these (1.0 and Atom) will support sophisticated metadata, and there is no reason this shouldn’t contain a ‘publication date’ for the items in the repository – although as far as I can see this information isn’t in any of the feeds currently? (perhaps you could use dct:date or dct:dateAccepted)
Given this information, there would further be no problem in re-ordering the items in the feed to be in publication date order.
It’s probably worth noting that most RSS clients/scripts are designed to read items in ‘added’ (to feed) order (or reverse order). and so it isn’t necessarily so easy to guarantee that others consuming your feed would get the items in the ‘intended’ order. However I think it would be a trivial task to re-write an RSS feed to substitute the ‘date published’ for the ‘date added’ – using a script, or a tool such as Yahoo pipes.
I would argue the key thing about RSS feeds is that they end up degrading nicely. If you are using a ‘vanilla’ client that only really understands the basic data, including the rich data in Atom or RSS 1.0 won’t break this. However, once that data is there more sophisticated clients can make use of this.
An example is that within the TELSTAR project (http://www.open.ac.uk/blogs/telstar/) we are using RSS feeds from RefWorks ‘RefShare’ folders – which allow RefWorks users to publish lists of references as RSS. Although you could subscribe to these feeds with any RSS reader, and it would work, by writing some special routines within Moodle, we can ensure that when the items are displayed in the VLE we use all of the metadata, and display the full reference in an appropriate reference style.
One final thing (sorry to go on), what would be a real step forward would be a standard way of expressing sophisticated bibliographic data in RSS 1.0 feeds. I’d suggest that the Bibliontology might be worth a look for this (http://bibliontology.com/) – at the moment there is no standard for this and this means that it is difficult to write clients/scripts which can take more sophisticated metadata from RSS feeds for bibliographic material in a reliable way.
December 1st, 2009 at 12:11 pm
I’m not sure how useful vanilla RSS (or atom) feeds are for repositories… we generate them too on our site (http://publicationslist.org – the atom feeds are in most-recently-published order) in case people want to subscribe to see someone’s latest publications.
We’ve found that JSON feeds of publications lists are much more powerful – e.g. for embedding subsets of publications in a remote website. View source on: http://publicationslist.org/embed-sample.php and you’ll see that the publications data is sourced on the fly from a ‘publist.js’ feed. Another example is on: http://nog.eps.hw.ac.uk/publications.html
December 1st, 2009 at 12:34 pm
@fred I’m not against using JSON, but I’d argue RSS gives a common entry point – you can use a vanilla reader/script if you want.
I know that there are some arguments in favour JSON as opposed to XML in general, but personally I’m not quite clear what you think JSON offers over the metadata encoded in XML within an RSS feed? (unless it is simply that you feel that RDF/XML is overly complex and makes it harder to deal with the data)
December 2nd, 2009 at 11:21 am
We’ve just added an ORO RSS to: http://www.open.ac.uk/blogs/ORO and that’s looking fine to me at the moment. We used our Feed2JS parser at http://mcs.open.ac.uk/feed2js/build.php .
But we are also interested in getting an XML feed based on parameters which can build up in different ways obviously. RSS is crude, but effective. See also some of our pro-forma staff pages such as: http://www.computing.open.ac.uk/People/m.petre where the top list is an ORO RSS feed and the bottom section comes from our in-faculty data store which preceded ORO by some years. I’d like to get the ORO data presented like that eventually, but it’s a big project for us and we have too much to do at the moment.
December 2nd, 2009 at 11:22 am
Above – for first URL read:
http://mctintranet.open.ac.uk/Intranet.nsf/httpPages/DDEM+Research
(Copy and paste muddle – sorry!)
DAvid Clover
December 2nd, 2009 at 12:50 pm
@david – you might be interested in the work we’ve been doing as part of the TELSTAR project – we take an RSS feed of References (from the RefWorks bibliographic management software) and apply the correct formatting, so you can display the reference in a variety of styles – the OU in-house Harvard style, or alternatives such as MLA/MHRA/etc.
If we could apply this to an RSS feed from ORO then I think we’d be on the way to replicating the lower ‘publications list’ on your staff pages?
December 2nd, 2009 at 1:02 pm
The modification of the feeds to include additional metadata is something that I’ll add to ORO as I’m sure it will be useful to many. The problem (and there has been an example of this today) is that many of the clients reading the RSS still only do by ‘added’ date as Owen suggests.
Hence, we’ve had to provide RSS feeds ordered by ‘publication date’ at source. Ultimately, I’d prefer the RSS manipulation to be handled by the client, not the provider. I think it provides interesting debate though.
December 2nd, 2009 at 1:41 pm
Like David, we’re using our EPrints feeds with feed2js so that staff can easily copy and paste code into their web profiles. i.e. http://www.lincoln.ac.uk/cerd/09site/Staff09_Athody.htm Similarly, we’re also displaying feeds by faculty, too: http://www.lincoln.ac.uk/home/research/repository.htm
I agree that recently published EPrints would be better than recently added EPrints, but hey ho, I’ll take what I can get.
I’ve just asked EPrints Services, who maintain our repo, about creating an RSS feed for each EPrint, so that each EPrint item is an RSS , including MediaRSS info, too. I’d use that for an OER project I’m running. it would allow me to dump the entire course as one EPrint and then grab a feed with all the resources from that course. I could then use it to pull into WordPress, via FeedWordPress or publish to iTunes (in this case, it’s video).
December 2nd, 2009 at 1:44 pm
@chris if you supply the metadata in the feed I’d be happy to write a Yahoo pipe to take this and output a reordered RSS feed by publication date
December 2nd, 2009 at 1:48 pm
ah, WordPress stripped my comment. I meant to say:
“I’ve just asked EPrints Services, who maintain our repo, about creating an RSS feed for each EPrint, so that each EPrint item is an RSS item, including MediaRSS info, too.”
December 2nd, 2009 at 2:10 pm
@owen That’s very true a Yahoo pipe would be a solution rather than changing the feed. I’ll add the additional metadata to some of the EPrint feeds and we’ll see what happens
December 4th, 2009 at 8:19 pm
“standard RSS feeds do not deliver repository content in an order which is useful for Faculty pages, i.e. by date of publication.”/”RSS feeds provide a reflection of recent activity in the repository, and not necessarily recently-published research.”
RSS feeds are a transport mechanism. The way many people see them used is for the syndication of recnt contnt on a site but that’s not the only way of using rss.
eg i use “static RSS” feeds all over the place simply to transport links around. The content never changes, the RSS is just moved to move the content from one place to another.
And i don’t think there is anything in the RSS/atom spec that determines what order items in a feed appear in? Although as @ostephens points out, many feed reader clients implement user directed features such as showing unread items in bold title font, hiding read items etc etc
“I would argue not very, and this has long been a concern of mine. The reason is, like RSS, really simple: standard RSS feeds do not deliver repository content in an order which is useful for Faculty pages, i.e. by date of publication.”
So order them a different way?
There is no reason why the whole of an individual’s publications can’t be delivered as RSS in whatever order you like – chronological, reverse chronological, date of submission, alphabetical, grouped by journal etc etc.
IMHO, RSS as a consumer technology is now a minimal play. RSS is/will be used more widely as a lightweight transport mechanism (although here it faces competition from JSON, which can get round cross domain browser security issues)
WRT additional bib data in the feed, Owen is right on it again; there’s no reason not to extend Atom eg using a Dublin Core namespace, although the lack of standards about using feeds to syndicate bibliographic data means that you’ll need a custom handler on the other end.
I thought that chris gutteridge had enabled all sorts of rss’n'json outputs on eprints anyway? Maybe not…?
December 6th, 2009 at 9:34 pm
It is possible to have an RSS feed in publication date order on ORO by selecting the RSS 1.0 (Publication Date) option from the export menu.
Unfortunately, many EPrints in ORO only contain the published year, and the current RSS feeds in EPrints will only embed the pubDate into the item element if a fully RFC compliant date can be constructed (although this can be modified easily if required)
I’d be open to extending the bibliographic metadata to include additional elements or indeed write additional export plugins if there is a need to support other users/systems that we don’t already provide for. All suggestions welcome.
December 9th, 2009 at 10:59 am
Like Tony, I see RSS as two things: (1) a standardised ‘recently-added’ service, (2) a standardised metadata delivery format.
I would run a standard RSS ‘recently added’-ordered feed with a RSS icon button on the site. But also, on some sort of ‘developers’ page, expose a number of other RSS-looking feeds that are really just APIs disguised as RSS.
Returning RSS XML instead of custom XML in APIs gives developers a familiar base that leads to fast hackability. Eventually they would read the documentation that says the publication date is in dc:date not rss:updated and it’s ordered by dc:date, etc., but because it’s RSS they already have half the tools they need.
This is one of OpenSearch’s strengths – that the results are just RSS plus extensions.
December 10th, 2009 at 2:14 pm
I’m an ignorant about all these technicalities, and I prefer to use ready-made solutions (e.g. I use bibsonomy.org which uses JabRef to produce all kinds of outputs) but I will just point out a functionality that I think would be useful for the research prospectus and maybe departmental sites: to produce a feed with at most one publication by each author. Otherwise, as said in this post, it’s very easy to get lost of publications from the same person, which is slightly embarrassing for a site that wants to showcase its diverse research.
Probably other people will come up with other constraints for their sites. How to technically accommodate all this, I don’t know. Maybe a single Javascript script is better than individually tweak several feeds. Sites like delicious.com allow users to check some boxes to generate a widget that calls the script with the right parameters. Just a thought, probably not too helpful…
March 3rd, 2010 at 3:09 pm
For the record – here’s my hack for re-ordering EPrints’ RSS feed to display by date of publication, in reverse order. Thanks Owen for alerting me to the existence of this post!
March 3rd, 2010 at 9:40 pm
I really need to settle on an rule to identify the “best” front page N items to show. Recently modified but year >= current year?