Monthly Archives: April 2016

Using Search Engine Optimization (SEO) for research publications

The Library Research team recently delivered some training on Increasing Citations.  As part of that session I spoke about using SEO for research publications.  The below is a survey of that content.  If anyone is interested in us running that session for a particular department/school, please get in touch – Chris

Why SEO?

In a world where researchers increasingly use the web to search for literature, and where that web is increasingly crowded, researchers need to do whatever they can to get their research discovered.  SEO techniques can help to get your research outputs discovered.

A research output may be full text indexed by a search engine. However, different sections of it (e.g. title, abstract and keywords) may be weighted differently by a search engine. Judicious wording and selection of key phrases and concepts in these sections may render your article higher up the returned results in a web or database search.

Title

Titles should describe what the research is about – they should give the reader a clear idea as to what the paper is about.  That might sound obvious, but here are some (real) article titles:

  • “A message from Titanic”
  • “From lemonade stands to 2065”
  • “Hot potato endgame”

Any ideas what they might be about?no me neither.  If you do stumble across a creative title for a research paper don’t dismiss it, maybe you can use it in social media, but it’s probably best not to use it as the formal title.

The title is likely to be the first thing a potential reader will see about your article so it’s crucial to let them know what it’s about.  Patrick Dunleavy offers some great advice in his blog post Why do academics choose useless titles for articles and chapters? :

  1. The title should be relevant
  2. The title should be consistent with named concepts in the abstract and sub-headings
  3. Consider using a full narrative title e.g. ‘New Public Management is Dead — Long Live Digital Era Governance’.  This has 2 specific topics, memorable non-academic language and lends itself to citation e.g. “Some commentators think Public Management is dead (Dunleavy et al, 2006)”
  4. If you cannot do 3 above, at least provide some narrative clues

Keywords

Keywords are terms used to describe the key concepts articulated in the research output. They can appear in the title, abstract or body of the work – but do look for terms that may not appear in full text.  These terms may be synonyms or acronyms or possibly larger conceptual terms that are not specifically named in the body of the text.  (There is a story of an article that appeared in the Washington Post about Arnold Palmer scoring successive holes in ones – unfortunately the article omitted the term “golf” – so a search on golf didn’t return this article!)

  • Think of keywords as potential search terms
  • Use keywords that are common terminology in your research field
  • Include relevant synonyms as keywords
  • Include keywords in your abstract and body text
Keywords by Heather Gold (CC BY-NC-ND 2.0)

Keywords by Heather Gold (CC BY-NC-ND 2.0)

But, be aware of cramming keywords or repeating them too much – search may exclude items they consider to be keyword stuffed and, ultimately, the reader of your paper is a human not a search engine.  Don’t repeat keywords to the point they detract from the flow of the text.

 

For example, useful keywords for this blogpost might include SEO; Search Engine Optimization; journal articles; research outputs.

Abstract

abstract_noAn abstract is a piece of text that should convince a potential reader to read the whole article – its function is to aid selection – and always provide one when you can.

However, it is also likely to be more heavily weighted than the body of the article, so in an online world of full text indexing it is also choice text that is indexed by search engines – its function is also to aid return in search.

A 2015 study by Weinberger, Evans and Allesina in PLOS Computational Biology looked at how longer abstracts containing terms that used superlatives and signaled novelty or importance were more likely to be cited than shorter simpler abstracts, they conclude: “Longer, more detailed, prolix prose is simply more available for search.

So think about how the abstract is not just a representation of the paper but also text that will be indexed by search engines.

Citations

Don’t be afraid to cite your previous work – but don’t overdo it and make sure all works cited are relevant.  According to this Wiley guide, search engines factor citations into how they rank your work.

Graphics

Graphics used in a paper should be vector (e.g. *.svg, *.ai, *.eps, *.ps) rather than raster (e.g. *.bmp, *.png, *.tif and *.jpg) as text in raster images cannot be read by search engines and are therefore not used to aid the ranking of the work in search results.

Names

Be consistent with your name across publications and with names of authors you cite in any particular paper.  Advice from the publisher Sage is to use your full name including middle names – try to make sure it is distinguishable from other names of researchers.  Even better get an ORCiD, associate all your research with that persistent identifier and use the ORCiD when submitting work to publishers.

Platforms

The work you do on SEO is contingent on the platform your research output sits on, some sites are better indexed by Google than others.  In Google search ORO items consistently come up alongside or above the same output on the publishers website.  ORO is a great discoverability platform and so are other repositories and academic social networking sites like ResearchGate.  So make sure the article is on a platform that is well indexed by search engines, and that platform might not necessarily be the publisher’s site.

 

Finally, timely reminders from Witold Kieńć on Open Science… [SEO] will only work if the publication itself is good and interesting enough.  Academic SEO does not substitute but supports the quality of content

and from seanrnicholson…

Writing Blog Content for SEO by seanrnicholson (CC BY-ND 2.0)

Writing Blog Content for SEO by seanrnicholson (CC BY-ND 2.0)

 

Some Further Reading:

The Wiley guide: Search Engine Optimization: For Authors

The Elsevier guide: Get Found – optimize your research articles for search engines

Excellent commentary on the publisher SEO guides from Wouter Gerritsma: Academic search engine optimization: for publishers

A good recent overview of SEO and research from a workshop at British Ecological Annual Meeting in 2015: Maximising the Exposure of Your Research: Search Engine Optimisation and why it matters

Witold Kieńć on Open Science, Why and how should you optimize academic articles for search engines.

 

Research Data: Working with Social Media

preserving social media coverIs your research based around the measurement of public opinion? Are you interested in changing social attitudes? If you’re thinking of using content from social media platforms like Facebook, Twitter, or LinkedIn as key sources of research data then you may want to read a recently published Technology Watch Report from the Digital Preservation Coalition on “Preserving Social Media”.

Published in February this year, the report throws interesting light onto issues of archiving and preservation of social media content for social research, and shows how research is helping unpick the technical and legal difficulties associated with this very new area of study. I summarise some of the key points below, but if your research might use content from social media, it’s worth reading the original.

Ethical Issues

Traditional social research involving human participants takes great care over obtaining their consent, but most users of social media platforms tick away the ownership of their personal data without much thought. Social media archives use data owned by corporations, but created by end users with little power in the social media ecosystem.

Accidental disclosure of personal information is made more likely by the interlinked big datasets of modern social media platforms. Researchers will have to work even harder to protect “the right to be forgotten”.

Commerce vs Public Good

Most social research is conducted for the public good. Social media runs on a commercial model and therefore treats data as a commercial asset rather than a public good. Social media platforms sell data to businesses to measure current trends and behaviours; they are not interested in the long term value of their data, a key area of interest to social research. This difference in approach affects the ways in which they make their data available and the controls they place on its further use; researchers are prohibited from sharing raw data, or publishing it except in small non-machine readable datasets. Some large archives store the raw data, and provide access to a few researchers whilst negotiating with data owners for future relaxation of controls.

Worryingly, many platforms do not have an internal preservation policy; of all the major social media platforms, only Twitter has allowed the Library of Congress to archive its entire collection of tweets. It has not yet allowed free access to that archive.

Transient Big Data

The multi-platformed, linked nature of social media data makes it hard to select those data for preservation or storage. A tweet for instance contains up to 140 characters with images, shortened URLs and embedded links to other social media content. In order for a researcher to derive meaning from that content at a later date, there has to be some context stored with the data. Geolocation data, hashtags, keywords, timestamps, can all help preserve context and give meaning to a specific collection.

The huge volumes of data generated by social media mean storage can be a problem, especially as current EU legislation restricts the use of cloud storage to EU locations. Meaningful access by future researchers to vast data collections depends upon the development of robust database architectures that can cope with natural language queries like “Donald Trump” or “2013 Bundestag”, without taking a year to run the query. Early database designs in this are use pre-filtering by timestamp, or hashtag to improve responsiveness.

To preserve meaning and context within social media, data need to be prepared for archiving, linking back to longer versions of shortened URLS, and to archived versions of sites mentioned in social media. One case study mentioned by the author has successfully automated those two parts of data preparation to reduce costs.

Data Management Solutions

The case studies referenced within this report show that there are many tools and technologies already developed, or under development to help deal with both the archiving, and the managed access to the huge datasets that can be created by data harvesting. In a very new and rapidly evolving area of research, it is heartening to read the progress that many public research organisations have already made, not just in terms of technology, but in the management of data, and management of access to data.

The report advocates the creation of centralised storage under the auspices of a specialist national agency to deal with issues of quality and long term access, and calls for greater collaboration between agencies working in this area.

Doing Open Access and Being Open Access

As we deal with the increased deposit in the institutional repository precipitated by the HEFCE Open Access Policy it rather strikes me that our increased activities meeting the policy can be seen as doing Open Access but is it really being Open Access?
open-access-graphic-e1338824885146More researchers are conscientiously depositing Author Accepted Manuscripts into ORO to meet the requirement of the HEFCE policy, and in doing so they might think they are Open Access.  But when we have to apply lengthy embargo periods of 12 months, 18 months or 24 months, is this really Open Access?  For me it isn’t, and the increased activity in doing Green Open Access (for us depositing in ORO) brings into stark focus the drawbacks of Green Open Access as defined by the HEFCE policy (specifically allowing longer publisher embargoes than other OA policies).

Complying with publisher embargoes is most keenly felt by Institutional Repositories.  Academic Social Networking sites like ResearchGate don’t pay any attention to them, they are irrelevant to preprint servers (arXiv) and posting on personal websites takes the risk from an institutional level to an individual level, where people are consciously or unconsciously ignoring publisher restrictions.  (In fact in some instances publishers only have embargoes on Institutional Repositories and not on personal websites).

So publisher embargoes affects the efficacity of Institutional Repositories in doing Open Access the most.  And we should be keenly aware of developments in scholarly communications that attempts to do open access, or should I say “content sharing”, in different ways to the traditional routes of Green and Gold Open Access.

SciHub is the most disruptive, a pirate bay for scholarly publications, if you like.  Academic Social networking sites like ResearchGate and academia.edu are perhaps the most prevalent.  Traditional publishers like Springer/Nature are innovating and developing services that allows content to be read, streaming scholarly publications, but not be downloaded or saved, and almost certainly not, Text and Data Mined (TDM).

So even though researchers are being expected to use Institutional Repositories more to do Open Access.  These places may not be the best place to be Open Access.

And so what happens when a researcher deposits an item in our Institutional Repository only to find a 24 month embargo applied to the item?  What do they then do, and where do they actually go, to be Open Access?

the data are mine

Workshop: Open Science / open data: what’s it all about?

Open Science is all about maximising benefit for all from academic research by increasing access to both published papers and supporting data. But researchers worried about protecting their intellectual property or long term career prospects can take heart from knowing that Open Science approaches to sharing their work can increase their citations and impact.

Want to know more? Come along to the rescheduled FOSTER1 sponsored training session on 13th April to find out about Open Science; what it is, how it can benefit you, and how to work with your data in ways that support open access.

 

Open Science: Applications and Benefits

Wednesday 13th April 2-4 pm

Library Research Meeting Room, 2nd Floor

Email library-training@open.ac.uk to register for the session, with a brief description of your research area.

 

1 EU-funded project FOSTER https://www.fosteropenscience.eu/

Cartoon CC-BY www.msgerry.com/www.aukherrema.nl

Training Opportunity: Open Access Publishing

Getting to Grips with Open Access Publishingopen access

07/04/2016 9.30-11.00

Library Research Meeting Room, 2nd Floor, Library

Increasingly, researchers are expected to publish their outputs by open access to satisfy the requirements of the research funders in the UK and abroad. This session will introduce participants to the benefits of open access publishing, and to UK and international funder policies for open access. The session will also address the new HEFCE policy on open access for the post-2014 REF.

Facilitators: Chris Biggs, Research Support Librarian &  Nadine Lewykcy, Senior Manager, Research Strategy and Planning

This session is open to all.  Booking required in advance, please email library-training@open.ac.uk places will be allocated on a first come first served basis.