One of the aspirations of the RISE project is to be able to release the data in our recommendations database openly. So we’ve been thinking recently about how we might go about that. A critical step will be for us to anonymise the data robustly before we make any data openly available and we will post about those steps at a later date.
Once we have a suitably anonymised dataset our current thinking is to make it available in two ways:
- as an XML file; and,
- as a prepopulated MySQL database.
The idea is that for people who are already working with activity data then an XML file is most likely to be of use to them. For people who haven’t been using activity data and want to start using the code that we are going to be releasing for RISE then providing a base level of data may be a useful starting point for them. We’d be interested in thoughts from people working with this type of data about what formats and structures would be most useful.
For the XML format we’ve taken as a starting point the work done by Mark van Harmelen for the MOSAIC project and were fortunately able to talk to him about the format when he visited to do the Synthesis project ‘Recipes’ work. We’ve kept as close to that original format as possible but there are some totally new elements that we are dealing with such as search terms that we need to include. The output in this format makes the assumption that re-users of this data will be able to make their own subject, relationship and search recommendations by using the user/resource/search term relationship within the XML data.
Proposed RISE record XML format
Basic data: Institution, year and dates
<globalID type=”EDSN”>12345678 [Ebsco Accession number]</globaLID>
<title>AI-SIMCOG: a simulator for spiking neurons and multiple animats’ behaviours
<volume>12</volume> <number>3</number> <month>6</month>
User context data
<user> anonymised UserID
<sequenceNumber>1 [Note: sequence number already stored within database]
For students: [propose to map to a subject ]
<progression>UG2 [F, UG1, UG2, UG3, UG4, M, PhD1, PhD2, PhD3+ (F is for foundation year) ]
End record, more records
<!– more useRecords here if need be –>
We are interested in any feedback or comments on whether this format makes sense or would be useful or whether there are changes you think we should make. You can either leave a comment on the blog or email us at Rise-project