Site Migration – New Home

I have decided to move openmind.ed from server space provided by The Open University and onto a general WordPress.com site.  This will make the site more independent and give me a bit more control over how it works – but the content is essentially the same.  I will only be updating the new site from now on although some of the stuff in the sidebar will continue to update automatically.

You can find the new version of this site at:
philosopher1978.wordpress.com

Share
Posted in admin | Leave a comment

#oerrhub on the LSE Impact of Social Science blog

This is a duplicate of my article from the LSE Impact of Social Science blog which was published today.  You can find the original here.

rob farrowMuch sharing and use of open educational resources (OER) is relatively informal, difficult to observe, and part of a wider pattern of open activity. What the open education movement needs is a way to draw together disparate fragments of evidence into a coherent analytic framework. Rob Farrow provides background on a project devoted to consolidating efforts of OER practitioners by inviting the open community to contribute directly and submit impact narratives. Through the mapping of these contributions, the data can continue to grow iteratively and support the decisions made by educators, students, policymakers and advocates.

The Open Education movement is now around ten or twelve years old and has started to make a significant difference to education practices around the world. Open educational resources (OER) are resources (article, textbook, lesson plan, video, test, etc.) that might be used in teaching or learning. They are considered ‘open’ when they are openly licensed in ways that [permit] no-cost access, use, adaptation and redistribution by others with no or limited restrictions or, more simply their free use and re-purposing by others.

This distinction might seem rather subtle and legalistic at first. But the whole of the open education movement is predicated on the idea that open licensing leads to far reaching and beneficial change. By providing an alternative to traditional copyright, open licenses make it possible to share and repurpose materials at marginal cost. It is often stated, for instance, that OER have the potential to increase access to education through lowering the prohibitive cost of textbooks or journal subscriptions. Some claim that OER allows for more innovative teaching and closer bonds between students and learners as a result of a more reflexive syllabus. Others hold the view that open licensing will align existing pedagogies along more collaborative and networked lines.

Image credit: opensource.com via Flickr (CC BY-SA)

When open licensing in conjunction with digital technology can enable duplication and adaptation of materials almost anywhere in the world at next to no cost, it’s easy to see how the implications may be manifold for educational institutions. Perhaps the strongest evidence for this thus far comes from the open access movement, which continues to leverage academic publishers for better value.

Unsurprisingly, much research has gone into ascertaining the evidence that exists in support of these claims. A good portion of earlier OER research focused on establishing the relative quality of open materials and found that they are generally at least as good as equivalent commercial materials (though there are of course variations in quality). But there are reasons why establishing a clear picture of the wider impact of OER adoption is more complex.

Let’s leave aside for now issues around the much discussed and yet nebulous term “impact”. OER adoption is taking place within a world of education undergoing radical change. Where OER does change practices there are often multiple interventions taking place at the same time and so it is hard to isolate the particular influence of openness. Use contexts can vary wildly between countries and education levels, and cultural differences can come into play. Furthermore, much sharing and use of open educational materials (such as Wikipedia) is relatively informal, difficult to observe, and part of a wider pattern of activity. This is not to say that there isn’t good quality OER research out there, but the typical dependence on softer data might sometimes be thought unconvincing. Further complications can arise from inconsistencies in understanding what ‘open’ means to different groups.

Nonetheless, there remains a need for evidence that would support (or discount) from the key claims expressed in the rhetoric around OER, as well as an overall picture of global activity. What the open education movement needs is a way to draw together disparate fragments of evidence into a coherent analytic framework that can support judgments about OER impact for a range of use cases.

OER Research Hub (OERRH) is a research project in IET at The Open University which approaches these issues through an open and collaborative approach. Our project aspires to be open in both its focus and the methods we use to gather and share data. We’ve taken a mixed-methods approach to research depending on the context, and we’ve also undertaken some of the largest surveys about OER use and attitudes from a range of stakeholders. By using a survey template that is consistent across the different samples it becomes possible to see patterns across countries and sectors. Our research instruments and data are released on open licenses and we have an open access publication policy. By encouraging a culture of open sharing we have been able to consolidate the efforts of OER practitioners and help to build a shared understanding.

We work openly with a range of collaborators around the world to gather data and share practical experience and also have a fellowship scheme that helps to foster a worldwide network of experts. By focusing on collecting data around ‘impact’ in situ we are able to build up an evolving picture of changing practices.

The analytic framework for pulling together the data includes a set of research hypotheses which reflect some of the main claims that are made about OER. These help to provide focus but a further structuring is provided by the use of geospatial coordinates (which are of course universal) and map disparate data types on a map across a shared geographical base.

oer impact map1Image credit: OER Impact Map (OER Research Hub)

Mapping has become popular within the OER world, and there is a lot of interest in maps for strengthening communities and as tools for building a shared understanding of the world. Accordingly, OERRH’s OER Impact Map acts as both research tool and dissemination channel. By using a simple metadata structure for different data types it becomes possible to visualize (as well as simply ‘map’) information. For instance, real-time reporting of the evidence gathered across each hypothesis or visualising the sum of evidence gathered help us to understand the data. Soon it will be possible to browse the project survey data directly as well as interact with more detailed, structured narratives about OER impact. The map itself will continue to help us to see patterns in the data and cross-reference evidence gathered.

oer impact map2Image credit: OER Impact Map (OER Research Hub)

By no means is OER Impact Map complete; by its nature the data set continues to evolve. But openness is the key to the sustainability of a service like this: by inviting the open community to contribute directly and submit their impact narratives to OERRH the data can continue to grow iteratively and support the decisions made by educators, students, policymakers and advocates. Furthermore, open licensing of evidence records allows us to close citation loops and archive data more easily, and the relative ease with which open access research can be found helps it find it way into the evidence base.

It is worth noting that the combination of mapping and curation can be flexibly applied to other research questions in educational and social science. The code for OER Impact Map is available openly on GitHub, meaning others can use it build their own impact maps: or adapt this code to their own needs. The impact map is based on a JSON information architecture which supports multiple programming languages and flexible use of the data (like combining it with other datasets).

What our project illustrates is that the use of openness to solve challenges in the project can lead to innovation in approaches in understanding impact. The combination of mixed-methods research into hypotheses with mapping and data visualization techniques can be flexibly applied in support of traditional research activity.

OER Research Hub is funded by The William and Flora Hewlett Foundation

Rob Farrow is a philosopher and educational technologist who researches open education at The Institute of Educational Technology, The Open University (UK). He blogs at openmind.ed and tweets as @philosopher1978.

Share
Posted in data visualization, education, oer, oerrhub | Tagged , , , , , , | Leave a comment

OCWC 2014 Recording Available

The video recording from my research presentation at OCWC 2014 is now available at http://videolectures.net/ocwc2014_farrow_oer_impact/.  It’s not possible to embed here but they have a nice player on their site.

This presentation gives an overview of the OER Research Hub project, some of the methodological and epistemological issues we encounter, and how we propose to ameliorate these through the technologies we use to investigate key questions facing the OER movement.


OER Impact: Collaboration, Evidence, Synthesis
Robert Farrow

Share
Posted in conferences, oerrhub, research | Tagged , , , , , , | Leave a comment

Open Research into Open Education #calrg14

Here are my slides from today’s presentation: feedback welcome as always.

The project website is http://oerresearchhub.org and the OER Impact Map is available at http://oermap.org.

Share
Posted in data visualization, education, oerrhub, research | Tagged , , , , , , | Leave a comment

JiME Reviews June 2014

This is the current list of books for review in the Journal of Interactive Media in Education (JiME) at the moment – if you’re interested in reviewing any of the following then get in touch with me through Twitter or via rob.farrow [at] open.ac.uk to let me know which volume you are interested in and some of your reviewer credentials.

Wanda Hurren & Erika Hasebe-Ludt (eds.) (2014). Contemplating Curriculum – Genealogies, Times, Places. Routledge: London and New York.  link

Phyllis Jones (ed.) (2014).  Bringing Insider Perspectives into Inclusive Learner Teaching – Potentials and challenges for educational professionals. Routledge: London and New York. link

Marilyn Leask & Norbert Pachler (eds.) (2014).  Learning to Teach Using ICT in the Secondary School – A companion to school experience.  Routledge: London and New York. link

Steven Warburton & Stylianos Hatzipanagos (eds.) (2013). Digital Identity and Social Media.  IGI Global: Hershey, PA.  link

Simone White & Michael Corbett (eds.) (2014). Doing Educational Research in Rural Settings. Methodological issues, international perspectives and practical solutions. Routledge: Abingdon.  link

Share
Posted in JiME, reviews | Tagged | Leave a comment

JiME Reviews April 2014

This is the current list of books for review in the Journal of Interactive Media in Education (JiME) at the moment – if you’re interested in reviewing any of the following then get in touch with me through Twitter or via rob.farrow [at] open.ac.uk to let me know which volume you are interested in and some of your reviewer credentials.

Sue Crowley (ed.) (2014). Challenging Professional Learning. Routledge: London and New York.  link

Andrew S. Gibbons (2014).  An Architectural Approach to Instructional Design.  Routledge: London and New York. link

Wanda Hurren & Erika Hasebe-Ludt (eds.) (2014). Contemplating Curriculum – Genealogies, Times, Places. Routledge: London and New York.  link

Phyllis Jones (ed.) (2014).  Bringing Insider Perspectives into Inclusive Learner Teaching – Potentials and challenges for educational professionals. Routledge: London and New York. link

Marilyn Leask & Norbert Pachler (eds.) (2014).  Learning to Teach Using ICT in the Secondary School – A companion to school experience.  Routledge: London and New York. link

Ka Ho Mok & Kar Ming Yu (eds.) (2014).  Internationalization of Higher Education in East Asia – Trends of student mobility and impact on education governance. Routledge: London and New York.  link

Peter Newby (2014). Research Methods for Education (2nd ed.). Routledge: Abingdon. link

Share
Posted in JiME | Tagged , , | Leave a comment

Thinking Learning Analytics

I’m back in the Ambient Labs again, this time for a workshop on learning analytics for staff here at The Open University.


Challenges for Learning Analytics: Visualisation for Feedback

Denise Whitelock described the SaFeSEA project which is based around trying to give students meaningful feedback on their activities.  SaFeSEA was a response to high student dropout rates for 33% new OU students who don’t submit their first TMA.  Feedback on submitted writing prompts ‘advice for action’; a self reflective discourse with a computer.  Visualizations of these interactions can open a discourse between tutor and student.

Students can worry a lot about the feedback they receive.  Computers can offer a non-judgmental, objective feedback without any extra tuition costs.  OpenEssayist the structure of an essay; identifies key words and phrases; and picks out key sentences (i.e. those that are most representative of the overall content of the piece).  This analysis can be used to generate visual feedback, some forms of which are more easily understood than others.

Bertin (1977/81) provides a model for the visualization of data.   Methods can include diagrams which show how well connected difference passages are to the whole, or to generate different patterns that highlight different types of essay. These can be integrated with social network analysis & discourse analytics.

Can students understand this kind of feedback? Might they need special training?  Are these tools that could be used primarily by educators?  Would they also need special training?  In both case, it’s not entirely clear what kind of training this might be (information literacy?).  Can one tool be used to support writing across all disciplines or should such a tool be generic?

The Wrangler’s relationship with the Science Faculty

Doug Clow then presented on ‘data wrangling’ in the science faculty at The Open University.  IET collects information on student performance and presents this back to faculties in a ‘wrangler report’ able to feed back into future course delivery / learning design.

What can faculty do with these reports?  Data is arguably better at highlighting problems or potential problems than it is at solving them.  This process can perhaps get better at identifying key data points or performance indicators, but faculty still need to decide how to act based on this information.  If we move towards the provision of more specific guidance then the role of faculty could arguably ben diminished over time.

The relation between learning analytics and learning design in IET work with the faculties

Robin Goodfellow picked up these themes from a module team perspective.  Data can be understood as a way of closing the loop on learning design, creating a virtuous circle between the two.  In practice, there can be significant time delays in terms of processing the data in time for it to feed in.  But the information can still be useful to module teams in terms of thinking about course:

  • Communication
  • Experience
  • Assessment
  • Information Management
  • Productivity
  • Learning Experience

This can give rise to quite specific expectations about the balance of different activities and learning outcomes.  Different indicators can be identified and combined to standardize metrics for student engagement, communication, etc.

In this way, a normative notion of what a module should be can be said to be emerging.  (This is perhaps a good thing in terms of supporting course designers but may have worrying implications in terms of promoting homogeneity.)

Another selective element arises from the fact that it’s usually only possible to collect data from a selection of indicators:  this means that we might come to place too much emphasis on data we do have instead of thinking about the significance of data that has not been collected.

The key questions:

  • Can underlying learning design models be identified in data?
  • If so, what do these patterns correlate with?
  • How can all this be bundled up to faculty as something useful?
  • Are there implications for general elements of course delivery (e.g. forums, VLE, assessment)?
  • If we only permit certain kinds of data for consideration, does this lead to a kind of psychological shift where these are the only things considered to be ‘real’ or of value?
  • Is there a special kind of interpretative skill that we need in able to make sense of learning analytics?

Learning Design at the OU

Annie Bryan drilled a little deeper into the integration of learning design into the picture.   Learning design is now a required element of course design at The Open University.  There are a number of justifications given for this:

  • Quality enhancement
  • Informed decision making
  • Sharing good practice
  • Improving cost-effectiveness
  • Speeding up decision making
  • Improve online pedagogy
  • Explicitly represent pedagogical activity
  • Effective management of student workload

A number of (beta) tools for Learning Design have been produced.  These are focused on module information; learning outcomes; activity planning, and mapping modules and resources.  These are intended to support constructive engagement over the life of the course.   Future developments will also embrace a qualification level perspective which will map activities against qualification routes.

These tools are intended to help course teams think critically about and discuss the purpose of tolls and resources chosen in the context of the course as a whole and student learning experiences.  A design perspective can also help to identify imbalances in course structure or problematic parts of a course.

Share
Posted in big data, data visualization, education | Tagged , , , , , , | Leave a comment

Guerrilla Research #elesig

http://upload.wikimedia.org/wikipedia/commons/thumb/6/69/Afrikaner_Commandos2.JPG/459px-Afrikaner_Commandos2.JPG

We don't need no stinking permissions....

Today I’m in the research laboratories in the Jennie Lee Building at The Institute of Educational Technology (aka work) for the ELESIG Guerrilla Research Event.  Martin Weller began the session with an outline of the kind of work that goes into preparing unsuccessful research proposals.  Using figures from the UK research councils he estimates that the ESRC alone attracts bids (which it does not fund) equivalent to 65 work years every year (2000 failed bids x 12 days per bid).   This work is not made public in any way and can be considered lost.

He then went on to discuss some different digital scholarship initiatives – like a meta educational technology journal based on aggregation of open articles; MOOC research by Katy Jordan; an app built at the OU; DS106 Digital Storytelling – these have elements of what is being termed ‘guerrilla research’.  These include:

  • No permissions (open access, open licensing, open data)
  • Quick set up
  • No business case required
  • Allows for interdisciplinarity unconstrained by tradition
  • Using free tools
  • Building open scholarship identity
  • Kickstarter / enterprise funding

Such initiatives can lead to more traditional forms of funding and publication; and the two at least certainly co-exist.  But these kinds of activities are not always institutionally recognised, giving rise to a number of issues:

  • Intellectual property – will someone steal my work?
  • Can I get institutional recognition?
  • Do I need technical skills?
  • What is the right balance between traditional and digital scholarship?
  • Ethical concerns about the use of open data – can consent be assumed?  Even when dealing with personal or intimate information?

Tony Hirst then took the floor to speak about his understanding of ‘guerrilla research’.  He divided his talk into the means, opportunity and motive for this kind of work.

First he spoke about the use of the commentpress WordPress theme to disaggregate the Digital Britain report so that people could comment online.  The idea came out of a tweet but within 3 months was being funded by the Cabinet Office.

In 2009 Tony produced a map of MP expense claims which was used by The Guardian.  This was produced quickly using open technologies and led to further maps and other ways of exploring data stories.  Google Ngrams is a tool that was used to check for anachronistic use of language in Downton Abbey.

In addition to pulling together recipes using open tools and open data is to use innovative codings schemes. Mat Morrison (@mediaczar) used this to produce an accession plot graph of the London riots.  Tony has reused this approach – so another way of approaching ‘guerrilla research’ is to try to re-appropriate existing tools.

Another approach is to use data to drive a macroscopic understanding of data patterns, producing maps or other visualizations from very large data sets, helping sensemaking and interpretation.  One important consideration here is ‘glanceability‘ – whether the information has been filtered and presented so that the most important data are highlighted and the visual representation conveys meaning successfully to the view.

Data.gov.uk is a good source of data:  the UK government publishes large amounts of information on open licence.  Access to data sets like this can save a lot of research money, and combining different data sets can provide unexpected results.  Publishing data sets openly supports this method and also allows others to look for patterns that original researchers might have missed.

Google supports custom searches which can concentrate on results from a specific domain (or domains) and this can support more targeted searches for data.  Freedom of information requests can also be a good source of data; publicly funded bodies like universities, hospitals and local government all make data available in this way (though there will be exceptions). FOI requests can be made through whatdotheyknow.com.  Google spreadsheets support quick tools for exploring data such as sliding filters and graphs.

OpenRefine is another tool which Tony has found useful.  It can cluster open text responses in data sets according to algorithms and so replace manual coding of manuscripts.   The tool can also be used to compare with linked data on the web.

Tony concluded his presentation with a comparison of ‘guerrilla research’ and ‘recreational research’. Research can be more creative and playful and approaching it in this way can lead to experimental and exploratory forms of research.  However, assessing the impact of this kind of work might be problematic.  Furthermore, going through the process of trying to get funding for research like this can impede the playfulness of the endeavour.

A workflow for getting started with this kind of thing:

  • Download openly available data: use open data, hashtags, domain searches, RSS
  • DBpedia can be used to extract information from Wikipedia
  • Clean data using OpenRefine
  • Upload to Google Fusion Tables
  • From here data can be mapped, filtered and graphed
  • Use Gephi for data visualization and creating interactive widgets
  • StackOverflow can help with coding/programming

(I have a fuller list of data visualization tools on the Resources page of OER Impact Map.)

Share
Posted in big data, education, events, liveblog, research, tools | Tagged , , , , , , , | 2 Comments

Ethical Use of New Technology in Education

Today Beck Pitt and I travelled up to Birmingham in the midlands of the UK to attend a BERA/Wiley workshop on technologies and ethics in educational research.  I’m mainly here on focus on the redraft of the Ethics Manual for OER Research Hub and to give some time over to thinking about the ethical challenges that can be raised by openness.  The first draft of the ethics manual was primarily to guide us at the start of the project but now we need to redraft it to reflect some of the issues we have encountered in practice.

Things kicked off with an outline of what BERA does and the suggestion that consciousness about new technologies in education often doesn’t filter down to practitioners.  The rationale behind the seminar seems to be to raise awareness in light of the fact that these issues are especially prevalent at the moment.

This blog post may be in direct contravention of the Chatham convention

This blog post may be in direct contravention of the Chatham convention

We were first told that these meetings would be taken under the ‘Chatham House Rule’ which suggests that participants are free to use information received but without identifying speakers or their affiliation… this seems to be straight into the meat of some of the issues provoked by openness:  I’m in the middle of life-blogging this as this suggestion is made.  (The session is being filmed but apparently they will edit out anything ‘contentious’.)

Anyway, on to the first speaker:


Jill Jameson, Prof. of Education and Co-Chair of the University of Greenwich
‘Ethical Leadership of Educational Technologies Research:  Primum non noncere’

The latin part of the title of this presentation means ‘do no harm’ and is a recognised ethical principle that goes back to antiquity.  Jameson wants to suggest that this is a sound principle for ethical leadership in educational technology.

After outlining a case from medical care Jameson identified a number of features of good practice for involving patients in their own therapy and feeding the whole process back into training and pedagogy.

  • No harm
  • Informed consent
  • Data-informed consultation on treatment
  • Anonymity, confidentiality
  • Sensitivity re: privacy
  • No coercion
  • ‘Worthwhileness’
  • Research-linked: treatment & PG teaching

This was contrasted with a problematic case from the NHS concerning the public release of patient data.  Arguably very few people have given informed consent to this procedure.  But at the same time the potential benefits of aggregating data are being impeded by concerns about sharing of identifiable information and the commercial use of such information.

In educational technology the prevalence of ‘big data’ has raised new possibilities in the field of learning analytics.  This raises the possibility of data-driven decision making and evidence-based practice.  It may also lead to more homogenous forms of data collection as we seek to aggregate data sets over time.

The global expansion of web-enabled data presents many opportunities for innovation in educational technology research.  But there are also concerns and threats:

  • Privacy vs surveillance
  • Commercialisation of research data
  • Techno-centrism
  • Limits of big data
  • Learning analytics acts as a push against anonymity in education
  • Predictive modelling could become deterministic
  • Transparency of performance replaces ‘learning
  • Audit culture
  • Learning analytics as models, not reality
  • Datasets >< information and stand in need of analysis and interpretation

Simon Buckingham-Shum has put this in terms of a utopian/dystopian vision of big data:

Leadership is thus needed in ethical research regarding the use of new technologies to develop and refine urgently needed digital research ethics principles and codes of practice.  Students entrust institutions with their data and institutions need to act as caretakers.

I made the point that the principle of ‘do no harm’ is fundamentally incompatible with any leap into the unknown as far as practices are concerned.  Any consistent application of the principle leads to a risk-averse application of the precautionary principle with respect to innovation.  How can this be made compatible with experimental work on learning analytics and sharing of personal data?  Must we reconfigure the principle of ‘do no harm’ so it it becomes ‘minimise harm’?  It seems that way from this presentation… but it is worth noting that this is significantly different to the original maxim with which we were presented… different enough to undermine the basic position?


Ralf Klamma, Technical University Aachen
‘Do Mechanical Turks Dream of Big Data?’

Klamma started in earnest by showing us some slides:  Einstein sticking his tongue out; stills from Dr. Strangelove; Alan Turing; a knowledge network (citation) visualization which could be interpreted as a ‘citation cartel’.  The Cold War image of scientists working in isolation behind geopolitical boundaries has been superseded by building of new communities.  This process can be demonstrated through data mining, networking and visualization.

Historical figures of the like of Einstein and Turing are now more like nodes on a network diagram – at least, this is an increasingly natural perspective.  The ‘iron curtain’ around research communities has dropped:

  • Research communities have long tails
  • Many research communities are under public scrutiny (e.g. climate science)
  • Funding cuts may exacerbate the problem
  • Open access threatens the integrity of the academy (?!)

Klamma argues that social network analysis and machine learning can support big data research in education.  He highlights the US Department of Homeland Security, Science and Technology, Cyber Security Division publication The Menlo Report: Ethical Principles Guiding Information and Communication Technology Research as a useful resource for the ethical debates in computer science.  In the case of learning analytics there have been many examples of data leaks:

One way to approach the issue of leaks comes from the TellNET project.  By encouraging students to learn about network data and network visualisations they can be put in better control of their own (transparent) data.  Other solutions used in this project:

  • Protection of data platform: fragmentation prevents ‘leaks’
  • Non-identification of participants at workshops
  • Only teachers had access to learning analytics tools
  • Acknowledgement that no systems are 100% secure

In conclusion we were introduced to the concept of ‘datability‘ as the ethical use of big data:

  • Clear risk assessment before data collection
  • Ethcial guidelines and sharing best pracice
  • Transparency and accountability without loss of privacy
  • Academic freedom

Fiona Murphy, Earth and Environmental Science (Wiley Publishing)
‘Getting to grips with research data: a publisher perspective’

From a publisher perspective, there is much interest in the ways that research data is shared.  They are moving towards a model with greater transparency.  There are some services under development that will use DOI to link datasets and archives to improve the findability of research data.  For instance, the Geoscience Data Journal includes bi-direction linking to original data sets.  Ethical issues from a publisher point of view include how to record citations and accreditation; manage peer review and maintenance of security protocols.

Data sharing models may be open, restricted (e.g. dependent on permissions set by data owner) or linked (where the original data is not released but access can be managed centrally).

[Discussion of open licensing was conspicuously absent from this though this is perhaps to be expected from commercial publishers.]


Luciano Floridi, Prof. of Philosophy & Ethics of Information at The University of Oxford
‘Big Data, Small Patterns, and Huge Ethical Issues’

Data can be defined by three Vs: variety, velocity, and volume. (Options for a fourth have been suggested.)  Data has seen a massive explosion since 2009 and the cost of storage is consistently falling.  The only limits to this process are thermodynamics, intelligence and memory.

This process is to some extent restricted by legal and ethical issues.

Epistemological Problems with Big Data: ‘big data’ has been with us for a while generally should be seen as a set of possibilities (prediction, simulation, decision-making, tailoring, deciding) rather than a problem per se.  The problem is rather that data sets have become so large and complex that they are difficult to process by hand or with standard software.

Ethical Problems with Big Data: the challenge is actually to understand the small patterns that exist within data sets.  This means that many data points are needed as ways into a particular data set so that meaning can become emergent.  Small patterns may be insignificant so working out which patterns have significance is half the battle.  Sometimes significance emerges through the combining of smaller patterns.

Thus small patterns may become significant when correlated.  To further complicate things:  small patterns may be significant through their absence (e.g. the curious incident of the dog in the night-time in Sherlock Holmes).

A specific ethical problem with big data: looking for these small patterns can require thorough and invasive exploration of large data sets.  These procedures may not respect the sensitivity of the subjects of that data.  The ethical problem with big data is sensitive patterns: this includes traditional data-related problems such as privacy, ownership and usability but now also includes the extraction and handling of these ‘patterns’.  The new issues that arise include:

  • Re-purposing of data and consent
  • Treating people not only as means, resources, types, targets, consumers, etc. (deontological)

It isn’t possible for a computer to calculate every variable around the education of an individual so we must use proxies:  indicators of type and frequency which render the uniqueness of the individual lost in order to make sense of the data.  However this results in the following:

  1. The profile becomes the profiled
  2. The profile becomes predictable
  3. The predictable becomes exploitable

Floridi advances the claim that the ethical value of data should not be higher than the ethical value of that entity but demand at most the same degree of respect.

Putting all this together:  how can privacy be protected while taking advantage of the potential of ‘big data’?.  This is an ethical tension between competing principles or ethical demands: the duties to be reconciled are 1) safeguarding individual rights and 2) improving human welfare.

  • This can be understood as a result of polarisation of a moral framework – we focus on the two duties to the individual and society and miss the privacy of groups in the middle
  • Ironically, it is the ‘social group’ level that is served by technology

Five related problems:

  • Can groups hold rights? (it seems so – e.g. national self-determination)
  • If yes, can groups hold a right to privacy?
  • When might a group qualify as a privacy holder? (corporate agency is often like this, isn’t it?)
  • How does group privacy relate to individual privacy?
  • Does respect for individual privacy require respect for the privacy of the group to which the individual belongs? (big data tends to address groups (‘types’) rather than individuals (‘tokens’))

The risks of releasing anonymised large data sets might need some unpacking:  the example given was that during the civil war in Cote d’Ivoire (2010-2011) Orange released a large metadata set which gave away strategic information about the position of groups involved in the conflict even though no individuals were identifiable.  There is a risk of overlooking group interests by focusing on the privacy of the individual.

There are legal or technological instruments which can be employed to mitigate the possibility of the misuse of big data, but there is no one clear solution at present.  Most of the discussion centred upon collective identity and the rights that might be afforded an individual according to groups they have autonomously chosen and those within which they have been categorised.  What happens, for example, if a group can take a legal action but one has to prove membership of that group in order to qualify?  The risk here is that we move into terra incognito when it comes to the preservation of privacy.


Summary of Discussion

Generally speaking, it’s not enough to simply get institutional ethical approval at the start of a project.  Institutional approvals typically focus on protection of individuals rather than groups and research activities can change significantly over the course of a project.

In addition to anonymising data there is a case for making it difficult to reconstruct the entire data set so as to stop others from misuse.  Increasingly we don’t even know who learners are (e.g. MOOC) so it’s hard to reasonably predict the potential outcomes of an intervention.

The BERA guidelines for ethical research are up for review by the sounds of it – and a working group is going to be formed to look at this ahead of a possible meeting at the BERA annual conference.

Share
Posted in ethics, events, liveblog, technology | Tagged , , , , , , , , | Leave a comment

My ORO report

I’ve just a quick look at my author report from the ORO repository of research published by members of The Open University.  I’m quite surprised to learn that I’ve accrued almost 1,300 downloads of materials I have archived here!

An up to date account of my ORO analytics can be found at http://oro.open.ac.uk/cgi/stats/report/authors/31087069bed3e4363443db857ead0546/. I suppose a 50% strike rate for open access publication ain’t bad… but there is probably room for improvement…

Share
Posted in big data, housekeeping | Tagged , , , , , , | Leave a comment