EAMS 2016

Back in September I was delighted to be a keynote speaker at EAMS: the first ever conference on E-Assessment in the Mathematical Sciences. For more information about the conference in general click here; for a recording of my keynote click here.

The conference was a pleasant surprise in many ways. Firstly, it was truly international, with attendees from Australia, Finland, Ireland, Japan, Norway, the Netherlands, South Africa and the US as well as from all over the UK. Secondly, it was not as “techy” as I’d feared it might have been, nor did the maths go right over my head (well, not very often…). Thirdly, and most importantly, I had to tone down my keynote, which had been written to be rather critical of much e-assessment practice, because there was some fantastic stuff reported at the conference! I was particularly pleased to be a keynote speaker alongside Christian Lawson-Perfect, Chris Sangwin and Michael Gage. Thus we heard about some of the very good e-assessment systems and question types for mathematical sciences: NUMBAS, STACK and WeBWorK. I added in some detail on Pattern Match.

As I’ve said before in this blog, I am particularly impressed by STACK, and its author Chris Sangwin gave a particularly thoughtful talk on “the interplay between calculation and reasoning”, which fed very neatly into my discussion of “how far is it appropriate to go” in assessing automatically.

Of course there were talks that were less good;  the one point that I’d still want to emphasise is the need to monitor student use of questions very closely, and not to assume that they are behaving in the way that you think they are. However it was a joy to be at a meeting with such lovely people, most of whom seemed driven  by a desire to improve their students’ learning. It was also a great pleasure that Cliff Beevers, who was already an expert in this area when I was just setting out in the very early 2000s, was also at the meeting.

Posted in conferences | Tagged , , , , | Leave a comment

Failure or success?

I can be a bit of a glass half empty sort of person, but recent events have made me reflect on society’s capacity to look for failure in success.

I guess my thinking comes in part from the events surrounding the ExoMars 2016 mission, and the excitement and disappointment felt in the School of Physical Sciences at the Open University, of which I am Head. On 19th October, two things were supposed to happen, about 225 million kilometres away, in the vicinity of Mars. First of all, the Schiaparelli lander (the shiny bit on top of the Trace Gas Orbiter, of which a quarter-sized model is shown in the photograph below), which had separated from the TGO previously, was supposed to land on Mars. Well it did, after a fashion; trouble is it came in far too fast and appears to have exploded on impact. Hmmm, so that was a failure then?


Well, actually no. Schiaparelli was a “entry, descent and landing demonstrator module” – it was meant to test and demonstrate the technology. The majority of that technology worked according to plan. The heat shielding operated as anticipated during atmospheric entry, the parachute deployed correctly and further slowed the lander, the front heat shield jettisoned as planned and the landing radar was activated and began returning data. Unfortunately, Schiaparelli then released its parachute earlier than anticipated and only fired its landing thrusters for a few seconds rather than the predicted 30-second burn; shortly afterwards communication was lost.

It is important to note that most of the data which was collected during Schiaparelli’s descent has been recovered. The mission’s AMELIA team, of which the my colleague Stephen Lewis is Co-Principle Investigator, is aiming to use this descent data to reconstruct the lander’s trajectory from an altitude of around 120 km down to the point at which it lost contact. The team can then build up vertical profiles of the Martian atmosphere – e.g. temperature, pressure, wind speed – which will be compared with existing atmospheric models and simulations. This is the first landing attempted during the planet’s annual dust storm season, meaning that the results will offer unprecedented insight into the atmosphere at this time of year.

Most significantly, the majority of the science planned for this part of the ExoMars programme will be undertaken by the Trace Gas Orbiter (TGO). And whilst we were all worrying about the fate of the Lander on 19th October, the TGO had successfully entered Martian orbit. That’s a real success. The OU is closely involved closely on two of the orbiter’s instruments: Manish Patel is Co-Principle Investigator on NOMAD, an Infrared-to-UV spectrometer investigating the composition of the Martian atmosphere, and Matt Balme is a Guest Investigator on the TGO mission, working with data from the CaSSIS instrument, which will be returning high resolution, colour, stereo images of the planet. These stereo images can be used to generate 3D elevation data of the Martian surface.

So, we have lots to celebrate. But the public sees the failure, not the success. This is the point at which I return from my description of events around Mars to my usual standpoint as a commentator on assessment-related matters. We need to remember how much damage a single “failure” (e.g. a poor exam result) can do to an individual, despite previous successes. Even when failure is real, we need to learn to move on and to build success out of failure rather than the other way round.


[With thanks to PhD student Rhian Chapman, whose description of the work of the TGO and Lander I have adapted from the School of Physical Sciences’ website (http://www.open.ac.uk/science/physical-science/news/exomars-update)]


Posted in ExoMars, Success and failure | Tagged , , | Leave a comment

Keys to transforming assessment at institutional level: selected debates from AHE2016

Hot on the heals of my post about Sue Bloxham’s keynote at the Assessment in Higher Education workshop in June, this post is about the follow-up  Transforming Assessment webinar “Keys to transforming assessment at institutional level: selected debates from AHE2016.

Three talks from the AHE workshop had been selected for the webinar on the basis of the fact that they really did focus on change at the institutional level, and I thoroughly enjoyed chairing the session. If you want to watch the whole thing, click here for more information and the recordings.

The first of the three talks that we’d selected was “Changing feedback practice at an institutional level” in which Sally Brown talked about work at the University of Northumbria, Leeds Met (now Beckett) University and Anglia Ruskin University. Kay Sambell had given this talk at the earlier workshop and their conclusions were that

  • Slow transformative development has more impact than attempts at quick fixes;
  • Having money to support activities and personnel is important, but large amounts of cash doesn’t necessarily lead to major long-term impact;
  • Long-term ownership by senior managers is essential for sustainability;
  • To have credibility, activities need to be based on evidence-based scholarship;
  • Committed, passionate and convincing change agents achieve more than top-down directives.

The third of the three talks was “Changing colours: what happens when you make enhancement an imperative?” in which Juliet Williams talked about the impact of the TESTA (Transforming the Experience of Students through Assessment) Project at the University of Winchester.

However, from the conversations that I had at the workshop in June, it was the middle talk (given at the Webinar by Dave Morrison of the University of Plymouth because Amanda Sykes from the University of Glasgow was unavailable) that had inspired many of the attendees – bearing in mind that these were largely assessment practitioners not experts. The title was “Half as much but twice as good” and the important points I picked up were that

  • Timely feedback is more important than detailed feedback
  • [students are as busy as we are so] Less feedback can be more effective. If a student only reads your feedback for 30 seconds, what do you want them to take?

Capture 3

We ended with a good discussion of how to bring true institutional change.

Posted in conferences, insitutional change, timeliness of feedback | Tagged , , , , | Leave a comment

Central Challenges in Transforming Assessment at the Departmental and Institutional Level

Back on 30th June, Assessment in Higher Education (AHE) held a seminar  in Manchester with the theme of “Transforming assessment and feedback in Higher Education on a wider scale: the challenge of change at institutional level”. The idea behind the seminar was partly to hold a smaller-scale event between our increasingly large bienniel conferences, though we had well over 100 attendees.

AHE are now working in collaboration with Transforming Assessment, and we live-streamed Sue Bloxham’s keynote to a further twenty or so people around the world. Then on 13th July we had a dedicated webinar to which three selected presenters from the seminar contributed. My involvement in both of these events meant that I had a double ‘bite at the cherry’. I heard one set of presentations at the seminar itself, then I heard the three presentations on 13th July , and chaired a discussion of overlapping themes. There was some fantastic stuff.

capture 1As I try to catch up with this blog, I’ll start by describing my take on Sue’s keynote. She started from the precept that assessment remains unfit for purpose – and change is slow. She went on to outline what she described as key barriers to assessment enhancement, where the two  barriers that have most resonance with my own experience being:


  • centrally imposed change, which produces resistance. Sue’s proposed solution is that we should put the focus for change on small low-level workgroups.
  • the need for assessment literacy for staff. Here the focus must be on adequate professional development.

Sue went on to describe a  framework which might be drawn upon to create the conditions for transformation at institutional or departmental level, based around

  • key principles
  • infrastructure
  • strategy
  • assessment literacy.

Capture 2

It was inspirational; now we just need to make the change happen. Don’t take my word for it though; you can watch the recording of Sue’s keynote  here.


Posted in conferences, insitutional change | Tagged , , | 1 Comment

Fishy feedback

I don’t think that I am ever going to win prizes for my artwork. However this post is about my picture of a fish, shown below:


A month or so ago, I was invited to take part in a live event about assessment and feedback, and for some reason the three of us involved decided to base this around an ‘assessed task’ which was to draw a fish.

There were a few problems – the first one was that we hadn’t been told the criteria against which our drawings would be judged. So my colleague John drew fish and chips. Even though I did better than John did, I was unaware of the detailed criteria.

So, when our work was ‘marked’ none of us did very well. The feedback given related to efforts as marked against the criteria – which we hadn’t seen. So I was praised for my colourful fish (I just happened to have a red pen in my hand) but downgraded because I had not shown any water. So, despite the fact that my marks were better than those of my two colleagues, I was angry to start with – I hadn’t been told to draw a fish in water!  My emotions stopped me from listening to the rest of the feedback, which may or may not have been justified.

Finally, the mark scheme gave credit for  the presence of different sorts of fins and gills. I have no idea whether my drawing, or the other colleague’s one that I was marking, included these points or not. My knowledge of fish anatomy is somewhat limited!

So, very quickly (and not as planned by the activity) we have a list of important points for assessing and giving feedback.

1. Make sure your assessment criteria are clear – and shared with students.

2. Receiving feedback is an emotional experience; bear that in mind.

3. Make sure that students understand the language that you use in your feedback.


Posted in assessment criteria, emotional reaction, feedback | Tagged , , | Leave a comment

The importance of neutral results

This is the third posting in this morning’s trilogy about research methods, and this one was prompted by an article in this month’s issue of Physics World :

Ball, P. (May 2016), No result, no problem? Physics World, 29(5), 38-41.

Ball (quoting from others, in particular Brian Nosek of the University of Virginia) points out that ‘positive’ results are far more likely to be published that neutral or negative results. He starts by reminding us of Michelson and Morley’s famous ‘null’ result of 1887, in which there was no discernible difference in the speed of light passing in different directions through “the ether”. The failure to observe the expected result went unexplained for nearly two decades until Einstein’s special theory of relativity showed that the ether was not required in order to understand the properties of light.

Coming back to the more mundane, who wants to see endless papers that report on results that didn’t happen? The literature is already sufficiently obese. Ball points out that in some fields there are specific null-result journals. Or surely, such results could just be published in the blogosphere or on preprint servers. Another possibility is linked to the suggestion that objectives of experiments should be declared in registered reports before the data are collected – see https://osf.io/8mpji/wiki/home/. This would “improve research design, whilst also focusing research initiatives on conducting the best possible experiments rather than getting the most beautiful possible results.”

Whatever, the results do need to be out there. Not everyone is going to have a result as significant as Michelson and Morley’s, but plain honesty – and a wish to stop others wasting their time in carrying out the same experiment, believing it to be new – means that all results should be shared. This should not be seen as a waste of time, but rather an example of what Ball describes as “good, efficient scientific method”.

I’d like to take this slightly further. I have encountered educational researchers who refuse to publish a result unless it is statistically significant. To return to my starting point of this morning, I’m a numerate scientist, I like statistically significant results…but I have seen some most unfortunate consequences of this insistence on  ‘significance’ including (and no, I’m not joking) a self-justification of claiming that results are ‘significant’ at some arbitrary level e.g. 7%…PLEASE, just give your results as they are. Don’t tweak your methodology to make the results fit. Don’t claim what you shouldn’t. Recognise that appropriate qualitative research methodologies have a place alongside appropriate quantitative research methodologies – and be honest.

Posted in research methods, statistics | Tagged , | 2 Comments

The unscientific method

The title of this post is copied from another New Scientist article, this time by Sonia van Gilder Cooke, and published in Issue number 3069 (16th April 2016) on pages 39-41. The article starts “Listening to When I’m Sixty-Four by The Beatles can make you younger. This miraculous effect, dubbed ‘chronological rejuvenation’ was revealed in the journal Pyschological Science in 2011. It wasn’t a hoax, but you’d be right to be suspicious. The aim was to show how easy it is to generate statistical evidence for pretty much anything, simply by picking and choosing methods and data in ways that researchers do every day.”

The article is wider ranging than the one that I’ve just posted about here. However, what is most worrying is that it goes on to point out that dubious results are alarmingly common in many fields of science. The summary of causes of bias includes some things that I suspect I have been guilty of:

  • Wishful thinking – unconsciously biasing methods to confirm your hypothesis
  • Sneaky stats – using the statistical analysis that best supports your hypothesis
  • Burying evidence – not sharing research data so that results can be scrutinised
  • Rewriting history – inventing a new hypothesis in order to explain unexpected results
  • Tidying up – ignoring inconvenient data points and analyses in the write-up

I will discuss one cause that isn’t explicitly mentioned in the summary, namely our wish to only publish ‘positive’ results, in my next post in this morning’s trilogy:

The article goes on to suggest a number of fixes:

  • Pre-registration – publicly declaring procedures before doing a study
  • Blindfolding – deciding on a data analysis method before the data are collected
  • Sharing – making methods and data transparent and available to others
  • Collaboration – working with others to increase the rigour of experiments
  • Statistical education – acquiring the tools required to assess data meaningfully



Posted in research methods, statistics | Tagged , | Leave a comment

Simpson’s paradox

Back in November, I posted about the fact that I was going to be more bullish about the fact that I am a physicist but that I do educational research. As I try to build my confidence to say some of things that follow from that in my own voice, I’ll start by quoting some more articles I have read in the past few months.

To start with, there was a piece in New Scientist back in February (Issue number 3062, 27th February 2016, pg 35-37), by Michael Brooks and entitled “Thinking 2.0”. This article starts by pointing out Newton’s genius in recognising the hidden variable (gravity) that connects a falling apple and the rising sun. He goes on to explain that “we know that correlation does not equal causation, but we don’t grasp the depth of it” – and to point out that our sloppy understanding of  statistics can lead us into deep water.

Brooks gives a powerful hypothetical example of Simpson’s paradox, defined by Wikipedia as a paradox “in which a trend appears in different groups of data but disappears or reverses when these groups are combined” (the Wikipedia article gives some more examples and is worth reading). The example in the New Scientist article is about a clinical trial involving 400 men and 400 women that apparently shows that a new drug is effective in treating an illness – for both the men and the women. However, if you look at the 800 participants as a whole, it becomes apparent that more of those who were NOT given the drug recovered than those who received the drug. How so? Well, although the sample was nicely balanced between men and women, and half of the participants received the drug whilst half didn’t, it turns out that far more men were given the drug in this particular study, and men are much more likely to recover, whether or not they receive the drug. The men’s higher overall recovery rate masked the drug’s negative effect. This is a hypothetical example, and in a structured environment such as a clinical trial, such potential pitfalls can generally be circumnavigated. But medical – and educational – research often operates in what Brooks rightly describes as muddy waters. Controls may not be possible and we can be led astray by irrelevant, confusing or missing data.

Although I was aware of Simpson’s paradox and thought I had a reasonable understanding of ‘lies, damned lies and statistics’ it took me some time to get my head around what is going on here. We need to be really careful.

Posted in research methods, Simpson's paradox, statistics | Tagged , , | 2 Comments

Do we need assessment at all?

I’m surprised I haven’t posted on this before, but it looks as if I haven’t, and I am reminded to do so now by another New Scientist piece, this time from back in January:

Rutkin, A. (2nd Jan 2016) Robotutor is a class act. New Scientist, 3054, p. 22.

The article talks about an algorithm developed by researchers at Stanford University and Google in California which analyses students’ performance on past problems, identifies where they tend to go wrong and forms a picture of their overall knowledge.

Chris Piech from Stanford goes on to say “Our intuition tells us if you pay enough attention to what a student did as they were learning, you wouldn’t need to have them sit down and do a test.”

The first paper I heard suggesting that we might assess students by analysing their engagement with an online learning environment (rather than adding a separate test) was Redecker et al. (2012) and it blew me away.

Redecker, C., Punie, Y., & Ferrari, A. (2012). eAssessment for 21st Century Learning and Skills. In A. Ravenscroft, S. Lindstaedt, C.D. Kloos & D. Hernandez-Leo (Eds.), 21st Century Learning for 21st Century Skills (pp. 292-305). Berlin: Springer.

In reality of course, and as much discussed in this blog, I would never want to do away with interaction with humans, and there are things (e.g. essays, problem solving) where I think marking should be done by human markers. However,  if we can do away with separate tests that are just tests, I’d be delighted.

Posted in learning analytics | Tagged | 1 Comment

Positive discrimination?

plusThis isn’t really about assessment, or perhaps it is. First of all, some background. Because of a change in the dates used to establish school years where I lived when I was  small, I missed a year at primary school. So, in a sense, I was disadvantaged. But I understand that, up to a certain age, they then gave those of us affected extra marks in exams. I’ve no idea whether that was actually the case. What I do know is that if I felt I’d been given unfair advantage over others  in my more recent career (in particular as a female physicist) I would not be happy.

EqualsMy definition of equality of opportunity has to do with leveling the playing field. I once arrived at a tutorial venue to give a tutorial, having requested a ground floor room because I knew someone in a wheelchair would be there. The problem was that the venue had given me a room in a portacabin up three steps. Only three steps but the effect was the same – the student couldn’t access the tutorial (well, not until I got angry and got us moved to another room). Sometimes apparently small things can get in the way of learning, for some students not for others, and promoting equal opportunity is to do with ensuring that these “small things” are removed. In my book, equality of opportunity is not the same of positive discrimination; I’d give a student extra time in an exam if a medical condition suggested it was necessary; I would not give a student extra marks just by virtue of the medical condition. I’m happy to argue my case for that…or at least I was…

At the Open University we have found that female students do less well on one of our physics modules, and we continue to investigate the causes for this and to seek to put it right. Start here to learn more about this work. However, I’d never have thought of increasing marks just for women or others in minority groups. After all, these are averages, some women and some black students do well, even if their average attainment is lower.

Then, in my catch-up reading of old copies of New Scientist I came across an opinion piece from Joshua Sokol entitled “Mix it up”. This points out, as I know from other sources, that there can be a mismatch between scores in tests and future performance. So if women and blacks do less well in a test, and we use that test to determine entry onto a subsequent programme (in this case an Astronomy PhD) we are both disadvantaging racial minorities and women, and failing to get the best students on the subsequent programme.

By coincidence, I been trying to come to terms with all of this in the week when my Department at the Open University has been awarded Institute of Physics Juno Champion Status for our commitment to gender equality. It’s great news, but it doesn’t mean we have arrived! More thought needed, and I think my conclusion to the conundrum described in this post is probably to be careful not to judge ANYONE on a single measure.

Sokol, Joshua (9th January 2016). Mix it up. New Scientist, number 3055, p. 24.

Posted in gender, postive discrimination | Tagged , | 5 Comments