ViCE/PHEC 2014

Posted on September 5th, 2014 at 7:15 pm by Sally Jordan

The ‘interesting’ title of this post relates to the joint Variety in Chemistry Education/Physics Higher Education Conference that I was on my way home from a week ago. Apologies for my delay in posting, but since then I have celebrated my birthday, visited my elderly in-laws, moved into new Mon-Fri accommodation, joined a new choir, celebrated Richard’s and my 33rd wedding anniversary – and passed the viva for my PhD by publication, with two typos to correct and one minor ‘point of clarification’. It has been an amazing week!

The conference was pretty good too. It was held at the University of Durham whose Physics Department (and, obviously, Cathedral) is much as it was when I graduated more than 36 years ago. However most of the sessions were held in the new and shiny Calman Learning Centre (with the unnervingly named Arnold Wolfendale Lecture Theatre, since I remember Professor Wolfendale very well from undergraduate days). There were lots more chemists than physicists, I don’t really know why, and lots of young enthusiastic university teaching fellows. Great!

Sessions that stood out for me include the two inspirational keynotes and both of the workshops that I attended, plus many conversations with old and new friends. The first keynote was given by Simon Lancaster from UEA and its title was ‘Questioning the lecture’. He started by telling us not to take notes on paper, but instead to get onto social media. I did, though I find it difficult  to listen and to post meaningful tweets at the same time. Is that my age? However I agree with a huge amount of what Simon said, in particular that we should cut out lots of the content that we currently teach.

Antje Kohnle’s keynote on the second day had a very different style. Antje is from the University of St Andrews and she was talking about the development of simulations to make it easier for students to visualise some of the conterintuitive concepts in quantum mechanics. The resource that has been developed is excellent, but the important point that Antje emphasised is the need to develop resources such as this iteratively, making use of feedback from students. Absolutely!

The two workshops that I so much enjoyed were (1) ‘Fostering learning improvements in physics’, a thoughtful reflection, led by Judy Hardy and Ross Galloway from the University of Edinburgh, on the implications of the FLIP Project; and do (2) the interestingly named (from a student comment)  ‘I don’t know much about physics, but I do know buses’ led by Peter Sneddon at the University of Glasgow, looking at questions designed to test students’ estimation skills and their confidence in estimation.

The quality of the presentations was excellent, bearing in mind that some people were essentially enthusiastic teachers whilst others were further advanced in their understanding of educational research. I raised the issue of correlation not implying causality at one stage, but immediately wished that I hadn’t. I think that, by and large, the interventions that were being described are ‘good things’ and of course it is almost impossibly difficult to prove that it is your intervention that has resulted in the improvement that you see.

In sessions and informal discussion with colleagues, the topics that kept stricking me were (1) the importance of student confidence; (2) reasons for underperformance (by several measures) of female students. We are already planning a workshop for next year!

Oh yes, and Durham’s hills have got hillier…

Implications of evaluation of Formative Thresholded Assessment 1

Posted on August 10th, 2014 at 5:36 pm by Sally Jordan

As I said in my last post, the most powerful finding from this evaluation is the fact that many of our students, and also many of our staff, have a very poor understanding of our assessment strategies. And it is not just the ‘new’ assessment strategies that they don’t understand; they also have poor understanding of the more conventional assessment strategies that we have been using for years and years. So what are the issues? I believe that whilst the first of these may be particular to the Open University, the others are of more general applicability.

1. Our students (and a worryingly large number of our staff) assume that when a module has summative continuous assessment, this contributes to their overall score. The reality is…. it does and it doesn’t. In order to pass a module, or to achieve a particular grade of pass, a student needs to get above a certain threshold in both the continuous assessment and the examinable component separately. So even if you do exceptionally well in the continuous assessment, you will still fail the module if you don’t do sufficiently well in the examinable component. I just don’t think we make this point sufficiently clear to our students. For example, I still come across text that talks about a 50:50 weighting of continuous assessment and examinable component. Given that you have to pass separate thresholds for the two, the weighting is a complete red herring!

2. My second point is a general conclusion from my first. We need to make our assessment strategies clear.

3. We  need to avoid unnecessary complication. Some of our assessment strategies are incredibly complex. The complexity has usually arisen because of someone wanting to do something innovative, creative, and with the best interests of our students in mind. But sometimes it is just too much..

4. We need to adopt practice that is consistent or at least coherent across a qualification, rather than having different strategies on each module. More on that to follow. I have no wish to stymie creativity, but when you stand back and look at the variations in practice from module to module, it is not at all surprising that students get confused.

Formative thresholded assessment – some evaluation findings

Posted on May 30th, 2014 at 6:00 pm by Sally Jordan

I haven’t said much about the Open University Science Faculty’s move to formative thresholded assessment (first introduced here), or our evaluation of it, or our next steps. So much to catch up on…There is more on all aspects in the poster shown below, but you won’t be able to read the detail, so I’ll explain what we have found in a series of posts.

First of all a reminder of what we mean by formative thresholded assessment: Students are required to demonstrate engagement by getting over a threshold of some sort in their continuous assessment but their final module grade is determined by the module’s examinable component alone.

Two models of formative thresholded assessment are being trialled:
(a)  Students are required to demonstrate engagement by reaching a threshold (usually 30%) in, say, 5 out of 7 assignments;
(b)  Assignments are weighted and students are required to reach a threshold (usually 40%) overall.

And what of the findings? Here are some of the headlines:

  • Assignment submission rates are slightly lower than with summative continuous assessment, but there appear to be no substantial changes as a result of the change in assessment practice; 
  • Students who submit all assignments do better in the examinable component (unlikely to be a causal effect); however for some students, choosing to omit continuous assessment components in order to concentrate on revision appears to have been a sensible strategy;
  • There were some problems when students were taking a module with summative continuous assessment and a module with formative continuous assessment concurrently, especially when assignments were due on the same date.

However, to my mind, our most significant finding has been that many students and tutors have a poor understanding of our assessment strategies, including conventional summative continuous assessment. This is in line with a frequently found result that students have poor understanding of the nature and function of assessment – and we need to do something about it.

Hello again…and take care

Posted on May 29th, 2014 at 10:27 am by Sally Jordan

Apologies for my lack of activity on this site in the past couple of months; paradoxically this is because I have been so busy with assessment work. So expect various thoughts in the next couple of months, as I write up our evaluation of formative thresholded assessment and complete the covering paper for my PhD by publication “E-assessment for learning? Exploring the potential of computer-marked assessment and computer-generated feedback, from short-answer questions to assessment analytics”.

For now, just another warning about the need to interpret things carefully. It relates to the figure in my previous post. It would be easy to look at this figure and assume that most students attempt formative questions twice; this is not true, as the figure below shows. Most students attempt most questions once, but a few people attempt them many times (resulting a mean number of attempts that is about twice the number of users for each question). You’ll also note that, for the particular example shown on the right-hand side below, a higher than average number of students attempted the question four times – I’ll explain the reason for this in a future post.

Why learning analytics have to be clever

Posted on March 28th, 2014 at 5:32 pm by Sally Jordan


I am surprised that I haven’t posted the figure below previously, but I don’t think I have.

It shows the number of independent users (dark blue) and usages (light blue) on a question by question basis. So the light blue bars include repeating of whole questions.

This is for a purely formative interactive computer-marked assignment (iCMA) and the usage drops off in exactly the same way for any purely formative quiz. Before you tell me that this iCMA is too long, I think I’d agree, but note that you get exactly the same attrition – both within and between iCMAs – if you split the questions up into separate iCMAs. And in fact the signposting in this iCMA was so good that sometimes the usage INCREASES from one question to the next. This is for sections (in this case chemistry!) that the students find hard.

The point of this post though is to highlight the danger of just saying that a student clicked on one question and therefore engaged with the iCMA. Just how many questions did that student attempt? How deeply did they engage? The situation becomes even more complicated if you consider the fact that there are around another 100 students who clicked on this iCMA but didn’t complete any questions (and again, this is common for all purely formative quizzes). So be careful, just using clicks (on a resource of any sort) does not tell you much about engagement.

What gets published and what people read

Posted on March 23rd, 2014 at 6:24 pm by Sally Jordan

I doubt this will be my final ‘rant of the day’ for all time, but it will exhaust the stock of things I’m itching to say at the current time. This one relates not to the use and misuse of statistics but rather to rituals surrounding the publication of papers. I’ll stay clear of the debate surrounding open access; this is more about what gets published and what doesn’t!

I have tried to get papers published in Assessment & Evaluation in Higher Education (AEHE). They SAY they are “an established international peer-reviewed journal which publishes papers and reports on all aspects of assessment and evaluation within higher education. Its purpose is to advance understanding of assessment and evaluation practices and processes, particularly the contribution that these make to student learning and to course, staff and institutional development”  but my data-rich submission (later published in amended form as Jordan (2012) “Student engagement with assessment and feedback: Some lessons from short-answer free-text e-assessment questions”) didn’t even get past the editor. Ah well, whilst I think it was technically in scope, I have to admit that it was quite different from most of what is published in AEHE. I should have been more careful.

My gripe on this is two-fold: firstly if you happen to research student engagement with e-assessment, as I do, you’re left with precious few places to publish. Computers & Education, the British Journal of Education Technology and Open Learning have come to my rescue, but I’d prefer to be publishing in an assessment journal, and my research has implications that go way beyond e-assessment (and before anyone mentions if, whilst I read the CAA Conference papers, I’m not convinced that many others do, and the International Journal of e-Assessment (IJEA) seems to have folded. Secondly, whilst I read AEHE regularly, and think that there is so excellent stuff there, I also think there are too many unsubstantiated opinion pieces and (even more irritating) so-called research papers that draw wide-ranging conclusions from, for example, self-reported behaviour of small numbers of students. OK, sour grapes over.

In drafting my covering paper for my PhD by publication, one of the things I’ve done is look at the things that are said by the people who cite my publications. Thank you, lovely people, for your kind words. But I have been amused by the number of people who have cited my papers for things that weren’t really the primary focus of that paper. In particular, Jordan & Mitchell (2009) was primarily about the introduction of short-answer free-text questions, using Intelligent Assessment Technologies (IAT) answer matching. But lots of people cite this paper for its reference to OpenMark’s use of feedback. Ross, Jordan & Butcher (2006) or Jordan (2011) say a lot more about our use of feedback. Meanwhile, whilst Butcher & Jordan (2010) was primarily about the introduction of pattern matching software for answer matching (instead of software that uses the NLP techniques of information extraction), you’ve guessed it, lots of people cite Butcher & Jordan (2010) rather than Jordan & Mitchell (2009) when talking about the use of NLP and information extraction.

Again, in a sense this is my own fault. In particular, I’ve realised that the abstract of Jordan & Mitchell (2009) says a lot about our use of feedback and I’m guessing that people are drawn by that. They may or may not be reading the paper in great depth before citing it. Fair enough.

However I am learning that publishing papers is a game with unwritten rules that I’m only just beginning to understand. I always knew I was a late developer.

Evaluation, evaluation, evaluation

Posted on February 23rd, 2014 at 4:08 pm by Sally Jordan

Despite my recent ‘rants of the day’, I think it is vitally important that we try our best to evaluate our assessment practice. There is some good, innovative practice out there, but it can still be very tempting to confuse practice that we consider to be “good” with practice that we know to be good, because it has been properly – and honestly – evaluated. And, at the danger of appearing a stick in the mud, innovation does not necessarily lead to improvement.

My quote for today is from the (fictional) Chief Inspector Morse:
In the pub, with Lewis, he’d felt felt convinced he could see a cause, a sequence, a structure, to the crime… It was the same old tantalizing challenge to puzzles that had faced him ever since he was a boy. It was the certain knowledge that something had happened in the past – happened in an ordered, logical, very specific way. And the challenge had been, and still was, to gather the disparate elements of the puzzle together and to try to reconstruct that ‘very specific way’.” (from Colin Dexter’s “The remorseful day”, Chapter 22)

Honest evaluation will sometimes ‘prove’ what you expected; but sometimes there will be surprises. Sometimes good ideas don’t work and we need to reconsider. Sometimes a ‘control’ group does better than the experimental group and we need to think why. Sometimes students don’t engage with an assessment task in the way that we expect; sometimes students don’t make the mistakes that we think they will make; sometimes they make mistakes that we don’t expect.

Actually, in the long run, it is often the surprises that provide the real insights. And sometimes they can even save time and money. We would have gone on using linguistically-based software for our short-answer free-text questions had we discovered that pattern matching software was just as effective.

But whatever, we must find out…Chief Inspector Morse always got it right in the end!

A Darwinian view of feedback effectiveness

Posted on February 8th, 2014 at 9:19 pm by Sally Jordan

Please don’t treat this too seriously – but please do stop and think about what I am trying to say, in the light of the fact that the effectiveness of feedback on assessment tasks is, despite the huge amount that’s been written on the subject, poorly understood.

Many people talk about the issues that arise when the grade awarded for an assignment ‘gets in the way’ of the feedback – and this is something I have seen evidence of myself. Authors also talk in quite damnatory terms about extrinsic motivation and surface learning. However, we have to face the fact that many of our students probably have no aspiration to submit perfect work – they just want to do OK, to pass, not to fall too far behind their peers.

Now sidestep to the theory of natural selection and evolution. Individuals with advanatagous characteristics have a a greater probability of survival, and therefore of reproducing. Provided that these characteristics are inherited by offspring, individuals possessing the characteristics will become more common in the population. If something like an environmental change (a common example is a decrease in soot in the atmosphere) means that there is a change in what is advantageous (so, in the example, dark coloured moths – which were well camouflaged from their predators when the atmophere was sooty – become less well camouflaged and so more likely to be eaten) then relatively rapid evolution will be seen (in the example, light coloured moths will become more common). When there is no change in the environment, natural selection will still be taking place, but you won’t see a lot of evolution.

Now, think feedback. If a student only wants to pass and is getting pass grades and feedback that says they are doing OK, then [in their view] is there any need for them to do anything differently? Perhaps there isn’t really a ‘gap’ (Ramaprasad, 1983; Sadler, 1989) to close. Perhaps this is just the natural way of things.

More lies, damned lies and statistics

Posted on February 8th, 2014 at 8:19 pm by Sally Jordan

This second ‘rant of the day’ focuses on practice which, I think, arises from the fact that most people are not as fortunate(?) as me in having data from hundreds and thousands of students on each module each year. It also stems from a very human wish for our data to show what we want them to show.

The first problem that arises is akin to that shown in the photograph (which, I hasten to add, has nothing to do with students, at the Open University or elsewhere – it is just an image I’ve found on XPert, which seems to be about the number of people who have looked at a particular photo). Wow yes, there has been a marked increase, of …ah yes…three! (probably the photographer’s mum, sister and wife…) – and look at all those months when the photo was not viewed (I suspect because it had not been uploaded then…). This example may relate to photographs, but I have seen similarly small data sets used to ‘prove’ changes in student behaviour.

The second type of problem is slightly more complicated to explain – but I saw it in a published peer-reviewed paper that I read last week. Basically, you are likely to need a reasonably large data set in order to show a statistically significant difference between the different behaviour of different groups of students. So if your numbers are on the small side and no significant difference is shown, you can’t conclude that there isn’t a difference, just that you don’t have evidence of one.

Victorian clergymen

Posted on January 31st, 2014 at 7:14 am by Sally Jordan

This is more ‘rant of the day’ than ‘quote of the day’ but I’d like to start with a quote from my own ‘Maths for Science’ (though I’m indebted to my co-author Pat Murphy who actually wrote this bit):

” It is extremely important to appreciate that even a statistically significant correlation between two variables does not prove that changes in one variable cause changes in the other variable.

Correlation does not imply causality.

A time-honoured, but probably apocryphal, example often cited to illustrate this point is the statistically significant positive correlation reported for the late 19th Century between the number of clergymen in England and the consumption of alcoholic spirits. Both the increased number of the clergymen and the increased consumption of spirits can presumably be attributed to population growth (which is therefore regarded as a ‘confounding variable’) rather than the increase in the number of clergymen being the cause of the increased consumption of spirits of vice versa.”

Jordan, S., Ross, S. and Murphy, P. (2013) Maths for Science. Oxford: Oxford University Press. p. 302.

Now, my fellow educational researchers, have you understood that point? Correlation does not imply causality. In the past week I have read two papers, both published in respectable peer-reviewed journals and one much cited (including, I’m sad to say, by one publication on which I’m a co-author), which make the mistake of assuming that an intervention has been the cause of an effect.

In particular, if you offer students some sort of non-compulsory practice quiz, those who do the quiz will do better on the module’s summative assessment. We hope that the quiz has  helped them, and maybe it has – but we can’t prove this fact just from the fact that they have done better in a piece of assessed work.  What we mustn’t forget that it is the keen, well motivated, students who do the non-compulsory activities – and  these students are more likely to do well in the summative assessment, for all sorts of reasons (they may actually have studied the course materials for a start…).

One of the papers I’ve just mentioned tried to justify the causal claim by saying that the quiz was particularly effective for “weaker” students. The trouble is that a little investigation showed me that this claim made the logical inconsistency even worse! Firstly it assumed that weaker students are less well motivated. That may be true, but no proof was offered. Secondly, I was puzzled about where the data came from and discovered that the author was using score on the first quiz that a student did, be that formative or summative, as an indicator of student ability. But students try harder when the mark counts and their mark on a summative assignment is very likely to be higher for that reason alone. The whole argument is flawed. Oh dear…

I am deliberately not naming the papers I’m referring to, partly because I’m not brave enough and partly because I fear there are many similar cases out there. Please can we all try a little harder not to make claims unless we are sure that we can justify them.