Dé Onderwijsdagen Pre Conference : ‘Digital testing’

Posted on November 15th, 2014 at 4:11 pm by Sally Jordan

P1000739It has been quite a week. On Wednesday I sat on the edge of my seat in the Berrill Lecture theatre at the UK Open University waiting to see if Rosetta’s lander Philae, complete with the Ptolomy instrumentation developed by, amongst others, colleagues in our Department of Physical Sciences, would make it to the surface of Comet 67P, Churyumov–Gerasimenko. I’m sure that everyone knows by now that it did, and despite the fact that the lander bounced and came to rest in a non-optimal position, some incredible scientific data has been received; so there is lots more for my colleagues to do! Incidentally the photo shows a model of the lander near the entrance to our Robert Hooke building.

Then on Friday, we marked the retirement of Dr John Bolton, who has worked for the Open University for a long time and made huge contributions. In particular, John is one of the few who has really analysed student engagement with interactive computer-marked assessment questions. More on that to follow in a later posting; John has been granted visitor status and we are hoping to continue to work together.

World Trade Center, RotterdamHowever, a week ago I was just reaching Schiphol airport prior to a day in Amsterdam on Sunday and then delivering the keynote presentation at Dé Onderwijsdagen (‘Education Days’) Pre Conference : ‘Digital testing’ at the Beurs World Trade Center in Rotterdam. It was wonderful to be speaking to an audience of about 250 people, all of whom had chosen to come to a meeting about computer-based assessment and its impact on learning. Even more amazing if you consider that the main conference language was Dutch, so these people were all from The Netherlands, a country with a total population about a quarter the size of the UK.

There is some extremely exciting work going on in the Netherlands, with a programme on ‘Testing and Test-Driven Learning’ run by SURF. Before my keynote we heard about the testing of students’ interpretation of radiological images – it was lovely to see the questions close to the images (one of the things I went on to talk about was the importance of good assessment design) – and about ‘the Statistics Factory’, running an adaptive test in a gaming environment. This linked nicely to my finding that students find quiz questions ‘fun’ and that even simple question types can lead to deep learning. Most exciting is the emphasis on learning rather than on the use of technology for the sake of doing so.

I would like to finish this post by highlighting some of the visions/conclusions from my keynote:

1. To assess MOOCS and other large online courses, why don’t we start off by using peer assessment to mark short answer questions. Because of the large student numbers this would lead to accurate marking of a large number of responses, with only minimal input from an expert marker. Then we could use these marked  responses and machine learning to develop Pattern Match type answer matching, to allow automatic marking for subsequent cohorts of students.

2. Instead of sharing completed questions, let’s share the code behind the questions so that users can edit as appropriate. In other words, let’s be completely open.

3. It is vitally important to evaluate the impact of what we do and to develop questions iteratively. And whilst the large student numbers at the UK Open University mean that the use of computer-marked assessment has saved us money, developing high-quality questions does not come cheap.

4. Computer-marked assessment has a huge amount to commend it, but I still don’t see it as a panacea. I still think that there are things (e.g. the marking of essays) that are better done by humans. I still think it is best to use computers to mark and provide instantaneous feedback on relatively simple question types, freeing up human time to help students in the light of improved knowledge of their misunderstandings (from the simple questions) and to mark more sophisticated tasks.

The videos from my keynote and the other presentations are at http://www.deonderwijsdagen.nl/videos-2014/

MOOCs: same or different?

Posted on November 2nd, 2014 at 8:35 am by Sally Jordan

Last week’s EDEN Research Workshop was thought-provoking in many ways. Incidentally, I think that was largely because of the format that discouraged long presentations and encouraged discussion and reflection. I thought this would irritate me but it didn’t.

One of the questions that the workshop prompted for me (and, if the ‘fishbowl’ discussion at the end is to be believed, for others too) is the extent to which our wealth of previous research into student engagement with open and distance learning (especially when online) is relevant to MOOCs. Coincidentally, my [paper!] copy of  the November issue of Physics World arrived yesterday, and a little piece entitled “Study reveals value of online learning” lept out and hit me. It’s about work at MIT that has pre- and post-tested participants on a mechanics MOOC. The details are at:

Colvin, K. F., Champaign, J., Liu, A., Zhou, Q., Fredericks, C., & Pritchard, D. E. (2014). Learning in an introductory physics MOOC: All cohorts learn equally, including an on-campus class. The International Review of Research in Open and Distance Learning, 15(4), 263-282.
They found that students, whether well or poorly prepared, learnt well. The Physics World article comments  that David Pritchard, the leader of the MIT researchers “finds it ‘rather dismaying’ that non-one else has published about whether learning takes place, given that there are thousands of MOOCs available”. I agree with Pritchard that we need more robust research into the effectiveness of MOOCs. However, I come back to the same point: To what extent does everything we know about open and distance learning apply?
I used to get really annoyed that people talked about MOOCs as if what they are doing is entirely new, and when the Physics World article goes on to compare MOOCs with traditional classroom learning, as if nothing else has existed, I feel that annoyance surfacing. However, at EDEN, I suddenly realised that there are some fundamental differences. Most people studying MOOCs are already well qualified; that is increasingly not the case for our typical Open University students. I accept that the MIT work has looked at “less well prepared” MOOC-studiers, and that is very encouraging, but I wonder if it is appropriate to generalise or to attempt support such a wide spectrum of different learners in the same way. Secondly, most work on the impact of educational interventions considers students who are retained, and the MIT study is no exception; they only considered students who had engaged with 50% or more of the tasks; if my maths is right that was about 6% of those initially registered. Much current work at the Open University rightly focuses on retaining our students; all our students. Then of course there are differences in length of module and typical study intensity, and so on.
I suppose an appropriate conclusion is that MOOC developers should both learn from and inform developers of more conventional open and distance learning modules. And I note that the issue of The International Review of Research in Open and Distance Learning that follows the one in which the MIT work is reported, is a special, looking at research into MOOCs. That’s good.

The Unassessors

Posted on October 31st, 2014 at 8:07 am by Sally Jordan

Radcliffe Camera, OxfordFollowing a small group discussion at the 8th EDEN (European Distance and E-learning Network) Research Workshop in Oxford earlier in the week, I accepted the task of standing up to represent our group in saying that our radical change would be to do away with assessment. It was, of course, somewhat tongue in cheek, and we didn’t really mean that we would do away with assessment entirely, rather that we would radically alter its current form. Assessment is so often seen as “the problem” in education, “the tail wagging the dog” and we spend a huge amount of money and time on it, so a radical appraisal is perhaps overdue; as others who are wiser than me have said before. We should, at the very least, stop and think what we really want from our assessment; despite the longstanding assessment for learning/assessment of learning debate, I still don’t think we really know.

You’ll note that I am using the word “we” in the previous paragraph. That’s deliberate, because I am including the whole assessment community in this (researchers and practitioners); I am certainly not just talking about my own University. I feel the need to explain that point because the rapporteur at the EDEN Research Workshop managed to rather misunderstand my paper and so to criticise the Open University’s current assessment practice as being the same as it was 25 years ago. It is my fault entirely for not making it clearer who I am and what I was trying to say; because I am basically a practitioner, and proud of it, I suffer quite a lot from people not appreciating the amount of reading and thinking that I have done.  The rapporteur was absolutely right to be critical; that’s what the role is about and I am very grateful to him for making me review my standpoint. It is also true – as I say frequently – that we all, including those of us at the Open University, should learn from others. However, I’d ask whether any distance learning provider does much better.

There is a related point, relating to the extent to which change should be evolutionary or revolutionary. It is simply not true that Open University assessment practice is the same as it was 25 years ago: 25 years ago, our tuition was all face to face (we now make extensive use of synchronous and asynchronous online tools); our tutor-marked assignments were submitted through the post; our use of computer-marked assessment was limited to multiple-choice questions with responses recorded with a pencil on machine-readable forms (no instantaneous, graduated and targeted feedback; no constructed response questions and certaintly no short-answer free text questions); we made considerably less use of end-of-module assignments, oral assessment, assessment of collaborative activity, peer assessment. Things have changed quite a lot! However, the fundamental structures and many of the policies remain the same. Our students seem happy with what we do, but nevertheless perhaps it is time for change. Perhaps that’s true of other universities too!

Tails wagging dogs

Posted on October 30th, 2014 at 7:21 am by Sally Jordan

For those readers who are not native English speakers, I need to explain the title of this post. It refers to something that should be a minor factor becoming the important factor in decision making. In this case, I am exploring whether we are sometimes driven by a desire to use technology for the sake of doing so. Although I think that technology has a huge amount to offer (in my context, on student learning), surely the most important thing is doing the best we can for our students.

The example I’d like to give is the electronic management of assessment (i.e. the electronic submission, marking and return of tutor-marked assignments). It is something on which the Open University is ahead of others, and we have already faced up to some of the issues (e.g. staff resistance) that others are just encountering. However I am anxious that a requirement to submit and mark work electronically sometimes affects the way students have to produce their work, and the way tutors are required to mark it – not necessarily for the better.

I’m a physicist and so our assignments frequently require students to make extensive use of symbolic notation and graphs etc. It is important that students learn to lay out their answers correctly, and for the less technically savvy, it can be easier to do this in handwritten work. I accept that producing high quality work electronically is perhaps a skill that our students should have, and I don’t want to be a dinosaur, so let me concentrate on marking and commenting.

On handwritten work, our tutors are used to commenting on and correcting work at exactly the place where an error is made e.g. correcting an equation, showing what is wrong with layout, redrawing a graph. Tablet devices enable a similar approach when marking electronically. But not all tutors have these devices and they are sometimes fiddly to use, and some tutors are left attempting to mark using just a word-processing package. This can lead to a rather different style of commenting, making rather more general comments. It is possible that this different style is ‘better’. However my point is that the way in which we give feedback is being driven by technology. The tail is wagging the dog.

ViCE/PHEC 2014

Posted on September 5th, 2014 at 7:15 pm by Sally Jordan

The ‘interesting’ title of this post relates to the joint Variety in Chemistry Education/Physics Higher Education Conference that I was on my way home from a week ago. Apologies for my delay in posting, but since then I have celebrated my birthday, visited my elderly in-laws, moved into new Mon-Fri accommodation, joined a new choir, celebrated Richard’s and my 33rd wedding anniversary – and passed the viva for my PhD by publication, with two typos to correct and one minor ‘point of clarification’. It has been an amazing week!

The conference was pretty good too. It was held at the University of Durham whose Physics Department (and, obviously, Cathedral) is much as it was when I graduated more than 36 years ago. However most of the sessions were held in the new and shiny Calman Learning Centre (with the unnervingly named Arnold Wolfendale Lecture Theatre, since I remember Professor Wolfendale very well from undergraduate days). There were lots more chemists than physicists, I don’t really know why, and lots of young enthusiastic university teaching fellows. Great!

Sessions that stood out for me include the two inspirational keynotes and both of the workshops that I attended, plus many conversations with old and new friends. The first keynote was given by Simon Lancaster from UEA and its title was ‘Questioning the lecture’. He started by telling us not to take notes on paper, but instead to get onto social media. I did, though I find it difficult  to listen and to post meaningful tweets at the same time. Is that my age? However I agree with a huge amount of what Simon said, in particular that we should cut out lots of the content that we currently teach.

Antje Kohnle’s keynote on the second day had a very different style. Antje is from the University of St Andrews and she was talking about the development of simulations to make it easier for students to visualise some of the conterintuitive concepts in quantum mechanics. The resource that has been developed is excellent, but the important point that Antje emphasised is the need to develop resources such as this iteratively, making use of feedback from students. Absolutely!

The two workshops that I so much enjoyed were (1) ‘Fostering learning improvements in physics’, a thoughtful reflection, led by Judy Hardy and Ross Galloway from the University of Edinburgh, on the implications of the FLIP Project; and do (2) the interestingly named (from a student comment)  ‘I don’t know much about physics, but I do know buses’ led by Peter Sneddon at the University of Glasgow, looking at questions designed to test students’ estimation skills and their confidence in estimation.

The quality of the presentations was excellent, bearing in mind that some people were essentially enthusiastic teachers whilst others were further advanced in their understanding of educational research. I raised the issue of correlation not implying causality at one stage, but immediately wished that I hadn’t. I think that, by and large, the interventions that were being described are ‘good things’ and of course it is almost impossibly difficult to prove that it is your intervention that has resulted in the improvement that you see.

In sessions and informal discussion with colleagues, the topics that kept stricking me were (1) the importance of student confidence; (2) reasons for underperformance (by several measures) of female students. We are already planning a workshop for next year!

Oh yes, and Durham’s hills have got hillier…

Implications of evaluation of Formative Thresholded Assessment 1

Posted on August 10th, 2014 at 5:36 pm by Sally Jordan

As I said in my last post, the most powerful finding from this evaluation is the fact that many of our students, and also many of our staff, have a very poor understanding of our assessment strategies. And it is not just the ‘new’ assessment strategies that they don’t understand; they also have poor understanding of the more conventional assessment strategies that we have been using for years and years. So what are the issues? I believe that whilst the first of these may be particular to the Open University, the others are of more general applicability.

1. Our students (and a worryingly large number of our staff) assume that when a module has summative continuous assessment, this contributes to their overall score. The reality is…. it does and it doesn’t. In order to pass a module, or to achieve a particular grade of pass, a student needs to get above a certain threshold in both the continuous assessment and the examinable component separately. So even if you do exceptionally well in the continuous assessment, you will still fail the module if you don’t do sufficiently well in the examinable component. I just don’t think we make this point sufficiently clear to our students. For example, I still come across text that talks about a 50:50 weighting of continuous assessment and examinable component. Given that you have to pass separate thresholds for the two, the weighting is a complete red herring!

2. My second point is a general conclusion from my first. We need to make our assessment strategies clear.

3. We  need to avoid unnecessary complication. Some of our assessment strategies are incredibly complex. The complexity has usually arisen because of someone wanting to do something innovative, creative, and with the best interests of our students in mind. But sometimes it is just too much..

4. We need to adopt practice that is consistent or at least coherent across a qualification, rather than having different strategies on each module. More on that to follow. I have no wish to stymie creativity, but when you stand back and look at the variations in practice from module to module, it is not at all surprising that students get confused.

Formative thresholded assessment – some evaluation findings

Posted on May 30th, 2014 at 6:00 pm by Sally Jordan

I haven’t said much about the Open University Science Faculty’s move to formative thresholded assessment (first introduced here), or our evaluation of it, or our next steps. So much to catch up on…There is more on all aspects in the poster shown below, but you won’t be able to read the detail, so I’ll explain what we have found in a series of posts.

First of all a reminder of what we mean by formative thresholded assessment: Students are required to demonstrate engagement by getting over a threshold of some sort in their continuous assessment but their final module grade is determined by the module’s examinable component alone.

Two models of formative thresholded assessment are being trialled:
(a)  Students are required to demonstrate engagement by reaching a threshold (usually 30%) in, say, 5 out of 7 assignments;
(b)  Assignments are weighted and students are required to reach a threshold (usually 40%) overall.

And what of the findings? Here are some of the headlines:

  • Assignment submission rates are slightly lower than with summative continuous assessment, but there appear to be no substantial changes as a result of the change in assessment practice; 
  • Students who submit all assignments do better in the examinable component (unlikely to be a causal effect); however for some students, choosing to omit continuous assessment components in order to concentrate on revision appears to have been a sensible strategy;
  • There were some problems when students were taking a module with summative continuous assessment and a module with formative continuous assessment concurrently, especially when assignments were due on the same date.

However, to my mind, our most significant finding has been that many students and tutors have a poor understanding of our assessment strategies, including conventional summative continuous assessment. This is in line with a frequently found result that students have poor understanding of the nature and function of assessment – and we need to do something about it.

Hello again…and take care

Posted on May 29th, 2014 at 10:27 am by Sally Jordan

Apologies for my lack of activity on this site in the past couple of months; paradoxically this is because I have been so busy with assessment work. So expect various thoughts in the next couple of months, as I write up our evaluation of formative thresholded assessment and complete the covering paper for my PhD by publication “E-assessment for learning? Exploring the potential of computer-marked assessment and computer-generated feedback, from short-answer questions to assessment analytics”.

For now, just another warning about the need to interpret things carefully. It relates to the figure in my previous post. It would be easy to look at this figure and assume that most students attempt formative questions twice; this is not true, as the figure below shows. Most students attempt most questions once, but a few people attempt them many times (resulting a mean number of attempts that is about twice the number of users for each question). You’ll also note that, for the particular example shown on the right-hand side below, a higher than average number of students attempted the question four times – I’ll explain the reason for this in a future post.

Why learning analytics have to be clever

Posted on March 28th, 2014 at 5:32 pm by Sally Jordan


I am surprised that I haven’t posted the figure below previously, but I don’t think I have.

It shows the number of independent users (dark blue) and usages (light blue) on a question by question basis. So the light blue bars include repeating of whole questions.

This is for a purely formative interactive computer-marked assignment (iCMA) and the usage drops off in exactly the same way for any purely formative quiz. Before you tell me that this iCMA is too long, I think I’d agree, but note that you get exactly the same attrition – both within and between iCMAs – if you split the questions up into separate iCMAs. And in fact the signposting in this iCMA was so good that sometimes the usage INCREASES from one question to the next. This is for sections (in this case chemistry!) that the students find hard.

The point of this post though is to highlight the danger of just saying that a student clicked on one question and therefore engaged with the iCMA. Just how many questions did that student attempt? How deeply did they engage? The situation becomes even more complicated if you consider the fact that there are around another 100 students who clicked on this iCMA but didn’t complete any questions (and again, this is common for all purely formative quizzes). So be careful, just using clicks (on a resource of any sort) does not tell you much about engagement.

What gets published and what people read

Posted on March 23rd, 2014 at 6:24 pm by Sally Jordan

I doubt this will be my final ‘rant of the day’ for all time, but it will exhaust the stock of things I’m itching to say at the current time. This one relates not to the use and misuse of statistics but rather to rituals surrounding the publication of papers. I’ll stay clear of the debate surrounding open access; this is more about what gets published and what doesn’t!

I have tried to get papers published in Assessment & Evaluation in Higher Education (AEHE). They SAY they are “an established international peer-reviewed journal which publishes papers and reports on all aspects of assessment and evaluation within higher education. Its purpose is to advance understanding of assessment and evaluation practices and processes, particularly the contribution that these make to student learning and to course, staff and institutional development”  but my data-rich submission (later published in amended form as Jordan (2012) “Student engagement with assessment and feedback: Some lessons from short-answer free-text e-assessment questions”) didn’t even get past the editor. Ah well, whilst I think it was technically in scope, I have to admit that it was quite different from most of what is published in AEHE. I should have been more careful.

My gripe on this is two-fold: firstly if you happen to research student engagement with e-assessment, as I do, you’re left with precious few places to publish. Computers & Education, the British Journal of Education Technology and Open Learning have come to my rescue, but I’d prefer to be publishing in an assessment journal, and my research has implications that go way beyond e-assessment (and before anyone mentions if, whilst I read the CAA Conference papers, I’m not convinced that many others do, and the International Journal of e-Assessment (IJEA) seems to have folded. Secondly, whilst I read AEHE regularly, and think that there is so excellent stuff there, I also think there are too many unsubstantiated opinion pieces and (even more irritating) so-called research papers that draw wide-ranging conclusions from, for example, self-reported behaviour of small numbers of students. OK, sour grapes over.

In drafting my covering paper for my PhD by publication, one of the things I’ve done is look at the things that are said by the people who cite my publications. Thank you, lovely people, for your kind words. But I have been amused by the number of people who have cited my papers for things that weren’t really the primary focus of that paper. In particular, Jordan & Mitchell (2009) was primarily about the introduction of short-answer free-text questions, using Intelligent Assessment Technologies (IAT) answer matching. But lots of people cite this paper for its reference to OpenMark’s use of feedback. Ross, Jordan & Butcher (2006) or Jordan (2011) say a lot more about our use of feedback. Meanwhile, whilst Butcher & Jordan (2010) was primarily about the introduction of pattern matching software for answer matching (instead of software that uses the NLP techniques of information extraction), you’ve guessed it, lots of people cite Butcher & Jordan (2010) rather than Jordan & Mitchell (2009) when talking about the use of NLP and information extraction.

Again, in a sense this is my own fault. In particular, I’ve realised that the abstract of Jordan & Mitchell (2009) says a lot about our use of feedback and I’m guessing that people are drawn by that. They may or may not be reading the paper in great depth before citing it. Fair enough.

However I am learning that publishing papers is a game with unwritten rules that I’m only just beginning to understand. I always knew I was a late developer.