If you are interested in assessment in a distance learning context, you may be interested to know about a special edition of the journal ‘Open Learning’, with the theme of assessment. Click on the link to see the call for papers.
We have become aware of a significant difference in outcome for male and female students on our level 2 physics module; around 25% of the students on the module are women, and they are both less likely to complete the module and less likely to pass if they survive to the end. This effect is not present in our level 1 Science Module or for other scientific disciplines apart from astronomy at Level 2; and the women who get through to Level 3 do better than men.
Many theories have been proposed as to the reason for the effect, which may be related to persistent gender differences in performance on the force concept inventory – see for example Bates et al. (2013). I proposed that the effect might have been related to the assessment we use; there is evidence (e.g. Gipps & Murphy, 1994; Hazel et al., 1997) that girls are less happy with multiple-choice questions.
One of the things we have done is looked at performance differences on each interactive computer-marked assignment question. The results are summarised in the figure below (click on it to see the detail).
Points to note are as follows:
Women score less well than men on most questions, but the effect is no greater than for tutor-marked assignment questions.
The gender difference is much greater for some questions than others; but the questions with a large differences are not all questions of one type. So multiple-choice is not to blame! It appears more likely that the issue is with what the questions are assessing; there is some indication that our female students are less good at complex, abstract, problem-solving type questions.
The gender difference is much less for the final iCMA. My hypothesis is that the usual reasons for women doing less well are counter-balanced by the fact that women are more persistent; they are more likely to attempt this iCMA whilst men are more likely to reckon they have reached the required threshold on 5 out of 7 iCMAs, and so not to bother.
More work is required before I can be confident of this analysis; it is an interesting and extremely important investigation.
Bates, S., Donnelly, R., MacPhee, C., Sands, D., Birch, M., & Walet, N. R. (2013). Gender differences in conceptual understanding of Newtonian mechanics: a UK cross-institution comparison. European Journal of Physics, 34(2), 421-434.
Gipps, C. V. & Murphy, P. (1994). A fair test? Assessment, achievement and equity. Buckingham: Open University Press.
Hazel, E., Logan, P., & Gallagher, P. (1997). Equitable assessment of students in physics: importance of gender and language background. International Journal of Science Education, 19(4), 381-392.
I’m indebted to the colleague who told me about Cees van der Vleuten’s keynote at the EARLI Assessment SIG Conference in Madrid last August (http://www.earli-sig1-conference.org/cees-van-der-vleuten.php). I should perhaps point out that I am reporting third hand, so I may have got it all wrong. All that I can claim is that I am reporting on my own reaction to what I think my colleague said about what she thinks Professor van der Vleuten said…
I understand that he was talking about the assessment of professional competence, which is very important. The point that really grabbed my attention though was that since we need professionals to be able to do their job day after day, in a reliable fashion, ‘one off’ assessment, at the end of a programme of study isn’t really appropriate. Of course, one off assessment is always open to challenge – you will do less well if you have a headache on the day of the exam; you will do better if you happen to have revised the ‘right’ things. But there has been something of a backlash against continuous assessment recently, most obviously in the renewed emphasis placed on exams at the expense of coursework in UK schools (courtesy of governmental policy). Perhaps with more justification, some argue that you should assess outcomes at the end of a module rather than progress towards those outcomes and I have argued (e.g. here) that summative continuous assessment can lead to confusion over its purpose (is it formative or summative; is it for learning or of learning?).
Professor van der Vleuten’s keynote suggested that we should use ‘little and often’ continuous assessment that is very low stakes, perhaps with the stakes increasing as the module progresses – so that a student’s overall assessment record builds up slowly, in the same way that pixels build up to make a picture. Pixelated assessment. Nice!
This is a post on which I would particularly welcome comments. I am aware of the issues but distinctly lacking in solutions.
A couple of years ago I posted (here) about the fact that there are a range of skills (e.g. practical work) that are difficult to assess authentically by examination. So, in general terms, the answer is easy; we should assess these skills in better, more authentic ways. So we should be making less use of examations…
But in our distance-learning environment, we have a problem. At some stage in a qualification, we really ought to check that the student we think we are assessing is actually the person doing the work. Examinations provide us with a mechanism for doing this; student identity can be checked in the good old-fashioned way (by photo ID etc.). In conventional environments, student identity can be verified for a range of other assessed tasks too, but that is much more difficult when we simply do not meet our students. At the Open University, exams are just about the only occasion when our students are required to be physically present in a particular place (and for students for whom this is not possible, the invigilator goes to them). So we should be making more use of examinations…
As in so many of the topics I post about in this Blog, there is a tension. What’s the way forward?
Here are a few of my thoughts, and those from some colleagues. We could:
1. review what we do in “examinations” to make the assessed tasks more authentic;
2. make greater use of open book exams;
3. tell students the questions in advance, and allow notes into an examination hall;
4. is there a technical solution? If we truly crack the issue of secure home exams at scale, then the assessed tasks could perhaps be longer and more open ended, with a remote invigilator just looking in from time to time;
5. Are there any other technical solutions?
6. moving away from examinations in the conventional sense, our Masters programmes sometimes require students to turn up for an assessment ‘Poster Day’. We have had some success in replicating this in a synchronous online environment.
7. we could have an examinable component that requires a student to reflect on collaborative work in forums. The student’s tutor could then check that the student has posted what they say they have posted throughout the presentation of the module.
8. Option (6) is essentially a viva. We could extend this approach by requiring every student (or a certain percentage) to have a conversation with their tutor or a module team member (by phone or Skype etc.) about their progress through the module/qualification.
We would be extremely grateful for comments and other ideas.
“As long as we hold this image of feedback as being something that one person gives to another person to educate that person, we’ve missed the ultimate point of the feedback system…”
Sound familiar? How about
“Feedback as a concept (or the thing that happens when you talk into your microphone too close to the speaker) is simply information that goes into a system (and comes back at you with a high-pitched squeal). What happens next is where things get interesting – the postfeedback learning, which is the point of feedback in the first place.”
However it may surprise you to hear that these quotes are not from a book on assessment, but rather from “Changing on the job: Developing leaders for a complex world” by Jennifer Garvey Berger. I’ve been on a leadership course at work for much of 2014 and I’ve been thinking a lot about the concepts, especially the challenging issue academic leadership. Just how do you get the best out of clever people? The quotes highlight some extremely interesting similarities with what I have been banging on about for years, in this blog and elsewhere.
The first point of similarity is that it is not the feedback intervention itself that is significant but rather the way in which the person receiving the feedback intervention responds to it. And if the person receiving the feedback intervention is in charge of their own response, so much the better.
However, feedback, purely as information, still needs to happen. In the staff management situation, sadly sometimes people don’t appreciate when there are issues that need to be addressed. So there is a need for a very clear exchange of information. In the case of feedback on assessed tasks, this is one area where e-assessment has huge potential. Computers can give information in a non-judgemental and impersonal way, leaving the interpretation of this information to people.
It has been quite a week. On Wednesday I sat on the edge of my seat in the Berrill Lecture theatre at the UK Open University waiting to see if Rosetta’s lander Philae, complete with the Ptolomy instrumentation developed by, amongst others, colleagues in our Department of Physical Sciences, would make it to the surface of Comet 67P, Churyumov–Gerasimenko. I’m sure that everyone knows by now that it did, and despite the fact that the lander bounced and came to rest in a non-optimal position, some incredible scientific data has been received; so there is lots more for my colleagues to do! Incidentally the photo shows a model of the lander near the entrance to our Robert Hooke building.
Then on Friday, we marked the retirement of Dr John Bolton, who has worked for the Open University for a long time and made huge contributions. In particular, John is one of the few who has really analysed student engagement with interactive computer-marked assessment questions. More on that to follow in a later posting; John has been granted visitor status and we are hoping to continue to work together.
However, a week ago I was just reaching Schiphol airport prior to a day in Amsterdam on Sunday and then delivering the keynote presentation at Dé Onderwijsdagen (‘Education Days’) Pre Conference : ‘Digital testing’ at the Beurs World Trade Center in Rotterdam. It was wonderful to be speaking to an audience of about 250 people, all of whom had chosen to come to a meeting about computer-based assessment and its impact on learning. Even more amazing if you consider that the main conference language was Dutch, so these people were all from The Netherlands, a country with a total population about a quarter the size of the UK.
There is some extremely exciting work going on in the Netherlands, with a programme on ‘Testing and Test-Driven Learning’ run by SURF. Before my keynote we heard about the testing of students’ interpretation of radiological images – it was lovely to see the questions close to the images (one of the things I went on to talk about was the importance of good assessment design) – and about ‘the Statistics Factory’, running an adaptive test in a gaming environment. This linked nicely to my finding that students find quiz questions ‘fun’ and that even simple question types can lead to deep learning. Most exciting is the emphasis on learning rather than on the use of technology for the sake of doing so.
I would like to finish this post by highlighting some of the visions/conclusions from my keynote:
1. To assess MOOCS and other large online courses, why don’t we start off by using peer assessment to mark short answer questions. Because of the large student numbers this would lead to accurate marking of a large number of responses, with only minimal input from an expert marker. Then we could use these marked responses and machine learning to develop Pattern Match type answer matching, to allow automatic marking for subsequent cohorts of students.
2. Instead of sharing completed questions, let’s share the code behind the questions so that users can edit as appropriate. In other words, let’s be completely open.
3. It is vitally important to evaluate the impact of what we do and to develop questions iteratively. And whilst the large student numbers at the UK Open University mean that the use of computer-marked assessment has saved us money, developing high-quality questions does not come cheap.
4. Computer-marked assessment has a huge amount to commend it, but I still don’t see it as a panacea. I still think that there are things (e.g. the marking of essays) that are better done by humans. I still think it is best to use computers to mark and provide instantaneous feedback on relatively simple question types, freeing up human time to help students in the light of improved knowledge of their misunderstandings (from the simple questions) and to mark more sophisticated tasks.
The videos from my keynote and the other presentations are at http://www.deonderwijsdagen.nl/videos-2014/
Last week’s EDEN Research Workshop was thought-provoking in many ways. Incidentally, I think that was largely because of the format that discouraged long presentations and encouraged discussion and reflection. I thought this would irritate me but it didn’t.
One of the questions that the workshop prompted for me (and, if the ‘fishbowl’ discussion at the end is to be believed, for others too) is the extent to which our wealth of previous research into student engagement with open and distance learning (especially when online) is relevant to MOOCs. Coincidentally, my [paper!] copy of the November issue of Physics World arrived yesterday, and a little piece entitled “Study reveals value of online learning” lept out and hit me. It’s about work at MIT that has pre- and post-tested participants on a mechanics MOOC. The details are at:
Following a small group discussion at the 8th EDEN (European Distance and E-learning Network) Research Workshop in Oxford earlier in the week, I accepted the task of standing up to represent our group in saying that our radical change would be to do away with assessment. It was, of course, somewhat tongue in cheek, and we didn’t really mean that we would do away with assessment entirely, rather that we would radically alter its current form. Assessment is so often seen as “the problem” in education, “the tail wagging the dog” and we spend a huge amount of money and time on it, so a radical appraisal is perhaps overdue; as others who are wiser than me have said before. We should, at the very least, stop and think what we really want from our assessment; despite the longstanding assessment for learning/assessment of learning debate, I still don’t think we really know.
You’ll note that I am using the word “we” in the previous paragraph. That’s deliberate, because I am including the whole assessment community in this (researchers and practitioners); I am certainly not just talking about my own University. I feel the need to explain that point because the rapporteur at the EDEN Research Workshop managed to rather misunderstand my paper and so to criticise the Open University’s current assessment practice as being the same as it was 25 years ago. It is my fault entirely for not making it clearer who I am and what I was trying to say; because I am basically a practitioner, and proud of it, I suffer quite a lot from people not appreciating the amount of reading and thinking that I have done. The rapporteur was absolutely right to be critical; that’s what the role is about and I am very grateful to him for making me review my standpoint. It is also true – as I say frequently – that we all, including those of us at the Open University, should learn from others. However, I’d ask whether any distance learning provider does much better.
There is a related point, relating to the extent to which change should be evolutionary or revolutionary. It is simply not true that Open University assessment practice is the same as it was 25 years ago: 25 years ago, our tuition was all face to face (we now make extensive use of synchronous and asynchronous online tools); our tutor-marked assignments were submitted through the post; our use of computer-marked assessment was limited to multiple-choice questions with responses recorded with a pencil on machine-readable forms (no instantaneous, graduated and targeted feedback; no constructed response questions and certaintly no short-answer free text questions); we made considerably less use of end-of-module assignments, oral assessment, assessment of collaborative activity, peer assessment. Things have changed quite a lot! However, the fundamental structures and many of the policies remain the same. Our students seem happy with what we do, but nevertheless perhaps it is time for change. Perhaps that’s true of other universities too!
For those readers who are not native English speakers, I need to explain the title of this post. It refers to something that should be a minor factor becoming the important factor in decision making. In this case, I am exploring whether we are sometimes driven by a desire to use technology for the sake of doing so. Although I think that technology has a huge amount to offer (in my context, on student learning), surely the most important thing is doing the best we can for our students.
The example I’d like to give is the electronic management of assessment (i.e. the electronic submission, marking and return of tutor-marked assignments). It is something on which the Open University is ahead of others, and we have already faced up to some of the issues (e.g. staff resistance) that others are just encountering. However I am anxious that a requirement to submit and mark work electronically sometimes affects the way students have to produce their work, and the way tutors are required to mark it – not necessarily for the better.
I’m a physicist and so our assignments frequently require students to make extensive use of symbolic notation and graphs etc. It is important that students learn to lay out their answers correctly, and for the less technically savvy, it can be easier to do this in handwritten work. I accept that producing high quality work electronically is perhaps a skill that our students should have, and I don’t want to be a dinosaur, so let me concentrate on marking and commenting.
On handwritten work, our tutors are used to commenting on and correcting work at exactly the place where an error is made e.g. correcting an equation, showing what is wrong with layout, redrawing a graph. Tablet devices enable a similar approach when marking electronically. But not all tutors have these devices and they are sometimes fiddly to use, and some tutors are left attempting to mark using just a word-processing package. This can lead to a rather different style of commenting, making rather more general comments. It is possible that this different style is ‘better’. However my point is that the way in which we give feedback is being driven by technology. The tail is wagging the dog.
The ‘interesting’ title of this post relates to the joint Variety in Chemistry Education/Physics Higher Education Conference that I was on my way home from a week ago. Apologies for my delay in posting, but since then I have celebrated my birthday, visited my elderly in-laws, moved into new Mon-Fri accommodation, joined a new choir, celebrated Richard’s and my 33rd wedding anniversary – and passed the viva for my PhD by publication, with two typos to correct and one minor ‘point of clarification’. It has been an amazing week!
The conference was pretty good too. It was held at the University of Durham whose Physics Department (and, obviously, Cathedral) is much as it was when I graduated more than 36 years ago. However most of the sessions were held in the new and shiny Calman Learning Centre (with the unnervingly named Arnold Wolfendale Lecture Theatre, since I remember Professor Wolfendale very well from undergraduate days). There were lots more chemists than physicists, I don’t really know why, and lots of young enthusiastic university teaching fellows. Great!
Sessions that stood out for me include the two inspirational keynotes and both of the workshops that I attended, plus many conversations with old and new friends. The first keynote was given by Simon Lancaster from UEA and its title was ‘Questioning the lecture’. He started by telling us not to take notes on paper, but instead to get onto social media. I did, though I find it difficult to listen and to post meaningful tweets at the same time. Is that my age? However I agree with a huge amount of what Simon said, in particular that we should cut out lots of the content that we currently teach.
Antje Kohnle’s keynote on the second day had a very different style. Antje is from the University of St Andrews and she was talking about the development of simulations to make it easier for students to visualise some of the conterintuitive concepts in quantum mechanics. The resource that has been developed is excellent, but the important point that Antje emphasised is the need to develop resources such as this iteratively, making use of feedback from students. Absolutely!
The two workshops that I so much enjoyed were (1) ‘Fostering learning improvements in physics’, a thoughtful reflection, led by Judy Hardy and Ross Galloway from the University of Edinburgh, on the implications of the FLIP Project; and do (2) the interestingly named (from a student comment) ‘I don’t know much about physics, but I do know buses’ led by Peter Sneddon at the University of Glasgow, looking at questions designed to test students’ estimation skills and their confidence in estimation.
The quality of the presentations was excellent, bearing in mind that some people were essentially enthusiastic teachers whilst others were further advanced in their understanding of educational research. I raised the issue of correlation not implying causality at one stage, but immediately wished that I hadn’t. I think that, by and large, the interventions that were being described are ‘good things’ and of course it is almost impossibly difficult to prove that it is your intervention that has resulted in the improvement that you see.
In sessions and informal discussion with colleagues, the topics that kept stricking me were (1) the importance of student confidence; (2) reasons for underperformance (by several measures) of female students. We are already planning a workshop for next year!
Oh yes, and Durham’s hills have got hillier…