Back to short-answer free-text questions. One of the startling findings of my work in this area was that computerised marking (whether provided by Intelligent Assessment Technologies’ FreeText Author or OpenMark PMatch) was consistently more accurate and reliable than human markers. At the time, I was genuinely surprised by this finding and so were the human markers concerned (one of whom had commented to me that he didn’t see how a computer could possibly pick up on some of the subtleties of marking). However others, especially those who know more about issues of reliability in the human marking of exams etc., were not at all surprised. And, reflecting on my own marking of batches of responses (where I can start by marking something in one way and find myself marking it differently at the end) and on the fact that I make mistakes in my work (not just when marking – think about the typos in this blog) however hard I try, I can see that human-markers have weaknesses!
Human markers will always be less consistent than a computer, both in terms of inter-rater reliability and internal consistency. Different people may mark a borderline response differently for legitimate reasons (someone may decide to give the student the benefit of the doubt, but someone else may not) and the provision of partial credit for partially correct responses can help here. However there are responses that one marker considers completely correct that another considers completely incorrect. This may be because one marker has more subject knowledge than other (so can spot either that a student has answered in a way that is different from that given in the mark scheme, but is still completely right, or that a student has used the right words, but their answer is complete nonsense). But, however great their subject knowledge, human markers sometimes make mistakes, marking a correct response and incorrect and vice versa. In our initial ‘human-computer marking comparison’ I remember a case where the human marker had read the word ‘distance’ as ‘direction’. I do that sort of thing all the time.
Don’t get me wrong – I don’t want to do away with human markers. Like many others in the UK, I am uneasy with the idea of using a computer to mark essays (using just keywords for content and various proxies for style). I may be old-fashioned in this; I may change my mind in the future. However I feel that longer pieces of writing are better marked by someone who actually understands the words. I also feel that human markers are best placed to give helpful feedback in these circumstances. So I advocate using computers to accurately and reliably mark relatively straightforward responses (which – I have shown – they can do better than human markers), freeing up human marking time for some sophisticated marking and for supporting students in other ways.