I am grateful to Carol Bailey (see previous post) who, following a discussion over lunch, sent me a link to an extremely interesting paper:
Vojak, C., Kline, S., Cope, B., McCarthey, S. and Kalantzis, M. (2011) New Spaces and Old Places: An Analysis of Writing Assessment Software. Computers and Composition, 28, 97-111.
This is about computer-based marking of essays. I should start by pointing out that this is not an area that I know a huge amount about – the issues are surprisingly different from those in the computer-based marking of short-answer questions. ‘E-rater’ is perhaps the best-known essay-marking software, but there are lots of others, some of which are discussed in the following review articles:
Dikli, S. (2006) An overview of automated scoring of essays. Journal of Technology, Learning and Assessment, 5(1).
Valenti, S. & Neri, F. & Cucchiarelli, A. (2003). An overview of current research on automated essay grading. Journal of Information Technology Education, 2, 319-330.
Some of the systems mark for content, some mark for essay-writing style, and some mark for both. As I understand it, when marking for content, this is in some senses easier to do than when you are marking an essay than when you are marking a shorter answer. In a short answer things like word order and negation can be extremely important (see my previous post on this here). However, in longer answers, if all the (probably large number) of keywords or their synonyms are there, then the essay-writer is likely to have right idea.
I don’t pretend to begin to understand the technologies used to mark writing style. I’m sure that they are very clever and they appear to have good success rates i.e. the things that they measure are good proxies for a the things that human markers of essays look for.
However Vojak et al (2003) raise an extremely important question: should we be asking our students to write for a computer rather than for a human marker? This question has philosophical overtones, and again I rapidly get out of my depth. It is an interesting point, given that writing is essentially a social activity.
More pragmatically, there are cases where people have succeeded in fooling an essay-marking system by submitting a gibberish essay and getting a good mark. Now, one of the things I learnt early in my e-assessment career (from my ‘mentor’ Phil Butcher) was that students don’t try to fool the system – this is something that academics do! Over the years I have found Phil’s advice on this point to be absolutely right. However, if word were to get out that a class’s essays were being marked on the strength of, say, the number of commas they contained, I, guess, I, wouldn’t, be, very, happy.
Returning to the more philosophical, I’d also feel uneasy if I felt that my carefully crafted words were being ‘read’ by a computer not a human. Despite the fact that I blog primarily to keep my own thoughts in order, as I write I do feel that I am trying to convince some flesh-and-bones reader of what I am trying to say, and to engage in debate. Is it odd that I feel like that, given that I am very happy to mark our students’ sentence-long answers automatically, and to give them computer-generated feedback. To get out of the logical hole I’m digging myself into, I’d argue that the marking and feedback on our short-answer questions is actually done by me, the question author. I set up the rules and wrote the feedback interventions iteratively, in the light of careful inspection of hundreds and sometimes thousands of real student responses.