Short-answer questions : how far can you go?

Finally for today, I’d like to talk about where I believe the limits currently sit in the use of short-answer free-text questions.

I have written questions where the correct response requires three separate concepts. For example, I have written a question which asked how the rock in photograph shown was formed. (Incidentally this is granite, photographed near Lands End in Cornwall, but I’d never say that in a question, otherwise students just Google the answer). A correct answer would talk about the rock being formed from magma (first concept), which has cooled and crystallised (second concept) slowly (slow because it has cooled within the Earth rather than on the surface) (third concept). I haven’t managed to write a successful question with a correct answer that includes more than three separate ideas, but that doesn’t mean to say it can’t be done.

A second consideration in authoring short-answer questions is the number of discrete correct and incorrect responses. Here I think the limit came in another question based on one of my photographs, with thanks to the colleagues shown. We used this photograph in training people to write answer-matching and the question was simply ‘Who is taller?’ That might sound like a very straightforward question (until you get the bright-sparks who say ‘The prettier one’),  but writing a complete set of answer-matching for this question was a time-consuming and non-trivial task. Think about the correct answers: ‘The one on the right’, ‘The one with longer hair’, ‘The one carrying a brochure’, ‘The one wearing glasses’, [not ‘The one holding a glass’]..and so on…and so on.

The third limit is the serious one. Developing answer matching for the questions I’ve talked about was time-consuming but we got there in the end. However the correct answers are unambiguous – it is clear that the woman on the right-hand side of the photograph is taller than the one on the left. However in some subject areas, what is ‘correct is less well defined. I think that’s the real limit for questions of this type.

Posted in short-answer free text questions | Tagged | Leave a comment

Short-answer questions : when humans mark more accurately than computers

Hot on the heals of my previous post, I’d like to make it clear that human markers sometimes do better than computers in marking short-answer [less than 20 word] free-text questions.

 I have found this to be the case in two situations in particular:

  1. where a response includes a correct answer but also an incorrect answer;
  2. where a human marker can ‘read into’ the words used to see that a response shows good understanding, even if it doesn’t use the actual words that you were looking for. Continue reading
Posted in human marking, short-answer free text questions | Tagged , , , | Leave a comment

Short-answer questions : when computers mark more accurately than humans

Back to short-answer free-text questions. One of the startling findings of my work in this area was that computerised marking (whether provided by Intelligent Assessment Technologies’ FreeText Author or OpenMark PMatch) was consistently more accurate and reliable than human markers. At the time, I was genuinely surprised by this finding and so were the human markers concerned (one of whom had commented to me that he didn’t see how a computer could possibly pick up on some of the subtleties of marking). However others, especially those who know more about issues of reliability in the human marking of exams etc., were not at all surprised. And, reflecting on my own marking of batches of responses (where I can start by marking something in one way and find myself marking it differently at the end) and on the fact that I make mistakes in my work (not just when marking – think about the typos in this blog) however hard I try, I can see that human-markers have weaknesses! Continue reading

Posted in human marking, short-answer free text questions | Tagged , , , | Leave a comment

Two more talks

We’ve now had two more talks as part of the OU Institute for Educational Technology’s  ‘Refreshing Assessment’ series. First we had Lester Gilbert from the University of Southampton on ‘Understanding how to make interactive computer-marked assessment questions more reliable and valid: an introduction to test psychometrics’. Then, yesterday, Don Mackenzie from Professional e-Assessment Services (which I think is a University of Derby spin-off) , with the title ‘From trivial pursuit to serious e-assessment: authoring and monitoring quality questions for online delivery’ Continue reading

Posted in assessment design | Tagged , , , | Leave a comment

Making the grades

I’ve been lent a copy of Todd Farley’s book ‘Making the grades: my misadventures in the standardized testing industry’ (published by PoliPointPress in 2009). The blurb on the back of the book says ‘Just as American educators, parents and policymakers reconsider the No Child Left Behind Act and its heavy em

Posted in human marking | Tagged , , , | Comments Off on Making the grades

Bad questions

As part of a ‘Refreshing Assessment’ Project, the Institute for Educational Technology at the Open University is hosting three talks during June. The first of these, last Wednesday, was from Helen Ashton, head of eAssessment at the SCHOLAR programme at Heriot Watt University, with the subject ‘Exploring Assessment Design’. It was a good talk, highlighting many points that I bang on about myself, but sometimes we need to hear things from a different perspective (in this case, from Helen’s experience of authoring questions for use by a wide range of schoolchildren). Continue reading

Posted in multiple-choice questions | Tagged , , , , , | Leave a comment

Random guess scores

As an extension to my daughter Helen’s  iCMA statistics project, random guess scores were calculated for multiple choice, multiple response and drag and drop questions in a number of different situations (e.g. with different numbers of attempts, different scoring algorithms, different numbers of options to select from and different numbers of options being correct, students being told how many options were correct, or not).

The random guess score for a question is essentially the score that you would expect from someone who is completely logical in working though the question but knows absolutely nothing about the subject matter.

Helen’s report is here.

Continue reading

Posted in statistics | Tagged , , | 2 Comments

Fair or equal?

This post returns to ideas from Gipps and Murphy’s book ‘A fair test?’. We use words like ‘equality’, ‘equity’ and ‘equal opportunities’ frequently, in the context of assessment and elsewhere. Gipps and Murphy deliberately talk about ‘equity’ not ‘equal opportunities’ and the UK Government talk about ‘equality’ (the 2010 Equality Act came fully into force in April 2011) – all in an attempt to make their meaning more clear. I used to think I was really clued up on all of this (as a line-manager in the UK Open University, I ask a lot of interview questions relating to equal opportunities – and I was once told that the answer I gave to an interview question of this ilk was the best that the interviewer had ever heard). However, especially in the context of assessment, I’ve come to realise that things aren’t as simple as they might appear… Continue reading

Posted in Uncategorized | Tagged , | Leave a comment

Not like Moses

One of the joys of trying to catch up with others who have been working in the field of assessment for much longer than me is finding books and articles that were written some time ago but which still seem pertinent today. I’d definitely put the following book into this category (and more thoughts from it will follow):

Gipps, C. and Murphy, P. (1994) A fair test? Assessment, achievement and equity. Buckingham: Open University Press.

For now, I’d like to highlight a particularly memorable quote from Gipps and Murphy, originally from the Times Educational Supplement back in November 1988,expressing sceptism about the ‘Code of Fair Testing Practices in Education’ in the USA. As a former Assistant Secretary of Education put it:

If all the maxims are followed I have no doubt the overall quotient of goodness and virtue should be raised. Like Moses, the test makers have laid down ten commandments they hope everyone will obey. This doesn’t work very well in religion – adultery continues.

So I’d like to emphasise that my ‘top tips’ in the previous post are not commandments! – apart perhaps from my final tip (monitoring the questions when in use) which I think ought to be made compulsory.

In general though, although the ‘top tips’ have worked well for me, and I hope that these are ideas that others might find useful, perhaps it is more important that question authors take responsibility for the quality of their own work, rather than mindlessly following ‘rules’ written by others. This wish reflects most of my practice, in writing e-assessment questions and in everything else so, for example, I far prefer helping people to write questions in workshops (when they are writing questions ‘for real’) than providing rules for them to follow. Sadly, I think a wish to improve the quality of our e-assessment may be leading to a more dictatorial approach – I’m not convinced it will work.

Posted in assessment design, quality, question writing | Tagged , , | Leave a comment

Top tips

I’ve recently been asked for my ‘top tips’ for writing interactive computer-marking assignment (iCMA) questions. I thought I might as well nail my colours to the mast and post them here too:

•  Before you start writing iCMA questions, think about what it is appropriate to assess in this way – and what it isn’t.

•  Think about what types of iCMA question are most appropriate for what you want to assess. Don’t assume that multiple-choice and multiple-response questions are more reliable than free-text entry questions – they aren’t!

•  Write multiple variants of your questions – this enables you to use the same basic template in writing questions for multiple purposes, reduces opportunities for plagiarism and gives students extra opportunities for practice. However, in summative use, make the variants of similar difficulty.

•  For multiple response questions (where students have to select a number of correct options), tell students how many options are required (otherwise students get very frustrated).

•  Check carefully that each question is unambiguous. Does it use language that all students should understand? If you want an answer in its simplest possible form, is this clear?

•  Think carefully about what you will accept as a correct answer. Do you want to accept miss-spellings (e.g. ‘sulphur’ instead of ‘sulfur’), surplus text etc. If in doubt, have surplus text at the end of a response removed before the response is checked. Students are not happy if their response is marked wrong because, for example, they have indicated the precision of an answer by typing ‘to 3 significant figures’ at the end of a perfectly correct numerical answer.

•  Wherever possible, give feedback that is tailored to the error that a student has made. (Students get very annoyed when they are given general feedback that assumes they don’t know where to start, where to their mind they have made a ‘small’ error late in the process, e.g. given incorrect units).

•  If a response is partially correct, tell the student that this is the case (preferably telling them what is right and what is wrong).

•  Check your questions carefully at each stage, and – especially important – get someone else to check them.

•  Monitor ‘real’ use of your questions and look at student responses to them. Check that your variants are of sufficiently similar difficulty. Be prepared to make improvements at this stage.

Posted in e-assessment, question writing | Tagged , | 1 Comment