Positive discrimination?

Posted on February 7th, 2016 at 10:54 am by Sally Jordan

plusThis isn’t really about assessment, or perhaps it is. First of all, some background. Because of a change in the dates used to establish school years where I lived when I was  small, I missed a year at primary school. So, in a sense, I was disadvantaged. But I understand that, up to a certain age, they then gave those of us affected extra marks in exams. I’ve no idea whether that was actually the case. What I do know is that if I felt I’d been given unfair advantage over others  in my more recent career (in particular as a female physicist) I would not be happy.

EqualsMy definition of equality of opportunity has to do with leveling the playing field. I once arrived at a tutorial venue to give a tutorial, having requested a ground floor room because I knew someone in a wheelchair would be there. The problem was that the venue had given me a room in a portacabin up three steps. Only three steps but the effect was the same – the student couldn’t access the tutorial (well, not until I got angry and got us moved to another room). Sometimes apparently small things can get in the way of learning, for some students not for others, and promoting equal opportunity is to do with ensuring that these “small things” are removed. In my book, equality of opportunity is not the same of positive discrimination; I’d give a student extra time in an exam if a medical condition suggested it was necessary; I would not give a student extra marks just by virtue of the medical condition. I’m happy to argue my case for that…or at least I was…

At the Open University we have found that female students do less well on one of our physics modules, and we continue to investigate the causes for this and to seek to put it right. Start here to learn more about this work. However, I’d never have thought of increasing marks just for women or others in minority groups. After all, these are averages, some women and some black students do well, even if their average attainment is lower.

Then, in my catch-up reading of old copies of New Scientist I came across an opinion piece from Joshua Sokol entitled “Mix it up”. This points out, as I know from other sources, that there can be a mismatch between scores in tests and future performance. So if women and blacks do less well in a test, and we use that test to determine entry onto a subsequent programme (in this case an Astronomy PhD) we are both disadvantaging racial minorities and women, and failing to get the best students on the subsequent programme.

By coincidence, I been trying to come to terms with all of this in the week when my Department at the Open University has been awarded Institute of Physics Juno Champion Status for our commitment to gender equality. It’s great news, but it doesn’t mean we have arrived! More thought needed, and I think my conclusion to the conundrum described in this post is probably to be careful not to judge ANYONE on a single measure.

Sokol, Joshua (9th January 2016). Mix it up. New Scientist, number 3055, p. 24.

Feedback from a computer

Posted on February 7th, 2016 at 10:12 am by Sally Jordan

Back in Feb 2011 - gosh that’s five years ago – I was blogging about some contradictory results on how people respond to feedback from a computer. The “computers as social actors” hypothesis contends that people react to feedback from a computer as if it were from a human. In my own work, I found some evidence of that, though I also found evidence that when people don’t agree with the feedback, or perhaps just when they don’t understand it, they are quick to blame the computer as having “got it wrong”.

The other side to this is that computers are objective, and – in theory at least – there is less emotional baggage in dealing with feedback from a computer than in dealing with feedback from a person; you don’t have to deal with the aspect that “my tutor thinks I’m stupid” or, even perhaps even worse for peer feedback, “my peers think I’m stupid”.

I was reminded of this in reading an interesting little piece in this week’s New Scientist. The article is about practising public performance to a vritual audience, and describes a system developed by Charles Hughes at the University of Central Florida. The audience are avatars, deliberately designed to look like cartoon characters. A user who has tried the system says “We all know that’s fake but when you start interacting with it you feel like it’s real” – that’s Computers as Social Actors. However, Charles Hughes goes on to comment “Even if we give feedback from a computer and it is actually came from a human, people buy into it more because they view it as objective”.

Wong, S. (6th Feb 2016). Virtual confidence, New Scientist, number 3059, p. 20

Tails wagging dogs

Posted on January 22nd, 2016 at 10:05 pm by Sally Jordan

Earlier in the week I gave a workshop at another University. I’m not going to say where I was, because it might sound as if I’m cricitising their practice. Actually I’m not crititising them particularly, and indeed the honest, reflective conversation we had was amazing. I think that many of us would be advised to stop and think in the way we all did on Wednesday.

I was running a workshop on the electronic handling of human-marked assignments. I should perhaps mention that I was in a physics department – and I am a physicist too. This is significant, because if we expect students to submit assignments electronically, then they have to produce them electronically – complete with all the symbolic notation and graphs etc that we use. This can be a real challenge. At the Open University we encourage students to write their answers by hand and scan them, but then the quality can be mixed – and plagiarism-checking software doesn’t work. Many OU students chose to input their maths in Word Equation Editor or LaTeX, but it takes them time and they make mistakes – and frequently they don’t show as much working as they should.

Then there’s the problem of marking; how do we put comments on the scripts in a way that’s helpful without it taking an unreasonable amount of time? At the OU, we comment in Word or using PDF annotator, using various hardware like the iPad Pro and the Microsoft Surface. We can make it work reasonably well, and actually some of our tutors now get on quite well with the technology.  As a distance-learning University, we can at least argue that electronic handing of assignments speeds the process up and saves postage costs – and trees!

I’d been asked to run a workshop about what we do at the OU; I think I failed in that regard – they knew as much as I do. However, halfway through, someone commented to the effect that if the best we can do is to mimic handwritten submission and marking, why are we doing this? They’ve been told they have to, so that there’s an audit trail, but is that a good enough reason? They are allowed to make a special case and the mathematicians have done this;  but isn’t it time to stop and think about the policy?

We then started thinking about feedback – there is evidence that audio/video feedback can be more useful than written feedback. So why are we giving written feedback? Indeed, is the feedback we give really worth the time we spend on it? We’re driven to give more and more feedback because we want high scores on the NSS, and students tell us they want more feedback. But is it really useful? I’ve blogged on that before and I expect that I will again, but my general point in this post is that we should stop and think about our practice rather than just looking for solutions.

On a related point, note that JISC have run a big project on the “Electronic Managament of Assessment” – there is more on this here.

 

What do I really think about learning analytics?

Posted on January 6th, 2016 at 9:07 pm by Sally Jordan

There have been two very good webinars on learning analytics recently in the Transforming Assessment series. On 9th Sept 2015, Cath Ellis from the University of New South Wales and Rachel Forsyth from Manchester Metropolitan University spoke on “What can we do with assessment analytics?”. On 9th December 2015, Gregor Kennedy, Linda Corrin and Paula de Barba from the University of Melbourne spoke on “Providing meaningful learning analytics to teachers: a tool to complete the loop” . I would heartily recommend both recordings to you.

Having said that, talk of  learning analytics is suddenly everywhere. If we take Doug Clow’s (2013, p. 683) short definition of learning analytics as “the analysis and representation of data about learners in order to improve learning” then it is beyond argument that this is something we should be doing. And I agree that student engagement in assessment is something that must be included in the analysis, hence “assessment analytics”. It could be argued that I’ve been using assessment analytics since before the term was invented, though my approach has been a bit of a cottage industry; one of the recent changes is that learning analytics now (rightly) tends to be available at the whole institution level.

There seems to be some dispute about (i) whether the terms learning and assessment analytics apply to analysis at the cohort level as well as analysis at the individual student level and (ii) as to whether it is legitimate to include analysis done retrospectively to learn more about learning. I would include all of these; all are important if we are to improve the student experience and their chances of success.

So far so good. However, learning analytics has become fashionable; that in itself is perhaps a cause for some anxiety. It is all too easy to leap on board the bandwagon without giving the matter sufficient thought. I know that many mainstream academics (i.e. those who probably don’t read blogs like this one) are deeply uneasy about the approach. This is partly because the information given to academics is sometimes blindingly obvious…e.g. telling us that students who do not engage at all with module materials are not very likely to pass. Some of us have been banging on about this for years. I am also anxious that the analytics are sometimes simplistic, equating clicking on an activity with engagement with it, something I’ve posted about before.

So, what’s the way forward? I think learning analytics has to cease being the preserve of the few and become something that ordinary lecturers use in their teaching as a matter of routine; if this is to happen they need to trust and understand the data and to take ownership of it. Furthermore, the emphasis needs to change from the giving of data to the real use of data. When learning analytics (at the individual or the cohort level, and in real time or retrospectively) reveals a problem, let’s do something about it.

Clow, D. (2013). An overview of learning analytics. Teaching in Higher Education, 18(6), 683-695.

Can multiple-choice questions be used to give useful feedback?

Posted on January 3rd, 2016 at 1:56 pm by Sally Jordan

I was asked the answer to this question recently, and I thought it was worth a blog post. My simple answer to the question in the title, I’m afraid to say, is “no”. Perhaps that’s a bit unfair, but I think that relying on MCQs to provide meaningful feedback is somewhat short-sighted; surely we can do better.

My argument goes thus: a question author provides feedback which they believe to be meaningful on each of the distractors, but that assumes that the student was using the same logic as the question author in reaching that distractor. In reality, students may guess the answer, or work backwards from the distractors or something in between e.g. they may rule out distractors they know to be wrong and then guess from amongst the rest. Feedback is only effective when the student encounters it in a receptive frame of mind (timing and concepts such as response certitude come into play here); if the student has given a response for one reason and the feedback assumes a different logic then the feedback is, at best, of dubious value. It is also the case that there is growing evidence that when given the option to give responses without the ‘hint’ provided by MCQs, students give answers that were not amongst those provided in the distractors.

It is no secret that I am not a fan of selected response questions, though my views have mellowed slightly over the years. My biggest problem with them is the lack of authenticity. However if that is not an issue for the use being made, and the questions are well written, based on responses that students are known to give (rather than those that ‘experts’ assume students will give), then perhaps MCQs are OK. Even relatively simple multiple-choice questions can create “moments of contingency” (Black & Wiliam, 2009; Dermo & Carpenter, 2011) and Draper’s (2009) concept of catalytic assessment is based on the use of selected-response questions to trigger subsequent deep learning without direct teacher involvement. However I think the usefulness here is in making students think, not in the direct provision of feedback.

There are other things that can be done to improve the usefulness of multiple-choice questions e.g. certainty-based marking (Gardner-Medwin, 2006). However, when there are so many better question types, why not use them? For example, there are the free-text questions – with feedback – used by the free language courses at https://www.duolingo.com/. I’m not sure what technology they are using, but I think it is linked to crowd-sourcing, which I definitely see as the way ahead for developing automatic marking and feedback on short-answer constructed response questions.

Let’s make 2016 the year in which we really look at the evidence and improve the quality of what we do in the name of computer-marked assessment and computer-generated feedback. Please.

References

Black, P. & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment, Evaluation and Accountability, 21(1), 5-31.

Dermo, J. & Carpenter, L. (2011). e-Assessment for learning: Can online selected response questions really provide useful formative feedback? In Proceedings of the 2011 International Computer Assisted Assessment (CAA) Conference, Southampton, 5th-6th July 2011.

Draper, S. (2009a). Catalytic assessment: Understanding how MCQs and EVS can foster deep learning. British Journal of Educational Technology, 40(2), 285-293.

Gardner-Medwin, A. R. (2006). Confidence-based marking: Towards deeper learning and better exams. In C. Bryan & K. Clegg (Eds.), Innovative Assessment in Higher Education (pp. 141-149). London: Routledge.

 

Researching engagement with assessment, as a physicist

Posted on November 21st, 2015 at 1:35 pm by Sally Jordan

I have not posted as much as I might have wished recently, and when I have, I’ve tended to start with a grovelling apologies on the grounds of lack of time because of my head of department duties. I sometimes also hesitate to post because of a lack of confidence: I’m not really an expert; what grounds do I have to be so opinionated. However, following my  seminar in our own Department of Physical Science’s Seminar Series at the Open University on Thursday (for which the slides – I hope – are at SEJ DPS Seminar Nov 2015) I have decided that it is time to take a more robust attitude. OK, I’m unusual to be a physicist, let alone the head of a Department of Physical Sciences, doing pedagogic research. But that’s what I am; that’s who I am. The point is that I am researching learning, but I am doing so as a numerate scientist. I’m going to stop apologising for the fact and I might even stop moaning about the resultant difficulty that I sometimes have in getting papers published. I am not a social scientist, I’m a physicist.

So what does that mean? It means that I try to use scientific methodology; I listen to student opinion because it is important, but I also look for hard data. I don’t say that one thing causes another unless there is evidence that it does. Furthermore – and scientists sometimes fall down here too – I report my findings even when they don’t show what I was expecting. Well, that’s my aspiration. As frequently happens, I was slightly worried by some of the comments following my talk on Thursday – people say “ah yes, we have found such and such”. Have they REALLY found this, or is it what they think might be happening? Hypotheses are important but they need testing. Even more worryingly, I’m writing a paper at the moment and it is very tempting to ignore findings that don’t support the story I want to tell. Please don’t let me do that. Please give me the courage to stand my ground and to report the truth, the whole truth and nothing but the truth.

I have just realised that I don’t seem to have posted about the talk that Tim Hunt and I gave at the Assessment in Higher Education Conference in the summer on “I wish I could believe you: the frustrating unreliability of some assessment research”. I will rectify that as soon as possible (…remember, I’m a head of department…) but in the meantime, our slides are on slideshare here.

The multiple limitations of assessment criteria

Posted on November 8th, 2015 at 6:31 pm by Sally Jordan

Sadly, I don’t get as much time as I used to in which to think about assessment. So last Wednesday was a particular joy. First thing in the morning I participated in a fantastic webinar that marked the start of a brand new collaboration between two initiatives that are close to my heart – Transforming Assessment (who run a webinar series that I have been following for a long time) and Assessment in Higher Education (whose International Conferences I have helped to organise for 4 years or so). Then I spent most of the afternoon in a workshop discussing peer review. The workshop was good too, and I will post about it when time permits. For now, I’d like to talk about that webinar.

header 1

The speaker was Sue Bloxham, Emeritus Professor at the University of Cumbria and the founding Chair of the Assessment in Higher Education Conference. It was thus entirely fitting that Sue gave this webinar and, despite never having used the technology before, she did a brilliant job – lots of good ideas but also lots of discussion. Well done Sue!

Capture 2

Assessment criteria are designed to make the processes and judgement of assessment more transparent to staff and students and to reduce the arbitrariness of staff decisions. The aim of the webinar was to draw on research to explore the use of assessment criteria by experienced markers and discuss the implications for fairness, standards and guidance to students.

Sue talked about the evidence of poor reliability and consistency of standards amongst those assessing complex performance at higher education level, and suggested some reasons for this, including different understanding, different interpretation of criteria, ‘marking habits’ and ignoring or choosing not to use criteria.

Sue then described a study, joint with colleagues from the ASKe Pedagogical research centre at Oxford Brookes University, which had sought to  investigate the consistency of standards between examiners within and between disciplines. 24 experienced examiners from 4 disciplines & 20 diverse UK universities were employed and each considered 5 borderline (2i/2ii or B/C) examples of typical assignments for the discipline.

The headline finding was that overall agreement on a mark by assessors appears to mask considerable variability in individual criteria. The difference in the historians’ appraisal of individual constructs was further investigated and five potential reasons were identified that link judgement about specific elements of assignments to potential variation in grading:

  • Using different criteria from those published
  • Assessors have different understanding of shared criteria
  • Assessors have a different sense of appropriate standards for each criterion
  • The constructs/criteria are complex in themselves, even comprising various sub-criteria which are hidden to view
  • Assessors value and weight criteria differently in their judgements

Sue led us into a discussion of the implications of all of this. Should we recognise the impossibility of giving a “right” mark for complex assessments? (for what it’s worth, my personal response to this question is “yes” – but we should still do everything in our power to be as consistent as possible). Sue also discussed the possibility of ‘flipping’ the assessment cycle, with much more discussion pre assessment and sharing the nature of professional judgement with students. Yes, yes, yes!

If I have a complaint about the webinar it is purely that some of the participants took a slightly holier than thou approach, assuming that the results from the study Sue described were as a result of poor assessment tasks or insufficiently detailed criteria (Sue explained that she didn’t think more detailed criteria would help, and I agree) or examiners who were below par in some sense. Oh dear, oh dear, how I wanted to tell those people to carry out a study like this in their own context. Moderation helps, but those who assume high level consistency are only deluding themselves.

While we are on the subject of the subjective nature of assessment, don’t take my word for the high quality of this webinar, watch it yourself at http://ta.vu/4N2015

So what is assessment really about?

Posted on October 4th, 2015 at 5:06 pm by Sally Jordan

P1030194I’ve just returned home from Barcelona, where I was visiting the eLearn Center at the Universitat Oberto de Catalunya (UOC), the Open University of Catalonia. UOC has an “educational model” which is similar to that used at the UK Open University, though they are not “open” in the same sense (they have entry qualifications) and they are an entirely online university. Overall I was extremely impressed (and Barcelona was quite nice too…).

Partly as a result of my discussions in Barcelona and partly just as a result of taking a break from the usual routine, I have been reflecting on what we do in the name of assessment. It’s so easy to make assessment an “add on” at the end of a module (and if you have an exam, I guess it is exactly that). But even if you are then using that assessment as  assessment of learning, are you really assessing what you hope that your students have learnt (i.e. your learning outcomes), or are you assessing something altogether different? And if assessment is assessment for learning, is it really driving learning in the way you hope?

At least some courses at UOC make good use of collaborative assessment and surely, in principle at least, a solution  is to assess all of the actual activities that you expect your students to do i.e. to put assessment at the centre. In an online environment it should be possible to assess the way in which students actually engage with the materials and collaborate with their peers. However, in my experience, practice is often somewhat different.  At the very least, if you have an activity where students work together to produce an output of some sort, it makes sense to assess that output not a different one, even if you then have to enter the murky world of assessing an individual contribution to a group project.

So where does that leave all my work on sophisticated online computer-marked assessment? I still think it can still be very useful as a means of delivering instantaneous, unbiased and targeted feedback interventions and as a way of motivating students and helping them to pace their studies. But that’s about learning not assessment…I need to think about this some more. Perhaps assessment is a bit like quantum mechanics, the more you think you understand it, the more problematic it becomes…

 

Gender differences on force concept inventory

Posted on July 4th, 2015 at 10:59 am by Sally Jordan

Hot on the heals of my last post, reporting on work which did not provide support for the previous finding that men and women perform differentially on different types of assessed tasks, I bring you a very interesting finding from work done at the University of Hull (with Ross Galloway from the University of Edinburgh, and me). David Sands from Hull and two of his students came to the OU on Thursday and gave a presentation which is to be repeated at  the GIREP-EPEC conference in Poland next week.

We are seeking to investigate whether findings from use of the well-established force concept inventory (FCI) (Hestenes et al, 1992), are replicated when the questions are asked as free text rather than multiple choice questions. Free text versions of the questions have been trialed at Hull and Edinburgh, and the next step is to attempt to write automatically marked versions of these.

However, the interesting finding for now is that whilst in general students perform in a similar way on the free text and multiple choice version of the FCI, there are some variations in the detail. In particular, whilst men outperform women in the MCQ version of the FCI  (Bates etc al., 2013) it seems that the gender difference may be reduced or even reversed with the free text version. We don’t have enough responses yet to be sure, but watch this space!

Bates, S., Donnelly, R., MacPhee, C., Sands, D., Birch, M., & Walet, N. R. (2013). Gender differences in conceptual understanding of Newtonian mechanics: a UK cross-institution comparison. European Journal of Physics, 34(2), 421-434

Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30(3), 141-158.

More on the gender differences on our level 2 physics module

Posted on June 27th, 2015 at 2:28 pm by Sally Jordan

I’m returning to the topic raised here. To summarise, significantly fewer women than men study our level 2 (FHEQ L5) physics module S207 and, more worryingly, those who do are significantly less likely to complete it, and those who complete it are less likely to pass…It’s a depressing situation and we have been trying to find out more about what is going on. We don’t have definite answers yet, but we do have some pointers – and we are hoping that if we can begin to address the issues we will be able to improve the outcomes for all students on S207 (both men and women).

In my previous post I explained that women do less well on almost all interactive computer-marked assessment (iCMA) questions, but the amount by which they do less well varies from question to question. This does not appear to depend on question type.

Next, let’s consider the S207 exam. The exam has three parts with (a) multiple-choice questions; (b) short-answer questions; (c) longer questions. Students are expected to attempt all questions in part (a) and part (b), whilst in part (c) they should attempt three questions from a choice of 7 (one on each of the main books in the module).

Let’s start by considering performance on each of the three parts of the exam (all the data are for the 2013-14 presentation). The average score for men and women for each of the three parts are shown in the figure below (blue = men; pink = women, with my apologies for any offence caused by my sexist stereotyping on colour, but I’m sticking with it because it is obvious).

s207-data-13j-at-exam-median-a-b-c

 

 

 

 

 

 

 

So, women do less well on multiple-choice questions, as you would have been expecting if you’ve read the literature…but they also do less well on short-answer and long-answer questions (though do note the fact that the error bars overlap)…Hmmm.

Things get much more interesting if we consider how many men and women choose to answer each of the longer questions in part (c):

s207-data-13j-at-exam-number-attempted

 

 

 

 

 

 

 

So relatively fewer women are choosing to answer the first two questions; relatively more are choosing the answer the others. And how well do they do on each question? See below:

s207-data-13j-at-exam-median-q20-q26

 

 

 

 

 

 

 

So, all questions are not equal. Men and women appear to prefer different questions and to perform differently on different questions. And we have also seen that we are more likely to loose students when they are studying the materials that are assessed in the first two exam questions. So it looks as if women do even worse on some parts in the module than others. What we don’t know yet is whether this is as a result of the topic in question (in this case Newtonian mechanics) or that they are less good at problem solving, less familiar with the abstract types of questions that we are asking, or whether less structured long questions put them off (c.f. questions where they are given more tips). We also need to do more work to test some hypotheses that might explain some of these factors e.g. that whilst women are as likely as men to have A levels, they may be less likely to have A level physics or maths. Our investigation is far from over.