Gender differences on force concept inventory

Posted on July 4th, 2015 at 10:59 am by Sally Jordan

Hot on the heals of my last post, reporting on work which did not provide support for the previous finding that men and women perform differentially on different types of assessed tasks, I bring you a very interesting finding from work done at the University of Hull (with Ross Galloway from the University of Edinburgh, and me). David Sands from Hull and two of his students came to the OU on Thursday and gave a presentation which is to be repeated at  the GIREP-EPEC conference in Poland next week.

We are seeking to investigate whether findings from use of the well-established force concept inventory (FCI) (Hestenes et al, 1992), are replicated when the questions are asked as free text rather than multiple choice questions. Free text versions of the questions have been trialed at Hull and Edinburgh, and the next step is to attempt to write automatically marked versions of these.

However, the interesting finding for now is that whilst in general students perform in a similar way on the free text and multiple choice version of the FCI, there are some variations in the detail. In particular, whilst men outperform women in the MCQ version of the FCI  (Bates etc al., 2013) it seems that the gender difference may be reduced or even reversed with the free text version. We don’t have enough responses yet to be sure, but watch this space!

Bates, S., Donnelly, R., MacPhee, C., Sands, D., Birch, M., & Walet, N. R. (2013). Gender differences in conceptual understanding of Newtonian mechanics: a UK cross-institution comparison. European Journal of Physics, 34(2), 421-434

Hestenes, D., Wells, M., & Swackhamer, G. (1992). Force concept inventory. The Physics Teacher, 30(3), 141-158.

More on the gender differences on our level 2 physics module

Posted on June 27th, 2015 at 2:28 pm by Sally Jordan

I’m returning to the topic raised here. To summarise, significantly fewer women than men study our level 2 (FHEQ L5) physics module S207 and, more worryingly, those who do are significantly less likely to complete it, and those who complete it are less likely to pass…It’s a depressing situation and we have been trying to find out more about what is going on. We don’t have definite answers yet, but we do have some pointers – and we are hoping that if we can begin to address the issues we will be able to improve the outcomes for all students on S207 (both men and women).

In my previous post I explained that women do less well on almost all interactive computer-marked assessment (iCMA) questions, but the amount by which they do less well varies from question to question. This does not appear to depend on question type.

Next, let’s consider the S207 exam. The exam has three parts with (a) multiple-choice questions; (b) short-answer questions; (c) longer questions. Students are expected to attempt all questions in part (a) and part (b), whilst in part (c) they should attempt three questions from a choice of 7 (one on each of the main books in the module).

Let’s start by considering performance on each of the three parts of the exam (all the data are for the 2013-14 presentation). The average score for men and women for each of the three parts are shown in the figure below (blue = men; pink = women, with my apologies for any offence caused by my sexist stereotyping on colour, but I’m sticking with it because it is obvious).

s207-data-13j-at-exam-median-a-b-c

 

 

 

 

 

 

 

So, women do less well on multiple-choice questions, as you would have been expecting if you’ve read the literature…but they also do less well on short-answer and long-answer questions (though do note the fact that the error bars overlap)…Hmmm.

Things get much more interesting if we consider how many men and women choose to answer each of the longer questions in part (c):

s207-data-13j-at-exam-number-attempted

 

 

 

 

 

 

 

So relatively fewer women are choosing to answer the first two questions; relatively more are choosing the answer the others. And how well do they do on each question? See below:

s207-data-13j-at-exam-median-q20-q26

 

 

 

 

 

 

 

So, all questions are not equal. Men and women appear to prefer different questions and to perform differently on different questions. And we have also seen that we are more likely to loose students when they are studying the materials that are assessed in the first two exam questions. So it looks as if women do even worse on some parts in the module than others. What we don’t know yet is whether this is as a result of the topic in question (in this case Newtonian mechanics) or that they are less good at problem solving, less familiar with the abstract types of questions that we are asking, or whether less structured long questions put them off (c.f. questions where they are given more tips). We also need to do more work to test some hypotheses that might explain some of these factors e.g. that whilst women are as likely as men to have A levels, they may be less likely to have A level physics or maths. Our investigation is far from over.

Reflections on AHEC 2: Assessment transparency

Posted on June 27th, 2015 at 1:17 pm by Sally Jordan

mangleI should start by saying that Tim Hunt’s summary of last week’s Assessment in Higher Education Conference is excellent, so I don’t know why I’m bothering! Seriously, we went to some different sessions, in particular Tim went to many more sessions on feedback than I did, so do take a look at his blog posting.

Moving on to “Assessment transparency”. I’m picking up here on one of the themes that Tim also alludes to, the extend to which our students do, or don’t, understand what is required of them in assessed tasks. The fact that students don’t understand what we expect them to do is one of the findings I reported on in my presentation “Formative thresholded evaluation : Reflections on the evaluation of a faculty-wide change in assessment practice” which is on Slideshare here. Similar issues were raised in the presentation I attended immediately beforehand (by Anke Buttner and entitled “Charting the assessment landscape: Preliminary evaluations of an assessment map”). This is not complicated stuff we’re talking about – not anything as sophisticated as having a shared understanding of the purpose of assessment (though that would be nice!).

It might seem obvious that  we want students to know what they have to do in assessment tasks, but there is actually a paradox in all of this. To quote Tim Hunt’s description of a point in Jo-Anne Baird’s final keynote: “if assessment is too transparent it encourages pathological teaching to the test. This is probably where most school assessment is right now, and it is exacerbated by the excessive ways school exams are made hight stakes, for the student, the teacher and the school. Too much transparency (and risk averseness) in setting assessment can lead to exams that are too predicable, hence students can get a good mark by studying just those things that are likely to be on the exam. This damages validity, and more importantly damages education.”. Suddenly things don’t seem quite so straightforward.

Reflections on AHEC 1: remembering that students are individuals

Posted on June 25th, 2015 at 9:05 pm by Sally Jordan

I’ve been at the 5th Assessment in Higher Education Conference, now truly international, and a superb conference. As in 2013, the conference was at Maple House in Birmingham.  With 200 delegates we filled the venue completely, but it was a deliberate decision to use the same venue and to keep the conference relatively small. As the conference goes from strength to strength we will need to review that decision again for 2017, but a small and friendly conference has a lot to commend it. We had some ‘big names’, with Masterclasses from David Boud, David Carless, Tansy Jessop and Margaret Price, and keynotes from Maddalena Taras and Jo-Anne Baird. There were also practitioners from a variety of backgrounds and with varying knowledge of assessment literature.

For various reasons I attended some talks that only attracted small audiences, but I learnt a lot from these. One talk that had a lot of resonance with my own experience was Robert Prince’s presentation on “Placement for Access and a fair chance of success in South African Higher Education institutions”.  Robert talked about the different educational success of students of different ethnicity, both at School and at South African HE institutions. The differences are really shocking. They are seeking to address the situation at school level, but Robert rightly recognises that universities also need to be able to respond appropriately to students from different backgrounds, perhaps allowing the qualification to be completed over a longer period of time.

Robert went on to talk about the ‘National Benchmark Tests (NBT)’ Project, which has produced tests of academic literacy, quantitative literacy and mathematics. The really scary, though sadly predictable, finding is that the National Benchmark Tests are extremely good at predicting outcome. But the hope is that the tests can be used to direct students to extended or flexible programmes of study.

In my mind, Robert’s talk sits alongside Gwyneth Hughes’s talk on ipsative assessment i.e. assessing the progress of an individual student (looking for ‘value added’). Gwyneth talked about ways in which ipsative assessment (with a grade for progress) might be combined with conventional summative assessment, but that for me is the problem area. If we are assessing someone’s progress and they have just not progressed far enough I’m not convinced it is helpful to use the same assessment as for students who are flying high.

But the important thing is that we are looking at the needs of individual students rather than teaching  and assessing a phantom average student.

Special edition of Open Learning

Posted on May 27th, 2015 at 9:42 pm by Sally Jordan

If you are interested in assessment in a distance learning context, you may be interested to know about a special edition of the journal ‘Open Learning’, with the theme of assessment. Click on the link to see the call for papers.

Open Learning The Journal of Open Distance and eLearning CFP for Assessment Special Issue March 2015

Performance on interactive computer-marked questions – gender differences

Posted on March 15th, 2015 at 8:47 pm by Sally Jordan

We have become aware of a significant difference in outcome for male and female students on our level 2 physics module; around 25% of the students on the module are women, and they are both less likely to complete the module and less likely to pass if they survive to the end. This effect is not present in our level 1 Science Module or for other scientific disciplines apart from astronomy at Level 2; and the women who get through to Level 3 do better than men.

Many theories have been proposed as to the reason for the effect, which may be related to persistent gender differences in performance on the force concept inventory – see for example Bates et al. (2013). I proposed that the effect might have been related to the assessment we use; there is evidence (e.g. Gipps & Murphy, 1994; Hazel et al., 1997) that girls are less happy with multiple-choice questions.

One of the things we have done is looked at performance differences on each interactive computer-marked assignment question. The results are summarised in the figure below (click on it to see the detail).

s207-13j-score-first-attempt-mean-stderr 

 

 

 

 

Points to note are as follows:

Women score less well than men on most questions, but the effect is no greater than for tutor-marked assignment questions.

The gender difference is much greater for some questions than others; but the questions with a large differences are not all questions of one type. So multiple-choice is not to blame! It appears more likely that the issue is with what the questions are assessing; there is some indication that our female students are less good at complex, abstract, problem-solving type questions.

The gender difference is much less for the final iCMA. My hypothesis is that the usual reasons for women doing less well are counter-balanced by the fact that women are more persistent; they are more likely to attempt this iCMA whilst men are more likely to reckon they have reached the required threshold on 5 out of 7 iCMAs, and so not to bother.

More work is required before I can be confident of this analysis; it is an interesting and extremely important investigation.

References

Bates, S., Donnelly, R., MacPhee, C., Sands, D., Birch, M., & Walet, N. R. (2013). Gender differences in conceptual understanding of Newtonian mechanics: a UK cross-institution comparison. European Journal of Physics, 34(2), 421-434.

Gipps, C. V. & Murphy, P. (1994). A fair test? Assessment, achievement and equity. Buckingham: Open University Press.

Hazel, E., Logan, P., & Gallagher, P. (1997). Equitable assessment of students in physics: importance of gender and language background. International Journal of Science Education, 19(4), 381-392.

 

 

Pixelated assessment

Posted on January 17th, 2015 at 5:27 pm by Sally Jordan

I’m indebted to the colleague who told me about Cees van der Vleuten’s keynote at the EARLI Assessment SIG Conference in Madrid last August (http://www.earli-sig1-conference.org/cees-van-der-vleuten.php). I should perhaps point out that I am reporting third hand, so I may have got it all wrong. All that I can claim is that I am reporting on my own reaction to what I think my colleague said about what she thinks Professor van der Vleuten said…

I understand that he was talking about the assessment of professional competence, which is very important. The point that really grabbed my attention though was that since we need professionals to be able to do their job day after day, in a reliable fashion, ‘one off’ assessment, at the end of a programme of study isn’t really appropriate. Of course, one off assessment is always open to challenge – you will do less well if you have a headache on the day of the exam; you will do better if you happen to have revised the ‘right’ things. But there has been something of a backlash against continuous assessment recently, most obviously in the renewed emphasis placed on exams at the expense of coursework in UK schools (courtesy of governmental policy). Perhaps with more justification, some argue that you should assess outcomes at the end of a module rather than progress towards those outcomes and I have argued (e.g. here) that summative continuous assessment can lead to confusion over its purpose (is it formative or summative; is it for learning or of learning?).

Professor van der Vleuten’s keynote suggested that we should use ‘little and often’ continuous assessment that is very low stakes, perhaps with the stakes increasing as the module progresses – so that a student’s overall assessment record builds up slowly, in the same way that pixels build up to make a picture. Pixelated assessment. Nice!

untitledpixelmona

Authentic assessment vs authentication of identity

Posted on December 2nd, 2014 at 5:43 pm by Sally Jordan

This is a post on which I would particularly welcome comments. I am aware of the issues but distinctly lacking in solutions.

A couple of years ago I posted (here) about the fact that there are a range of skills (e.g. practical work) that are difficult to assess authentically by examination. So, in general terms, the answer is easy; we should assess these skills in better, more authentic ways. So we should be making less use of examations…

exam hallBut in our distance-learning environment, we have a problem. At some stage in a qualification, we really ought to check that the student we think we are assessing is actually the person doing the work. Examinations provide us with a mechanism for doing this; student identity can be checked in the good old-fashioned way (by photo ID etc.). In conventional environments, student identity can be verified for a range of other assessed tasks too, but that is much more difficult when we simply do not meet our students. At the Open University, exams are just about the only occasion when our students are required to be physically present in a particular place (and for students for whom this is not possible, the invigilator goes to them). So we should be making more use of examinations…

As in so many of the topics I post about in this Blog, there is a tension. What’s the way forward?

Here are a few of my thoughts, and those from some colleagues. We could:

1. review what we do in “examinations” to make the assessed tasks more authentic;

2. make greater use of open book exams;

3. tell students the questions in advance, and allow notes into an examination hall;

4. is there a technical solution? If we truly crack the issue of secure home exams at scale, then the assessed tasks could perhaps be longer and more open ended, with a remote invigilator just looking in from time to time;

5. Are there any other technical solutions?

6. moving away from examinations in the conventional sense, our Masters programmes sometimes require students to turn up for an assessment ‘Poster Day’. We have had some success in replicating this in a synchronous online environment.

7. we could have an examinable component that requires a student to reflect on collaborative work in forums. The student’s tutor could then check that the student has posted what they say they have posted throughout the presentation of the module.

8. Option (6) is essentially a viva. We could extend this approach by requiring every student (or a certain percentage) to have a conversation with their tutor or a module team member (by phone or Skype etc.) about their progress through the module/qualification.

We would be extremely grateful for comments and other ideas.

Effective feedback

Posted on November 26th, 2014 at 10:18 pm by Sally Jordan

“As long as we hold this image of feedback as being something that one person gives to another person to educate that person, we’ve missed the ultimate point of the feedback system…”

Sound familiar? How about

“Feedback as a concept (or the thing that happens when you talk into your microphone too close to the speaker) is simply information that goes into a system (and comes back at you with a high-pitched squeal). What happens next is where things get interesting – the postfeedback learning, which is the point of feedback in the first place.”

However it may surprise you to hear that these quotes are not from a book on assessment, but rather from “Changing on the job: Developing leaders for a complex world” by Jennifer Garvey Berger. I’ve been on a leadership course at work for much of 2014 and I’ve been thinking a lot about the concepts, especially the challenging issue academic leadership. Just how do you get the best out of clever people? The quotes highlight some extremely interesting similarities with what I have been banging on about for years, in this blog and elsewhere.

 The first point of similarity is that it is not the feedback intervention itself that is significant but rather the way in which the person receiving the feedback intervention responds to it. And if the person receiving the feedback intervention is in charge of their own response, so much the better.

However, feedback, purely as information, still needs to happen. In the staff management situation, sadly sometimes people don’t appreciate when there are issues that need to be addressed. So there is a need for a very clear exchange of information. In the case of feedback on assessed tasks, this is one area where e-assessment has huge potential. Computers can give information in a non-judgemental and impersonal way, leaving the interpretation of this information to people.

Dé Onderwijsdagen Pre Conference : ‘Digital testing’

Posted on November 15th, 2014 at 4:11 pm by Sally Jordan

P1000739It has been quite a week. On Wednesday I sat on the edge of my seat in the Berrill Lecture theatre at the UK Open University waiting to see if Rosetta’s lander Philae, complete with the Ptolomy instrumentation developed by, amongst others, colleagues in our Department of Physical Sciences, would make it to the surface of Comet 67P, Churyumov–Gerasimenko. I’m sure that everyone knows by now that it did, and despite the fact that the lander bounced and came to rest in a non-optimal position, some incredible scientific data has been received; so there is lots more for my colleagues to do! Incidentally the photo shows a model of the lander near the entrance to our Robert Hooke building.

Then on Friday, we marked the retirement of Dr John Bolton, who has worked for the Open University for a long time and made huge contributions. In particular, John is one of the few who has really analysed student engagement with interactive computer-marked assessment questions. More on that to follow in a later posting; John has been granted visitor status and we are hoping to continue to work together.

World Trade Center, RotterdamHowever, a week ago I was just reaching Schiphol airport prior to a day in Amsterdam on Sunday and then delivering the keynote presentation at Dé Onderwijsdagen (‘Education Days’) Pre Conference : ‘Digital testing’ at the Beurs World Trade Center in Rotterdam. It was wonderful to be speaking to an audience of about 250 people, all of whom had chosen to come to a meeting about computer-based assessment and its impact on learning. Even more amazing if you consider that the main conference language was Dutch, so these people were all from The Netherlands, a country with a total population about a quarter the size of the UK.

There is some extremely exciting work going on in the Netherlands, with a programme on ‘Testing and Test-Driven Learning’ run by SURF. Before my keynote we heard about the testing of students’ interpretation of radiological images – it was lovely to see the questions close to the images (one of the things I went on to talk about was the importance of good assessment design) – and about ‘the Statistics Factory’, running an adaptive test in a gaming environment. This linked nicely to my finding that students find quiz questions ‘fun’ and that even simple question types can lead to deep learning. Most exciting is the emphasis on learning rather than on the use of technology for the sake of doing so.

I would like to finish this post by highlighting some of the visions/conclusions from my keynote:

1. To assess MOOCS and other large online courses, why don’t we start off by using peer assessment to mark short answer questions. Because of the large student numbers this would lead to accurate marking of a large number of responses, with only minimal input from an expert marker. Then we could use these marked  responses and machine learning to develop Pattern Match type answer matching, to allow automatic marking for subsequent cohorts of students.

2. Instead of sharing completed questions, let’s share the code behind the questions so that users can edit as appropriate. In other words, let’s be completely open.

3. It is vitally important to evaluate the impact of what we do and to develop questions iteratively. And whilst the large student numbers at the UK Open University mean that the use of computer-marked assessment has saved us money, developing high-quality questions does not come cheap.

4. Computer-marked assessment has a huge amount to commend it, but I still don’t see it as a panacea. I still think that there are things (e.g. the marking of essays) that are better done by humans. I still think it is best to use computers to mark and provide instantaneous feedback on relatively simple question types, freeing up human time to help students in the light of improved knowledge of their misunderstandings (from the simple questions) and to mark more sophisticated tasks.

The videos from my keynote and the other presentations are at http://www.deonderwijsdagen.nl/videos-2014/