Keys to transforming assessment at institutional level: selected debates from AHE2016

Posted on September 11th, 2016 at 9:07 am by Sally Jordan

Hot on the heals of my post about Sue Bloxham’s keynote at the Assessment in Higher Education workshop in June, this post is about the follow-up  Transforming Assessment webinar “Keys to transforming assessment at institutional level: selected debates from AHE2016.

Three talks from the AHE workshop had been selected for the webinar on the basis of the fact that they really did focus on change at the institutional level, and I thoroughly enjoyed chairing the session. If you want to watch the whole thing, click here for more information and the recordings.

The first of the three talks that we’d selected was “Changing feedback practice at an institutional level” in which Sally Brown talked about work at the University of Northumbria, Leeds Met (now Beckett) University and Anglia Ruskin University. Kay Sambell had given this talk at the earlier workshop and their conclusions were that

  • Slow transformative development has more impact than attempts at quick fixes;
  • Having money to support activities and personnel is important, but large amounts of cash doesn’t necessarily lead to major long-term impact;
  • Long-term ownership by senior managers is essential for sustainability;
  • To have credibility, activities need to be based on evidence-based scholarship;
  • Committed, passionate and convincing change agents achieve more than top-down directives.

The third of the three talks was “Changing colours: what happens when you make enhancement an imperative?” in which Juliet Williams talked about the impact of the TESTA (Transforming the Experience of Students through Assessment) Project at the University of Winchester.

However, from the conversations that I had at the workshop in June, it was the middle talk (given at the Webinar by Dave Morrison of the University of Plymouth because Amanda Sykes from the University of Glasgow was unavailable) that had inspired many of the attendees – bearing in mind that these were largely assessment practitioners not experts. The title was “Half as much but twice as good” and the important points I picked up were that

  • Timely feedback is more important than detailed feedback
  • [students are as busy as we are so] Less feedback can be more effective. If a student only reads your feedback for 30 seconds, what do you want them to take?

Capture 3

We ended with a good discussion of how to bring true institutional change.

Central Challenges in Transforming Assessment at the Departmental and Institutional Level

Posted on September 10th, 2016 at 9:28 pm by Sally Jordan

Back on 30th June, Assessment in Higher Education (AHE) held a seminar  in Manchester with the theme of “Transforming assessment and feedback in Higher Education on a wider scale: the challenge of change at institutional level”. The idea behind the seminar was partly to hold a smaller-scale event between our increasingly large bienniel conferences, though we had well over 100 attendees.

AHE are now working in collaboration with Transforming Assessment, and we live-streamed Sue Bloxham’s keynote to a further twenty or so people around the world. Then on 13th July we had a dedicated webinar to which three selected presenters from the seminar contributed. My involvement in both of these events meant that I had a double ‘bite at the cherry’. I heard one set of presentations at the seminar itself, then I heard the three presentations on 13th July , and chaired a discussion of overlapping themes. There was some fantastic stuff.

capture 1As I try to catch up with this blog, I’ll start by describing my take on Sue’s keynote. She started from the precept that assessment remains unfit for purpose – and change is slow. She went on to outline what she described as key barriers to assessment enhancement, where the two  barriers that have most resonance with my own experience being:

 

  • centrally imposed change, which produces resistance. Sue’s proposed solution is that we should put the focus for change on small low-level workgroups.
  • the need for assessment literacy for staff. Here the focus must be on adequate professional development.

Sue went on to describe a  framework which might be drawn upon to create the conditions for transformation at institutional or departmental level, based around

  • key principles
  • infrastructure
  • strategy
  • assessment literacy.

Capture 2

It was inspirational; now we just need to make the change happen. Don’t take my word for it though; you can watch the recording of Sue’s keynote  here.

 

Fishy feedback

Posted on May 1st, 2016 at 1:32 pm by Sally Jordan

I don’t think that I am ever going to win prizes for my artwork. However this post is about my picture of a fish, shown below:

fish

A month or so ago, I was invited to take part in a live event about assessment and feedback, and for some reason the three of us involved decided to base this around an ‘assessed task’ which was to draw a fish.

There were a few problems – the first one was that we hadn’t been told the criteria against which our drawings would be judged. So my colleague John drew fish and chips. Even though I did better than John did, I was unaware of the detailed criteria.

So, when our work was ‘marked’ none of us did very well. The feedback given related to efforts as marked against the criteria – which we hadn’t seen. So I was praised for my colourful fish (I just happened to have a red pen in my hand) but downgraded because I had not shown any water. So, despite the fact that my marks were better than those of my two colleagues, I was angry to start with – I hadn’t been told to draw a fish in water!  My emotions stopped me from listening to the rest of the feedback, which may or may not have been justified.

Finally, the mark scheme gave credit for  the presence of different sorts of fins and gills. I have no idea whether my drawing, or the other colleague’s one that I was marking, included these points or not. My knowledge of fish anatomy is somewhat limited!

So, very quickly (and not as planned by the activity) we have a list of important points for assessing and giving feedback.

1. Make sure your assessment criteria are clear – and shared with students.

2. Receiving feedback is an emotional experience; bear that in mind.

3. Make sure that students understand the language that you use in your feedback.

 

The importance of neutral results

Posted on May 1st, 2016 at 11:28 am by Sally Jordan

This is the third posting in this morning’s trilogy about research methods, and this one was prompted by an article in this month’s issue of Physics World :

Ball, P. (May 2016), No result, no problem? Physics World, 29(5), 38-41.

Ball (quoting from others, in particular Brian Nosek of the University of Virginia) points out that ‘positive’ results are far more likely to be published that neutral or negative results. He starts by reminding us of Michelson and Morley’s famous ‘null’ result of 1887, in which there was no discernible difference in the speed of light passing in different directions through “the ether”. The failure to observe the expected result went unexplained for nearly two decades until Einstein’s special theory of relativity showed that the ether was not required in order to understand the properties of light.

Coming back to the more mundane, who wants to see endless papers that report on results that didn’t happen? The literature is already sufficiently obese. Ball points out that in some fields there are specific null-result journals. Or surely, such results could just be published in the blogosphere or on preprint servers. Another possibility is linked to the suggestion that objectives of experiments should be declared in registered reports before the data are collected – see https://osf.io/8mpji/wiki/home/. This would “improve research design, whilst also focusing research initiatives on conducting the best possible experiments rather than getting the most beautiful possible results.”

Whatever, the results do need to be out there. Not everyone is going to have a result as significant as Michelson and Morley’s, but plain honesty – and a wish to stop others wasting their time in carrying out the same experiment, believing it to be new – means that all results should be shared. This should not be seen as a waste of time, but rather an example of what Ball describes as “good, efficient scientific method”.

I’d like to take this slightly further. I have encountered educational researchers who refuse to publish a result unless it is statistically significant. To return to my starting point of this morning, I’m a numerate scientist, I like statistically significant results…but I have seen some most unfortunate consequences of this insistence on  ‘significance’ including (and no, I’m not joking) a self-justification of claiming that results are ‘significant’ at some arbitrary level e.g. 7%…PLEASE, just give your results as they are. Don’t tweak your methodology to make the results fit. Don’t claim what you shouldn’t. Recognise that appropriate qualitative research methodologies have a place alongside appropriate quantitative research methodologies – and be honest.

The unscientific method

Posted on May 1st, 2016 at 10:39 am by Sally Jordan

The title of this post is copied from another New Scientist article, this time by Sonia van Gilder Cooke, and published in Issue number 3069 (16th April 2016) on pages 39-41. The article starts “Listening to When I’m Sixty-Four by The Beatles can make you younger. This miraculous effect, dubbed ‘chronological rejuvenation’ was revealed in the journal Pyschological Science in 2011. It wasn’t a hoax, but you’d be right to be suspicious. The aim was to show how easy it is to generate statistical evidence for pretty much anything, simply by picking and choosing methods and data in ways that researchers do every day.”

The article is wider ranging than the one that I’ve just posted about here. However, what is most worrying is that it goes on to point out that dubious results are alarmingly common in many fields of science. The summary of causes of bias includes some things that I suspect I have been guilty of:

  • Wishful thinking – unconsciously biasing methods to confirm your hypothesis
  • Sneaky stats – using the statistical analysis that best supports your hypothesis
  • Burying evidence – not sharing research data so that results can be scrutinised
  • Rewriting history – inventing a new hypothesis in order to explain unexpected results
  • Tidying up – ignoring inconvenient data points and analyses in the write-up

I will discuss one cause that isn’t explicitly mentioned in the summary, namely our wish to only publish ‘positive’ results, in my next post in this morning’s trilogy:

The article goes on to suggest a number of fixes:

  • Pre-registration – publicly declaring procedures before doing a study
  • Blindfolding – deciding on a data analysis method before the data are collected
  • Sharing – making methods and data transparent and available to others
  • Collaboration – working with others to increase the rigour of experiments
  • Statistical education – acquiring the tools required to assess data meaningfully

 

 

Simpson’s paradox

Posted on May 1st, 2016 at 10:10 am by Sally Jordan

Back in November, I posted about the fact that I was going to be more bullish about the fact that I am a physicist but that I do educational research. As I try to build my confidence to say some of things that follow from that in my own voice, I’ll start by quoting some more articles I have read in the past few months.

To start with, there was a piece in New Scientist back in February (Issue number 3062, 27th February 2016, pg 35-37), by Michael Brooks and entitled “Thinking 2.0″. This article starts by pointing out Newton’s genius in recognising the hidden variable (gravity) that connects a falling apple and the rising sun. He goes on to explain that “we know that correlation does not equal causation, but we don’t grasp the depth of it” – and to point out that our sloppy understanding of  statistics can lead us into deep water.

Brooks gives a powerful hypothetical example of Simpson’s paradox, defined by Wikipedia as a paradox “in which a trend appears in different groups of data but disappears or reverses when these groups are combined” (the Wikipedia article gives some more examples and is worth reading). The example in the New Scientist article is about a clinical trial involving 400 men and 400 women that apparently shows that a new drug is effective in treating an illness – for both the men and the women. However, if you look at the 800 participants as a whole, it becomes apparent that more of those who were NOT given the drug recovered than those who received the drug. How so? Well, although the sample was nicely balanced between men and women, and half of the participants received the drug whilst half didn’t, it turns out that far more men were given the drug in this particular study, and men are much more likely to recover, whether or not they receive the drug. The men’s higher overall recovery rate masked the drug’s negative effect. This is a hypothetical example, and in a structured environment such as a clinical trial, such potential pitfalls can generally be circumnavigated. But medical – and educational – research often operates in what Brooks rightly describes as muddy waters. Controls may not be possible and we can be led astray by irrelevant, confusing or missing data.

Although I was aware of Simpson’s paradox and thought I had a reasonable understanding of ‘lies, damned lies and statistics’ it took me some time to get my head around what is going on here. We need to be really careful.

Do we need assessment at all?

Posted on February 14th, 2016 at 7:58 am by Sally Jordan

I’m surprised I haven’t posted on this before, but it looks as if I haven’t, and I am reminded to do so now by another New Scientist piece, this time from back in January:

Rutkin, A. (2nd Jan 2016) Robotutor is a class act. New Scientist, 3054, p. 22.

The article talks about an algorithm developed by researchers at Stanford University and Google in California which analyses students’ performance on past problems, identifies where they tend to go wrong and forms a picture of their overall knowledge.

Chris Piech from Stanford goes on to say “Our intuition tells us if you pay enough attention to what a student did as they were learning, you wouldn’t need to have them sit down and do a test.”

The first paper I heard suggesting that we might assess students by analysing their engagement with an online learning environment (rather than adding a separate test) was Redecker et al. (2012) and it blew me away.

Redecker, C., Punie, Y., & Ferrari, A. (2012). eAssessment for 21st Century Learning and Skills. In A. Ravenscroft, S. Lindstaedt, C.D. Kloos & D. Hernandez-Leo (Eds.), 21st Century Learning for 21st Century Skills (pp. 292-305). Berlin: Springer.

In reality of course, and as much discussed in this blog, I would never want to do away with interaction with humans, and there are things (e.g. essays, problem solving) where I think marking should be done by human markers. However,  if we can do away with separate tests that are just tests, I’d be delighted.

Positive discrimination?

Posted on February 7th, 2016 at 10:54 am by Sally Jordan

plusThis isn’t really about assessment, or perhaps it is. First of all, some background. Because of a change in the dates used to establish school years where I lived when I was  small, I missed a year at primary school. So, in a sense, I was disadvantaged. But I understand that, up to a certain age, they then gave those of us affected extra marks in exams. I’ve no idea whether that was actually the case. What I do know is that if I felt I’d been given unfair advantage over others  in my more recent career (in particular as a female physicist) I would not be happy.

EqualsMy definition of equality of opportunity has to do with leveling the playing field. I once arrived at a tutorial venue to give a tutorial, having requested a ground floor room because I knew someone in a wheelchair would be there. The problem was that the venue had given me a room in a portacabin up three steps. Only three steps but the effect was the same – the student couldn’t access the tutorial (well, not until I got angry and got us moved to another room). Sometimes apparently small things can get in the way of learning, for some students not for others, and promoting equal opportunity is to do with ensuring that these “small things” are removed. In my book, equality of opportunity is not the same of positive discrimination; I’d give a student extra time in an exam if a medical condition suggested it was necessary; I would not give a student extra marks just by virtue of the medical condition. I’m happy to argue my case for that…or at least I was…

At the Open University we have found that female students do less well on one of our physics modules, and we continue to investigate the causes for this and to seek to put it right. Start here to learn more about this work. However, I’d never have thought of increasing marks just for women or others in minority groups. After all, these are averages, some women and some black students do well, even if their average attainment is lower.

Then, in my catch-up reading of old copies of New Scientist I came across an opinion piece from Joshua Sokol entitled “Mix it up”. This points out, as I know from other sources, that there can be a mismatch between scores in tests and future performance. So if women and blacks do less well in a test, and we use that test to determine entry onto a subsequent programme (in this case an Astronomy PhD) we are both disadvantaging racial minorities and women, and failing to get the best students on the subsequent programme.

By coincidence, I been trying to come to terms with all of this in the week when my Department at the Open University has been awarded Institute of Physics Juno Champion Status for our commitment to gender equality. It’s great news, but it doesn’t mean we have arrived! More thought needed, and I think my conclusion to the conundrum described in this post is probably to be careful not to judge ANYONE on a single measure.

Sokol, Joshua (9th January 2016). Mix it up. New Scientist, number 3055, p. 24.

Feedback from a computer

Posted on February 7th, 2016 at 10:12 am by Sally Jordan

Back in Feb 2011 - gosh that’s five years ago – I was blogging about some contradictory results on how people respond to feedback from a computer. The “computers as social actors” hypothesis contends that people react to feedback from a computer as if it were from a human. In my own work, I found some evidence of that, though I also found evidence that when people don’t agree with the feedback, or perhaps just when they don’t understand it, they are quick to blame the computer as having “got it wrong”.

The other side to this is that computers are objective, and – in theory at least – there is less emotional baggage in dealing with feedback from a computer than in dealing with feedback from a person; you don’t have to deal with the aspect that “my tutor thinks I’m stupid” or, even perhaps even worse for peer feedback, “my peers think I’m stupid”.

I was reminded of this in reading an interesting little piece in this week’s New Scientist. The article is about practising public performance to a vritual audience, and describes a system developed by Charles Hughes at the University of Central Florida. The audience are avatars, deliberately designed to look like cartoon characters. A user who has tried the system says “We all know that’s fake but when you start interacting with it you feel like it’s real” – that’s Computers as Social Actors. However, Charles Hughes goes on to comment “Even if we give feedback from a computer and it is actually came from a human, people buy into it more because they view it as objective”.

Wong, S. (6th Feb 2016). Virtual confidence, New Scientist, number 3059, p. 20

Tails wagging dogs

Posted on January 22nd, 2016 at 10:05 pm by Sally Jordan

Earlier in the week I gave a workshop at another University. I’m not going to say where I was, because it might sound as if I’m cricitising their practice. Actually I’m not crititising them particularly, and indeed the honest, reflective conversation we had was amazing. I think that many of us would be advised to stop and think in the way we all did on Wednesday.

I was running a workshop on the electronic handling of human-marked assignments. I should perhaps mention that I was in a physics department – and I am a physicist too. This is significant, because if we expect students to submit assignments electronically, then they have to produce them electronically – complete with all the symbolic notation and graphs etc that we use. This can be a real challenge. At the Open University we encourage students to write their answers by hand and scan them, but then the quality can be mixed – and plagiarism-checking software doesn’t work. Many OU students chose to input their maths in Word Equation Editor or LaTeX, but it takes them time and they make mistakes – and frequently they don’t show as much working as they should.

Then there’s the problem of marking; how do we put comments on the scripts in a way that’s helpful without it taking an unreasonable amount of time? At the OU, we comment in Word or using PDF annotator, using various hardware like the iPad Pro and the Microsoft Surface. We can make it work reasonably well, and actually some of our tutors now get on quite well with the technology.  As a distance-learning University, we can at least argue that electronic handing of assignments speeds the process up and saves postage costs – and trees!

I’d been asked to run a workshop about what we do at the OU; I think I failed in that regard – they knew as much as I do. However, halfway through, someone commented to the effect that if the best we can do is to mimic handwritten submission and marking, why are we doing this? They’ve been told they have to, so that there’s an audit trail, but is that a good enough reason? They are allowed to make a special case and the mathematicians have done this;  but isn’t it time to stop and think about the policy?

We then started thinking about feedback – there is evidence that audio/video feedback can be more useful than written feedback. So why are we giving written feedback? Indeed, is the feedback we give really worth the time we spend on it? We’re driven to give more and more feedback because we want high scores on the NSS, and students tell us they want more feedback. But is it really useful? I’ve blogged on that before and I expect that I will again, but my general point in this post is that we should stop and think about our practice rather than just looking for solutions.

On a related point, note that JISC have run a big project on the “Electronic Managament of Assessment” – there is more on this here.