Skip to content
Syndicate content

Institute of Educational Technology > Feed aggregator > Categories > Recent blogs from IET staff

Recent blogs from IET staff

Textbook Heroes at CNX2014

Dr Beck Pitt's blog - Tue, 15/04/2014 - 14:22

Originally posted on :

“Textbook Hero” (Photo Credit: Beck Pitt)

Two weeks ago I was participating in CNX2014 at Rice University, Houston. Although I was mainly tweeting from the conference, I did take a few notes which I’ve reproduced below. These notes focus only on Monday’s sessions so, for a more detailed and varied account of what happened, check out the conference hashtag (#cnx2014) for lots of interesting participant reflections.

Connexions/OpenStax College Updates 

Connexions (CNX) is now 15 years old! And whilst the conference did look back at the conception of CNX and some of its achievements over the past decade or so, there were exciting announcements to come … First, there was the update from Daniel Williamson on last year’s OpenStax College (OSC) statistics on adoption and cost savings (see the great video from 2013 here, which was showcased at that event; Daniel had produced a cool updated version for this year.) OSC now has…

View original 1,509 more words

Hits & thoughts ain't evidence

Professor Martin Weller's Blog - Wed, 09/04/2014 - 10:28

This week we've been populating the impact map for the OER Research Hub. The impact map ( has been developed largely by Rob Farrow and Martin Hawksey, and features lots of Hawksey-goodness. You can do the following on the map:

So, as well as putting our own evidence in there, we have been trying to add in the research of others that really demonstrates evidence for one of the hypotheses. And this has been an interesting exercise. I have been working through Rory McGreal's excellent resource OER Knowledge Cloud, going through papers and trying to add them in. The problem is very few OER papers actually give anything approaching proper evidence or research. Try it yourself, pick a few papers from the knowledge cloud at random. What you get are project reports about releasing OERs, lots of "lessons learned", a lot of beliefs stated as evidence eg "this will improve retention", quite a lot of download stats, but very little hard evidence that you could point at and say to someone "this supports (or negates) this hypothesis".

In some ways this is understandable - OERs had to be developed in order to do research on OERs. So the early phase of the field will always be partly driven by evangelism and implementation. But we've moved beyond that phase now, after more than 10 years of OERs. The field really needs to up its game in terms of research now and demonstrating impact and evidence. I think all OER projects should have a research strand built in that asks questions such as "what are the expected benefits of this work?", "how will we measure that?", "what happens if these aren't realised?" etc. (Our 11 hypotheses would be a good start for anyone).

I really believe in OERs, and I think in the early stage of their development you just needed to take a leap of faith and develop them. But they have reached a level of maturity now when we can ask tough questions of them, without fear of undermining the whole enterprise. Indeed, I think having such solid research to point to is essential for OERs to make that next push through into mainstream practice.

So if you've got any of this evidence lying around (and I do mean evidence, not something a bloke down the pub told you), please let us have it.

You don't get openness for nothing

Professor Martin Weller's Blog - Wed, 09/04/2014 - 10:27

<Warning, post may be a bit preachy - photo:>

This isn't a post about the financial cost of open education, but rather the reciprocal, moral cost. As I mentioned in my last post, I've been working through a lot of OER publications for the OER Impact map. I've also been reading a lot of MOOC, open access & open scholarship publications for my Battle for Open book.

One thing that surprises and irritates me is the number of such publications that aren't published under an open access licence. It is a tad ironic to say the least when you encounter an article along the lines of "How OERs will transform education" - please pay $24.00 to access the article.

I'm not usually one for the kind of Open Stalinist approach, outing people for not being open enough and dictating exactly how people should be open, I think it's counter-productive, unimaginative and not very pleasant. But on this subject I am a hard-liner.

Now, I think all articles should be open access anyway, but I think if you are doing any research in the field of open education (MOOCs, OA, OER, open data, etc), then as soon as you start doing that research you are morally obliged to publish results open access. I don't care which method (although if Green route, make it easy to find). You only get to do that research (even if you are critical of it) because others have been open. You are therefore beholden to reciprocate in a like manner. If you don't want to, or feel that the journal you are targeting isn't OA, then choose another subject area. Openness is the route that allows you to do that research and it also has value - people will want to read your work because it is about openness. And you don't get that for free - Open access is the price of admission.

Risky Business

Will Woods's blog - Mon, 07/04/2014 - 09:39

In March I attended a visioning workshop held by the recently appointed Pro-Vice-Chancellor of Learning and Teaching, Prof. Belinda Tynan , and attended by 60 of my colleagues. The 60 were recruited through a competition for ideas, and the best ideas won the day, so the event had people from all levels and areas of the Open University which was a refreshing way to bring bright minds together. The workshop discussed where the Open University should be by 2025. The approach we took was designed by a group who work on Future Studies and involved starting at the global and gradually working down to our own turf; In the meantime losing the baggage of the here and now, and also finding ourselves forming a consensus by engaging in cross-fertilized discussions on topics to do with educational futures.

It’s fair to say that I found the workshop empowering and inspiring, it had everything from contemporary performance art to RSA style animation. I’m currently working on the area of “Innovation to Impact” which is very close to my heart and something I’ve been working to try to strengthen within the Open University over the past few years, working alongside Prof. Josie Taylor, the previous Director of IET, who has recently retired and with David Matthewman, the Chief Information Officer at the Open University.

Another supporter of this work has been the Director of Learning and Teaching, Niall Sclater, who has recently left the Open University to pursue new ventures. I raise my cap to Niall for the work he has done in the relatively short time he’s been at the Open University, including the introduction of the Moodle VLE (along with Ross MacKenzie) and the Roadmap Acceleration Programme, and most recently leading the Tuition Strategy work for the OU. I wish him all the best on his latest adventure! – I’m starting to feel like the last man standing in the TEL area.

Coming back to innovation, Ann Kirschner wrote a piece about Innovation in Higher Education a couple of years ago and many similar articles have since followed however I still enjoy reading her article as it appears to be well researched and still a good compass to where innovations are heading. Tony Bates also covered these areas recently in a blog post around a Vision for Learning and Teaching in 2020. We covered many of these and other aspects at the workshop but sticking to the topic of innovation and risk the main thing that rang true for me from the workshop was that we have become very “risk averse” (complacent) at the Open University and there was, among the 60 delegates a very strong sense that we needed to feel able to take some risks and to be more agile (a very overused word) to survive and thrive by 2025.

The “innovation pipeline” is a concept we’ve been considering (how to improve the flow between incubators and central areas, i.e. the journey from prototype to large scale mainstreaming). We want to improve this at the Open University and last year I gave a short presentation to the Learning Systems Advisory Group about that topic. I love the quote that I took from Ron Tolido, the CTO of Amazon, ”@rtolido At Amazon, you must write a business case to stop an innovation proposal, rather than to start one. Silences 90% of nay-sayers”. The Open University is no Amazon of however we do need some of the pioneering spirit…


…in the past week I have also attended an “executive away day” for the Institute of Educational Technology at the OU, organised by the new Director of IET, Patrick McAndrew. Patrick has always been an keen early adopter of technologies and new ideas and he is wanting to making some organisational transformations with IET showing the way. For example, at the away day we went through a micro version of an agile project, we had a scrum, a sprint, another scrum and a velocity check all within one hour in the afternoon of the away day. The project was to develop an induction for new starters and we all took on tasks and worked through them, helping each other out. We have now taken the step to becoming an agile unit.

I have been using an agile approach to some recent developments, in particular for iSpot where I was hoping to start using the agile or lean approach back in 2012 (see my magile post) but only actually achieved any form of agile methodology last year when we started running into trouble and found that we needed to resolve issues with a much tighter timeframe and resorted to frequent (not daily but every other day) scrums and short sprints of three weeks. This worked very well and we were transparent with the project team which kept things ticking over and very quickly (within nine weeks) turned the project around and got it back on track.

I believe that Patrick wants IET to be a leading light for the Open University to become an agile organisation. I fully support him in this and I will be doing my utmost to ensure that we embrace this and to prove that adopting an agile approach does not compromise on the quality of output.

There will be more from me on the L&T vision workshop outputs once they are officially synthesised, endorsed and made available in the public domain.

JiME Reviews April 2014

openmind.ed - Dr Rob Farrow's blog - Tue, 01/04/2014 - 11:12

This is the current list of books for review in the Journal of Interactive Media in Education (JiME) at the moment – if you’re interested in reviewing any of the following then get in touch with me through Twitter or via rob.farrow [at] to let me know which volume you are interested in and some of your reviewer credentials.

Sue Crowley (ed.) (2014). Challenging Professional Learning. Routledge: London and New York.  link

Andrew S. Gibbons (2014).  An Architectural Approach to Instructional Design.  Routledge: London and New York. link

Wanda Hurren & Erika Hasebe-Ludt (eds.) (2014). Contemplating Curriculum – Genealogies, Times, Places. Routledge: London and New York.  link

Phyllis Jones (ed.) (2014).  Bringing Insider Perspectives into Inclusive Learner Teaching – Potentials and challenges for educational professionals. Routledge: London and New York. link

Marilyn Leask & Norbert Pachler (eds.) (2014).  Learning to Teach Using ICT in the Secondary School – A companion to school experience.  Routledge: London and New York. link

Ka Ho Mok & Kar Ming Yu (eds.) (2014).  Internationalization of Higher Education in East Asia – Trends of student mobility and impact on education governance. Routledge: London and New York.  link

Peter Newby (2014). Research Methods for Education (2nd ed.). Routledge: Abingdon. link

OpenStax College Survey Results (Part I)

Dr Beck Pitt's blog - Mon, 31/03/2014 - 01:20

Originally posted on :

CNX2013 at Rice University (Picture Credit: Beck Pitt CC-BY)

Exciting times! I’m currently in Houston and about to head on over to Rice University for CNX2014 tomorrow. I was lucky enough to attend last year’s conference and am returning this year to present some of our research findings on OpenStax College (OSC) textbooks as part of 1 April’s Rapid Fire panel session Efficacy: Are they Learning? with Denise Domizi (UGA) and John Hilton III (BYU). For more info on the conference check out the schedule here.

Over the past year or so I’ve been working with Daniel Williamson of OSC/Connexions to conduct research into the impact of OSC textbooks on both educators and students. To date, we’ve run questionnaires with both user groups and I’ve also interviewed a small number of educators about their use of the textbooks. Work is ongoing and I’m currently focused on creating case studies and looking for…

View original 2,353 more words

The AccessForAll 3.0 PNP Specification and its potential in Learning Analytics for Disabled Students

Martyn Cooper's blog - Fri, 28/03/2014 - 17:57


The development of a match set of resource metadata and student functional profiles was recognised in the late 1990s as the best way of achieving automated personalisation for accessibility in any Web based system but particularly an Virtual Learning Environment (VLE) or Learning Management System (LMS).  In 2000 and accessibility working group was established within IMS Global Inc. to work on this.  I was a member of that group and have continued to be associated with this work.  In September 2013 IMS published a public draft of the specification of AccessForAll 3.0 stating:

“The Accessibility project group has released a public draft of IMS Access for All v3.0. The public draft is provided so that implementors have the opportunity to begin work and provide comments before production of the final specification.”

This blog post outlines the specification of part of this, the part that relates to user profiles known in IMS as PNP (Personal Needs and Preferences).  Note – no details are given here of the corresponding resource metadata specification known as the DRD (Digital Resource Description).  For authoritative details please refer to the IMS publications available at:


Overview of the IMS AccessForAll 3.0 PNP Specification

To quote from the primer to the specification:

Guiding Themes

The themes that guided the development of AfA 3.0 are described here:

  • Simplicity and ease of understanding are crucial to adoption;
  •  Easy modifiability will support changing requirements and the needs of organizations that require some parts of the model but not all, or require some parts of the model but use a different set of properties for other parts of it. Here, the work takes a knowledge-oriented approach, which will in future versions support Semantic Web technologies, as further explained in the Best Practices and Implementation Guide;
  • Easy integration with other metadata should be possible;
  • Integration with standards for device properties will, in future versions, permit matching to device properties as well as user requirements.
  • The standard should properly relate user agents, accessibility APIs and producer-oriented accessibility standards;
  • Widespread adoption within accepted accessibility frameworks and tools will promote wider impact.


The long-term goal of IMS AfA3.0 is to produce a sustainable, general and effective model integrated with the other standards efforts. It has to work across environments, device contexts and organizations; it is not limited to the educational domain. It aims to support accessibility through an “architecture” for personalization. It needs to be recognized, however, that at this point, the work is prototypical and intended to support progression towards such a model.


The Data Elements

The specification is for a “core” specification, but this is designed to be extensible where this is needed for specific implementations.  Using extensions to the core will impact on interoperability with other system components or external systems; thus it is an important design decision.

The Core Profile of the IMS Global AfA PNP v3.0 specification is defined in Table 4.1. [Taken from IMS Global Access for All (AfA) Core Profiles, Version 3.0 Specification, Public Draft 1.0]


Table 4.1 Core Profile for an AfA PNP v3.0 instance. ID Element/Attribute Name Spec M’plicity Profile M’plicity Profile Commentary Root accessForAllUser 1 Each AfA PNP v3.0 instance file will contain one and only one accessForAllUser. 1         accessModeRequired 0..* (1.1)                 existingAccessMode 1   The following vocabulary terms are not permitted: itemSize, olfactory, orientation and position. (1.2)                 adaptationRequest 1   2         adaptationTypeRequired 0..* (2.1)                 existingAccessMode 1   The following vocabulary terms are not permitted: itemSize, olfactory, orientation and position. 2.2                 adaptationRequest 1 3         atInteroperable 0..1 4         educationalComplexityOfAdaptation 0..1 5         hazardAvoidance 0..* 6         inputRequirements 0..1 7         languageOfAdaptation 0..* 8         languageOfInterface 0..* -9-         adaptationDetailRequired 0..* Prohibited   -9.1-                 existingAccessMode 1 Prohibited   -9.2-                 adaptationRequest 1 Prohibited   -10-         adaptationMediaRequired 0..* Prohibited   -10.1-                 existingAccessMode 1 Prohibited   -10.2-                 adaptationRequest 1 Prohibited   -11-         educationalLevelOfAdaptation 0..* Prohibited   -12-         extensions 0..* Prohibited Extensions not supported.

Figure 1: data elements in AccessForAll 3.0 PNP Core Profile


AfA3.0 Statements – a functional model of disability

Now in the existing data at the OU we can use in Learning Analytics we have , which is a binary that just indicates that the student has, or has not, declared a disability.  Then there is .  There are 12 categories of disability used to collect data for national reporting to the Higher Education Statistical Agency (HESA) which constitute the values of ; these are:

  1. Sight
  2. Hearing
  3. Mobility
  4. Manual skills
  5. Speech
  6. Specific learning difficulty e.g. dyslexia
  7. Mental health
  8. Personal care
  9. Fatigue/pain
  10. Other
  11. Unseen disability e.g. diabetes, epilepsy, asthma
  12. Autistic Spectrum Disorder


The AccessForAll 3.0 specification by contrast allows machine-readable statements like the following to be stored in a user profile database:

  •          Student ‘x’ has a Access Mode Requirement of textual alternatives to graphics;
    [c.f. has a sight impairment as would be stated in the above categorisation]


In other words AccessForAll 3.0 allows a functional model rather than a medical model of disabilities to be constructed.  This is very important for the Accessibility use case for Learning Analytics and will enable refinements in the Support use case.


Applications in Learning Analytics

Previous blog posts on Learning Analytics form a useful introduction to this section – see:


What is desirable in the envisaged Learning Analytics applications is a user profile that can describe how each individual user interacts with their computer (or other device) and the Web environment.  This will enable the Accessibility use case to be implemented meaningfully.  This follows from the fact that if users with similar profiles encounter access problems with the same learning activity (object) then there is a high degree of probability that it has accessibility deficits.

It will also be of benefit in the Support use case as users with similar needs can be targeted with specific support for them.  E.g. before screen reader users undertake a particular activity they could be directed towards a help page.

Now is AccessForAll 3.0, in its core specification sufficient for this?  Probably not, extensions will be required to give the level of detailed information sort but these extensions are allowed in the specification.  However, the specification allows for such extensions and their devising although application specific could be fed back into later versions of AccessForAll as recognised extensions that other applications could use and thus be interoperable.

 UML Diagram of a Learning Analytics System Deploying AccessForAll

Figure 2: UML diagram of a future e-learning system incorporating Learning Analytics and Personalisation.


The above UML diagram shows a future system integrating a Virtual Learning Environment (VLE), Analytics Engine, Personalisation Engine.  The functional user database at the bottom of the diagram could be based on a data model itself based on AccessForAll 3.0 (with or without extensions).

In summary, when a student requests a learning resource from the VLE the Personalisation Engine determines what format of the resource best suits that user based on the data stored in the functional user database. The Analytics Engine records all the student’s interactions with the VLE and the resultant system responses.  In fact the only elements on the above systems diagram not already in place are the Personalisation Engine and the Functional User Database towards the bottom of the diagram.


So AccessForAll 3.0 holds enormous promise for being able to implement Learning Analytics approaches, for both Support and Accessibility use cases, that incorporate detailed user profiles and has many advantaged over the medical model based classification .  Now it is time to do some implementation and validate the approach with end users and other stakeholders.


— end —


LAK14 Fri am (9): Keynote and other things

Dr Doug Clow's blog - Fri, 28/03/2014 - 16:17

Liveblog from the second full day LAK14 - Friday morning session.

Keynote: Scott Klemmer  - Design at Large

Abelardo introduces Scott.

Scott thanks everyone for inviting him. Interested in learning analytics and online learning, is a johnny-come-laterly here. Has been following my liveblog! (Like all the best people.) His field is design. Really exciting time. Most powerful dynamic is the large number of people excited about making stuff – Arduinos, robots, programming – so many people interested in design.

History of design. Short primer on C20th industrial design. Came from within a few miles of here. Henry Dreyfuss – locomotive design, John Deere tractor, Honeywell thermostat, Bell telephone. WHen inventing the future, prototyping is key.

A contemporary, worked with Boeing, on a mockup prototype of the interior of an airplane. Had passengers board, with real luggage, experience it for the duration of the flight, staff coming through, and so on. That let him debug a bunch of stuff cheaply before they went live.

Dreyfuss’ book, Designing for People. Classic cycle – envision, make, evaluate – iterate the cycle. Any tinkering process is like this. Lots of recent advances in design thinking is about better envisioning, prototyping tools. Lot of the work at LAK, analytics, is on the evaluation side, improving our ability to learn from the prototypes we make. [Again nice clean slides with big photos.]

This led Dreyfuss to anthropometrics: the physical size of the dial, handle, steering wheel is dictated about human ergonomics. So can predict to a degree. With telephone, could be important for kids to be able to use; tractor usable with gloves. Helps us make better first guesses.

With online education, opportunity for this community to impact, in a lot of engineering research we invent the future. We are pioneers. Research is like a trip report from the future. Figure out what’s good there, good restaurants, how you live, and you send a postcard back to people who don’t live there yet. Research papers are classic postcards from the future; but also videos, prototypes. Analytics gives you real power when you move from the lab to the wild.

Moved to San Diego, started swimming. Like moving from swimming in a pool. things are regulated, to swimming in the ocean. Thrill, wonder of swimming in the ocean is similar to opportunities in analytics. Some examples of moving to the wild. Then broader implications of this shift generally. There’ll be a second session afterwards for a more in-depth conversation.

Looking to the future, draw ideas from the past. First lesson, physical space. Co-located cluttered studios are hallmarks of design education. Introduced in 1819 in Paris, endured for 200 y. First is the power of shoulder-to-shoulder learning. AS a CS UG at Brown, had one lab where all the computers were. All the programming done on Unix workstations. Got a huge cohort effect from people being colocated. Friend, when they lost a computing cluster, had a huge effect on the cohort experience.

At Stanford, 2005, first class, brought studio model to design class. It’s been great on the whole. One notable aspect: course evaluations strong on the whole, one element not. How fair is the grading – was in the 13th percentile. Traditional arts school background, great artist who gives you the grade they think you deserve, you take it. The engineering students were accustomed to exams with right answers, you could compare and agree the grade is fair. Really helpful in learning. Led me over years to long experiment with self assessment and then peer assessment.

Two punchlines: 1. Peer learning approaches work when they’re integrated well. Baking pedagogy into software is powerful. We do this when we move online; when you hand it to someone else, they get the pedagogical strategies you put in there, so good strategy for technology transfer. 2. Scaffolding structure is critical and overlooked, for both teachers and students. If you ask do these work well, answer is nearly always, they work great when you do this. Innovative things flop because we forget about the scaffolding.

Other universities using these materials. In Fall 2011, taught on a large online class. How are we going to scale projects and peer critique in to online learning? Multi-year project. Lead student Chinmay Kulkarni. System used in 100+ MOO classes. Guinea pig class was my design class. Others have taken it up – Python, philosophy, management, recipes, arguments, music.

Calibrated peer review process. Step 1, practice assessment on a training assignment, get feedback on that. Brings expectations in to alignment with the class, increases inter-rater reliability. Step 2, assess 5 peers, one of which is staff graded, gives ground truth. Then at the end, Step 3, students self-assess.

Extremely powerful to use open-ended assignments, pedagogically valuable. Embrace real world with many paths to success, important to teach. But really challenging, takes huge amount of grading time. Can’t scale. Machine grading advances are good, but can’t do a lot of things you want to, yet.

When administrators of press they think of peer assessment as a tool for scaling grading and making it more efficient. It’s also important for peer learning. Care more about that than the number-generating equation.

An intrinsic paradox of peer processes: asking novices to do things that definitionally requires expertise. They are there to learn things they don’t know about yet. This is where scaffolding comes in. After assignment, have micro-expertise on that area, can do a great job.

Peer grades correlate well with staff grade. Peers, with help, can provide great constructive criticism. Does this scale to global online classes?

Stanford HCI class on Coursera. Learner composition, has a rubric, with scores, and open feedback. Students may it their own, sharing cool things, interfaces. Amazon curated lists, Twitter chat, LinkedIn group for alumni with certificates. Often students come up with better ways of explaining stuff than I could. One benefit of online, see students doing projects far outside the walls. Example of a classroom learning system adopted by the UN. Really cool stuff.

Staff agree with each other with range of 6.7%. About 36% are within 5%, 59% are within 10%, then broad tail. This is adequate for a pass/fail class. There are errors on individual assignments, they tend to cancel them out. Simulations, did students get certificates and should? A small number got lucky, and 24/1000 didn’t get a cert by grading error. On the numerical side, that’s the big problem, the improvement that really matters.

Online classes, students report how much seeing different people tackled the assignments gives them ideas for their own work. Like in a design studio. Realise there are lots of ways of doing things, not just what you thought.

Assessing yourself at the end – when we do it ourself, in a maker mindset. Evaluating others is a different mindset. When we submit a paper, think we’re geniuses, when when we put on our reviewer hat, we’re total curmudgeons. That is probably a good asymmetry. But good to bring that perspective to your own work. Closing that loop, teaches you to be critical of your own work, and to be more forgiving in the work of others.

Found, consistent with the literature – adding more peers gives diminishing returns quickly. Wisdom of crowds works when errors are randomly distributed. [Actually particular random distributions - some distributions don't converge.] If you want to do better, need to improve the feedback loop.

So you can let people know how good their grading is. Adding this feedback reduces this error in subsequent assessments, but they tend to overshoot. So now we give numerical score, not just too high/too low.

More important to focus on the qualitative feedback. Challenging, with diversity of language background. Some feedback is minimal or superficial – ‘great idea’, ‘I can’t read the words in the pics clearly’, ‘solution requirement is vague here but I’m excited to see where you take this in the storyboards’ (??).

Return of novices-as-experts paradox. Going to scaffold that, using fortune cookies. Broadly applicable useful advice. We know from HCI, recognition works better than recall. Reason why graphical interface works, all the things you can do are visible. Show nouns and verbs, you pick what you want. This is the same strategy. So we list a bunch of common errors – like you get as an expert teacher of a class. Common failings, or success patterns. Encode these as fortune cookies. This gets you much more detailed and actionable feedback.

Improving assessment – take a common rubric, but rare to systematically think about whether it’s working. Improvement is ad hoc. We plot variance of the rubric items. Tends to be the case that items yielding high variance in grading are not well written. But ‘element 5c is wonky’ is simple.

Separating orthogonal questions improves agreement. Often orthogonal attributes are combined – e.g. did they upload interesting photos – better to separate in to did they upload, were they interesting. Parallelising helps reduce misalignment – low performance ‘hard to follow’, high performance ‘easy to follow’. Also improved inter-rater reliability. Inch up the quality. The Gaussian overall tightens when you add rubric revisions.

7 habits of highly successful peer assessment.  Assignment-specific rubrics; Iterate before release (pre and during); Assignment-specific training; Self assessment at the end; Staff grades as ground truth (find how well the system is working); Show a single, adaptively-aggregated grade (not all peer grades are good, don’t show all grades – if they see they have one wingnut grader, that reduces confidence); Provide feedback to graders.

Peer learning stategies can give deep feedback, improve motivation, and learning. Biggest challenge is the labour requirement of doing it. In blogosphere about peer assessment in MOOCs, two genres of critique. One is, I took a class and it didn’t work. It’ll be because one of those 7 principles was violated. Even if you follow all 7, the labour involved is a lot. Part of me thinks, welcome to life as a staff member, you have to do the work. Gardening takes time. I do think there are big opportunities to make this more efficient. Want to talk about algorithms to do the busy-work.

Facebook group for class, didn’t know it existed for a year. One community TA follows it, content ins awesome. At the end shared their design work, portfolios. One did a poll on how long it took to complete the assessment.

Machine grading – NYT article about revising SAT essays. You can game a common machine grading algorithm. (Perelman 2013.) Grammar good, but not true. Combine machine and peers for accurate assessment?

Short answer questions are versatile open-ended assessment. Recognition better than recall. However, often in learning we want that deeper, recall-based learning, for real life situations where there aren’t four options hanging in front of you. Also more immune to test-taking strategies.

Machine grading. Auto grading of short answers, using etcML. Not specifically made for us, general framework. Trained with 500 submissions per question. Deep-learning based strategy, fair amount of training data.

Pilot: replace one peer with a machine. Previously took a median of the peers, now include the machine grade. Peers were lenient, 14% higher. Also swayed by eloquence over accuracy. Saw that a lot at Stanford. Unless you know the content, you would think it was a good answer.

Peers assess ambiguous answers better. If the grammar or type of solution isn’t in your training sample, you’re out of luck for machine learning.

Confidence score – in high confidence cases, machine score predicted staff agreement well. So use machines to allocate peer effort, interfaces guide it. Crowd wisdom only works for independent errors, so use this to mitigate. Predict, then asks students not to score assignments, but label attributes. More likely to agree about attributes than evaluation. [Not totally following this]

If system is 90% confident, assign to 1 rater. If 75%90% confident, to 2 raters, and <75% assign it to 3.

First ask, what are the correct or incorrect attributes. [I need to see the example longer to grok this, flashing by far too fast to read]

At the end , students see grades, and what they missed, from feedback from the peers. This more structured job helps create micro-expertise through interface. I really like project-based learning, authentic projects, real challenges. At the same time, a danger is, a bad course has attribute ‘do stuff and show us in 12 weeks’, danger is students don’t build the particular skills they need for a project. Good to combine openness with educational muscle training. Students do what they’re interested in, but tutor helps them work on particular skills. Less available in the online world. With labelling approach, bringing some of that back in.

How does this compare to the other system? Peer-scoring UI, give a score 0-10. Accuracy vs peer effort in seconds. Machine only 28%, but identify-veritfy up to 45%, but at a big cost of time. [Still missing a lot here - also slides hard to read because dim projector and low-contrast colours]

How many people do you need? Verification more useful with more peers.

Does a good job of overcoming biases, especially peer lenience. Identify-verify pattern does better, more targetted. Marginal increase in quality of the grades.

Blueprint for peer learning, using machines to amplify the benefits. After a while, looking at your peers’ work becomes drill-and-kill.

Photo, design school at Stanford, everyone co-located. But online, alone together? Real studio, just know what people are up to. Lose that online. A lot in the news about retention in online classes. A lot of it is silly. Don’t know what the denominator ought to be. It’s like if Amazon made all books free all of a sudden, I’d download 100 books, I’d probably never open half of them, and I’d get 10 words in to another half, and get a curve like the retention curve you get in online classes. Don’t think we should assume that free books is a disaster as a learning medium because the completion rate is 6%. Better question is how much is someone reading, or learning. So not at faction of books completed; analysis not by course but by individual. Media gets this wrong over and over. [I'm not sure I've seen any research or MOOC puff pieces doing it that way either, to be fair.]

At a residential college, social fabric designed to help you flow along. You get up on Tuesday morning, people are going to class, in the evening people are doing homework, in week 10 everyone’s stressed about exams. But online, the exam time is uncorrelated with anything anyone else is doing. You may have a final, but online, people just have their life, so when that collides, you not only have absence of positive reinforcement, you have negative reinforcement of everything else. Use studio as a muse to not replicate those features, but create stuff that plays a similar role. Can also ask, what’s difficult in a studio that’s easier or more powerful online. Teaching to 200 20yo, they do great work, they’re living similar lives. Every year I see a dozen apps about how my dorm mates and I get groceries, find parking, or where’s the cool party on Saturday night. That’s 40% of student design projects. Online, you get things like, project picked up by the UN. You have this diversity, huge opportunity.

Harnessing diversity at scale. Biodiversity, not monoculture. When harness this, algorithms have ideology baked in. Use it to leverage diversity, not stamp it out.

One example, Talkabout. Simple system. A front end to Google Hangouts. Real easy to batch up groups of 4-6 students, encourage them to talk about the class online. Used in 5 classes, results vary extremely widely. Developed it for social psych class, had materials for small group discussions, cultivated over years. So amplify that, shared with the world, worked awesomely. But class on HCI where they do these as an afterthought, not deeply integrated, they were essentially unused. That was me! I should’ve known better.

Re-showing the slide – peer learning requires integration and scaffolding to work well. The quality of that is the biggest predictor of quality.

Where it worked well, it worked really well. In e.g. Iran, Hangouts banned. Said you can also do it in person, and people did – 2000 in person, 2000 online. Went for more than an hour, in more than 100 countries. It was like a mini-UN. Different from the in-person comment – ‘lack of tension and active opposite person’. Worries – echo chamber online – small town America people go to the same town hall, online we all find our niche. But here we’re getting the opposite: more diversity than online.

More diverse groups to mildly better on final performance. I’m not convinced [nor am I on that data - also looks like single-country is best].

Looked at Dreyfuss, human-centred design. Marriage of prototyping to trying things out that are hard to predict, and theory to effectively to make a good guess first, and learn better and more efficiently.

Don Norman and Scott Klemmer, – need more theory, analytics techniques, to expand design beyond chance success. See this in spades in online learning. At Learning @Scale, energy great, smart folks, working on cool stuff. But appalled about the amount of completely post-hoc analysis. That’s a fine starting point, so long as we’re more serious and rigorous next time. A great benefit of online education led by CS, they are good a building stuff. Unfortunately, computer scientists have no training in social sciences, and so lack theory to make good design decisions. That’s where we can come in. This community, through its expertise in using analytics, to drive theory, a big teaching opportunity for online education.

Let’s create and share practical theory – Stokes, Pasteur’s Quadrant. 2×2 – seeking fundamental understanding or not, vs pure basic research/ inspired by use. Huge opportunity to marry social science theory with analytics and huge society-changing opps we see in the learning area.

Kurt Lewin – nothing is as practical as a good theory, best way to understand something is to try to change it.


Alyssa: Was worth getting up at 8am again. The diminishing returns for peer raters, in terms of accuracy of rating. Other side, value of doing the rating, getting a lot out of seeing peer work. Where’s the diminishing returns, how many ratings do I need to do?

Yeah, great question. When we designed the system used by Coursera, made conscious choice to randomly assign peers to assignments, to get a baseline in how well it works. Both from a score-generating and student-learning perspective, random is not the way to go. If goal is get a better score, use crowdsourcing algorithms to intelligently assign raters. But if goal is experiencing studio wall, interesting solution, my hunch is there are two things to share. One is, diversity is good. Don’t want to randomly give 10 of the same thing, probably. Some exceptions. Second, not worth showing people crummy assignments. Probably not fair to allocate a lot of grading resources to that. Some experimental work, if you see your assignment, show something 1 JND better. My design, worth seeing Jony Ive’s design, but also to see what a similar student did that’s a little bit better, that’s useful. Some evidence that’s helpful. Peer Studio.

Josh: Traditional distance education, Marist College, a few MOOCs. Seeing people taking that peer grading in to more traditional distance education, async, applying there

The Open University has done a great job on this for many years. Dan Russel and I ran an event at HICSS conference. They’re kinda grumpy, what’s new in this besides hype and the scale? What’s new is the hype and the scale! Scale enables all sorts of things you can’t do otherwise. The motivator for the system we used came from friend at UCSD for years. That was role model that inspired the online version. This year, took newest version of Peer Studio, used that class as the guinea pig for a newer version. It’s not distance education, we’re really excited about preliminary feedback. Doing flipped classroom stuff. Can have software to mediate preliminary discussion.

Q: Intrigued by graph of benefit of diversity. Wondering, in institutional research, struggle to illustrate benefit of diverse environment in college, how it impacts folks. Using LA to help us with that?

I do think, clearly most modern admissions offices see benefits in all kinds of diversity. That was what led us to run that analysis. The other thing was, there’s a technique called the jigsaw classroom, half a century old. Assign one student to do early years, some middle, some later, each teach their peers about their domain of expertise. Closest to a magic bullet in pedagogy. Everyone does better. Inspired by that rare success, looking at this in TalkAbout to leverage that online. Some of it depends on content. Social psychology is a great fit for diverse discussion group, e.g. Dubai and Indiana is interesting. Benefits will accrue for e.g. a linear algebra class. Kumbaya experience is intrinsically valuable, but content benefit is even greater.

9B. Discourse and Argumentation (Grand 3) Statistical Discourse Analysis of Online Discussions: Informal Cognition, Social Metacognition and Knowledge Creation. Ming Chiu, Nobuko Fujita (Full Paper)

Nobuko speaking.  Ming Chiue is at Buffalo, she’s at U Windsor, but also small business called Problemshift.


We have data from online courses, online forums. Tend to be summary stats, not time, order or sequences. So using Statistical Discourse Analysis, Ming Chiu invented. Not individual note level, but sequence of notes. Group attributes, recency effects.

Also informal cognition leading to formal cognition over time. Informal opinions are easier to access, intuitions rather than abstract theoretical knowledge. Fostering that development over time. Also how experiences ignite theoretical knowledge.

Knowledge-building theory, how students work with new information and made them in to good explanations or theories. Linking ideas from anecdotes in to theories, and elaborating to theorise more.

Corollary is social metacognition. Group members monitoring and control of one another’s knowledge and actions. Most individuals are bad at metacognition, so social is good to take control at group level. Questions indicate knowledge gaps. Disagreement always provokes higher, critical thinking. (?!)

Interested in new information or facts, and how we theorise about them, pushing process of students working with explanations, not just information. And express different opinion, more substantially.

Hypotheses – explanatory variables – cognition, social metacognition vs several other bits.

Data description

Design-based research. Online grad education course using Knowledge Forum (KF). Designed to support knowledge building – radically constructivist approach. Creation and continual improvement of ideas of value to a community; a group-level thing.

KF designed to support this. Focus on idea improvement, not just knowledge management or accumulation. Client and web versions, years old (80s) and now sophisticated. Lot of analytics features.

Demo. Students log in, screen with icons to the left. Biography view, establish welcoming community before they do work. Set of folders, or Views, one for each week of the course. Week 1 or 2, instructor spends times facilitating discussion and moderating it. (Looks like standard threaded web forum things.) Model things like writing summary not, with hyperlinks to other notes and contributions. And you can see a citation list of it. Can see who read which note how many times, how many times edited

N=17, grad students, 20-40yo,  most working, PT, in Ed program. Survey course, different topics each week. After 2w, students took over moderation. Particular theme set, emphasising not just sharing but questioning, knowledge-building discourse. Readings, discourse for inquiry cards, and KF scaffolds.

Cards are on e.g. problem solving, set of prompts aligned to commitments to progressive discourse. Notes contain KF scaffolds, tells you what the writer was intending the readers to get.

1330 notes. 1012 notes in weekly discussion (not procedural). 907 by students, 105 by instructor and researcher.

Method – Statistical Discourse Analysis

Some hypothesis, some dataset. 4 types of analytics difficulties – dataset, time, outcomes, explanatory variables.

Data difficulties – missing data, tree structure, robustness. So to deal with it, Markov multiple imputation, and for tree structure store preceding message to capture tree structure. Online discussion is asynchronous, get nested structure. SDA deals with that. For robustness, run separate outcome models on imputed data. (?)

Multilevel analysis (MLn, Hierarchical Linear Modeling), test with I2 index of Q-statistics, model with lag outcomes.

Outcomes – discrete outcomes (e.g. does this have a justification y/n), also multiple outcomes (detail being skimmed over here).

Model different people, e.g. men vs women, add another level to multilevel analysis, three level analysis. (She’s talking to the presenter view rather than what we’re seeing? Really hard to follow what she’s talking about here. Possibly because it’s not her area of expertise.)


Look at sequence of messages. Asking about use of an idea precedes new information. Informal opinions leads to new information too. Male participants theorised more. Anecdote, 3 messages before, ends in theorising, as did asking about use, opinion, different opinion.

Looking at informal cognition moving to formal cognition. Opinion sharing led to new information as a reply. Also opinion led to theorising. Anecdotes, got a lot of those, they were practising teachers and they talk about that, also led to theories. As did elaboration.

Social metacognition moving to cognition. Ask about use led to new information. Ask about use led to theorise, and so did different opinion and ‘ask for explanation’.

Educational Implications

Want to encourage students to use one another’s formal ideas to create formal knowledge. Also wanted to encourage students to create subgoals with questions, wonderment, they take more responsibility cognitively of what you’d expect teachers to do. They motivate themselves and build knowledge over time. Handing over to collective cognitive responsibility. Consistent with Design mode teaching. (All Bereiter & Scardamalia stuff.) Doing it via prompts to aid discussions.


Participants coded their own messages themselves – we didn’t need to do content analysis. Scale that up, might be applicable to a larger dataset like a MOOC. Talking about extracting that data from e.g. Moodle and Sakai.


(Phil Long says he’s doing the Oprah Winfrey thing with the microphone.)

Q: Interesting. I’m responsible for making research in to applied tools at Purdue. What artefacts does your system generate that could be useful for students? We have an early warning system, looking to move to more of a v 2.0, next generation system that it’s just early warning but guidance. How could this apply in that domain?

Signals for EWS. This is more at the processes, at higher level courses, guide the students further along rather than just don’t fail out. This data came from Knowledge Forum. Takes a few seconds to extract that, in to Excel for Statistical Discourse Analysis. Many posts had that coding applied themselves. We can extract data out of Moodle, and Sakai. If we identify something we want to look at, we can run different kinds of analysis. Intensive analysis on this dataset, including Nancy Law too, and UK researchers. SNA, LSA, all sorts. Extract in a format we can analyse it.

Q2: Analytical tour de force. 2 part question. 1, sample size at each of the three levels, how much variance to be explained? Use an imputation level at the first level, building in structure there?

Terrific question, only Ming can answer. (laughter) I’m not a statistician. I know this dataset really well. Gives me confidence this analysis works. For SDA only need a small dataset, like 91 messages.

Phil Winne: Imagine a 2×2 table, rows are presence/absence of messages that lead to something else, columns are the presence/absence of the thing they may lead to. Statistical test tells you there’s a relationship between those. [This seems a lot simpler - and more robust - than what they've done. But I haven't been able to understand it from the talk - need to read the actual paper!] Looked at relationship between other cells?

I’m sure Ming has a good complicated response. I was most interested in how students work with new information. Looked at the self-coding; can’t say caused, so much as preceded.

Uncovering what matters: Analyzing sequential relations among contribution types in knowledge-building discourse. Bodong Chen, Monica Resendes (Short Paper)

Bodong talking, from U Toronto.

First, talking about cooking. If you cook, and buy cooking books, you have to buy good ingredients, cook for right time, and put them in the right order. Temporality is important in cooking, and in learning and teaching.

Neil Mercer – temporal aspects of T&L are extremely important. Also in LA research. Irregularity, critical moments, also in presentations at this LAK14, lots about temporality.

However, Barbera et al (2014) – time is found playing almost no role as a variable in ed research; Dan Suthers critique of falling in to coding-and-counting. So challenge in taking temporality in to account. Learning theories tend not to take it in to consideration. Little guidance, and few tools.

Knowledge building – main context. Also suffer from this challenge. Scardamalia & Bereiter again. Continual idea improvement, emergent communal discourse, community responsibility.

Knowledge Forum again, but different version – posts in 2D space so students can see the relation between them. Used in primary schools. Metadiscourse in knowledge building. Engage young students to take a meta perspective, metacognition of their own work. Two aspects: first developing tools, a scaffold tracker, extracts log information about who used scaffolds, and present a bar chart to serve as a medium of discussion. And design pedagogical interventions, here for grade 2 students, what’s the problem for their discussion, to engage them – e.g. where are we stuck, how can we move forward.

What do good dialogues look like? Focus on ways of contributing to explanation-building dialogues. Thousands of discussions, grounded theory approach, Six different categories. [Is this like Neil Mercer's stuff?]

To make sense of lots of data. Lay out in a temporal, linear form, how different kinds of contribution. Compared effective dialogues and improvable dialogue where they didn’t make much progress.

Can we go further than coding and counting? What really matters in productive discourse?

Lag-sequential analysis. ID behavioual contingencies (Sackett 1980!). Tracks lagged effects too. Many tools: Multiple Episode Protocol Analysis (MEPA). GSEQ analysis of interaction sequences, and old tools in Fortran, Basic, SAS, SPSS. A challenge for a researcher to do this.

So wrote some code in R to do Lag-sequential Analysis. Easy to do, and is one line of code to run. (Is it available as an R package?)

Participants and data – Grade 1-6 students, 1 science unit, 1101 KF notes in total, about 200 for each grade.

Primary data analysis, coded as contribution types, inquiry threads, and productivity threads. About 10 threads in each dataset, some productive, some improvable – fewer improvable. (We’re N=2-9 here.)

Secondary data analysis – compare basic contribution measures. And lag-sequential analysis to look at transitional ones.

NSD in basic contribution measures between productive and improvable dialogue.

LsA. Simple data table (in R). Feed in to R program, computes what’s happening. Takes one thread, computes the transitional matrix for that thread – e.g. if 1 happens, what’s the frequency of e.g. 5 happening. Base rate problem though. Try to deal via adjusted residual, or Yule’s Q, gives good measurement. Like a correlation score. “The code calculates that, which is just … magical.”

Merge in to one big table, 50×38. Simple t-test between types of dialogue and whether they differ in each transition. Run over all data.

Found – in effective dialogues, after questioning and theorising, tend to get lots of obtaining evidence. Also when working with evidence, they tend to propose new theories. For not very effective dialogues, students respond by simply giving opinions.


Temporality matters. Temporal patterns distinguish productive from less productive patterns.

Focus on community level, not individual or group level. Also, an R implementation of LsA, addressing the practice gap. Contact him to get it.

Limitations – LsA overestimates significant results, misses individual level. Data manipulation converted it into linear format. Other actions, like reading, are not considered.

So for future, design tools to engage students in better discourse. Connect levels with richer action, and refine the algorithm to be more effective.


Q: Agree that methods matter in LA. Useful to see these two presentations, employing different methods. Statistical discourse analysis is new. What would a comparison look like? They both hit on sequential analysis. Would be great, come from same lab – considered a head-to-head methodological treatment? (laughter)

Ming Chiu’s work is more advanced. A lot of work in SDA is different. Big difference here, I compare two kinds of dialogues, they are more not distinguishing between effective.

Nobuko: Focus on productive discussions, not non-productive. We looked at everything, but focused on things that led to provision of information and theories. But for you productive ones lead to theories.

I’m not trying to advance the methodology, I want to design tools for students. I’m trying to use a tool to explore possibilities.

Q: LsA is done, understood quite well, useful baseline for understanding new creature, SDA, it’s complicated. Before we can put faith in it, have to have some understanding.

Q2 (?Phil Winne): SDA can address the kind of question you did, like do discussion vary, like an upper level in a multilevel model.

10B. Who we are and who we want to be (Grand 3) Current State and Future Trends: a Citation Network Analysis of the Learning Analytics Field. Shane Dawson, Dragan Gasevic, George Siemens, Srecko Joksimovic (Full Paper, Best paper award candidate)

(While Shane was out of the room, George stuck a photo of a dog into Shane’s presentation.)

Shane talking. Thanks everyone for stamina. Thanks co-authors, except George. (I contributed! Says George.) I had the lowest impact, so I am up here.

The slide comes up, and Shane looked straight at George. Yes, you did contribute. (Manages to recover quickly.)

Goal – citation analysis and structural mapping to gain insight in to influence and impact. Through LAK conferences and special issues – but missing a broad scope of literature.

Context – much potential and excitement: LA has served to identify a condition, but not yet dealt with more nuanced and integrated challenges.

Aim – to ID influential trends and hierarchies, a commencement point in Leah’s term. To bring in other voices, foundation for future work.

LA has emerged as a field. (Strong claim!) Often mis-represented and poorly understood, confused with others.

Using bibiliometrics – Garfield (1955), Ding (2011). Dataset: LAK11, 12, 13, ETS, JALN, ABS special issues. Straight citation count, author/citation network analysis, contribution type, author disciplinary background (shaky data).

Many criticism – buddy networks, self-citations, rich-get-richer. Gives us some great references (i.e. theirs). Real impact factor – cartoon from PhDcomics. But broadly accepted.

Highly cited papers are predominately conceptual and opinion papers, esp Educause papers. Methods – Wasserman and Faust SNA book. There were empirical studies mentioned, but few.

Citation/author networks. Count any link only once, not multiple. Lovely SNA-type pictures. A few big ones. Moderate clustering – 0 is no connections, 1 is all, got about 0.4/0.5. Some paper were strong connection points, but degrees surprisingly low. We’re drawing on diverse literature sets. Degrees were increasing from LAK11 to LAK13.

Author networks – a few strong nodes, but generally similar disciplines clustering.  Small cliques, few highly connected; not a problem really. For an interdisciplinary field, still largely disciplinary clustered.

Paper classification – schema from info systems, 6 categories, added a 7th. Evaluation research, validation research, solution proposal, conceptual proposal, opinion, experience, panel/workshop. Lots of solution proposals. A lot of evaluation research in the journals, the empirical studies are more there. LAK dominated by CS. More educational researchers in the journals – they prefer stuff in journals than conferences, but CS will. Largely conceptual/opinion. Methods – “other” was by far the most. Quant not far behind.

Field early, but maturing. Lots of opinion and definitional. Need to grow empirical students, more validation research, and critiques of studies. Would be great to see more arguments. Computer scientists dominate LAK proceedings; education research dominates journals.

By understanding our field, we can better advance it. Or do all fields go through this process? Working at other techniques too.


Matt: We’ve noted this for a while, it’s maturing. Is there another field that we can look at, in its first 5-10 y, to see how our research compares.

That was a comment from reviewers, can we normalise this. I’m not sure. How do you do that?

George: One field we could look at, the EDM community, there is some overlap. Talked about that, talking to Ryan. Point at the end, the location of a citation is more important as its existence.

Shane: Love to do the EDM work. Still argue it’s not as interdisciplinary as LA, so direct comparison very difficult.

Adam: Irony, that for analytics topic, not much quant. At history, from early days of ed tech, could we use that as a benchmark?

Shane: Yes. Look at where authors have come from, go out multiple steps. Largely they’re from ed tech, that brings in other literature.

Q: How can we use this insight to develop? Look at what communities are being cited but not well represented at the conference, approach for next LAK.

Shane: Great idea, thanks.

Hendrik: LAK data challenge, visualised the dataset of EDM and LAK, with ACM full proceedings. 12 submissions analysing differences between LAK and EDM. How could we team up for that for next LAK. Focused track, with focused tasks, where people have specific questions, compare how questions work on the datasets.

Shane: We did, Dragan chatted to Chris Brooks about the data challenge, would be great to get involved more.

Bodong: Analysing tweets since 2012, this is my first LAK but have been tracking it for a long time. And attendees who did not have papers. So Twitter could augment this. Data challenge next year, include tweets? Another idea.

Shane: Really interested in that, who’s tweeting, what’s being picked up and what their backgrounds are. The comment about the group we are missing, that’d be another are of interested people who aren’t publishing.

Q: Not just conference, but alternative mappings, where published work is mentioned and by whom. Lots of different communities, educators, administrators. Social media may reveal some of those trends.

Maria: Discussing on Twitter the idea of an Edupedia. We can do things, a LAKopedia, brief summaries on how research builds on itself. Every article gets summary, bottom line, strengths and weaknesses, leads to things that build on them. Have a taxonomy mapped out – field is now so don’t have to go back a long way. I’m not volunteering!

Establishing an Ethical Literacy for Learning Analytics. Jenni Swenson (Short Paper)

Dean of Business and Industry at Lake Superior College, MN. I’m not a data scientist, I’m a soil scientist. Might be only discipline nerdier than data science. Then to dept of rhetoric, looking at ethical literacies. Interested in 2008, via Al Essa. And Matt Pistilli at Purdu.

Rhetorical theory, concerned with ethics. Crisis of credibility, focus on ethics really helped. So have a lot of ethical frameworks. Our focus is to teach ethical communication in the classroom, raise awareness, to give skills to make ethical choices. Best message possible, analyse artifacts – recurrent pieces, written, spoken, visual. Always outcomes of a group’s work. Better understand consequences, intended and otherwise.

So taken these frameworks, three so far, and applied to LA. Looking for questions around purpose from an ethical framework.

Data visualisations – Laura Gurak, Speed/reach/anonymity/interactivity. Early internet stuff. Ease of posting online, get ethical issues for visualisation artefacts become apparent. We have software that allows anyone – NodeXL and SNAPP intended to be simple to use. Realtime analysis can be posted by people with no expertise, can be put out with any context. When viewed as a fact, we get questions such as, is there a lack of context, no history. Who’s accountable for predictions, accuracy, completeness? Target snafu, private data can become public quickly. What if inadvertently made public? E.g. through implied relationships. Interactivity and feedback, there isn’t as much with people who might be part of the vis.

Dashboards – Paul Dombrowski. Three characteristics of ethical technology – design, use and shape of technology. Think about first impressions. Things like Signals. We all judge people, 8-9s, sticks with you. Visual elements could be expressive. Meaning created beyond the dashboard label of at-risk, and how student responds without the context. Finally, how does it function as an ethical document. Many questions! Is there a way to elevate the student over data, rather than viewing the student as data.

Then Effects, of the process. Stuart Selber’s framework, institutional regularisation – required use leading to social inequity. We have an economic reality, Pell-eligible students, no access to computer, transportation, have jobs. Different from 4y schools. Need to be sure not making harm. At any point, can have 5% of our student population homeless (!). Crazy what these students are doing to get the 2y degree. Ethics of these projects, could be different between two schools. Transparency about benefits. In rhetoric, if uncover ulterior motive, you message has failed and you’ve lost credibility. So transparency needed that school will benefit. Do intervention strategies favour on-campus, vs online? We want to have available to all. Bernadette Longo, power marginalising voices, often unintentional. Who makes decisions about LA model and data elements? Legitimising some knowledge and not other. If we do give them a voice, are they all given that consideration? Bowker and Leigh Star – most concerning. We are really trained to accept classification systems as facts. We know there are problems, and we’re stereotyping to fit in to categories. There could be existing categories we’re covering up. Making sure that conversation is there, again transparency. Real kicker – at risk, “people will bend and twist reality to fit a more desirable category”. But people will also socialise to their category, so dangers that it may feel like defeat.

How does institutional assignment of at-risk match the student? Do they reinforce status? Can LA be harmful if categories wrong? We know it is, we have to have the conversation.

Return to Selber. Three literacies – functional, critical and rhetorical literacy. They are non-practitioners put to this test. Understand, analyse, and produce knowledge. To reach ethical literacy. Under rhetorical side, four areas – persuasion, deliberation, reflection, social action. Who has control and power of the conversation, who doesn’t, and why? Are we following a code of ethics?

Central question: Who generates these artifacts, how, and for what purpose, and are these artifacts produced and presented ethically?

We could be a big step up, took tech comm 10-15y. Questions are a jumping off point.


Q: Thought about fact that many of us get funding from agencies, foundations, and how this compromises ethical principles, about empirical findings with another agenda in the background.

Any time you go after money, there’s ethical implications. For me, in rhetoric, as long as you’re having the conversation, being open and transparent, that’s all you can do.

James: In ethical models with Aristotelian ethics, utilitarianism – possibility of a new ethical model? Because dealing with technology?

I do think there is time. There is a lot of different codes of ethics, different models. This was just one discipline. There might be parallels or best fits. I’m hoping for that. People have papers in the hopper on ethics of analytics. Broader conversation, the Wikipedia idea was great, discuss what this code of ethics is.

Q2: Real barrier to LA is the ‘creepy factor’, people don’t realise you’re doing them. More mature ethical future could overcome that affective feeling?

It is creepy! (laughter) I think young people don’t care about how much information is being collected about them, and older people have no idea. Everyting is behind the scenes and we aren’t being told. We have to have trust about collection and use. Thinking more of industry. There isn’t a transparency. We feel betrayed. There’s no transparency right now and that gives us the creepy feeling. The opt in/out think contributes to creepy feeling.

Teaching the Unteachable: On The Compatibility of Learning Analytics and Humane Education. Timothy D. Harfield (Short paper)

Cicero – eloquence as copious speech. Will try to be as ineloquent as possible.

Timothy is at Emory U. A little bit challenging, involves unfamiliar language. Paper has philosophical language. Best way to approach it is with stories, motivation for thinking about what is teachable and what isn’t.


Driving concerns – getting faculty buy-in, especially from humanities. STEM, it’s an easy sell. In humanities, unfortunate responses. Profhacker blog, LA space is interesting, but is a little icky. Generate a reason to engage in this space. Secondly, understanding student success. Problem at Emory, we already have high retention and student success. How do you improve? And opportunity – think about what success looks like as a learner. So changed conceptions of student success. Retention and grade performance is easy; others more challenging. And thirdly, understanding learning as a knowledge domain. More often than not, learning is not defined or if it is, it’s as basically experience, changes your behaviour or pattern of thought [what else could it be?]. Have concerns about language of optimisation as constraining what learning might mean. Any time you see ‘optimisation’, think of a technique. Some types of learning don’t lend themselves to enhancement through techniques.

Teachable and unteachable

Going back to Aristotle. Five forms of knowledge, fifth – wisdom – is combo. Episteme, nous, techne, phronesis. What is the case, nous, how do I do it, what should I do. Episteme and techne are training, phronesis is education – practical wisdom. We cannot teach this (!), includes moral behaviour. Cannot teach them the right way to act with the right motivation, only acquired through experience (?!). Training vs education from another philosopher.

Humane education (Renaissance humanism)

Approaches we see – three features. Ingenium, eloquence, self-knowledge. Ingenium, essence of human knowledge – deliberation, disparate situations drawn together. Not technical competence, imaginative, creative, can’t be machine-learned. (!). Eloquence – copious speech. Educators mediate between students and knowledge. Students come from different places, values. In LA, not all students respond the same way. Example at Emory, dashboards like Signals, expect everyone will respond the same, aim to perform at a higher level. For some it’s effective, others not. Already high-performing students end up pursuing proxies. So wrt eloquence, meet students where they are. Self-knowledge, cultivating ingenious behaviours, making them responsible for learning, developing capacity.

Humane analytics

Real concerns. Potential for analytics to not recognise the responsibility of the learner, undermine their capacity.

Really fruitful if reflective approaches, like in Alyssa Wise’s work, Dragan, Dawson. At Georgia State, predictive models, part of the funding for this project is not just the model, but also purchasing a second monitor for every student adviser. At others, it’s an algorithm where students are automatically jettisoned and have to choose a different path. Here, confronted with their data, asked to reflect on it in light of their personal goals, troubles, where they want to be. These reflective opportunities are powerful: encourage responsibility, and also have the ability to make prudential decisions about how they are going to act in this context.

Another space, fruitful, social network analysis. Humanists will be frustrated by online education because miss the f2f, with each individual student to eloquently meet each student where they’re at. End up as sage on the stage; humanists end up that way but like to think of themselves others. SNA has possibility to get the face of the student back. Can ID nodes of interest, opportunities to intervene, to lead and guide in a particular direction.


Thinking carefully about what learning is, and the types of learning we want to pursue. Some of our techniques may contribute to some types and distract from others. Need to be sensitive to needs of all students, and not all learning is teachable.


Q: Slide referring to higher performers, seeing their dashboard, undesired outcome?

We’ve had dashboards piloted in two classes. Told students to stop looking, because they’re seeing decrease in performance. The competitive nature is coming out to the detriment of learning outcomes. Really frustrated, students really anxious. So instructors told them to stop.

Alyssa: It’s not always necessarily positive. How students act on analytics. Goal not to just predict, but to positively impact and change that. Students optimising to the metrics at the expense of the broader goal. How do we support students in the action taken based on the analytics?

We’re learning you can’t just throw a tool at an instructor and students. Responsibility in the use of these tools. If instructors not careful in framing the dashboard, not engaging in discussion in ongoing way, whether it is accurate reflection. Levels of engagement measured in clicks or minutes is not correlated with performance at all. Bring that up in class. When you do ID students are risk, or in a class determine these metrics are not useful – say best use of analytics here is to not use them. Or not use as measures, but as a text itself, and opp to reflect on the nature of their data and performance – digital citizenship, etc. Cultivate practical wisdom in the digital age.

Maria: We have to be careful. Just because we have dashboards, not the first time they got negative feedback, or negative actions. It’s a bit more transparent now. One thing is cautionary note, this stuff has been happening all the time, this is just a new way for it to happen.

Hadn’t thought about it that way. Discussions about responsibility might end up leading to larger conversation about instructor responsibility in their pedagogy.

Maria: Tends to be case in tech, it creates new problems, we create tech to solve those. Phones where we text, people drive when they text, now cars that drive themself. We’ve created dashboards, new unintended effects, need to iterate to find ways to have intended effects.

Situate in a sound pedagogy. A talk like this is really preaching to the choir. Wonderful to enter conversation about these issues.

Caroline: Combining this with earlier talk, apply it to our development processes within the community. Technology is a theory to be tested. Ideas about reflection applied to development of the tools so don’t get in to those issues.

Q2: Interested in effectiveness of high-level things vs just showing the raw activities that make up the score.

Significant chunk of the paper discusses relationship between what’s teachable and measurable. If it’s teachable, optimisable, performance compared to a standard: skill mastery. These things are teachable and measurable. But things like creativity, imagination, even engagement, they don’t have a standard for comparison, we’re forced to produce an artificial one, which is maybe to lose what it is in the first place. Is more effort required to make distinctions about the types of learning? LA applied differently.

Closing Plenary

Kim gathers people together. Glad you joined us here, looking forward to connections.

Few final things.

  • Best paper (Xavier Ochoa), poster (Cindy Chen), demo (Duygu Simsek).
  • If you presented, get it up at – or view it there!
  • Twitter archive will be there too. Conference evaluation to be sent next week, please complete.

Hands the torch over to Josh Baron, from Marist, for 2015. A bit frustrated, you’ve set the bar too high. (laughter) Had an awesome time. (applause) Advanced welcome drom Dr Dennis Murray (sp?) who’s looking forward to it. Not everyone knows where Marist College is, it’s Poughkeepsie, NY, an hour up the Hudson from NY City. Focused on technology. $3m cloud computing and analytics center. Many faculty excited.

Safe travels. See you next year!

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Why management speak makes us evil and ill

Dr Gill Kirkup's blog - Thu, 27/03/2014 - 21:18

I should not listen to the morning news as I eat breakfast. I turn into one of those people who shout at the radio presenters and leave the table annoyed with the way the world turns. This morning I heard something that brought home to me the evil of management speak.
The news item was a report on the shortcomings of some police forces in dealing with victims of domestic violence. Many years ago I was a volunteer in a refuge for victims of domestic violence so I do not have a naïve notion about how easy it is to deal with this area of gender violence. I know that many police forces are trying, with diminishing resources, to protect women and prosecute offenders, but that sometimes they fail. I did not begin listening to the news item with a hostile position on the police. But the end of the interview I had one, at least with respect to the senior police officer who was interviewed.
My attention was grabbed early in the interview when he referred to domestic violence as a ‘business area’ of the police and the followed this by describing women victims as ‘service users’ . I felt that I was listening to someone who could describe anything no matter how awful/tragic or wrong in similar terms. This is the language of commercial business. I expect to my broad-band provider to talk about me like this, and I probably think about them in the same language : i.e. they are my service provider, I am their customer. We have become used to hearing ourselves described in these terms in the health service. But I think most of us describe our doctors still as doctors – not as service providers, and that when we go to an accident and emergency department we don’t see ourselves as entering a business area of the health service.
Management speak deprives every area of human life of its particular ethical and human dimensions. It is wrong to talk about a wide variety of different human activities in language that suggests they are all of the same ethical value and human importance.   People become units of consumption or units of production or delivery, and relationships between people are described as systems and protocols (which is where the senior police officer described the failure, and the solution, as lying this morning). Everyone is dehumanised, and deprived of moral responsibility.
You can imagine management speak as the kind of language that allows people to build concentration and torture camps, ‘parking’ their ethics and empathy somewhere else while they do so: seeing themselves as simply working in a business area of some Government. On the other hand you cannot imagine it as the language used by those who are incarcerated in those camps, or by the people who eventually liberate them. If  these groups used the same language to talk about themselves we would see this as showing a ‘lack of affect’ that indicated mental illness such as depression or post-traumatic stress disorder.
I think my proposition this morning is that management speak is not only evil, it makes us all ill.

LAK14 Thu pm (8): Institutional perspectives, learning design

Dr Doug Clow's blog - Thu, 27/03/2014 - 20:34

Liveblog from the second full day LAK14 - Thursday afternoon session.

Session 7B: Institutional Perspectives Techniques for Data-Driven Curriculum Analysis. Gonzalo Mendez, Xavier Ochoa, Katherine Chiluiza (Full Paper, Best paper award candidate).

Xavier talking.

Siemens & Long (2011) Educause classification of learning and academic analytics. Focus at the Departmental level, aimed at learners and faculty.

Government says have to redesign curriculum. Questions – which are the hardest courses? How are they related? Is workload adequate? What makes students fail? Courses to eliminate. How can learning analytics help here? And help curriculum redesigners.

Wanted to design techniques. Using available data, no time to collect data, do surveys, etc. Grades are always available, so focus on those. Want metrics that are go/stay, not using them as decision-makers, but as discussion starters. Also easy to apply and understand. Want to ‘eat own dog food’, apply to our grades on CS program.

Finding which courses are difficult. Could be good students who do well, or easy course. There are two estimation metrics, alpha and beta, difficulty and stringency of grading. Three scenarios – course grade > GPA; easy course for that student. 2 Course grade=GPA, or 3 course grade < GPA, was bad for them. Aggregate students, come up with a distribution, get the average deviation of the GPA of the students. Should eliminate the ability of students.

Real examples, but the distributions are not normal! So don’t tell the full story. Most difficult course, algorithms analysis, most is on the bad site. The easy course, it’s more on the left side. So new metric, the skewness of the distribution.

Also compared to asking the students and teachers (perceived) vs the estimated ones. Some on both lists, but not all. Perception is not the same as estimation. Why? They think the courses are difficult, but data say no. Example, physics doesn’t appear in perception, but is very hard according to the data.

Dependence estimation – if they do well on this course, how likely are they to do on another one. Map of the CS curriculum, which are prerequisites, maps, and which belong to which one. Simple Pearson correlation – but a lot of them! Many very low correlation, like 0.3. But ‘computing and society’, isn’t linked to much, but correlates with HCI course, and discrete mathematics (0.6ish) – surprising.

Maybe we should rethink prerequisites? [I'm not sure that follows from things being uncorrelated.] Why ‘Programming Fundamentals’ doesn’t correlate with other programming courses?

So ran Exploratory Factor Analysis of the different courses. There is lots of difference in the professional part. Five factors – basic training, one is advanced CD stuff, one on programming, one around client interaction, and one they couldn’t make sense of – electrical networks and differential equations.

The grouping is off (from their expectations) – Programming Fundamentals isn’t in the programming factor. (Makes sense since it doesn’t correlate.)

Wanted to look at dropout and enrolling paths. Expect they all start happy, and they drop out over time. Sequential Pattern Discovery using (something) – SPADE.  60% of students that drop out fail Physics. Hard course, and is a main problem. Only Programming Fundamentals is in the top list for dropouts of the CS courses. So we’re dropping them out from basic courses. So – start with CS topics?

Finally, load/performance graph. What they think they can manage from what they do. Present a graph of courses they have to take – it’s quite a big complex graph (7 x 10 matrix, not very sparse). Simple viz – density plot of difficulty taken vs pursued difficulty. Most of the density is on the straight line. (Pass all the courses you take.) But big blob below – most are taking 5 or 4 courses, and failing one of them. Maybe suggested load is unrealistic. Can we present the curriculum graph in a better way? Or recommend the right amount of load for each student?

Very simple techniques and methods. Used only grade data – so can apply in your university. Discussion starter questions – and now have lots more questions, but they are more actionable and specific.

Redesigning courses based on new questions in April. More ambitious goal – to create techniques that can be transferred to practitioners.


Q: As faculty member, difficult of course is difficulty of material and grading scale. Could be easy material but you’re a tough grader. Can you confront that problem? Normalised distributions over time? Teasing out difficult material or grades?

We have courses taught by different professors over time. In the process of redoing it by course/professor. Bit politically … yeah. (laughter)

Q: Interesting to apply this in Law schools where normal distribution is imposed on the class. [Only in the US, I think?]

Q2: Because GPA an average, how do you take account of the standard deviation, which must get smaller over time?

Move these metrics to students, yes, in first year you have variation of the GPA. But this is of the whole program. It’s the GPA of people who’ve left the university. If want to do it with 1st year students, yes have problems.

Grace: We’re finding GPA is not a good measure, esp in 1st year because there’s a higher dropout rate. The 0s pull down the average. Remove withdrawers?

We use the information from students who take the course. We don’t use GPA for dropout analysis.

Q3: Faculty have engaged, maybe adopted analytics techniques. Do you get traction, or is it a rabbit hole where you do more complex action?

Faculty have been very open. When presented as, just look at this data, without conclusions. When you draw conclusions, then you have friction, if you say ‘this is a hard course’. If you say, ‘this metric says this’, no value judgement, it’s Ok. These are not (good enough for this). You need human insight to get some judgement.

Stephanie: It is very difficult to get faculty to realise there are problems, and how they may fit together and cascade through. Appreciate this work, showing you what the results show. Responsive to data about how things are going.

Q4: Talked about understanding course load, what should be recommended course load. A lot of what you might want to include might have to do with responsibilities outside class – a PT job, that stuff. That data’s probably not available. What of that other info would be useful, and how might you collect it?

These metrics are being done in a system where professor asks them about working outside, regular activities, so we can take that in to account. We know there is a problem of load. The cause, we have to explore that.

Hendrik: Also busy with playing with study load. Rule of thumb from experts at the moment. How do you measure the study load so far? Is it just the professor saying? Studies with more accurate or transparent measure?

Easy. If 4h, have 4 credits, 5h, 5 credits. Aware of studies of workload from (someone) at U Ghent, ask students to log their workload. Maybe they have a paper about that.

The Impact of Learning Analytics on the Dutch Education System. Hendrik Drachsler, Slavi Stoyanov, Marcus Specht (Short Paper). The Impact of Learning Analytics on the Dutch Education System from Hendrik Drachsler

Hendrik presenting, from OUNL. Totally different study, normally I do recommender systems. We took opporunity of local LASI in Amsterdam, only Dutch people, from companies, HEIs, K12. Some analytics at that event to get a measure or idea of how the LA concept is appreciated by that community.

LASI-Amsterdam in 2013. SURF – umbrella organisation for Dutch HEIs. Have an LA SIG. Also Kennisnet, for K12 in Netherlands.

Group Concept Mapping. Identifying common understanding of group of experts – brainstorming, sorting, rating. Post-its, write them up, cluster on the wall, then voting. It’s same approach, but computer supported. Then applies robust analysis (MDS, HCA) then presented in conceptual maps, which are rated. Many viz possible.

LASI did brainstorm at LASI. Then sorted online afterwards. Then rated on two criteria – how important, is it feasible.

For one participant, they have a sorting arrangement, generate a square binary similarity matrix. Do that for all of them, get a total square similarity matrix – multidimensional scaling. Input square matrix, get an N-dimensional mapping – often 2-dimensional map. LAK data challenge paper did this for LAK papers.

Hierarchical cluster analysis, neighbourhood of statements and how similar. Use this as a hub for a semantic analysis. Need to decide what hierarchical approach makes best sense – have to explore the data.

‘One specific change that LA will trigger in Dutch education is …’ – the prompt for brainstorming. Then sort the responses, put them together. Each does that themselves. Then rate from 1-5 for each response, how important, how feasible.

Two hypotheses – H1 the most important will be less feasible to implement. (seen this in previous work). H2 sig diff btw novices and experts in importance and feasibility (see this too).

Descriptive stats – N=32 people entered statements (60 people total). Sorting phase, 63 started, 38 finished. This takes a lot of time – 108 statements, takes 2h. For GCM, that’s Ok. Importance and feasibility ratings similar.

Also asked participants to self-rate expertise. Most – >50% – said they were novice. 44% advanced, 2% expert. Most were established professionals in their field (>10y), but not in LA. Most teaching-skewed, some managers. Nice spread of organisational context.

Point map – nice scattergram of points. Cluster maps suggested, and decided 7 clusters. System suggests topics, but manual oversight too. Had several – students empowerment, personalisation (close to each other). Risks way away from everything else, but close to management & economics. Teacher empowerment closer to feedback & performance and research & learning design. [Nice]

Then add the cluster rating map – each block is higher depending on how important it was seen. But teacher empowerment top. Feasibility very similar, actually. Risks not a big issue.

H1, pattern match importance vs feasibility. Surprisingly, people think teacher empowerment is important, and feasible. Personalisation is important, but less feasible. Research & LD in between; student empowerment important, but less feasible. Management/economics and risks right at the bottom.

H2, novice vs experts different ratings. There is not! Really very close. Took novice in one group, and intermediate and expert, basically agree about most things. They speak the same language. Same for feasibility – although experts do tend to rate things as less feasible than novices. (Looks probably significant to me, if they had large enough N.) Consensus that teacher empowerment are important.

Need to reject H1 & H2. Dutch community (at LASI) highly agrees on topics that are important to influence the educational system with LA.

Final message: the Netherlands is ready to roll out LA.

But want to explore data, e.g. H3 sig diff in sectors (HE/K12, business/educ).

Partners to run a GCM on LA in their countries – contact us!

LACE project is running a study on quality indicators for learning analytics. One further step, we will build an evidence hub, bring studies in, get an overview that provide insights in to how LA affects certain topics.


Grace: Halfway with this in Australia?

Did not manage this with a LASI. Have to select the people to contribute, can do it as many as you want to brainstorm. Sorting and rating takes 2h, you need experts, who are considering it. Then run whole thing online. Used it at Amsterdam to push it there and have fast result.

An Exercise in Institutional Reflection: The Learning Analytics Readiness Instrument (LARI). Kimberly E. Arnold, Steven Lonn, Matthew D. Pistilli (Short Paper)

Kim talking. Working on a similar concept. New instrument: LARI. Looking for feedback, collaborators.

Facing a challenge. Lots of practitioners want to get in to it. Researchers go by choice, practitioners are mandated. Without much direction or understanding. 2.5-3y ago realised this was an issue. Important consideration is to know if you’re ready or not as an institution.

Readiness definition: willing, prepared, immediacy. In LARI, LA requires time, resources, money. Intensive.

Use the same ‘Ready Set Go’ picture as Hendrik on the slide!

Successful implementation won’t happen accidentally. Logistics can be daunting. Institutional reflection is critical. So how can we facilitate reflection? Needs to be comprehensive. Sometimes from IT depts, with educ researchers, but perhaps not all depts. Needs to be cross-disciplinary, diverse experience and skills. So more realistic understanding of resources.

How do we get to assessing readiness? Especially in a large institution, but evne in a small one there’s a lot to include. An national view – Horizon Report. And Innovating Pedagogy report from the Open University. (yay!)

Educause ‘analytics maturity index. Great tool, focuses on analytics at large, aimed at individuals. But wanted something down from the landscape, but up from the individual. Institutional profile, that’s a prescriptive diagnostic too; situated in literature, and formative, based on parsimony/practicality/proactive.

Matt takes over.

The instrument, developed items based on the literature (readiness broadly, and analytics – meta pieces). ECAR Maturity Index. Practitioners’ experience too. Also some original factors – ability, data, culture, process, governance/infrastructure.

Convenience sample, N=33 over 9 institutions (8 US, 1 Canada). Focused on R1s, where have a foothold in learning analytics. Survey distributed.

Started with 139 items. Exploratory Factor Analysis, eliminated 42 survey items. Then a second factor analysis, eliminated another 7. Third, no noticeable change to left it.

The factors changed – went to new ones. Ability, data, governance/infrastructure stayed. Culture and process came together as a single one. And another one was ‘overall readiness perceptions’. Surprised by that.

Final 90s survey items, Exploratory Factor analysis, Cronbach alpha 0.9464, explained 55.7% of variation.

Steve takes over. Institutional differences – plotting each factor for institutions with at least 2 respondents. There were sig diff for ability, and for overall readiness factors. Very mixed picture, but that may not be a bad thing. Helping to reflect on data, making it actionable for student success. Institutions can wildly differ, yet get to similar places in terms of effective changes.

Limitations – sample of convenience, only one non-US institution. May not be applicable elsewhere. Policy implication in international settings – looking for international partnerships.

What’s the future? Iterating the instrument. Done the factor analysis, it’s leaner, though still dense. Maybe split up by different people filling it in. Want to create tailored feedback and automate. Low/medium/high readiness, feedback about that. Prompts for exploration – how do you stay high, etc.  Want more activities, and international partners.

Want beta partners. Lead individual at American institution. 10-15 individuals in various roles representing the diversity of job roles associated with analytics. Honest & constructive feedback. Participate in follow-up study on use and usefulness of report. for contact.


Q: Applaud the instrument. In factor analysis, have <50? (yes) Caution you. 50 is considered poor on factor analysis. Want 250, 300. Solutions converge as you add more things. It needs a lot, a much larger sample to get the instrument. Do have high Cronbach alpha, but that’s because you have so many items. Really try to increase, make it more parsimonious, less than 10 questions per factor. N<50 is insufficient.

Matt: Yes. This was a starting place, even with small N is enough to talk about. We are at a beta realm. Looking for more. Comments well taken, will heed them.

John Whitmer: A dependent variable – indexed to the accomplishments of institution, or …

Steve: We’ve heard LA is great, but 2d in to the conference, it’s complex, lots of work to implement it at scale. It’s those 5 factors.

Matt: The measure of dependent variable is in those 5 factors. It’s a reflective tool. The constructs around implementation are broad enough, but institution applies them in their context, rather than an arbitrary context.

JW: Think this is great. I would love to have this indexed to institutions that are accomplishing a lot. An important weighting of significance.

Steve: Working towards that. Institutions like you … So get more, other colleges have found … Put examples forward for folks. Contextualise it for institutions.

Bodong: Who should fill this survey?

Kim: We don’t know yet. We say, should have at least 10 people, in 10 different areas take it. The instrument isn’t robust enough for us to say what we need. We have some guidelines on people we should approach, e.g. data governance, IT, education dept. We’re at the beginning of this, building scaffolds.

Competency Map: Visualizing Student Learning to Promote Student Success. Jeff Grann, Deborah Bushway (Short Paper)

Jeff speaking. Capella University. Deb had conflicting meeting, Chief Learning Officer, executive sponsor of the work.

Context: Capella is fully online, in Minneapolis. For-profit, 35k students at a distance. 1st institutions online to be regionally accredited. Our recognition by Dept of Ed for direct assessment, a way students can proceed rather than seat time requirements, move as fast as they can learn. So around competency-based education.

Students – 75% female. 40y average age. Primarily graduate students, 25% undergrads. Mostly PT, gaining something in their career. Relevancy is a prime topic.

Career advancement – two populations. Employers, interested in filling positions with skills, talents, making sure they’ve had experience and want to see success. Universities, structured around academic programs, containing lots of courses, with activities on which students are graded. But how do these two relate to each other?

At Capella, middle layer – look at outcomes people’ll need in jobs in the future. Define the competencies you’d need, and criteria to measure those, based on faculty judgements (can they be reliable?). Then define that, that’s the hard work, straightforward to connect to the program offerings. So get assurance that we’re aligned to industry.

Competency-based education from Nat Post2 Educ Coop 2001. Four-layer model, with fixed traits at the bottom, demonstrations at the top. Assessment is working in the top bit of the pyramid. How do we measure those competencies.

Have a fully-embedded assessment model. All of these are aligned – as metadata – in course authoring environment. To make a course you have to do all of that mapping. Not had a big obvious effect on students.

Another is the Scoring Guide Tool, to gather data. This is the workhorse, to evaluate students’ work in courses. Criteria defined against competencies – nonperformance, basic, proficient, distinguished. Also feedback/comments. Autocalculates a suggested grade – can change it. Most find that very helpful. Can weight criteria too.

That tool generates lots of data for learners, university. Many reports internally and to accreditors. Have a gap – haven’t shown the data to learners in aggregate way. Combine with assessment data. Over 1m judgements made, not a lot of reporting.

Committee-based idea: many data displayed, with drill-downs and indicators that are not clear. Design time focus on learner display. Not same display for all; customise. Conceptual map for learners. Web design. [Slides too small to read for my week eyes.] Took it out to learners, focus groups. Many confused, misconceptions. So more simplification, display for just one course. Bar to indicate two things for each – status for how their scoring, and how many they’ve been marked on, to see tracking over time. Hard to do conceptually, and to bring the data together.

So planned a pilot in Q1 2013. Several courses. Used data to produce an email, descriptively told them the same info, automatically sent when tutor finished the grading. Students very invested in demonstrating each competency to finish the course – can finish early. Flexible schedule. 100% of the emails were opened – that’s an amazing open rate.

A final round of design, launched for graduate business programs. So shows course level progress, and how many you’ve been marked on, how many assignments out of how many. Then circles, in colour to indicate status, and amount of colouring is about how many they’ve been marked on so far. Then also access to previous courses, print button. And tutorial, FAQs, and – link to instructor.The circles pop up with particular details of judgements on each assignment, and which ones are coming up next.

Took to prospective learners – and they liked it on first impression. Typically, learner’s first reaction to competency-based education isn’t like this. Visual brings this home for students.

Usage stats – it’s voluntary. 12k learners accessed (out of ?35k); peaks at end and beginning of term. Saturdays are not good days.

Unprompted qual comments – pretty positive and what they’d like. Negative feedback is that can’t go back in time to see old one (data availability one).

Not much effect on ability. Those who used it had slightly more distinguished, less non-performance. Good news is it wasn’t really harmful to students. (laughter).

Re-registration rates for summer courses – those who used it more likely to enrol the following term. Multiple regression analysis – competency map usage in to their predictive model – significant effect even after adjusting for powerful covariate.

Get a good high-level vision, but be flexible about implementing it.


Q: Comment. Worked as web designer. Great case – simple as it looks, but can tell it’s highly refined. First good cut product, they didn’t realise it, then refined based on comments. Blows my mind. It’s good stuff. Anyone, take a look at those slides. That dashboard is brilliant.

Thank you.

Q2: From ?Ashford University, same profile. Echo previous congratulations. Sometimes you present LA to students, there’s concern about student trust of the data, and intrusion on privacy. Do you only summarise at personal level, or show cohort data?

In our first draft, expected curiosity about comparing to class, normative comparison expected to be main use case. But students only cared about meeting faculty expectations. Great news, easier to build and easier ethically for small courses. Some have two learners in. Instructor has access to every learner’s map for their courses; academic adviser has access to their advisees.

Session 8B: Learning Analytics and Learning Design Educational Data Sciences -– Framing Emergent Practices for Analytics of Learning, Organizations, and Systems. Philip Piety, Daniel Hickey, MJ Bishop (Full Paper)

Daniel presenting. thanks to MacArthur, Google, Indiana U for funding.

Realling in the middle of a big paradigm shift. Notion of four or five distinct communities dealing with educational data. Value if they interact more.  Some common features, and offer a framework for Educational Data Sciences.

What is a sociotechincal paradigm shift – more like Tapscott than Kuhn. Context, internal, emergent. First, digital tools create vast quantities of data. Qualitative shift from institution to individual, e.g. badges. 3, Expansion of academic knowledge – Common Core, there are unmeasured standards. They’re not skills, they’re dispositions. 4. Also disruption in traditional evidentiary practice. Who controls data, who gets access. Happening across four communities.

Educational data. Finance data 1980s-2000. Manufacturing similarly; retail 90s to 200s; healthcare form 90s. Education late to the game. Lot of pressure to make stuff happen really fast, coming from technical sector, Gates. Educational data is fundamentally different, needs to be handled differently.

[We are special snowflakes! No, really.]

Book – Phil Piety Assessing the Educational Data Movement. [Note to self: get hold of this.] Design science and learning science.

Landscape on two dimensions – level (age, basically),vs scale – individuals to systems.

Academic/institutional analytics – post secondary/organisations. Institutional analysis, early warning systems. Stodgy offices in every university cranking out data. Office of Institutional Research. Open Academic Analytics Initiative. Hathi Trust. Lots to learn from.

Systemic/instructional improvement – No Child Left Behind. Data-driven decision making. Should be called test-driven decision making. High stakes tests. K12 organisations/systems. A lot of people, every K12 institution has a data person. Some synergy there. Practice of data use is ahead of research. Only a handful of people in the room here.

Learning analytics/ed data mining. They are about as similar as one society – CSCL, JLearning Sciences. HE/continuing, individual/cohort level. He wonders how distinct these communities are.

Learner analytics/personalisation, all by the individual. Driven by Gates, Dept of Educ, Khan Academy.

Lots of work in lots of places. Early warning system, brings together several people.

There are common features across these four communities. Rapid change. Boundary issues. Disruption in evidentiary practices. Visualisation, interpretation, culture – dashboards, APIs. Ethics, privacy, governance

Four factors that make all educ data unique: Human/social creation; measurement imprecision (reliability issues); comparability challenges (validity=wicked problems); fragmentation, systems can’t talk to each other. Storm likely this Fall when every State tries to use short-term gains on high-stakes tests to assess teachers, will see recapitulisation of what happened in the 90s, seeing rapid move of teachers in rank. The existence of Sharepoint proves that systems don’t talk to each other.

Common principles to unite. Nod to Roy Pea. It’s interdisciplinary, draws on six areas, but all those draw from computer science and shapes all four of these.

“Computer science is the root of all [pause] Evil!” – audience jumping in.

Learning occurs over different timescales, helps resolve appropriate purposes.

Digital fluidity – same artefacts serve different purposes. Historically, institutions said need some data, use it for one purpose. But people are using them, the data, and figuring something else. Marist University using, cellphone records (!!), we know how much time students are spending in the Library – if >5h we know they’ll do better.

Any artifacts are adopted and adapted. Take NCLB. We know now, it was originated in Clinton White House, Democrat initiative. Bush, saw huge opportunity to break confidence in public schools by setting impossibly high standards, and it worked.

The data is a flashlight, shine light. Really, it’s a lens, and an imperfect one. Helps to get the right lens.

Summer of Learning, from badges work. Data from Chicago, huge insights to be had. Scaling up to 10 cities. Let’s think more broadly.


How many think it’d be nice if these met every two years?

(Almost nobody.)

Alyssa: I don’t know that I agree with the way you’ve categorised them. Compared to ICLS and CSCL. There’s a difference in what people are topically interested in. Differences we see in EDM and LA are in the approaches, ways they’re looking at problems. Comparable to educational technology, and learning sciences. Interested in similar things but different tools. Overlap, but conversations are not the same. I’d be hesitant to go to a 2y model, and lots of changes, so that’s too long.

I did blow through that part.

Art: One downside in trying to combine and grow bigger is conferences get so big you can’t find people you want to interact with. Once upon a time, a sociologist studied how big a community can be before you have a society. Estimate was 200 active participants before it works. (stops working?). ITS alternates with AIED. But each year there’s an event. Always ask, how are they different, have to scratch your head to find a reason.

More argument is about finding ways to find commonalities, synergies.

Stephanie: The LA world, other people are here who don’t come from CS – unless claims physics, biology, etc. Lots of really interesting people coming out of the disciplines, especially ones who’ve dealt with data for a long time.

If I’ve made people made, Phil wrote the first draft of the paper.

Phil Winne: Data often unreliable, in a big way. But all data is like that.

Does that truly characterise it all? In finance, there’s not much slop there. Educational data, yeah, it’s about the messiest data there is. No, that’s not true.

Phil: If all educ data has reliability issues, implication of working with EDM, or LA?

Should be careful. Nobody is saying let’s stop analysing that data. The train isn’t going to stop. We need to be careful, thoughtful.

Q: Data are not to blame, it’s what we expect from it. We imbue too much meaning in to the data. Asked for data, but they don’t want data, they want an answer to the question. Big steps between them.

Q2: Data so dirty, teachers are said to be teaching stuff they’re not, students in classes they’re not. Right now it’s so poorly collected.

Hendrik: From data competition. It’s not just panel discussion that shows the difference. EDM focused on model that predicts how successful they’ll be. Variety of topics, the topics increased a lot, divergence in the domains. LAK is more a hub to reach to educational people, EDM is more tech, data-mining driven. They can both exist.

I feel safe in arguing EDM and LA have more in common than there is with the other communities.

H: Yes, I agree.

Designing Pedagogical Interventions to Support Student Use of Learning Analytics. Alyssa Wise (Full Paper)

Same issue: how do we make information actionable, working with students.

From Simon Fraser University.

What do you picture when I say ‘learning analytics’? Some images – data; some about scripts and calculation; some visualisations. I think about the teacher, late at night (photo of a bearded guy with beanie on his head). What is going to affect things next day. And students, figuring out what it means, how are they doing, what changes do they need to make.

If LA to truly make a difference, have to design for impact on larger activity patterns of teachers and students. Links to Nancy’s talk on disruption, and what we’re disrupting. We could disrupt it for the positive.

If we don’t, many technologies never made a big impact. Pile of junk in the corner. [Nice, clean graphical slides - whole-screen photo with no overlay.]

Focus on day-to-day use of learning analytics. Because – best thing is to put data in the hands of students (shoutout to my talk yesterday). They are the most enabled to make immediate, local changes. Chance to activate metacognition. Empowerment not enslavement. Brings up democratisation of access to data. Cost of doing interpretation, student with data is only chance of one-to-one analysis.

Challenges and opportunities. Need to know the purpose, the learning design, to understand it. But students don’t necessarily understand that – and that’s critical for interpretation. But sharing why we’re doing what we’re doing with be good. More likely to engage in the ways we’re hoping. Not just why engage, but what does good engagement look like.  Also transparency, rigidity of interpretation. Danger to optimise to what we measure.

Moving from learning analytics, to learning analytics interventions. Not how to calculate and produce the visualisations, but how we support people using them. We won’t understand if what we’re producing is good until people use it.

Many locally-contextualised questions: When to consult; Why; What do they mean, and what should they do; How does it all fit together with everything else.

The goal is an initial model for designing pedagogical interventions to support student use of LA. Two foundations, 3 processes, 4 principles.

Focus on Learning analytics, and learning activities – as an integrated unit. Not just reflecting afterwards. Integration. Use of analytics should be part of the design. Provides local context for sense-making.

2 conceptual questions – what metrics, and what do productive/unproductive patterns look like. Better to look ahead with this, rather than afterwards look at what data you have.

Practical questions – link learning goals, actions, analytics and make that clear.

Grounding – making this happen in a classroom. Example from e-Listening Research Project. Tracking data in online forum. What they do when they post, and how they attend to peers’ posts. Take that data and ground it in what’s important about discussions.

Set out clear guidelines and discussion about the purpose of online discussion. And about instructor’s expectations for productive process of engaging online. And then how analytic provide indicators of those processes. So exposed to ideas of others; attend to range of others’ ideas; percent of posts read. Takes junk scan data out (drop data for just open post and move on).

Guidelines for students. One is broad listening, show them analytics. Forum interface is interesting. Also % posts read; better to view a portion in depth than everything superficially.

Agency. The creepy test. Are we being honest? Would I feel creepy if someone did that to me? Working with them, not done to them. To avoid that, work with student agency. Establish personal goals, maybe microgoals or not adopt big ones. Have some authority in interpretation, provide human context, and decide actions.

Goal setting/reflection is part of analytics process too. Want individual goal-setting, no one way of success. Guidelines are nuanced, starting point for thinking what to do. Goal setting as explicit part of the activity. Online reflection journal in the LMS.

Then reflection – data informed. Have reference points to think what the data means. Personal goal, class one. Structured in. If the analytics are always there, consult too much, or ignore. So timeline is important. Weekly here. Assess goals themselves too.

Two other points – dialogue, and reference frame. The reference frame is what does it mean wrt theory, peers (easy and available, but not always the best), and vs individual goals. Dialogue – big problem for scalability, but important. Space for negotiation around interpretation. Takes us time to figure it out.

[battery ran out on laptop, lost slides]

Dialogue powerful, the interpretation isn’t unproblematic. Can have powerful dialogue. One teacher to many students isn’t scalable, but can do student:student – and that helps the agency too. How do we bring it in as part of the learning process.

[battery backup, back on very quickly]

Can get dialogic comments that illuminate the data – e.g. had to renew visa so was out of town, hence lot metrics.

Embed it was part of learning practices for students. Think about it now, not making good tools and apply it later. It’ll help us build better tools.


Simon: This should become a reference point for work going forward. Clearly priority on real time feedback to the students, to monitor themselves. Ethical issues, students feeling exposed by the visualisation? Or not providing ones you feel run the risk of that?

That’s implementation-specific. With the combination of the agency and all that, students didn’t have issues. But it wasn’t all the time, they got it on a weekly basis. First day of a month-long forum isn’t indicative. Work on the time window gives different results. Worried about real time, so had it go, here, after a time period. Didn’t have issues of exposure but that may be elsewhere.

Phil Winne: Bring analytics in to student practice should be at the centre, helps them improve. Think about what data can do, people have biased memories, they remember recent, not middle, etc. Psychology of memory, might shape types of data. When they have traces of things they’ve done, activities. (I mangled this, it was clearer when he said it!)

In Phil’s project, they review data, and set goals. Also project at Michigan, students look at activities and assess how they are likely to do, take action. Think before, and after, and become active processors.

Q: More about the visualisations and differences in interpretations. You gave an example, need for dialogue. What lines of thinking have come up?

You can have lots of different ones. The picture here is the interface. But we ask students to think about this in two perspectives. Available all the time. Think how are they doing. Their posts are lighter colour, can see if they are in one place or all over. Also take responsibility for the discussion as a whole, not just teacher.

Q: Different comparisons, peer comparisons used because easiest. Speaking to what students are seeing, what metrics.

The students who weren’t doing as well, some already know, but all felt badly and didn’t know how to change it. Understand how they’re not doing as well.

Q: Comparing to their own performance.

Yes. Everyone can’t be above average.

Daniel: Continue debate with Phil. Alyssa is a graduate of program I’m the chair of. I’m in to communally-regulated learning. I want them to think, talk about the visualisations. How much discourse about the learning is taking place?

One of the principles are dialogue. They’re both important. Don’t want talk without thinking. Need to think about individual, group and link. Helps see data in different ways. We’re asking students to have more and more, have dialogue, and dialogue about the dialogue. But it’s not either/or.

A Cognitive Processing Framework for Learning Analytics. Andrew Gibson, Kirsty Kitto, Jill Willis (Short Paper)

Kirsty talking. Emphasize the nature of co-author. Andrew is PhD student, comp sci. My background is physics. Jill is in Faculty of Ed. Trying to go to the educational practitioners and find out what they want to know.

New to LA. A lot of dilemmas – aim to understand learners, but focuses on students poised to fail. Metrics to judge learning, but are they good proxies for learning? Organisational analytics easy (!) to implement, but not aimed at individual learner. Learner focussed ‘the holy grail’ – but hard/disruptive to implement.

For me, looking at LA as quality information that’s going to improve learning outcomes. Lots of educational theories about that. How can we improve our understanding of learning? Enjoying these presentations.

Trying to start from a dumb but established theory – Bloom’s taxonomy – and use that to inform data capture. Cognitive OPeration framework fro Analytics (COPA). At many levels – outcomes, assessment items, learner completion of assessment tasks. Can do in a unified way?

Bloom’s Revised Taxonomy – set of verbs – Create, Evaluate, Analyse, Apply, Understand, Remember. And subcategory verbs. Lots of verbs!

New API, xAPI, has Subject Verb Object statements – so could have a direct map to Bloom with a suitable ontology.

Looking at one course, new. Australian sector changing. Australian Qualifications Framework (AQF) – all new courses have to be compliant. Set of learner competencies, guaranteed at each level of learning. Each unit has Course Learning Outcomes (CLOs), aggregated CLOs meet specific course outcomes at that level. So massive curriculum redesign effort to comply with this by 2015. Opportunity here – if get data capture within it, can go a long way.

This is a unit, compulsory science unit on science in context. Holistic introduction, where science is in social context. But for some reason that equates with learning skills for most academics – about writing, giving talks. So tensions. Had a lot of arguments about teaching skills or ideas.

The CLOs, huge documentation about where it fits in to the AQF structure. Short bits of text – which include verbs. So counted the verbs – e.g. how many times ‘understanding’ occurs. Map to Bloom. The CLOs are meant to map to the assessment tasks too. So can do the same thing there too. This unit has 3 major assessment tasks, aimed at different CLOs.

Once you do that, we find there’s a disconnect between course and the assessment – assessment has 20%, but course 4% for Creating. Evaluating is 33% for course, but 11% for assessment. Not even consistent in the documentation! Problem, at least from the QA level. We need to get more coherent, so that’s useful.

Nice thing here. This was small scale, not automated. Bloom is well respected educationally. Data easy to understand and communicate. It’s the kind of analytics that educational professionals (are likely to) want. Can map consistently across domains – CLOs, assessment, individual learners, both formatively and summatively. Metacognitive aid.

Future plans – our documentation is coming out of our eyeballs. When we mark, have descriptors of what they’ve done, mapping that to the learner is possible. Could do it automatically for all units. Other cognitive models? This is just tagging, but could do e.g. distributional semantics to extract theme verbs rather than straight tagging. Really want to give the data back to the learner.


Q: Had conversations at NC State who wish for this, using Bloom. Lots of educators do want to use it, for their objectives.

Really surprised people hadn’t done it. Thought wow, this is really dumb.

Q: Straightforward, but actionable.

Hendrik: European qualification framework, have that already 5 years. Mandatory all courses should apply it. Also PhD thesis applying LA to map back to the EQF things, created a dashboard to show teachers. Recommend it. Did not apply Bloom.

Sounds useful.

Simon: Obvious next step, interesting to know what the course designer makes of it.

I can tell you, I’m the course designer for this unit, which is why I had access. It meant, last year, we were teaching a ‘workshop on writing’ – students hated that. So I could redesign it, very much in development. Now have more correlation. We only just started writing the unit this week (?). I know what I was thinking, did I put that in to documentation. I can do that! Haven’t done it yet.

Simon: A next step would be to try it with other people. Worry with any approach, it’s not a proxy for what it looks like. Take it to an educator who says, I’m just using different language. Simple is good. Seems promising approach.

The AQF has simple documentation. They are using those verbs, because they have to in the way the reporting is set up. At the assessment mapping, that’s more plausible. Would like to look at more distributional semantics approach there.

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

LAK14 Thu am (7): Keynote, MOOCs, at-risk students

Dr Doug Clow's blog - Thu, 27/03/2014 - 15:59

Liveblog from the second full day LAK14 - Thursday morning session.

Stephanie welcomes everyone to the second day. Introduces Mike Sharkey, founder of Blue Canary, a sponsor. Former director of academic analytics at the University of Phoenix. At LAK12 he and his partner talked about founding a company, and now has Blue Canary.

Mike was talking to George at another LAK, focus on research and theory side, but what about application and implementation side. Was very focused on implementation side, saw a gap in people that had the ability and resources to implement the exciting ideas. Started software-focused company a year ago, going to institutions and help them help their students out. Happy to be a sponsor here, first major sponsorship from their company.

Here to introduce Nancy Law.

Keynote: Nancy Law: Is Learning Analytics a Disruptive Innovation?

Thanks organisers for the invitation. Unsure whether she wants the keynote to be disruptive or now. So what’s the answer? The problem is, I don’t have an answer. If it is to be disruptive, not clear whether it’s good or not. Only have questions, not answers.

What’s disruptive? From Clayton Christensen’s book (Innovator’s Dilemma) – two kinds of innovation: sustaining and disruptive. Sustaining are iterative development. Disruptive are those that may not look like an improvement. They tend to serve other purposes. They may lead to some drastic overthrows of existing things. Good examples: digital cameras. Asks who used a digital camera 10y ago; weren’t so good. [I think she's 10y off here - I have a 10 year-old digital camera in my bag right now, because it's better than my smartphone camera. *20y* ago they were rubbish.] Who uses film-based cameras? None. Those of you who take photos, last 200. How many most taken using a dedicated camera? Not many. Most use cameras. Who invented the first digital camera? Kodak, but they’re nowhere now. Should have had first mover advantage. Why are they not there now? They are real camera people, thought digital camera was too poor.

Two senses of disruptive: disruptive of beliefs, routines, practices, relationships; innovations at the early stages are often clumsy and not as refined as established approaches. Disruptive innovations are potentially transformative.

Whole concept of taking images – why use phone? Because can share it right away. Now not just for ourselves, but to share. Change fundamentally how we do things.

Is LA a disruptive innovation? What’s it competing with? Who are the target clientele – learners, teachers, leaders, learning scientists? How hae they been involved? What are the conditions for LA to be transformative.

Key challenge to disruptive innovations – these are foreign species. Several possibilities. Endangered species, or invasive species. What’s the likelihood for a foreign species to be invasive? Quite low. But they can be disastrous.

Skeptics views – technology cannot transform formal school education. Oversold & Underused, Larry Cuban. Rethinking Education in the Age of Technology, Allan Collins and Richard Halverson. How far learning technologies have taken root in US schools? Also looking ahead. Were saying technology. The number of gadgets we have around ourselves, and how schools are using them, they haven’t taken much root. Many children have smartphones, tablets, laptops. But usually, the teachers say no, you cannot use them in my class.

Innovations – “Adding wings to caterpillars does not create butterflies” – Stephanie Marshall (1996), you need transformation.

Systematic model of change: Initiate, innovate, implement, institutionalise, scale up by adoption of refined model. No longer works! Example of Nuffield physics approach, implemented, whole package, slowly implemented. Worked well then. Now things are changing too fast, especially in TEL. Can’t do any prototyping.

Simple analogy – ecological model. Educational systems are complex. Gardening analogy. She’s an urban gardener. The crop plants we depend on do not survive well with climate change and increased pollution. Find a new variety of the plant, is much more hardy in hotter temperatures, harsher conditions. Unfortunately, the flowers bloom very early, when most plants have not sprouted theirs. Blooming early, butterflies that used to pollinate that plant are not around that early. So even if we plant a whole plot with the new seeds, won’t have many fruits. Worse than the older variety. So can’t just look for what is the best solution, it’s only good (in context). Have to change the environment, change the other parts.

How can ICT-supported transformation become an education epidemic (David Hargreaves, 2003, Education Epidemic book). Why so difficult to cause transformation? Since the 90s, so many education reform movements. Are they successful? Hong Kong started major curriculum reform in 2000. If you ask people what they think about the reform, will get mixed feedback. Why haven’t they become like, for example, we might not like an epidemic. But nobody can forget the SARS in 2003. But in fact SARS is a good case in point – where is it now? We have it under close surveillance. Because it was so invasive, we tried to keep it down. Normal flu is not as threatening, so is more successful.

Back to digital cameras. Disruptive innovation. Why has it been successful? As a gadget, it was no competition to the traditional cameras. Why successful? (Several audience answers.) My sense is, the first people who were real users were not people like me who just want to take a good photo, they’re people like journalists (?!) they don’t need high resolution images (?!) they just want it quick. Resolution doesn’t matter too much on print or TV. Now journalists have to work as a single person, being reporter, writing up, transmitting it back to the HQ. It’s actually serving a different purpose. Not just by itself. It was successful because there was a need for low resolution pictures, that can be transmitted very quickly. If the invention were not at a time when the Internet was around. Access to Internet becoming popular, social media – without that, it wouldn’t prosper.

Who wants to bring cameras with you all the time? But we have the phone. There is no need for a camera. Now when there’s a sign, no photography, camera image with a cross. I think how can you prevent people taking photos? I may be just talking on the phone. The success of the digital camera is in the fact that it basically disappears.

Appropriation by different technologies/appliances. Integrated/morphed into other devices. In empowering, connecting and democratizing. I hate the complicated cameras, large, you have to know the stop number. But now even young children can take good photos. It’s digital so you can easily send, it’s connecting people, and democratising. We can do our own reporting.

The personal computer. Was it a disruptive innovation? We don’t think of our cellphone as a computer, but it’s more powerful than computers we had a few years ago. It has been successful because it’s empowering, connecting, democratising.

Is LA sustaining or disruptive? Will it achieve transformative, sustainable impacts at scale? Look at TEL for ideas.

International comparative project: IEA (Intl Assoc for the Evaluation of Educational Achievement), Second Information Technology in Education Study: Module 2 (SITES: M2). 200-2011. Studying innovative pedagogical practices using technology. Trying to think about how we assess students’ ability to use technology, but we don’t have the assessment technology to be confident about what we’re trying to measure. Asked about innovations, have they been sustained, spread. All the cases from Asia, sustainability and scalability was much lower. European cases was much more sustainable and scalable. Cross-country comparison HK vs Finland. Connectedness was important.

Another study, Pedagogy and ICT Use. Overall condition, not much change. Even when using the technology, often for traditional means. Book, Educational Innovations Beyond Technology, Nancy Law et al. Looked for principles for sustainable innovation. Five principles:

  • Diversity
  • Connectivity
  • Interdependence
  • Self-organisation – mechanisms
  • Emergence

Much more grassroots innovation.

Another example, ICT-enabled innovation for learning in Europe and Asia. Problem: what are the conditions for ICT-enabled innovation for learning to be sustained and scaled.  Three European cases, four Asian. eTwinning case – teacher network in Europe, encourage higher awareness about other cultures and languages. Not pedagogically motivated, but secure platform to connect every school, classroom in Europe.

The more successful cases have multiple pathways to innovate and scale, but clear vision. Ecological diversity fosters scalability. Some dimensions inderdependent and mutually constraining, have to be accommodated. Evolutionary, and scale is dynamic. Leadership for strategic alignment. Multilevel, system-wide, strategic partnership.

Will LA achieve transformative impact at scale, sustainably?

Design-based research working with teachers. Case study, teacher in secondary school. CSCL. Using analytics, simple measures.

Diana Laurillard project on learning design. Shared learning design by 5 teachers. What constitutes evidence of learning outcomes?

Learning analytics, the way we’re doing it is connected to our own learning design. They’re not portable at all. [Eh? I think some are.] Using the camera example, make them more appropriatable, integrated, empowering.

LA should be more closely aligned, to become a part of the ecology to make LT as an invasive species, make it really spread.

I’m not sure this is the sort of effect we want learning analytics to have on educational systems.

MOOCs, cMOOCs and xMOOCs. Take a complex system model of learning.

Change at level of brain, individual, institution, group/community, education, society level. Change is learning. Pace slower (ms to hours to centuries), ways to study different (neuroscience to organisational learning to anthropology [?]).


Charles/Giles?: Push to teach teachers to code, programming – does that provide some connectivity?

Do you mean that will help them to understand analytics more.

C/G: So they’ll be able to teach their students to code. Could there be a virtuous cycle? The British Government wants to train 16,000 teachers, they’ve brought basic programming in to their curriculum as a requirement. has 35k teachers to push in to learning to code.

I’m not sure it’ll be empowering.

Hendrik: Elaborate why we need quality indicators for analytics, to make it more comparable, for sustaining innovation? Quality indicators, on the different levels. What kind of indicators do you have in mind, from your studies?

Question I have is not so much specific LA indicators are the most important. But that the teachers, K12 teachers, in Hong Kong. Most of them are responsible. But they tend to think about activity design. It’s not learning design, which is different. When you ask them, what do you want the students to learn? They don’t have a very specific cognitive or metacognitive outcome. When you ask, what do you want them to achieve? That’s something new already. So then if you say, I want them to communicate, collaborate. What count as indicators for learning outcomes for those? That needs to be developed. The challenge for us in LA as a community, for it to make impact, we don’t have – the panel on sustainability raised this – there’s a lack of language, conceptual understanding about what counts as assessment, outcomes, indicators. We need conversations about that. Are there ways where LA can have a few, more simple analytics that people can talk around. SNA, more a tool for researchers – is it meaningful for teachers?

5B. MOOCs (Grand 3) Visualizing patterns of student engagement and performance in MOOCs. Carleton Coffrin, Linda Corrin, Paula de Barba, Gregor Kennedy (Full Paper)

Linda talking, from Centre for Study of HE at Melbourne. Carleton is the computer scientist who did the visualisations. Paula is a PhD student studying ed psych and did all the stats. Gregor is the Director.

MOOC research – to develop a greater understanding of students’ online learning experiences, processes, and outcomes. Context: lots of big reports from initial adopters of MOOCs – Edinburgh, Stanford. Looking at what we can do with LA and MOOCs, and what the scale of the data can tell us.

Very exploratory research. Having a look to see – didn’t have defined RQs. Looking for patterns and trends. But two main purposes: 1. More refined LA techniques for MOOC data. 2. Visualisations that’re meaningful to end users.

2012, U Melbourne signed up with Coursera. In 2013, deliver 7 MOOCs. This looks at the first two – for convenience because data available. But also significant.

First one – Discrete Optimisation. CS. Solving problems in more efficient ways. Other – Principles of Macroeconomics, introductory style.

Why these two? Differ in structure. Prerequisite knowledge – can’t mandate – but Macro had few, but Disc Opt required strong comp sci background to participate fully. 8w/9w long. Curriculum design different too. Macro is very linear. Disc Opt, released all materials and assessment at the beginning. Up to student to decide order and pace. Only recorded marks at the end. Macro – 8 quizzes, 3 summative, peer review essay. Disc Opt – prelim assignment, 5 primary, 1 extra-credit. Disc Opt also had unlimited attempts at the assessment – some, being CS, wrote code to change and resubmit their assessment One student had 8000 tries! That’s surely worth marks in itself. Also a leaderboard.

Completion rates – 4.2%, 3.5%, from 50k ish started down to 1,000 ish completed. Macro is xMOOC, Disc Opt almost a cMOOC [I don't think so!]

Data analysis approach, used by Kennedy and Judd (2004) – audit trail data, start high, use those observations to inform iterative analysis, refining. Look at clustering in to subpopulations.

RQs: How can we use analytics to classify student types? [And another one]

Participation fell off over time, performance is a steeply-falling-off graph – most 0, small numbers got lots. More students view videos than complete assessments. Constant decline of participation. Of students attempting assignment, most get very few marks. Despite difference in curriculum structures, they looked similar for both courses – for both participation by time, and for distribution of marks.

Enrollment/completion stuff – about 4%. Lots of people didn’t turn up in the first place, or didn’t participate, or didn’t do the assessment. So picked active participants – not just watching videos but submitting assignments. Ups completion rate to 18% for Macro, 13% for Disc Opt.

Student performance diagram, bar chart, size of bar makes a big difference. Changed to cumulative distribution instead. Saw a marked decline after two weeks. Hypothesis: marks in 1st 2 weeks are good predictor of final grade. Linear regression – Disc Opt R^2 52.7%, p <0.001; Macro too R2 20.6%. But Macro only has first summative assessment in week 3. If include 3rd week for Macro, R2 up to 51.5%.

Students most likely to succeed were those who performed well in the first two assessments – let’s look at > 60th percentile. Called that the qualified group. Assumption is if you did that well, assumption is you probably already knew some of this stuff. Add in qualified group completions, now 42.1% and 27.4%. They’re the group getting through better than the rest, vs auditing, or active but no prerequisite.

Subgroups – auditors, active, qualified. Look at participation by subgroups. Relative proportion wise, qualified group is consistent within and between the MOOCs; auditors came and went. The active students also declined significantly.

So? Student activity and success in first couple of weeks signif assoc w outcomes. Prior knowledge. Value of informal diagnostic tests at the start. Could be used to support adaptive work.

Do they take advantage of the open model in Disc Opt course?

State Transition Diagrams. Entry point, circle area is number of students in a state, line thickness is number of transitions.

Student video views diagrams – really cool. [Note to self: see if you can draw these easily in R, they're awesome.] Qualified students do a lot more moving around than the non-qualified ones. Diagrams have thicker lines, and more lines, when you compare qualified vs non-qualified ones. And the Disc Opt qualified group are the thickest, most dynamic.

Student assessment – for Macro, non-qualified students skip the formative ones, but the qualified groups don’t. In Disc Opt, lots more going back among the qualified groups.

Repeated viewing could indicate it needs more clarification, support. Or that the sequencing is wrong.

Future plans: Test subgroup classifications across a range of MOOCs. More detailed analysis of transitions students make from active to auditors. Link state transition diagrams with performance data – patterns that indicate success. And look at motivation.


Stephanie: Completion – received certificate? People who failed – took every test but not over the threshold to earn the certificate? Maybe more in the macro course? Less failure for those active in assessment.

Can’t answer the second, don’t know the difference. In terms of overall completion, only count students who passed, who were eligible for cert.

Q: So what question. At NorthWestern Univ, evaluating MOOCs. Are we trying to improve MOOCs? Tease out learning for other contexts?

Trying to improve MOOCs. What in Disc Opt running again? Used this data to make tweaks, talk to students about the structure. Are trying to improve. But identifying likely successes early, give instructors chance to make changes while course is still live. Target students with interventions.

Q: MOOCs important part of the future?

Yes, at Melbourne. But only small part of our online learning, so hope to inform other initiatives.

Bodong: Enjoyed the visualisation. Troubled by classification, especially qualified learners. Maybe the dropouts have even more advanced knowledge?

At a basic level, only data from MOOCs. Stanford research looked at bringing in data from surveys, will look at in the future. Need more research in validity, relation to motivation. This is basic at the moment. Need to really test the quality.

Small to Big Before Massive: Scaling up Participatory Learning Analytics. Daniel Hickey, Tara Kelly, Xinyi Shen (Short Paper)

Daniel from Indiana, Center for Research on Learning and Technology.

Was a full paper but accepted as short. Grant from Google to support massive participation. How scaled up to a BOOC – Big Open Online Course. Cut out how to automate the features to teach again in the summer. Tara doc student, Xinyi intern.

2013 year of the anti-xMOOC. Disaster in Georgia. Said, that’s not going to happen here. Aim to have them mention BOOC. Now, DOCC, Distributed Open Collaborative Course, cMOOCs, mentors, crowdsourcing. Project based courses are very intensive. People who have all the expertise aren’t going to do it. Not going to work when there’s a large body of disciplinary knowledge, a textbook we want people to know.

This is fostering large engagement when there’s a large body. We don’t want them to reconstruct validity, we want them to enact.

Using participatory assessment design research. Five general principles – Let contexts give meaning to knowledge tools, reward disciplinary engagement, and others.

Thirteen features. 3000 loc on top of 5000loc in Google CourseBuilder.

Situated theorist. Psychometrician. Really about contextualising disciplinary knowledge. When you register, have to say your discipline and role. 460 registrants, put them in to 17 networking groups. Also required to define a personal learning context, a curricular aim. Telegraphs that learning here is different, contextualised. Scared off some who weren’t serious about taking the course. Also part of the first assignment. Each week, assignment, first thing is restate their aim.

Graph of number of words changed in context and aim across units – but jump at one point when they come across concepts that matter to them.

Emergent groups, made it possible to – people added to their name to project their identity into this space.

Main feature – public course artifact, ‘Wikifolio’. Everything is public. All discussion not in forums but in wiki. Refined version of Sakai wiki. It gets a lot of writing – about 1500 words per week, dense, contextualised disciplinary stuff. People enrolled for credit are writing most.

Another feature – rank relative relevance, of the concepts you want them to learn. Evaluating instruction, more relevant to use classroom assessments, or evidence from accountability tests. Most states are going to use high-stakes tests to evaluate teachers in the Fall. Good way of engaging. One weak student example, still got it.

Feature 6 – access personalised content. External resources, rank them for relevance. Now you know a bit, why not find some external resources and share them with classmates.

All these features refined in an intimate context, and then automated to run with a big group.

Promote disciplinary engagement – Randi Engle’s idea. So feature 7 – peer commenting and instruction. Ask to post a question each week, and comment. Got about 3 comments each, with reasonable number of words each. 1/3 students had no interaction (2 universities in Saudi Arabia). 2/3 engaging. About 25% didn’t comment each week. Hand-coding of comments and how related. But all interaction was quite disciplinary.

I hate peer assessment. It never works. Peer endorsement – click here if it’s complete, that works. Peer promotion – can promote one and only one as exemplary. Nobody responded to one student’s question, but two people promoted him. Can see who promoted someone, and why.

160 completed the first assignment, 460 to 160 to 60. Proportion of people who promoted each week, was quite high. See exactly the opposite of MOOC pattern of abandoning the forums.

Feature 10 – public individualised feedback. Early posters tend to be the best, give lots of comments on e.g. Wednesday, ahead of Sunday deadline.

Feature 11 – aggregate how people did. Much wisdom and knowledge there – people in each group that chose options as most relevant.

Assessment important – Appropriate accountability (12), 3h complete, randomly sampled.

13 – Digital badges. These were awesome, have a blog post about it. Badge for each 3 sections, so long as it was endorsed as complete, and you took the exam. And good an ‘assessment expert’. Badge contains a claim, and evidence. You can click through to see the work they did. Sharing badges.

Conclusions – don’t go straight to massive, scale up incrementally. Design-based research. Learn defines context of use. Contextual and consequential knowledge. Embrace situative approaches to assessment.

Success, activity and drop-outs in MOOCs. An explorative study on the UNED COMA courses. Jose Luis Santos, Joris Klerkx, Erik Duval, David Gago, Luis Rodriguez (Short Paper)

José Luis talking.

We call success completion. Success in MOOCs is very relative, people who don’t follow it because they want certification. So take it as a relative concept.

Collaboration between KU Leuven and UNED COMA. Exploratory study. We are mainly a group that works on learning dashboards. Studying what are the human-computer interactions in this context. MOOCs are cool, so think about building a dashboard, and dig in to the data to see what we can build. So what we found in our case studies.

Platform is UNED’s MOOC platform, Cursos Online Masivos y Abiertos. OpenMooc. Two courses – German and English. Courses with high registration. Similar – language learning. Some differences. German for beginners. English was for advanced students – professional English. So need background. German 6w, English 12w. Both videos (YouTube) and questionnaires. German had peer review, but English assessed only with questionnaires. German 34k start, 3% complete (1000), English 23k, 6% complete (1500)

Don’t expect an easy path to reach your goal. Messy data. Wanted to see if they were attracting enough information to perform studies. It’s only 3 RQs:

  1. How does activity evolve?
  2. Are all learning activities relevant? Inc forum.
  3. Does use of target language in forum influence outcomes?

RQ1  How does it evolve?

Looked at humber of access to questionnaires, and activities. German has monotonic decline in questionnaires, English more jumpy. (Possibly because of structure of course?) Already drop students who didn’t start. By week 4, 75% gone. Also saw that the first activity of every unit gets more access than the last activity of the previous unit.

Activity – participation over time. Also, if focus on the people who reach the end of the MOOC, what’s the completion? 40% pass. Like some 1st year bachelor courses.

RQ2 Are all activities relevant?

120/105 activities. Binned in 10s, for each bin, % who passed course. [Trouble with slides - reinforces my belief that you should never have animations in your slides] If you miss just 10 activities, drop 25% in completion (I think.)

Compare with activity in the fora. Posting activity – peak of pass rate at 30 posts, but drops if you posted lots more than that (on German course).

What were the most active/useful threads about?

RQ3 Use of target language

UClassify – – Probability of every student using the target language TL. Then compare pass or not using TL. Doesn’t look like a big difference on German – but very few using German, it was a very beginner’s course. In English, 10% with higher probability used English >90%. Inflects around 60% probability – more than that, more successes; below, less success. Quite low English use. Only 13% passed the course. But no real clear finding here.

It was an exploratory study. It was a success, largely, but room to improve.

Recommendations: pay attention to the first activities. Increase the usefulness of the fora.Keep an eye in the use of target language.

Job pitch at the end! He’s finishing his PhD, interested in new opportunities. Got a round of applause from the audience.


Daniel: Discussion forums and MOOCs. Evidence says they suck. No, it’s not working! Anyone found them helpful?

John Whitmer: Yes, we found good evidence. Forum was kinda icing on the cake, but looked at levels of engagement. The people who do tests are in the forums. Don’t know causal link. But more active on forum is correlated with tests.

Jose: They were publishing the solutions of the questions in English. In German, used forum for sharing resources for learning German. They are useful because it’s necessary, cannot push back. Teaching in another online university, sometimes my students complain I don’t know how to engage them in the conversation. Can’t blame the system.

Q: Do you think forums in a MOOC should be redesigned? I hear, the charts of participation drop off. A student posts, another post post post, they are (overwhelmed). What if the first 20 people who get there block that forum off, they come back to that chunk. Then attrition factor. If you could create smallness in the strategy.

Yes, I agree. We need to redesign them, make easier to avoid information overload. Lots of threads, messages, have to spend a lot of time to read.

Ulrich: Missed one detail. Forum participation related to success. What was exactly success? Present yourself somewhere, or not present yourself for an exam? I read in your paper, the f2f exam and online exam. Not clear what not passing the exam means in the interpretation of your contingencies there, between e.g. forum participation and the ultimate success.

Completion means only passing the online exam.

U: Explicitly, or based on your activities?

Awarding with a badge, everything is automatic.

U: You don’t have to apply for it?

No. Afterward, can apply for the official certification, you have to go there f2f, additional step if you want to pay.

U: Your criterion was online?


Daniel: I ran out of time. Anyone working on EdX, interested in digital badges, I want to talk to you. Question for audience, does anyone have a forum where participation increases over time? We see over and over it gets so messy, brilliant study from MIT tech review, they just explode and become intractable, almost inevitably.

John someone: We’ve used a participation portfolio instead of forum. Rubric of a quality post, ask students to do Q&A, they submit a self-graded portfolio of their best work. They do hunting and finding of quality, not the instructor. I’ll tweet out the URL. The participation has shot through the roof. Regular courses in the LMS, online course.

John Whitmer: No activity in a MOOC increases over time, they all decrease. The percentage, enrolment, the interesting question is relative to prior activity for an individual student. Need to relativise to that. Look at what leads to higher levels of engagement. Entropy! In MOOCs without that incentive structure.

Daniel: I haven’t see it. My data is only one where people participating in an assignment where the activity level goes up.

John W: It could be a rounding error on other enrolments, it’s small compared to MOOC. I’ve not seen it.

Chris Brooks: Beginning, heavily used for socialisation – hey I’m from Germany, etc. We have to think about what we mean by engagement – blathering, or making deep critical comments. That’s important. Inside MOOCs, forum is one piece. Some Michigan MOOCs, students self organise on Facebook.

Daniel: Stephanie in her video said, this is where it gets ugly.

Q2: Twitter conversation, we know a lot about supporting good online discussions, take Chris’ point, what are we expecting. Many different models. Heartened to see your work, Dan, where they choose affiliations and you use that to group. A few thousand people in a meaningful conversation, that doesn’t make sense. It’s about giving purpose for discussion, finding the right thing to help them discuss. Yeah, we can say we haven’t see it.

Daniel: I do have a forum in my class, but for asking practical questions.

6B. Learning Analytics for “at risk” students (Grand 3) Engagement vs Performance: Using Electronic Portfolios to Predict First Semester Engineering Student Retention. Everaldo Aguiar, Nitesh Chawla, Jay Brockman, George Alex Ambrose, Victoria Goodrich (Full Paper, Best paper award candidate)

Everaldo talking.  Univ Notre Dame Coll of Eng.

Motivation – lots of study on student retention, attrition. Nearly half of students who dropout do so in first year. Early identification of at risk students crucial but not simple. What data needed? Low academic performance equal or not to at risk?

Eng class – 450-500 income, 18-22, 75% male. First year of studies program initially, intended major at start, declared at end of 1st year. Standard curriculum, including Intro to Engineering.

STEM Interest and Proficiency Framework – x/y chart, interest y axis, performance x axis. Interest displayed in terms of their ?study choices. Four possibly quadrants. 1. high interest low proficiency, need academic support. 2. High interest, high proficiency, optimal. 3. Low interest, low proficiency, need most attention. 4. High proficiency, low interest. I fell in to that place, and was given support to develop excitement about being an engineer. Need to develop that engagement. Aim to move them all to quadrant 2.

Overlay stayers vs dropout – high proficiency tend to stay, interest has little effect. Low/low has most of the dropouts; still some stayers in high interest low proficiency. Can do predictions on this – but should we focus on ones where we can’t predict. Interested in low proficiency but high interest, some don’t drop out. So interest access important. Also high proficiency low interest relevant.

Can we build predictive models that take into account both of these aspects? [I guess yes. Adding features is not hard.]

They have electronic portfolios. Creative space, record, collect artifacts to show that they have done. Assignments developed around the tool. Highly customisable. Assignments designed to entice students to reflect. Used this as data source for interest.

Previous, related work. Several models predicting student retention, and some comprehensive studies understanding dropout.

Data features used – three disjoint sets of features. Academic data (e.g. SAT, grades on courses), demographic data (incl dormitory!), engagement data (eportfolio logins, submissions, hits received).

Analyse each feature individually, to see which best describes the outcome. Several different approaches. (ePortfolio hits was important). Semester 1 GPA was not correlated with outcome for most approaches except one. (?!) Grades for intro course were more important. SAT scores not very important – SAT verbal scores were negatively correlated (!! – laughter in the room). ePortfolio hits was very important to all of the models, #1 for all but one, #2 on that one. (Pleasingly, ethnicity is a blank too.)

Created subsets of the data features. All academic, top academic, all engagement, top academic+engagement. Tested lots of classifiers.

Dataset very imbalanced, so measuring accuracy alone is very deceiving. 88.5% were retained anyway, so if you just predict all retained, get 88.5% accuracy. So looked instead at Acc- prediction accuracy for dropout students. And Acc+ is for retained students.

10-fold Cross Validation structure. 10 stratified samples, ran the same expt 10 times, average to get final result. Academic data features very low accuracy – no better than 25%. But engagement better. Very good on Acc- is the all engagement data features- 77%/83%. And academic plus engagement the very best – NB was 87.5% (Naive Bayes). Small tradeoff – incorrectly labelled some who didn’t drop out. That’s probably less important than missing people who did miss out.

Evaluating these results – confusion matrix true/false positive/negative.

ROC curves – best curves are the engagement features and top academic+engagement, by a long way.

Results: had 48 students eventually dropped out. Picked 47 to train model, got the other one to test. (And round and round, presumably.) On only performance, academic data, only spotted 11. When used engagement features, saw 42 students.

Lessons learning: Portfolios were good source of data, engagement data was good. In future. look at how early we can ID students who are disengaging. And what can we learn from analysing the content of ePortfolios.


Q: Four quadrants, high interest but not prepared. Interesting to identify them. Curious what thoughts you have about a mechanism to prepare them. I spent a lot of time trying to nuture interest, metacognitive stuff. But the content stuff is like bang for your buck. Yeah, huge potential in that corner. What ideas are worth pursuing.

Great question. Sometimes take for granted, easy to find students struggling academically, help them move up. But if disengaged, it’s much more difficult to assess that than to assess them academically. Working on peer mentors. Often times that feels more approachable than professors. Additionally, inviting students to attend engineering events – career fairs, graduate events.

Alex (coauthor): One step further, STEM ambassador communities. Using best examples in ePortfolios, to help inspire and reach out. At midterms, when ID good engagement, send advisors to the disengaging groups.

Q2: In ROC curve, engagement only worked as well as engagement+academic, which doesn’t work?

Engagement+academic was a tiny bit better. But without academic, still predict retention very accurately. Most of the students performed homogeneously well. So looking at just their performance, hard to discern dropout vs retain. Engagement features much larger role. Academic performance played a minor role.

Q2: Ok, it’s real. The four quadrants, that was illustrative data, not real?

Yeah. We don’t have that many dropouts.

Q3: What you ended with, text-based analysis – interested in anyone else doing portfolio analysis.

Very new work, more people doing analytics on portfolios. Students doing text analysis, looking for trends among successful students in what they put in there. Find how successful student uses the portfolio. Perhaps train other students to follow those trends.

Q4: Tried with non-engineering students?

No. IRB barriers. Curious to see how this works with other populations.

Perceptions and Use of an Early Warning System During a Higher Education Transition Program. Stephen Aguilar, Steven Lonn, Stephanie Teasley (Short paper)

Stephen Aguilar starts. Outline: Brief history of Student Explorer, design-based research. Methods and setting. Results. Implications.

Steve Lonn takes over. Design-based research, work with academic advisers, iteratively design it based on their needs. This is third paper about Student Explorer at LAK, can see the trajectory. In 2011, on v.0. At-risk program in college of Engineering. It was in Excel, manual acquire data. Performance data. A few rules. Using LMS logins as the rule breakers. All manual. Gave data that wasn’t previously available, but labour intensive and too infrequent to make changes.

2012, v.1. Work on redesign, to automate the processes so it’s not grad students doing it. Used Business Objects, automatic, weekly report. Good to automate, but cumbersome and unstable platform.

2013, v.2. New partner, another at-risk program, including athletes, a different college. Another redesign. More custom, built on .NET. But now daily processing.

The rules – red/amber/green – engage, explore, encourage – action for academic adviser. Formative data, not predictive model, more actual performance. They’re already at risk. First rule is how are they doing in performance – if >85% they’re green. If middle, find distance for course average – if close, back to green, if more middling, bring in percentile rank of course site logins (25% is their boundary).

When you select a student, get main summary, across all classes that’ve had one graded assignment (pulled from LMS gradebook). Current week, show alert status – red/orange/green. Detailed view for each course, can see class performance over time, and how student performs over time. And then qualitative data – if instructor is giving feedback, can see that too.

Stephen Aguilar takes over again.

Low achieving for engineering, but not very low achieving. But Summer Bridge is for non-trad students, help for transition to college. Community-building living environment. Hang out most of the day. Math, English and soc sci, mainly study skills. Individualised academic counselling. 200 students. 9 academic advisers, N=9. All with higher degrees. This is their main job.

Research interest was around intended vs actual use.

Data sources – user logs, appointment calendar data, pre/post test.

1h training session to introduce the tool (Student Explorer). Also to collect feedback – important point about option to hide student status, so can share them.

‘Please rank courses by how often you think you’ll raise issues related to them’ – stayed constant, math most important.

Usage patterns – consistent. First week, no data to report. Similarly for last week – still meeting. Also blank at lunch for meetings. Student Explorer activity happens in between. Activity spike around the math midterm (was ranked high), because in their experience that was most important predictor for future. Use pattern – overwhelmingly during the meeting, a bit before, almost none after. It was designed for meeting prep, but the advisers used it during the meeting with the student. Unintended!

Design intentions don’t always equal use. Some courses are more important – so maybe equal screen real-estate is not right design choice.

Affordances of design (Don Norman). Confusion matrix – perceptual visible, affords the action, that’s the quadrant you want. Talk about data literacy, puts onus on the user. But design literacy on those of us doing the work.

Design literacy, to understand the affordances, ability to author representations of data. All LA is an act of communication.

Example – paper with grade, simple. Quant test a bit more. Signals/student explore are a bit more complex, need more understanding. As they get more complex, have to think about the design. In theory, have data about motivation, self-regulation, data literacy, engagement.

Future work – student perceptions, how they make sense of it, and how they react to comparative data. Also nimble and clear representation – give new use case, how can we make it better for use during meetings with students.


Q: Your users identified that they were going to use that tool during meetings, they asked for that textbox. It was clear to advisers on seeing the tool that one of the most useful, best uses was concurrent in their interactions with students.

That was accidental. Clear to one adviser in a room of nine, but it spread. Clear … once they’d thought about it.

Q2: Excited about visualisations to learn to use visualisations. Integral part of STEM enquiry. Push hard on using these representations. If students see representations of data about them, enormous opp to increase their literacy in using visualisations and understanding.

In the future, 3-5y, access to their data, but also to tools to make visualisations. We don’t have all the good ideas, they might come up with something useful to them.

Q3: If instructors use the gradebook like they should – you said that. Feedback to students, could give feedback to instructors about how it’s being used. I’d like to see that as well. Advisers now in a key broker role.


Q4: Surprised by how little student access there was outside of advising. (Accessing their own.) Student access?

No, was only adviser access. Students didn’t have access. Only saw their data within a meeting with their advisor. Have been careful, we don’t know how students will interpret the data. Some have trouble with very basic literacy. Especially in early warning systems for at-risk students.

John Whitmer: Any data about effectiveness of tool on changing student performance.

Thinking about that for the future. 7w program, student performance hinges on the midterm. Round 2, make changes, show student their own habits, use that as a lever.

[tech problems for next speaker]

JW: Meetings, or not?

Not done much on prediction. It’s just the data we have now. Mostly on purpose, it’s not the direction we’re going in.

Andrew Krumm: Different theories drawing on. E.g. reference group, reinforcing stereotype?

Interested in, motivation framework, self-determination theory. If student already on a mastery track, that’s adaptive, what does that mean if we give them comparative. But could be maladaptive.

AK: Translate from psych to design? Efficient or hard?

Fun problem space. Definitely a struggle to take that and turn it in to a design.

Q5: Last person talking, interesting question about making these at-risk visuals available to students.

This is so new, not a lot of showing stuff to students and then get the outcome. That’s a rich space for research.

Modest analytics: using the index method to identify students at risk of failure. Tim Rogers, Cassandra Colvin, Belinda Chiera (Short Paper)

Tim speaking. Reference to Dillenbourg’s work on modest computing.

Couple of angles to risk – attrition, and performance. UNISA, busy developing grand algorithms to predict attrition and performance. Point to help them not drop out, and do well. Have a guy called Gollum, his Precious is data, very eccentric. It’s important work. But very complex and takes a long time. Taken a year, comes from biology in using mashups of decision trees to pull together data not done before. Thinks can do same with student data.

But! One context is day to day job in aiding academics improving their teaching. They want to predict how students will go in their courses. One project looking at interviewing students who did better or worse than expected – compared to what? Need something to analyse that and give you the data. Need to pull out students we expect not to do so well. Shouldn’t lose sight of trying to assist students. Another angle: in the course, it has a particular context. Large algorithms are fine if applied broadly, more skeptical when applied to student performance at aggregate levels. Dragan looked at Moodle data, just using regression, picked out variables that did indicate students at risk and predict it. When split it down to course level, different variables at work. Question is, how refined do we have to be? What level to aggregate or disaggregate?

Scott Armstrong, forecasting.  Unique perspective, serious mathematical chops, very much about the results – can you show the technique works in concrete circumstances. Fast and frugal heuristics. Anyone aware of that? (No.) Use rule of thumb vs regression. Often show simple, straightforward approach can do just as well as more sophisticated one, given constraints. Example, first past the post, where one main variable predicts enough. Don’t need more to make an accurate variable. Also students who’ve been there for a time, their GPA is quite predictive.

Don’t generally work with students who’ve been here for a while. Mostly first year courses. Many don’t come with any performance data we have – alternative pathways. So methods to tap in to that population. Index method. Childishly simple! The forecasting community have resurrected it. Tabulate variables that are relevant to the question. E.g. show table of student (each row), columns – have they logged in on time, 1/0; parents did not complete high schools 0/1, failure in previous course 1/0.  Series of variables, simply tabulate and add up.

Interesting history. Index more accurate than various regressions. Resurrected every decade. Used by credible forecasters with strong stats background. It’s either compitetive with more sophisticated approaches – or does better.

Index vs regression – pulled data out of system. Couldn’t get engagement out of the LMS. Compared to regression, simple elimination, drop lowest t value. From 2011, 4 courses, 2nd half of 1st year. Two models: index method, regression on total. And regression model. Tested it on novel data for 2012 students.

index R was -0.59; regression -0.70. So not as good.

Incredibly simple technique. No coefficients, so lose a lot of stuff. Can pile on as much predictors as lit will allow, including expert opinion, can drop, add variables. Sample size not an issue. Can be owned by educators, it’s simple.

Implications for orchestrating adoption. My major point. It’s not so much about the index method or other fast and frugal heuristics. Academic has a black box given to them, but they don’t know what’s going in to that. The kind of conversations you can have if you’re working with data are important. An academic who feels that an institutional risk algorithm presenting a list of at-risk students, they can’t interact with that if they don’t think it’s representative. Risks alienating people.

Shane Dawson and I had a conversation with academics recently. Academic said, 2nd year students failing because their lazy. That’s a strong assumption, but didn’t take it further. Need to have more data-based conversations with the academics. Have to pull them in to the data. However sophisticated it gets, have to leave room for them to participate in the questions they want asked. Ok, your students are lazy, why do you believe that? Data to illuminate that?

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Large Scale Collaborative Funding Bids

Martyn Cooper's blog - Thu, 27/03/2014 - 12:07

Today I attended a training workshop on large-scale funding for projects.  I have built my research career on such projects, but it is never too late to learn and gain from others’ experiences.

Anne Adams (IET):

Tensions – Turning Tension
  • Understand the drivers of the partners
  • Manage expectations
  • Managing cultural differences
    • differences can be enablers
  • Working at the OU is like working at a mini-EU (committee structures, scale, layers, etc.)
  • Learn by mistakes
  • Use processes and support : Trouble shooting
  • Funders can be a useful connection point – use them
  • Check understanding
  • Impact and dissemination from early on – impact and engagement
  • Balancing different objectives – your partners are doing the same
Small Projects
  • More focused objectives
  • Smaller budgets
  • Tighter timeframes
  • Researchers often have to do project management as well
Large Projects
  • Inverse of above but effectively made up of a series of small projects
Managing Time
  • Clear-out bid writing time
  • Co-ordinate multiple objectives
  • Larger projects may have tighter financial restrictions
Managing People Stakeholders
  • Project partners
  • Outside the project who are they?
  • Involve in writing the bid
  • Involve users
  • Expectations
  • Highlight how stakeholders have been involved in bid writing
  • Shifting timescales
  • OU Catalyst Project – Research School
  • Breakdown what will be available when
  • Administrators
  • Project manager
  • Researchers
  • Academics as researchers / teachers / communicators
Lines of management
  • Issue of part-time staff
  • Commenting on objectives
  • People are overworked
  • Use e-mail sparingly
KPIs / Partners
  • Co PIs / Partners
    • Institutional differences
    • Cultural differences
    • Tensions from misunderstandings
Manage expectations
  • Competing expectations from a project
    • From partners
    • From funders
    • Your own objectives
Be Brave
  • Proactive in getting key players in the bid who may have a previous funding record with the funders
  • Summary of objectives and ideas early on
  • Leverage OU
    • Scale
    • Contacts
    • Systems and procedures e.g. ethics
    • Management systems
  • Don’t be afraid
    • Change adapt ideas, partners even near deadline
    • Bring in additional partners – mid-project if necessary
  • If lose trust in a partner – deal with it – don’t want a disaster during the project
Turning Tensions Around
  • transferable skills from teaching at the OU
  • Learn to take risks
    • Allow exploration
    • Keep it focus
  • Allow partners to shine
Using people available
  • Colleagues
  • Research school
  • Successful bids
Using Processes
  • Re-use processes
  • Other external projects
  • Get feedback from potential reviewers
  • Funders have resources to support

Share your ideas

  • Share ideas early on – get the project name out there!
    • Leaving it to the end means missing opportunities
  • Share your ideas with your partners
  • Different ways of sharing:
    • Posters
    • Speed dating events
    • Plan to create a video early on
    • Create eBooks
    • Websites / data sharing (in project and public)

John Domingue (KMi):

  • Funding for staff and the latest equipment and travel
  • Good for CV – funding necessary for promotion to Chair
  • Networking
  • It gives you autonomy
  • Need a great idea for a project – the elevator test – can you sell it in 2 mins
  • The larger the funding the more political
  • In Europe saying the US has it is a seller
  • E.g. “turning the web from one of data into one of services”
  • Writing for the reviews – able to give up a week of their time so tend to be good researchers but not the best.  They have to read a lot of bids – make yours stand out
  • Also have to write for the EC Section who select the funded bids
  • Make it something beyond the state-of-the-art
  • Clear, pertinent, transparent
Official Criteria
  • Excellence (threshold 3/5)
  • Impact (threshold 3/5)
  • Quality and Efficiency
    • Plan has to match the idea … if you are going to change the world in X do  the resources to do it?
  • The Consortium – probably counts 50%
    • Do you have the big players?  If not why not?
  • Use the relevant industrial associations if appropriate
  • EU projects is a game – play by the rules
  • Make sure the objective aims of the proposals are aligned with the big partners
  • Make sure every partner is playing a specific role
  • Exploitation partners hardest to find – but most important
  • Academics will always come aboard
  • SMEs / Industrial players
  • Leading Research Institutions
  • Balance by Country, region, and type
  • Year to 9 months ahead of deadline meeting of core partners – set forward the core idea
  • Pre-consortium beauty contest
    • Needs to be handled careful
  • E-mails, Skype, etc. to develop the bid
  • Set a small team of people who will write the bid
    • The consortium may well change during the bid writing
  • Talk to the funder
Commission Dialogue
  • Go to Brussels for a day, e.g. an info day
  • Get feedback from the Commission after the call is out
  • Be prepared to radically change the bid in response to feedback
  • Study the Workplan early – before the call is published
Proposal Document
  • Template
  • Stick within page limit
  • Coherent
    • Text
    • Workplan (spreadsheets/ Gantt charts etc.)
    • Risk Management
  • Note – unit heads will have their own goals – how does your project fit?
  • Take into account previous EU projects in State-of-the-Art
  • Use strategic reports from the Commission and others to give background information
  • Typically WPs: Management; 3 Technical WPs, Dissemination and Exploitation
  • Get balance of roles between the partners across the WPs right – balance the effort to match the objectives
  • Impact – who are the authoritative sources?  Quote from key reports e.g. Gartner
Writing the Proposal
  • Small team of good writers (native English speakers), separated away from other work, usually in a shared office – use study leave
  • The deadline is final!
  • The difference between a good academic and a good academic with project money is networking
  • Info days
    • Often include networking session to find partners/projects
    • ICT Events (different in different fields)
    • Keynotes invite Commission representative to conferences you organise
Easy way to start
  • Become a project or proposal reviewer
  • The OU is a world leader in Pedagogy – lead training workpackages

Sarah Gray (Research Office)

Research Support


  • Work closely with faculty administrators
  • Review and approve all external bids (to Leverhulme and UKRCs)
  • Sarah is EU co-ordinator
  • Finding funding opportunities
    • Current oppertunies page
    • Visit UKRO and register e-mail address
    • National Contact Points in UK.Gov
  • Open calls on WWW page URL: search EU Horizon 2020 Participant Portal


— end —

Learning analytics don’t just measure students’ progress

My article ‘Learning analytics don’t just measure students’ progress – they can shape it‘, appeared online in The Guardian education today, in the ‘extreme learning’ section.

In it, I argue that we should not apply learning analytics to the things we can measure easily, but to those that we value, including the development of crucial skills such as reflection, collaboration, linking ideas and writing clearly.

I also link to the #laceproject – Learning Analytics Community Exchange – a European-funded project on learning analytics.

Setting learning analytics in context

I organised a panel at Learning Analytics and Knowledge 2014 (LAK14) in Indianapolis on ‘Setting learning analytics in context: overcoming the barriers to large-scale adoption’.

Thanks to Shirley Alexander, Shane Dawson, Leah Macfadyen and Doug Clow for making it a great event, and commiserations to Alfred Essa who couldn’t make it at the last minute due to a cancelled flight.


Once learning analytics have been successfully developed and tested, the next step is to implement them at a larger scale – across a faculty, an institution or an educational system. This introduces a new set of challenges, because education is a stable system, resistant to change. Implementing learning analytics at scale involves working with the entire technological complex that exists around technology-enhanced learning (TEL). This includes the different groups of people involved – learners, educators, administrators and support staff – the practices of those groups, their understandings of how teaching and learning take place, the technologies they use and the specific environments within which they operate. Each element of the TEL Complex requires explicit and careful consideration during the process of implementation, in order to avoid failure and maximise the chances of success. In order for learning analytics to be implemented successfully at scale, it is crucial to provide not only the analytics and their associated tools but also appropriate forms of support, training and community building.

The Slideshare below includes my sections of the panel presentation, and not the excellent presentations from the other speakers.

Setting learning analytics in context from Rebecca Ferguson

Discourse-centric learning analytics: DCLA14

I was one of the chairs of the Second International Workshop on Discourse-centric Learning Analytics (DCLA14), which ran as part of the Learning Analytics and Knowledge 2014 (LAK14) conference in Indianapolis.

Workshop notes available on Google Docs.

Programme overview

9:00 Chairs’ Welcome & Participant Lightning Intros

9.30 DCLA14: some questions to ponder… Rebecca Ferguson

9.45 DCLA Meet CIDA: Collective Intelligence Deliberation Analytics [pdf]

Simon Buckingham Shum, Anna De Liddo and Mark Klein

10.45 Automated Linguistic Analysis as a Lens for Analysis of Group Learning [pdf]

Carolyn Penstein Rosé

11.30 Designing and Testing Visual Representations of Draft Essays for Higher Education Students [pdf]

Denise Whitelock, Debora Field, John T. E. Richardson, Nicolas Van Labeke and Stephen Pulman

Discourse-centric learning analytics from Rebecca Ferguson

Future Internet Assembly, Athens

I was invited to attend the Future Internet Assembly in Athens, where I took part in a panel discussion: ‘Beyond MOOCs: The Future of Learning on the Future Internet‘. The FIA website includes video footage of the entire panel.

I spoke on my experience with the FutureLearn platform for massive open online courses (MOOCs). Since running its first course in September, FutureLearn now has more than a quarter of a million registered users and over half a million course sign ups.

I talked about the benefits that massive participation can offer to learners, educators and to society and about some of  the implications of MOOCs for the future of the internet, with a particular focus on authentication, interoperability and accessibility.

MOOC Platforms and the Future Internet from Rebecca Ferguson

LAK14 Weds pm (6): Alternative analytics & posters

Dr Doug Clow's blog - Wed, 26/03/2014 - 18:57

Liveblog from the first full day LAK14 - Wednesday afternoon session.

Session 3A: Alternative Analytics



Xavier welcomes everyone. A diverse set of presentations ahead.

Sleepers’ lag – study on motion and attention. Mirko Raca, Roland Tormey, Pierre Dillenbourg (Full Paper).

Mirko presenting.

Goal of this research is to give people better idea of the classroom – “with-me-ness”. Signals in the classroom, inferring attention and motion – if I move my hands, am I attentive? Not just immediate attention. Talking about how to exploit the signals.

Comes from eyetracking studies on MOOCs, looking at with-me-ness. Giving just the information, some people are more or less able to identify where to look, but with teacher reference, much more do so. Eye tracking in the classroom, know the position of the eyes, it’s vague in the classroom, especially if just sitting and listening. What signals do we have here? Signal theory, space where informaiton is traveling between participants. People think of information from teacher to students. But students are not sitting in a vacuum. With 50 people in the room, information travels between students. Distractions outside. Also from a person to themselves.

Signal-oriented view of the classroom. Students made a ‘sleepy FL’ vs ‘EPFL’ on Facebook. Social media content, showing student sleeping in the classroom. It’s clearly a signal! Trying to formalise this in to signal processing.

Started of with motion analysis as the basic form of analysis, before judging pose or gestures. Feature-tracking application, motion intensity measure of students over time. Not clear in and of itself, but testing. Have a huge amount of data, it’s overwhelming for teacher. Use modern algorithms to get more meaningful.

Annotated regions for each student, Lucas-Kanade feature tracking, group in to motion tracks. Then to associate with person, fit 2D-guassian probabilities. Person most likely to generate whole motion track is IDed as the source. Every motion associated with a single person, not distributed over multiple people – an issue with overlap of sitting.

Analysed two classes, N=38 and N=18. Four cameras, questionnaires, 10-min intervals, on attention etc. Student reports level of attention (nice normal distribution). Percentage of activities reported – six choices, three productive, three counter-productive. And on-task activities, listening/taking notes etc. Off-task activities are observed even at high levels of reported attention, but it does capture rise of on-task activity. (More attentive is correlated with more on-task activity.)

Teacher-centred positioning, amount of motion from students vs distance from teacher. With the distance from the teacher, the motion intensity decreased. Interesting! Mentally less active, and physically?

Then turned to interpersonal view – student-centred positioning. Proximics theory (E.T. Hall); personal perception of classroom. Three categories: immediate neighbour, visible neighbourhood, other. Classify each pair, randomly in the room, and analyse behaviour.

Synchronisation – borrowed from eye-tracking. Didn’t want to be overly intrusive: complement, not change how it’s taught. Stuck with cameras, not sensors stuck to people. Dyadic analysis of classroom – whether two people act at the same time or not. Like eye tracking, if two people act at the same time have high levels of comprehension and agreement. And how that translates to the motion. Often one person’s motion happens with another person’s, only slightly shifted timewise. Plot one graph vs another and get a matrix. Look at the diagonal, gives you times of motion synchronisation.Try to model the exact and relative synchronisation – is there a lag?

Tested immediate neighbour influence. Saw significantly higher probability of synchronisation between immediate neighbours and any other pair of students. Reasonable finding. No reason to coordinate, just sitting by someone influences your actions. (p<0.05). NSD in sync between visible and non-visible part.

Sleeper’s lag, is the synchronisation idea. Dominant signal from teacher, says take notes, people take notes. But if external signal that both people react to, they start taking notes, some realise as a cue from their peers rather than teacher. On high levels of attention, the average delay was smaller – they were quicker to react. Which makes sense. It’s marginally insignificant  (!) and relatively flat. At the resolution of 2s. Maybe need even smaller resolution here. Similar trend in other class but insignificant because smaller numbers.

Signals as a view of the classroom, not going above socio-economic view, sticking just to little box of the classroom. First results of observational system are going in the right direction. Demonstrated an idea for non-obtrusive performance metrics.

Future work – re-test, semester-long experiment with video, questionnaires, interviews. Also focus on face tracking, face orientation, as the social indicator.

And they are looking for a post-doc! In education and machine learning.


Xavier: Movements and lag. How sure are you that the delays are due to attention, not because maybe I started to take notes before.

We start with idea that they are telling us the truth in the questionnaires. We’ll see on the large sample. Different things that can be interpreted. The motion is very vague. Do we want to capture hand position, posture as defensive etc. Get as much from this feature and then move on. We’ll see.

Q: What camera technology do you use? Price, obtrusive?

As low end as it can be and not get crappy. Aiming to produce something applicable anywhere. Not high resolutions. Just moved on to HD webcam. Low end section. Tried using more exotic fields, emotional bracelets, we might use in capturing and assessing, but at the end it’ll be webcams.

Q2: Any theories about interaction with the academic content and how that affects their motion?

Started doing pre/post test. One thing in the questionnaire, it looks complicated, it’s just measuring four parameters. One of them was, the attention, classroom perception, teacher perception, and material importance. Correlate between all of them. If there is a correlation with the material importance, we’ll pick it up. Not sure in the long run with just the camera.

Malcolm: Any plans to experiment with other classroom design types – this is benches in rows. New environments, any plans?

Again, started similarly to webcam tech, started with typical layout. Not sure we can venture with other types, we don’t have many at EPFL. Broader, the standard composition of the classroom, typical layout seems to be more productive, others seem to be inhibitive of discussion, may have a reference.

Malcolm: Would love to see that.

Clustering of Design Decisions in Classroom Visual Displays. Ma. Victoria Almeda, Peter Scupelli, Ryan Baker, Mimi Weber, Anna Fisher (Short Paper)

Victoria presenting.

Why classroom displays? Teachers spend a lot of time thinking about what to put on wall displays. They’re available for hours to the students. Hence important to ask whether they’re visually engaging or distracting.

There are large amounts of sensory displays, but no evidence it helps learning. Know that there are links between off-task behaviour and visual design – visual bombardment., distraction.

Literature tends to focus on visual aspect, and features in isolation. So want to examine the semantic content of visual displays. And go beyond that to investigate the systematic design choices by teachers, to find patterns that best support it.

30 elementary K-4 classrooms in NE USA. Fall semester 2012.

Coding scheme – photos of walls using Classroom Wall Coding System. First code each flat object, then classify as Academic/behaviour (e.g. star charts)/non-academic (school policy, decorative).

Labels – English words, descriptive. Also content-specific stuff related to academic topics. Procedures as well, hierarchy of steps. Decorations included e.g. welcome board, or the Cat in the Hat. Student Art. Finally, other non-academic – calendars, fire escape plans, stuff solely for teacher eg. own personal calendar or picture of their diploma (!).

Analytical method – K-means clustering. Systematically tried different numbers of clusters. Used k=4, more just had more outliers. One n=1 cluster present even with k=2!  So four clusters.

Then determine distinguishing features for each one. Mean and sd vs overall average of each visual feature. Decorations and content-specific explained most of the variance. Cluster 1, decorations, labels, other nonacademic – help navigate. Cluster 2 similar, but in opposite direction – low decorations, low labels. Primarily private schools. Cluster 3, high content and decorations, only group with lots of student art. Teachers use student art to motivate learning; most likely they regarded visual displays as tools for learning. Singleton n=4, interesting – high content specific, procedures, other nonacademic. Probably a teacher who viewed them as tool for recalling information.

Monte Carlo analysis to see relationship between cluster and type of school. Charter schools overrepresented in Cluster 1, private schools in Cluster 2 – they may think visual displays distracting. High amount of decorations came with charter schools, curriculum emphasised literacy development, so promoted print-rich environment.

There is systematicity in how teachers decorate walls. First clustering outcomes on this, better than features in isolation. Private and charter school teachers decorate walls differently – does that impact engagement and learning.

So future work to look at student achievement, off-task behaviour.

[Limited liveblogging as I get ready to present.]

Data Wranglers: Human interpreters to help close the feedback loop. Doug Clow (Short Paper).

[My paper! So no liveblogging.]

Ace question at the end about what’s happening to get stuff in to the hands of students. I said yes, absolutely, that’s where the best benefit is likely to be, and we’re just starting to get working on that.

Toward Unobtrusive Measurement of Reading Comprehension Using Low-Cost EEG. Yueran Yuan, Kai-Min Chang, Jessica Nelson Taylor, Jack Mostow (Short Paper)

Yeuran speaking.

Traditional assessment, reading comprehension, asks you questions about what you’ve been reading. Age-old technique, very important. But problematic, the questions have to be written, they take students time, and they have limited diagnostic ability, especially if just scored correct/incorrect, and we don’t know what.

So looking to build better. Automated generation of questions, detection of comprehension by reading speed or other unobtrusive work, and diagnostic multiple choice tests. This is something new.

Electroencephalography, EEG, brain signal detection, put electrodes on people’s heads, record brain signal activity which is indicative of mental states. Commonly used to study neurophysiology of reading comprehension (where in the brain is activated when you read), and also can detect semantic and syntactic violations (! wow).

But problem with this – lab EEG looks useful, but it’s expensive, expert-operated, and require 30 channels, takes time and gel and faff to put on. Not suitable for classrooms.

New innovation, inexpensive EEG, a hundred dollars or so. Operated by just about anyone, single channel, easy to put on. But tradeoff for signal quality. What can we do with these devices? Is the important data from lab devices lost?

Research labs looking in to this – early study in AIED13, looked at use of EEG devices in a Reading Tutor, looking to see what we can and can’t detect. Primary success was text difficulty, significantly above chance. But couldn’t tell comprehension (whether they were getting questions right). Why not? Small data size, only 9 were children (adults don’t tend to get the questions wrong) with 108 questions. The devices tend to be noisy.

So improvements – methodological, improved stimuli – questions better – and bigger dataset, to 300 questions. And some algorithmic changes to pipeline – new features.

Cloze questions. Story given, fill-in-the-blank sentences. Multiple choice: right answer, plausible (but wrong) distractor, then implausible distractor (grammatical, but silly), lastly, ungrammatical distractor. “The hat would be easy to ___.” clean / win / eat / car.

Deployed ~900 questions, 300 remain after filtering of poor signals – aggressive about that to maximise performance. 78.7% correct, 13% plausible, 6.1% implausible, 2.2% ungrammatical.

Tried to predict from EEG signals whether they got something correct or not. Second analysis, looking at time reading each of the answer choices, trying to predict which type of violation was there. Alas, no significant distinction.

Original was just the context, then the context and cloze question, then choices, then the lot, the segmented, then 4-second segments.

Machine learning pipeline – generate EEG features (alpha mean, beta variance); cross-validated experiments: balancing (undersample), feature selection, evaluation. Two schemes – trained classifier on all but one trial, test on remains; other trained classifer on all but one subject, test on last subject.

Significant results are the segments over the parts where they were reading, but nsd over answering time. Better performance on between classifiers than within – probably because not enough data on the within. Only 60% – significant, but not good.

Why poor performance? Not ready to replace assessment. Maybe not enough data, over-fitting/noise. Maybe the low-end devices aren’t sensitive enough, or the pipeline isn’t making good use of the data.  Working on all three of these possibilities.

Conclusion: Significantly ID correct/incorrect above chance – but accuracy not great. But can’t predict the type of answer choice they were reading.

Workshop on EEG in education, at ITS 2014.Toolkit and dataset available!


Xavier: I know you don’t have data, but on instinct, which of the three are the reasons? Devices, data, pipeline?

Honestly, I think it’s a combination of all three. More data would be helpful. We have, we believe it’s possible due to the device. We find a lot of reading-relevant EEG information comes from sensors that are not at the front of the head, so different sensor locations, or multiple sensor locations, or bilateral differences.

Q: What’s the use case for this. I have the fear for students. (laughter)

We invested in a lot of taser companies (laughter). We’re investigating how to improve the accuracy here. Whether we can actually grade the students, that’s not directly we want to replace. But we could aid assessment, it’s likely the student needs help right now. So say get 60% indication they need help here, we might deploy a question and catch this rather than miss it. Something a bit more friendly.

Q: Maybe more formative assessment.

Related work by colleagues too. Looking at how to use this not as a binary distinction but to help other distinctions.

Q2: Why would you not have the option that the EEG data won’t be that accurate? How would you know you could get better than 60%?

We don’t. We haven’t done the same setup with expensive EEGs. It’s infeasible to collect this amount of data with kids in an environment anything like a classroom with all that crazy scifi stuff going on. Other distinctions are out there. Brain/computer interfaces are making strides, we hope this is lack of development right now. This is picking up among labs – hopefully we’ll have more exciting findings.

Q3: Students metacognitive abilities. Here, predicting accuracy of answer. But how confident are the students in their answers? That could be another outcome measure to predict.

Yeah, that’d be good if we had data for that. Looking in to predicting other tasks, other behavioural tasks and labels beyond correctness/incorrectness, but we don’t have the information right now.

Firehose session: Posters & demos

Chaired by Zach Pardos and Anna de Liddo.

Zach introducing. Intellectual entertainment! Stealing this idea from a neighbouring conference. Hard to get a sense of all the posters, exciting new work, established work, easier to convey with hands-on demos or poster presentations. All presenters have one minute to tell you about their work, then go to the posters and demos you want. 18 presenters. Demos, posters, doctoral consortium. Everyone has a poster. There’ll be an award for a poster, and one for a demo. You’ll see paper on the social tables for the reception, with a pink slip and a neon green slip. Green for demo, pink for vote on anything other than demo.

#7 Ed Tech Approach toward LA: 2nd generation learning analytics. First gen, focus on predictions, how soon we can provide an intervention. 2nd gen, more focus on the progress and how we can have student improve learning achievement. We have state of the art medical examination, you will die in one month, what would you do? Not enough to improve your health outcome. Want to see what kind of variables we need to look at.

Patterns of Persistence – John Whitmer. First gen research, in to MOOCs, deconstructing what achievement is, beyond pass/fail. Remedial or basic skills English writing MOOC, seeing if constructs apply. MOOC interaction data, 48k students, entry surveys, exit surveys, see how those inform level of participation. Used methodology of today: cluster analysis for patterns. Looked overall at correlations, use, testing, exam, participation.

Effects of image-based and text-based activities – Anne Greenberg. Post on investigation of potential differential effects of text and image activities in functional anatomy course. Text, image, control – participation on image activities had better performance on outcomes. Image questions less mentally taxing.

eGraph tool – Rebecca Cerezo. There’s a graph, nodes and analysis, represents student interacting with Moodle during a week. Tool developed, eGraph, #1, can demo on your computer. Really easy and intuitive.

Peer Evaluation with Tournaments – E2Coach. U Michigan. Engaging students in creating content, evaluating content. Peer evaluations. Poster is studies where students submitted a game to a tournament, videos, solutions to problems they got wrong on the exam. Do they do a good job? Do they learn? Can leverage a lot of students to generate good content?

National Differences in an international classroom – Jennifer DeBoer. Looking at student data using MLM with students within countries, EdX MOOC on ccts. Result were sig diff in performance for different groups of countries.

Visualising semantic space of online discourse: Bodong Chen. Toronto. Online discussion systems. Using topic modelling techniques, time series analysis, to provide alternative representations fo online discussions to help make sense of discussions better. For e.g., see how topics evolve over time. (From his slide, he’s an R ggplot2 user!)


Chris Brooks, demo on Analysing student notes and questions, for Perry Samson. LectureTools. Backchannel for classroom, clicker, voting tool, tons of analytics. Lots of research, opps to use this in your classroom.

HanZi handwriting acquisition with automatic feedback. Learning Chinese is hard for students and teacher. Tool, HanZi, gives fb, captures sequence and direction errors. Version in Windows, developed in to Android.

OAAI Early Alert System Overview – open source academic early warning system. SAKAI, release the model in standard ?XML format. Data from student information system, LMS data. Data integration processes, then predictive model, identifies students at risk, deploy interventions.

Learning analytics for academic writing – Duygu Simsek, OU. Write to communicate in the academy, deliver our thinking to the research community. All aim to write in academic way. Challenging! Especially for undergraduates. A tool, dashboard to help us improve our writing. Have your paper analysed by a machine like this! (Xerox Incremental Parser)

Doctoral Consortium

Zin Chen, Purdue, Social Media Analytics framework for understanding students’ learning experiences. Using social media in the classroom, engagement, collaboration. Studying in the wild, how students are talking, emotional wellbeing etc. Are you using social media? Vote for my poster!

Wanli Xing, Interpretable student performance model in CSCL environment. New rule-based method, machine learning. Theory too. Next gen learning analytics!

Data and Design in online education. Michael Marcinkowski. COIL at Penn State. Use of data by designers and instructors in MOOCs. Dialogue between instructors and designers and students. How they use the data from students to design their courses. Qual research, interviews, trace data. MOOCs!

From Texts To Actions – Maren Scheffel, OUNL. Application of methodologies form linguistics to patterns of actions in usage data. Study she’s doing: – participate in my study on quality indicators for LA!

Usage based item similarities to enhance recommender systems – Katja Niemann. Usage data in VLEs to enhance recommendations.

Identifying at risk students in a first year engineering course. Farshid Marbouti, Purdue. Predictive modelling, performance data. See which technique is more accurate. Also how accuracy changes during the semesters, how early can we have good accuracy in predicting grades, so they can do something about it.

Ani Aghababayan -Utah State University. Student affect, ITS, digital games for learning widespread, but very little support systems, this study aims to inform the development of such systems. Frustration and confusion. LA and data mining as the means. No results yet, but time-based sequences of frustration, confusion as indicators.


Sponsors! Marketing manager for their analytics platform. D2L and research foundation. They have free mooses from Canada!

Shady Shehada. Excited to attend.

Crossroads, intersection of learning analytics research, theory and practice. Bridge ML and education. Some people so excited, they rented a car in Canada and drove here to Indiana.

John Baker, CEO, feels privileged to play a role in recognising the LA research community. Foundation to drive important decisions.

Are committed to support community, bridge gap between research and vendor worlds – through support of the research community, and delivery of research-back solutions to the market. Also to being a good vendor role model for LA research. And to support research in a cross-section of education areas, to transform teaching and learning.

Interested in ML, DM, text mining, NLP, social computing, visualisation. BI, analytics, usability, systems architecture, QA, product dev. All integrated.

Research program – Science Behind Success. Two programs.

Research & Field Study Program – kitchen where we bake research until ready to apply.

Early Adopter Program – is where it goes out first. Three products: Insights Student Success System, Degree Compass, LeaP.

ISS system predicts grades, early as possible, so can intervene. 2y ago talked about the predictive engine. How about we build it in house, give the model to the client, make it generic and predictive. Worried assumption is not valid, can’t have one model that covers all courses in one client. So ended up, we provide the chef who cooks a plate for each course. Customised model for each course in each organisation. Challenges and efforts in doing this, especially with the data – cleaning for the model. Outstanding clients helped us bridge this gap. Degree Compass – recommendation engine, personalised curriculum for students, based on interests and strengths. Also LeaP, learning path, recommends a learning path and makes sure it’s adaptive by recommending efficient routes to reach their goals. One looks at the whole curriculum, one at each course, one within the course. Trying to combine.

Products based on research, more research going on. ML library to generate analytics, provide feedback about the product and how it’s used. Accessibility – not just standards compliant but also intuitive and easy to use. Research in content analysis, discussion, course content, using text mining/ML. Link this, with social graph, to the learning objectives.

Looking forward to LAK2015!

Zach takes over, shows the cards for voting. Tempting food or drink, until 4.30!


This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

LAK14 Weds am (5): Keynote, Conversation, Barriers panel

Dr Doug Clow's blog - Wed, 26/03/2014 - 17:14

Liveblog from the first full day LAK14 - Wednesday morning session.


Matt Pistilli welcomes everyone. We are international – 25 countries, all 6 inhabited continents. 140 institutions, 237 registrants – largest yet. 13 full papers, 25 short papers, 11 posters and demos, 9 doctoral consortium posters. Thanks to sponsors: blue canary, Purdue, Desire2Learn, Intellify Learning, SoLAR. Supporters: SoLAR, Wisconsin, Gardner Institute, Purdue. Thanks all those who have helped make it happen.

Abelardo Pardo takes over to talk about the submissions. 45 full papers, 35 short.Expanding in posters, demos, doc consortium posters.

Stephanie Teasley takes over. This is our fourth conference. Many – most – of the people in the room are first-time LAK attenders. We’re not from the same tribe, but what we can do together is bigger. Introduces Art Graeser, who is bridging boundaries already.

Keynote: Art Graeser
Scaling Up Learning by Communicating with AutoTutor, Trialogs, and Pedagogical Agents

Asks how many people are going to ITS in Hawaii – none. Educational Data Mining in London – a few more.

Thanks for inviting me. Discourse processes, building computer agents to hold conversations with learners in natural language. Apprenticeship learning, between the apprentice and the apprenticee. Industrial revolution, put people in the classroom. Information revolution, have conversations with computers. AutoTutor and similar systems.

Building systems where you can have agents that help students learn. They don’t have to be visible, could just be recommendations or suggestions in print, but I’m interested in the visible, embodied ones. How do you built these? AutoTutor been built for 17y.

Challenge: Design dialogs, trialogs and groups of agents to help students learn that are sensitive to SCAM – social cognitive, affective and motivational mechanisms. Not just cognition, knowledge and skills.

Functions of agents – help when learners ask, but people aren’t good at self-regulated learning and don’t ask for it. Conditional prob of seek help given they need it is about 3-5%. Also used as a navigational guide – complicated interfaces, agents suggest and help with next step. Pairs of agents modelling action, thought and social interaction – people learn by observing. Also adaptive intelligent conversational dialog participants, that’s AutoTutor, understand what the students say and respond intelligently. They can be peers, tutor, mentor, and others – demagogue, adversarial.

Overview of talk. Examples of agent learning and assessment environments. AutoTutor and other Memphis agents developed in Inst for Intelligent Systems. Learning gains and emotions. Finally, trialogs in Operation ArIES and ARA – games being commercialised by ?Pearson.

First example. Lewis Johnson, Tactical Language and Culture Training System – learn language in Iraq, also in French. Have a scneario with many agents that helps you learn a language, so you know how to socially interact. Speech recognition, game environment. 30 scenarios where you interact, learn the culture as well as the language. This won the DARPA 2005 outstanding tech achievement award.

Betty’s Brain – Gautam Biswas and Dan Schwarz. Try to create, as the learn, Betty’s Brain, a causal structure, involving physiological systems. You build a concept map. You read in order to build this concept map. You can ask the system, Betty, questions, and see if she answers right, and if not, modify the concept map. There’s another agent that can give you guidance if you’re lost. Then you take a test, quizzes, if perfectly, you’ve constructed the brain right; if errors, have to go back and modify it. Impressive learning gains, many layers of agents, activities.

Center for the Study of Adult Literacy study. In the US, there are 70m adults not reading at the right level, can’t get a decent job with decent salary. Big center to help them learn better. How do you help them read? A portal where they can go and have training to help their reading. At a deeper level – 3rd to 8th grade level to be even deeper. Interface, handheld iPad/Android, with agents around it. A tutor agent, and another student agent, who interact with the person – talking heads trying to guide and interact and help them learn at a different level. Maybe they don’t even know how to use a keyboard. Also practical activities, like reading a drug montage, asking them what is the use of a drug. Examples.

Video: visual hover cues and drag and drop. Who is the protagonist in this story? Click the name. “John, do you think that’s right?” “I don’t think that’s right.” (Quite slow and stilted language, it’s text-to-speech stuff.) Another sample: adaptive interaction, a game – Jeopardy version. (All quite simple yes-no type stuff.) Uses relevant examples, so things like signs, drugs, job application. There’s about 150 of these.

Testing these out across the country. Ready in 15 months.

Also doing K12 work, working with David Shaffer on AutoMentor in Epistemic Games. Human mentors, want to substitute having a computer mentor instead. Land Science is a computer game, urban design planning firm. Virtual site visit. Use a mapping tool, change the zoning of parcels on a map. They create a plan, and make a formal proposal. Live mentor conversation in a chat tool, who model how professional planners work, and push students to reflect on their work. One point, if students interact by themselves they might have fun, but you need a mentor to get them to do more productive stuff and justify their reasoning. Automated mentors will allow this to scale up.

From learning environments to assessment environments. Educational Testing Services, ETS, to make VR environments with the agents – three amigo environments – you the human plus two agents. You interact with each other conversationally. Question is, testing on English language learning, whether they can read signs and take actions reflecting the signs. E.g. carrying a water bottle past a sign saying no food or drink. There’s a doofus agent, and a smartypants agent. Hold the conversation through the world. This is how they’ll assess speaking, listening, writing and reading, all in a 20-30min interaction in the virtual world.

Idea, is low ability person has short, inaccurate or irrelevant responses, and violate social norms. Higher ability, lengthier turns, accurate, socially appropriate.

Also doing it in science. Example of volcanoes, placing seismometers, looking at data, making decisions, responding to alerts that come up. That’s science assessment in the future. Interacting with agents as you wander through the world.

VCAEST doing this in the medical field.

This is reaching PISA and PIAAC – Program for International Assessment of Adult Competencies. Assesses countries by literacy. Some countries invest in education based on placement in this list, if go down may put more money in.

PISA – Program for International Student Assessment – 15yos – US low on maths literacy, but average for literacy.

Focus on problem solving in these – theoretical frameworks on how to assess problem solving. One on collaborative problem solving, to assess it, have two or more agents. It’s a process where two or more agents (human or not) solve a problem by sharing understanding and effort to come to a solution.

Case is: agents have permeated the world, starting with 1:1 conversation all the way to assessment at the international level.

Was skepticism – how can you have a computer simulate a person. Computers are different from people; people are different from people.


First system they built, to help students learn by holding a conversation in natural language. Have to comprehend it and adaptively respond. Started out analysing what human tutors do. 10y of research analysing what actual human tutors do: videotaping, analysing in excruciating detail. Don’t want to do it again. Looked a tutors in middle school, college, tutors in math, science, published studies.

Some of the things they don’t do: fancy pedagogical strategies – e.g. Socratic tutoring, modelling-scaffolding-fading, reciprocal teaching, building on prerequisites, sophisticated motivational techniques, scaffolding self-regulated learning strategies.We just came up empty (looking for these).

Tutor communication illusions: they don’t have a deep understanding of what students know. It’s approximate. Tutors don’t ground-reference, the feedback accuracy is not good at all – typically positive fb after major errors rather than negative because it’s polite. Not aligned in discourse – more the tutor throws out information. Tutor thinks they articulate somethign that’s understood, but it often isn’t. But not even very good tutors are effective at this.

So think we can do something with a computer tutor.

AutoTutor – main question, requires 3-5 sentences to answer. Have agent, student inputs the language. Two versions – one speech recognition, one typing, doesn’t matter.

AutoTutor asks questions, presents problems, evaluates meaning, feedback, facial expressions, nods and gestures, hints – very difficult, prompts, adds info, corrects bugs/misconceptions, answers some questions – but it’s not good at that; beyond computer capabilities as yet. But students don’t ask many questions.

Example in physics, chosen in part because hard to find teachers of physics. 3700 teachers in Memphis, 0 who have majored in Physics.

Question – eg. does earth pull equally on the sun, and explain why. Not just answer, but the explanation. Have expectations – set of sentences you hope the articulate, also misconceptions you may see. It may take 50 to 100 turns back and forth to get there, while that happens, all the vague, scruffy fragmented information from the student is collected, compared to the expectations, and respond appropriately. Do have parsers, but they’re not vey useful – over half the utterances are ungrammatical anyway. (!)

Main question, answer, also pump; give hints when missing something. Or prompt to get them to say one word. Students often don’t say much. AT is based to get the student doing the talking, not the computer doing the telling. Maybe they give misconceptions, correct them.

Need to classify speech acts – asking a question needs different response to expressive evaluations, or ‘I don’t know’.

Managing one turn in a conversation is of interest. Follows a structure, typically just responds with short fb – Yeah, right, Okay, uhuh, no, not quite. Then advance the dialog, then end turn with a signal that transfers the floor to the student. (Needed to keep it going.) Question, hand gesture, other.

Demo of this – the fishics tutor – Big Mouth Billy Bass doing the talking to a human. Absolutely hilarious seeing the fish saying “What else can you say about the packet’s velocity at the point of release?”. Eliza for learning – easy to give the illusion of comprehension. (This is actually pretty good that way, I’ll totally buy that level of claim for it.)

iSTART as an example, Writing-Pal, iDRIVE – ask good questions (most questions are shallow), Philip K Dick android, HURA advisor on ethics, MetaTutor on self-regulated learning, Guru for biology, and DeepTutor for physics. Also including simulation.

The learning – we’ve assessed these systems in many studies. Metaanalysis, human tutors have effect size of 0.42 even when unskilled, compared to ?school. AutoTutor 0.80; ITS about 1.00 – more realistic about 0.8. Skilled human tutors, not enough studies. Meta-analysis of skilled human tutors, there’s less than 20 studied.

Favourite study, in physics, random assignment to four conditions – read nothing, read textbook, AutoTutor, Human Tutor. Learning gains – Human Tutor and AutoTutor well in the lead, human just edging it. Read nothing beat read textbook! (NSD) Test was the force concept inventory, requires deeper reasoning. Do learn shallow knowledge form the textbook, but not deep.

Replicated in other areas, comptuer literacy, critical thinking – again, do nothing is same as read textbook, if you assess for deeper learning.

AutoTutor  0.80 sigma compared to pretest, nothing, textbook; 0.22 compared to reading relevant textbook segments, 0.07 compared to succinct answer to question. 0.13 compared to delivering speech acts in print, 0.08 to humans in CMC. No real difference between AutoTutor and a human tutor. -0.2 vs enhanced with 3D simulation (i.e. adding that is better).

Tracking mastery of a core concept over time and tasks – see whether a student on e.g. Newton’s 3rd Law, over 10h training reliably. Map concept over time, see whether correct or not. All or none learning is rare – go from not learned to learned – that’s a fiction, rarely happens. It’s more variable, contextually influenced.

Emotions and learning

Track those, look at a lot. Looked at mixture of learning environments – Incredible Machine, others. Big six emotions – boredom, confusion, delight, flow, frustration, surprise. Not much happiness, sadness, rage, fear. These are learner-centred emotions. In assessment environment, also get anxiety. Track those, have AutoTutor be sensitive. Confusion is the best predictor of learning. That’s the point of a teachable motion.

Track these automatically – face, speech, posture, dialogue. Most of these can be done by just 2 channels – discourse channel, and the face – esp at the point of confusion, and dialogue history, that helps you ID the emotion. Most systems have cameras so this is possible.

If you take discriminating boredom vs flow vs neutral, the face is not where it’s at. Face is the same for all of those. Teachers think students are in the flow experience when the student is bored – that happens a lot. Teachers aren’t trained on this.

Half-life of emotions, dynamics – e.g. drone who’s always bored, hopelessly confused, emotional rollercoaster – have tutor respond.

One experiment, compared unemotional AutoTutor to a supportive, empathetic, polite AutoTutor, vs a rude Shake-up AutoTutor, confrontive and face-threatening. Many votes for the shake-up tutor. Adults like the shake-up one. Best to time it – go for standard until problems. After 30 min, go for affect sensitive one –  30 minutes for low domain knowledge. Shakeup after 1h for high domain knowledge (maybe).

Operation ARIES and ARRA

Trialogs in learning – two agents plus a human. Three types: low ability, vicarious learning; medium ability, tutorial dialogue; high ability, teachable agent.  Want to know which agent is right at what time.


Jerry: Curious about how you gather content to feed this system, the efforts of a real live teacher moving in the direction of training these systems?

Art: Where do you get the content? Build it aligned to standards, Common Core, variety of others. Practical world, teachers aren’t going to use this unless deals with NCLB etc. Other answer, we have authoring tools, right now spending time perfecting the authoring tools to where a human teacher can build this content easily. We have, we’re almost there. They build it in English, they don’t have to be programmers. One step requires expertise – building regular expressions. Symbolic computational linguistics. Vision is you have people create these materials in English, questions and answers, then a computational linguist in there to form the regular expressions.

Q: Efficacy of the 3D element, better than AutoTutor. Personal VR headsets, Oculus Rift bought by FB. Presence rather than gaming may be the killer ap for VR headsets. With Facebook involved, could AutoTutors merge with humans, a hybrid, tapping retired demographic combined with AutoTutors.

Art: Yeah. These, like Google Glasses. They don’t interpret the environment, they just store it. A lot of what goes on in the real world is uninteresting. Videotaping a party in California with a stationary camera. It was a great party. Watching it, just a bunch of people at a table talking, could only watch it 5 min because it’s boring. A lot of the environment is unexciting. Human tutors can help you interpret it. How a tutor would, as you watch a world, be commenting; could automate it – computer recognises something and suggests on that basis. Computer tutors can help human tutors – if they can interact with a system and see it, good for professional development. 900 tutors analysed, a lot of them aren’t well trained – most of them. The computer can help for professional development.

Stephanie: Take in the answer, don’t watch the party, be the party!

1A. Discussion with Art Glaeser

Q: What’s the end game with this? Closing 2 sigma gap? agent on in every remedial institution, solving the scale problem? Where’s it going?

A: Many directions. Mini Khan Academy thing, how many people used this thing. Khan clips with agents, each one might last 5 or 10 minutes. Out there for anyone to use. Another is more coordinated, a course like MOOC-based courses, or even commercial courses. I like the smaller chunks of stuff, dynamically used, adaptive. Could be free to anyone, or part of a commercial course.

Q: What’s the scale – 1000s of them? Physics, takes 100 units of work, American History, how much work is that? 99 units? Or what?

A: Working with Army Research Lab on scaling these up, Generalised Intelligent Framework for Tutoring, GIFT. First book has appeared – Next one is authoring tools, then assessment, then scaling up. Authoring tools are the key challenge on how to get people in any different field to author these materials. Have a script, licensed by ETS and others, to get people who are not computer scientists but subject matter experts to create these materials in English. Two major challenges, can get them to do a lot. One is regular expression problem, the other is rulesets. Examples – you might want introductions. Suppose want to introduce each other in a group, the number of rules that guide that is about 40. So if four people, say what’s your name, what if someone doesn’t volunteer their name, or no answer; takes about 40 rules for just 4 people. Normal materials developer is not able to unravel those 40 rules. So now, we have about 20 conversational modules where they can just drag that in there, all of those rules and link them to referents. Maybe that’ll be easier, that’s what we’re developing now. Another, asking a deep question and having an answer. 20 conversational modules they can drag and drop and then tweak.

Maria: Human tutors aren’t as good as you thought they were. How do you get them to develop the rules that are better than what humans do?

A: If you do have tools, with good exemplars, they can learn from that. We build these systems to help students learn, fascinated by the possibility of teacher training, professional development. Can do this subtly – you’re a teacher, want you to critique our materials. But in the process, they actually learn new expertise. That’s a model we’re thinking about for professional development. Mark, dealt with 900 tutors in Memphis with no training. There’s evidence that some tutors do more harm than good. Question is how you get them trained. This is one way of thinking about it. You have the authoring tools, ask them to advise us how to improve them. They learn from that while they’re doing it.

Q2: One opportunity – talking heads reading from a script but no captioning. Was that a design decision? Make the experience more like a person, rather than accessibility?

A: What would go on the caption?

Q2: Dialog caption, like closed captioning. You could have the text the tutor is saying.

A: We do have histories and captions in some of them. Two answers. One, there is some research that having both printed and spoken version can create split attention effect. They don’t work well with each other. If learning materials, with spoken voice over is better than print up there at the same time. We’ve done experiments on that. It could turn out the opposite – two different modalities is better than one, some like spoken some like reading, or they reinforce. However, some research indicates it’s not so good. It’s an option. We struggle with this with our adult learners. If all spoken, they won’t learn to read. Worry that sometimes we put the stuff in print, and the agent says take a look at this sentence – that forces them to read it, but if they can’t, they’ll articulate it orally. Right now struggling with that issue. It’s a big issue.

Q3: How many international students do you have? They have accents. Have to translate their own language. When I used voice machines at the bank, they don’t understand. Oral tutors, so …?

A: We’re doing work with Educational Testing Service, worries a lot about that. It’s English language learning assessment, have to worry about that. They have all possible languages and dialogues and how to accommodate that. Provocative parts of speech recognition, if taught naturally, better than if oyu try to emphasise things. Overcompensation can harm recognition on speech recognition systems. On a different project, we’re looking at speech recognition in classrooms between the teacher and students. Tried many forms of speech recognition, Google is the best and most resilient to different dialects and languages. It is an important issue. Google has the best system, and ETS, they have to worry about it in a high-stakes way to make sure nobody is compromised.

Leah: Language, international issues. Question about slicing the ideas in a different way, thinking about cultural difference. Cultural use of technology. I was aware as you gave examples, the rules for introductions, which vary within even English-speaking cultures. Rules for introductions for NE US students probably wouldn;t work well in Scotland. Or styles of feedback. We both thought Billy the Bass said it was right when the students gave the wrong answer, because we misinterpreted the response. Done any work on this?

A: We have. Short feedback and politeness is a big part of it. There’s a tradeoff between accuracy of feedback and politeness. We’ve struggled a lot with this. We’ve found in tutoring that lots of tutors are polite, so when student says something wrong, they say “Okay”, not the verbal form, it’s the emotional reaction. “Yes!” vs “Yeeeaaaah” (drawn out), really that’s negative feedback. Students vary in how responsive they are to that. The intonation, we’ve looked at short feedback in excruciating detail. Cultural differences too. Japanese business transactions, in Japan, when they say yes, that’s being polite.

Leah: Depending how it’s said.

A: US people think they made the deal, the person from Japan was just being polite. These are all issues. In NY city, can tolerate a lot of negative feedback. In parts of the South, you’d never say that. These are all issues. We’re approaching it, in the beginning stages, see how they respond to different types of feedback, and if their emotions change and the seem insulted, shift to more neutral or positive feedback, and give more content. What if you have all neutral feedback, with conceptual fb to guide them better. Do you really need the valenced pos/neg part, could be just ‘Okay, I hear you’ – but many students want the feedback, just tell me right or wrong! Different styles. Haven’t built one that’s perfect yet. Want to be sensitive in the early phases of the conversation, how their emotions respond, and adjust accordingly. Sometimes you want to push the envelope and be abrasive, face-threatning.

Caroline: Mastery of core concept over time. Integrate in to that about where they should be at 1st, 2nd stage; is there commonality?

A: Imagine in physics, 100 core concepts. During course of physics tutoring, we track those.

C: have some concepts before you go to the next stage

A: Yes, common concepts and misconceptions. We have a big list. As you give a new problem, it may lend itself to some core concepts and not others. But if it’s on deck, maybe concepts 12, 42 and 98 are relevant, you see by their acts of articulation/actions whether they’re correct. Can keep that big map over time. Hopefully, statistically they’re going to be honouring those over time. We have a big grid of correct concepts and misconceptions. Over time as they work on different problems, track it. Operation ARRA is on critical thinking in science. We have 21 core concepts – e.g. correlation does not mean causal, you have to operationally define your variables. Tracked over 20h of the game, there’s 6700 measures. Funnel those in on how much students get that accurately, apply it when a concept is relevant, how much they can articulate. How much time it takes to instantiate the right concept. We look at time on task, discrimination, generation, scaffolding required, over a 20h period.

Q4: A new set of data around students and their reaction to their learning experience. Facial recognition really exciting for me. Is that done automatically? Are there tools that would code these human emotions dynamically? Or is it human coders going back in?

A: Early, did human coders to train the automated classifier. It homes in only on confusion. That’s the most important. We don’t have a general one. Looked at assessments on the automated one. For things, there are two emotions worn on the face, confusion and surprise. Surprise you can’t do something about – maybe you want to keep them surprised. (laughter) But confusion – there’s a zone of optimal confusion. That may vary with their traits – some you might want to keep confused. Others will get frustrated, and then in to boredom, and they tune out. Want to be dynamic and adaptive. Confusion we have nailed, we don’t worry about surprise. Frustration is usually hidden. Prefrontal cortex puts inhibits, if you start throwing things around people don’t like that, so you hide frustration. There’s a smirkiness to frustration. Dialogue gets to frustration quicker, so you rely on the dialogue history and action history for that (e.g. hitting buttons quickly). It’s boredom and flow, we can detect flow by fluent activities. If your productivity is going, on a roll, you can pick that up. Boredom we’re trying to get  good detector. Timing is relevant. We have a nice boredom detector on the decoupling of the complexity of the material and the time you spend on it. If you’re reading, it takes a certain amount of time. If they’re real quick and it’s real complicated, they’re disengaged, want to do something, ask a question to get them re-engaged. Each one has a special case. The one that needs the face is confusion.

Q4: Our courses are async, online, don’t have students f2f, but that’d be another dimension we could just track with a camera.

A: Lots of systems have a camera, not quite ubiquitous. There are elements of the dialogue that you can detect confusion, but the performance is not so good.

Q5: Across cultures, found mapping between facial expressions and emotions to be consistent?

A: Eckman’s work on universal emotions, fear happiness surprise, some say more culturally specific. I’m convinced that some of the emotions are universal. Confusion is one of them. When you’re confused, you (wrinkle your brows, eyebrows), like you’re looking at things more precisely. Like reading something in small print. Confusion in the face has vestiges of that, maybe biological history to that. Surprise is universal. You don’t have to train a child if they are surprised. Wider eyes. That’s universal. Things like how to deal with frustration, is socially influenced. As before, people often hide frustrations if they’re properly socialised. Maybe in other places it doesn’t matter, perhaps in NYC. Boredom, people try to hide their boredom. Some people just (flop, slump). They haven’t been socialised in hiding their boredom. Other people fidget. It’ll be culturally sensitive.

Q6: Zone of optimal confusion. I know you’ve done work on creating confusion. What about learners who don’t (thrive on that; I missed a bit here).

A: We’ve published on this. One is, it’s good to plant confusion. Contradictions between agents – that act can cause higher learning. There’s a role of confusion, whether it’s a mediator or direct cause is under debate. Creating cognitive disequilibrium is good. Let’s say they’re confused. Question is how long to leave them there. If they’re an academic risk taker, high self-efficacy, you want to prolong the confusion and hope they solve it on their own. For those who may be more impetuous, low self-efficacy, you may want to come in there sooner with a hint. Right now we don’t know. We don’t know the exact parameters and the timing of how to manage the confusion productively. We want to create circumstances for confusion. Impasses, problem solving, want them to self-regulate to handle any situation at any time. But there’s a long route to that point. That’s at the horizon of research right now. We know confusion helps, best predictor of deep learning, not shallow. Manipulating can help that process. Finding traits of their cognitive aptitudes, how that interacts, we’re trying to figure that out right now.

Q7: Can learners tell agents shut up, you’re confusing me?

A: Interesting. Some frozen expressions – like shut up, get out of my face – store those and know how to respond. If you think they like you, you can come back and say, you shut up and listen to me. But if not, you can back away. Need production rules for handling all that, to accommodate the situations. We haven’t had circumstances where the human has actually said that. We have had students who get up and yell at the agent. But they haven’t said shut up. But have said “you’re wrong, you’re wrong!”. If you can get to that level of animation, you’ve succeeded in some sense, but what you do about it is a question. There’s a role of a bantering, playing interaction with the agent. Cultivating that would be great. For engagement, a typical prude tutor, very matter of fact, very factual, just gives the right answer, affect blunt. That gets boring after a while. However, if you have a way of engaging, playing with them that can keep the dialogue alive. How to do that adaptive to their personality is a key part. Different styles of tutor, easy to change with short feedback. Have a prude tutor, a rude tutor – the adults like the rude tutor more. You could have crude tutor. You could have a nude tutor, learn by striptease.  A lewd tutor. (?!) It’s not too hard to modify the system for these styles. Our IRB probably wouldn’t allow the nude and the lewd tutor. You can imagine that tailoring it to their style would be good for keeping the dialogue going.

Q7: Strip poker tutoring!

Q8: When tutor gives feedback, does it take in to account the mastery of concepts, and if it does, how does an educator define those rules when they’re building content?

A: Works fine if system already has expectations and misconceptions. Novel things, generated by the human that’s not on that list, it’s not handling. Someone has research human tutoring, human tutors don’t pick up on novel things from students either. Poses an interesting question – how does a tutor learn? I’m convinced they learn from experience. What they do, they get a case they haven’t figured out, take it home and puzzle it out, realise student had a misconception, takes time to deconstruct it, then they’ve learned from that. Always wanted to build an AutoTutor that learns from experience. There’s so many projects we haven’t been able to pursue it. I’d like it to learn. E.g. when it didn’t do a good job with a student. Find the features it’s missing, add it to the tutor. Or even do it automatically. Students may want to articulate Newton’s law, F=ma, can express it in many ways. We have LSA and other analysers to analyse the meaning. If high enough threshold, can store it as another exemplar, so over time get bigger corpus of how express F=ma. Then periodically reorganise its semantic space, so the AutoTutor is learning. If it’s new concept from whole cloth, not clear how it’ll get that. Human tutors don’t get it, takes a lot of reflection.

Q9: More on educational psychology. Lot of work in the animal behavioural world about interspecies different personality traits that may be different evolved survival strategies. Some people more reflective than others. Talked about individual learner differences. I believe more we understand neurological differences, we’re going to do a better job of handling them. Are you familiar with this? Very little overlap between this work, there really are people wired more as hunters than gatherers. Few bringing this to learner realm.

A: Military that looked in to learning styles, found they didn’t predict learning, ended up being a dead end. Not clear they looked at the right style – e.g. visual or verbal, that didn’t predict much. Might be a style on biological proclivities on whether you’re more an exploratory type, maybe that’s a hunter. Some people have higher situational awareness, bigger span of attention, that’s different form people who are more focused. Does have a relationship to emotions. When in a positive mood, have broader span of attention; negative affect it’s more narrow. Mood can predict whether you’re going to do well on a task. If you need precise analytical concentration, maybe a negative mood is different. Biological analysis of species, similarity, I’ve listened to talks but not done things directly myself.

Stephanie: Excited about info about learners when they come in to the class. Example where appropriate style depended on higher or lower skill students. Future where tutors ingest data we know about students before the task – the tutor could know someone’s math SAT score, high school GPA, which courses already taken. A lot of data we could feed in to personalise the learning. What level can the tutors handle this information?

A: Great question. Computers will do a lot, lot better than humans. It’s the combinatorial problem. Say 10 attributes, high or low, that’s 2^10 options. Computers can track that, but humans can’t do 30 students times 2^10th. They may end up perserverating on just a couple. Personality theory, people see people on only 3-7 dimensions.

Stephanie: Human tutors aren’t the gold standard. Humans aren’t going away. Is the increasing ability to input this data about the learner help to get the tutoring system to learn? More personalised?

A: I do know the army is interested in this, the Dept of Defence. They want to keep history of the learner, personalised profile based on the past. Want to store it, use it to guide them in the GIFT tutor. Also promotion, career development, use that information to recommend the next step. Making use of this information would be tremendously useful. Hasn’t  been enough research to see how that knowledge helps. Have seen what human tutors do. Digital teaching platforms: you have all the information there, would think teacher could adapt, but they’re so bewildered they don’t have time to do it. That’s where the computer can help. The whole history, digital learning portfolio, that’d help. It’s stuff to be mined and minded.

Q11: Combinatorial explosion, only works if you can list the important factors. Humans good at abductive, incorporate new factor on the fly. How do you specify the right factors ahead of time? Either going to invent abductive reasoning in a computer. Or what are the factors going to be?

A: I wanted to say, take humans analysing other human’s personalities. Research finds any one person only uses 3-7 attributes to classify people, you could use 100.

Q11: Humans are good at it!

A: Are they? They agree about it. Have implicit personality theory. Classic example, describe others, reflects more the describer than the target. The accuracy of these is suspect. When a teacher evaluates a student on attributes of the student, there’s a question of what they’re looking at, and what the accuracy is. I’m not convinced their accuracy is spectacular. It’s a small set, and the accuracy is suspect. Computer can do better. What we know from research on tutoring, students can express misconceptions, and they miss it most of the time because it’s not on their radar. It’s not, people think human tutors can pick up so much more, they can work with them individually. THat’s not what the data say.

Q11: Human tutors can pick up things that are not there, outside the system.

A: My claim is human tutors do that every once in a while, but they don’t do it frequently. The fact they do it some puts them at an advantage. Takes a very vigilant human tutor to say, I realise now that student has this misconception, I’m going to pay it more attention in the future. I don’t know whether a computer can do that. Maybe, if you apply certain data mining procedures, would it deduce new misconception it didn’t know about that maybe it should. Or maybe it’s not capable.

Q12: What limits do you use for agent tutoring, if any? Helping students write better. Cna the system help with those procedural tasks?

A: My colleague Danielle Macnamara ?), worked on Writing Pal, helps them with that. Also looking at emotions that help them write. How many people here really enjoy writing? (A few.) How many excruciatingly painful? (Many) Especially under time pressure. Group activities, summarise in 10 minutes. That is hard. It’s a good place to study emotions. Writing Pal helps people. LSA essay graders, very promising. In olden days, the current essay graders do it equally to expert graders. Anyone skeptical about whether computers can grade essays well, if you don’t know they do it very well, you’re not reading the right literature. ETS and ?Pearson have *Very* good ones. Students spend 2.5x more time just with simple feedback. Analyses cohesion etc, not just mechanical spellchecks. Used to be, turn in a paper, by the time you get feedback, have forgotten it. But here, immediate, can also figure out if people are gaining. That’s very motivating.

Caroline: Assessment would move on to a dialogue, to show the best they can do rather then one-off first time?

A: I hope so. In the future, you’ll never know you’re being assessed. It’s always learning and it’s adapting, you don’t have to go to places where you’re Being Tested. Whether it does is an interesting question. Sometimes high stakes is good, people are at their best. But for most of it, not reason not to track, open environment, see how good they are, improve themselves. That’d be good. Are we there yet? As long as there’s politicians I don’t think so.


2A. Panel. Setting Learning Analytics in Context: Overcoming the Barriers to Large-Scale Adoption Setting Learning Analytics in Context: Overcoming the Barriers to Large-Scale Adoption. Rebecca Ferguson, Doug Clow, Leah Macfadyen, Shane Dawson, Alfred Essa, Shirley Alexander

[No liveblogging - I was on the panel! But lots of interesting questions.]

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.

Doug’s blogs from LAK14

Martyn Cooper's blog - Tue, 25/03/2014 - 21:04

I have not made it to LAK14 but am following the conference through Doug Clow’s blog posts.  I commend them to you if you are interested in Learning Analytics.




LAK14 Tuesday pm (4): Machine learning workshop

Dr Doug Clow's blog - Tue, 25/03/2014 - 20:07

Liveblog from the second day of workshops at LAK14 - afternoon session.

Machine Learning & Learning Analytics Workshop – afternoon session


Discussion of papers in previous session – small groups

Three themes:

  • Data driven vs theory drive and role of ML
  • Moving focus from learning outcomes to learning process
  • Incorporating pedagogical and contextual information into ML models

My group: on process.

ML models predict pass/fail; more interested in figuring out which ones I can’t predict. Factors you can do something about vs ones you can’t. Sensitivity analysis? Changing the pattern of activity. Metacognitive ability. Is directing a student to an effective resource the same thing as a student finding their way to it? Possibly not (underlying variable of learning ability). Exogenous variables too, hard to know what to do with it.Prior knowledge. ML still heavily content-centric; not so much what students’ current knowledge graph is; maybe start with better knowing the learner better. Need to get in to more detail of the learner’s process. What if the process is the outcome? What do you tell students to do when they go to a resource? Not just individual process, but the social one?

Feedback from groups:

We haven’t hit the goldmine yet – what’s the reason? Process vs outcomes – what scope of outcome? Learning outcomes are the same as learning processes, just at a different level of granularity. (!) Behaviourism, ways of perceiving things.

Phil: Learning styles! [was mentioned in feedback] It’s clear people believe they have them, also clear that what they believe has little correspondence to what they do or achieve. So phrase it in terms of what people believe about their learning styles, and how that may affect performance.

Chris: Learning styles for a digital age: clustering activity of student behaviour, say these people are alike.

Doug: Approaches to study, and how they are contextual, not a stable orientation across contexts. But great to do data-driven empirical enquiry.]

Q: It’s widely know, very actionable. Does LA community imagine something that could reach the same level of success, about the wide range of knowledge?

George: This is things to think with, like digital natives stuff. Once you’re done thinking with it, it’s a limiting factor. Developing and advancing the learner. Learning styles research, you play to strengths. The ideas of improving instruction, advancing learners, as long as its data driven and we get feedback on how well we’re doing it.

Chris: Will LA and DM lead to theories that are like learning styles in that they are broadly applied and understood. Not necessarily learning styles. But a lot of LA couches things in current educational thought and theories; but could use analytic qualitative methods like grounded theory. Even if implication for research is we’ve shown this works in the MOOC, we build a theory from the data, rather than pulling in a theory from elsewhere and say partially confirms it – maybe it partially doesn’t confirm it.

George: Issue simmering, haven’t got in to it yet. Role of theory and data. Camp that argues that maybe we don’t frame our data exploration through a construct in advance of it, we devote more effort to what emerges. It’s discomfiting. Peter Norvig, power of data – the unreasonable effectiveness of data. Anderson in Wired, maybe the challenges of data/theory need to change to more data.  Still a bit far off from that, we’re still arguing about learning styles.

Q2: Do we need taxonomies of different types of courses. Learning styles, they’re so easy to interpret. If you can find a learning approach, take in to account all the contextual factors, the size of the course, do we have something like that.

Dragan: Lots of research on learning design, some people documenting it. Martin Weller presentation at MRI conference on coding types of MOOCs. Still not much understanding, capturing the pedagogical context, and intent, is guiding students. Data driven; educational evidence we have some stuff – learning styles the effect size is so small. Already shown it’s not effective. Metacognitive skills, motivational strategies, approaches to learning.

Q3: Pulling up learning styles is a straw man. Data driven is so much better! But the straw man sucks! One of the constraints of a data-driven perspective – seeing a goldfish and a dog, given instruction to climb a tree, see some of them are not doing it. If the goal isn’t effective, the data isn’t helpful about how to change it. There is a level of IDing educational theories not proven to be ineffective, that’d be (useful to deploy).

Automated Cognitive Presence Detection in Online Discussion Transcripts.
Vitomir Kovanovic, Srecko Joksimovic, Dragan Gasevic and Marek Hatala

Vitomir presenting.

Automating analysis of content of discussion transcripts. Specifically, cognitive presence – community of Inquiry framework. Goal to see how learners are doing, could be good to construct interventions, give feedback to learners on their contributions.

Asynchronous online discussions. Community of Inquiry framework – three constructs - social presence, cognitive presence, teaching presence. Cognitive presence is phases of cognitive engagement and knowledge construction. Well established, extensively researched and validated; content analysis (?by hand) for assessment of presences. Cognitive presence – “extent to which participants are able to construct meaning through sustained communication”.

Four phases – triggering event (issue, dilemma, problem); exploration; integration; resolution. Coding scheme for content analysis; requires expertise. Extensive guidance on the coding instrument, need domain knowledge too.

Text mining as an approach; text classification. Work on mining latent constructs. Most commonly used features, e.g. lexical and part-of-speech N-grams, dependency triplets. Commonly Support Vector Machines and K Nearest Neighbours.

1747 messages, 81 students. Manually coded by two human coders, 98.1% agreement, Cohen’s kappa 0.974. (Wow, that’s impressive for any human coding task.)

Same corpus. Feature extraction – range of N-gram techniques. Dependency triplets – captures connections across larger ranges of words. Also ‘backoff’, where you move one part of an N-gram to the part of speech – e.g. is to <verb>. Other features too – number of named entities, first in discussion, reply to the first. Linear SVM classifier. Java, using Stanford CoreNLP toolkit. Classification using Weka and LibSVM. Java Statistical Classes.

Results: got accuracies up to high 50s%; Cohen’s kappa around 0.4. Backoff trigrams were best; but they had lots & lots of features there. But entity count, is first, or is reply to the first were moderately good, and are only a single feature each.

Working on nested cross validation for model parameters. Best accuracy 61.7%, kappa 0.43.

Plans for future work too, to improve the accuracy. Move away from SVMs, not giving and clues to interpretation; good for classification but not interpretation, and can’t get probabilities for classification. Logistic regression, boosting, random forests, which features are important.

All the features are surface-based.


Dragan: Why did you remove the resolution phase?

When we removed the resolution phase, the focus of the course was on integration, we removed that one because it made sense. It was not really important in this context.

Q: Accuracy?

Yes, human coders had coded the messages, so could classification errors and kappa. Prior probabilities of each class as the baseline.

Developing predictive models for early detection of at-risk students on distance learning modules
Annika Wolff, Zdenek Zdrahal, Drahomira Herrmannova, Jakub Kuzilek and Martin Hlosta

Jakub talking.

Identifying students potentially failing, and give timely assistance. Using all available student data (Demographic, VLE clicks), predict result of next milestone or final result.

Problem specification: start with demographic data, during progress of the course, gather data from interaction with VLE, and scores from tutor marked assignments, and then have the final exam.

Found students who fail first assignment in fourth week has high probability of course failure (>95%). Have to start predicting before first TMA. (Aha! This is what I hadn’t understood earlier. Yes, if you fail first assignment you’re likely to drop out – but a lot of people drop out despite passing it, too.)

Demographic data – static – gender, age, index of multiple deprivation, new vs continuing, student workload during the course, number of previous course attempts. VLE data – from virtual learning environment. Currently one-day summary data, updated daily. Forum activity, Resource, view test assignment, web subpage, online learning materials, homepage activity.

Predictive model – need data from previous years to predict this year. Using three models – decision tree, k-nearest neightbours, naive Bayes.

Decision tree – CART – on VLE and demographics. K-nearest neighbours. 3 closest students from prev presentation affects final decision; applied separately on VLE clicks, and demographic data. Also naive Bayes, discretisation of continuous variables (AMEVA), assumes they are independent.

Results. 10-fold cross validation. Prediction for TMA2 in week before it (10w data). Precision 66%, 68% recall 53%, 55% – good for CART and k-NN on VLE; but k-NN on demographics is bad at this stage. naive Bayes 47% precision, 73% recall.

The models vote, and if more than two votes for a student, predict that they’re at risk, if not, say Ok. From the at-risk students, generate a list, student support team makes an intervention.

How to evaluate the predictions? How do the interventions affect the models for predictions?

Preliminary results: percentage students who were predicted to not submit who did submit (precision): Average error over each week was 7%±0.4%.

Have dashboard – OU Analyse. Several features.


Ugochi: Four models, but they didn’t all work – could throw away demographic, in the cross-validation. Then determine by voting – why give a weighting for that?

Now giving them equal votes, but Martin working on determining which models should have larger weight.

Martin: This was first step.

Ugochi: Why not drop the demographic?

Student support teams like the idea that you have this. We need to compute it anyway because they want to see students from previous presentation, so why not put it there.

Someone: Validation – have old datasets?

Yes. We have lists of students we predicted, we knew which in each week. We have students who submitted their first TMA. From those lists, can compute the error of our prediction.

Someone: Must have old datasets?

Cross-validation performed on old datasets. For current presentation we only have this so far.

Q2: How do you know after you make the intervention, how do you plan to evaluate it?

That’s the big question. We were thinking about some blind study, split students, but how can you decide – if you know that students were at risks, how do you choose which students you don’t intervene. We have data from previous presentations where it performed better.

Q3: How about two sections. One had the system, the other don’t, they don’t know about it. Will see what happens at the end.

Who should decide who will be in one group?

Q4: In for-profit ed we do this all the time. We also did red/amber/green – amber are the recoverable ones. Red and green, ignore. We do controlled studies all the time. We realised motive for us to prove this.

When we speak about this with teams, they say we’re crazy. Why should we not intervene with all of them?

Q4: It’s like for the greater good – like with pharmaceuticals. You have to do this.

I have my arms tied.

Ugochi: Can also do crossover design.

Q5: Problem with these models. We come up with the model, predicts, kappa 0.8, but student adviser, if we can’t interpret it to give meaningful strategy, you’re going to hit a wall.

Modelling student online behaviour in a virtual learning environment
Martin Hlosta, Drahomira Herrmannova, Lucie Vachova, Jakub Kuzilek, Zdenek Zdrahal and Annika Wolff


Martin speaking.

Primary goal, to find at-risk students, advise them of best learning steps. Paper goal is to understand factors of student behaviour in VLE that determine failure of success, and find a model that’s easy to interpret by course staff and tutors.

The data are the VLE activity and student assessments, didn’t include the demographics.

1st TMA is strong predictor of success/failure in the course P(FAIL|FailTMA1) > 95%. Activity in VLE before 1st TMA. So only modeling before the first TMA.

Secondly, need to understand the course, separately, because each of them is designed in a different way. Some fully online, some they get books by post.

The analysis process was taking a course, ID important activity types/content types on the VLE, then student behaviour modelling, gives a course fingerprint.

Important activity type identification is by Bayesian analysis – as time flows, there is a greater difference between success and failure. Forum activity (F), Resource activity (R), view test assignment (O – OUcontent), web subpage (S-subpage). Two models – GUHA, Markov chain based analysis.  General Unary Hypotheses Automaton. Discovery of new hypotheses from the data. Markov chain – stochastic, next state based only on current state; graphical model representation. Better interpretation for target user.

Two types of analysis – activity intensity in the week, and content type in the week.

So four important activities, identified through Bayesian analysis, gives sixteen possible states. Map journeys through those states. Show graphical form of paths through the states over time.

Two approaches. Graphical is more intuitive. For future, have cumulative states, and enrich the current ensemble predictor.


Ugochi: Seems like you use a categorical way of looking at VLE data. What if more continuous – e.g. amount fo resource vs forum.

With continuous, you have to discretise it. In prediction models we use, this is it. Either the binary information is important.

Jakob: We perform experiments with continuous data, shows us there is not much more information than in the week summary. Some only engage on specific days, get too much information.

Tea break

Small group discussions
  • Scaling up qual methods with ML
  • Prediction vs description with ML
  • Evaluating ML results and implications

My group: Evaluating base rates. Some with kappa over 0.8, which is amazing. How well do these models do over human estimates? Some people doing just this. Simon had a paper at LAK11; like these essay grading systems, how close to human raters. Challenge of human-human kappas being lower than human-machine. Follow/not follow advice from recommendations; does someone who didn’t follow advice and did badly follow it next time? What’s the tracking of that, the receptivity? Longitudinal study needed. Hold up gold standards of grades – but these are not so good/reliable. Interpretation of failing first assignment. Shouldn’t be on the course, vs should support. What about a weeding course? What variables would you want to add for the 5% who fail the first assignment but still pass. Interest in the grey area, the false positives and false negatives. What’s the impact of adding a new variable – amplification beyond that. What’s overfitting in this context? Want to affect the practice – prediction is valuable and interesting, but the point is to change the outcome.


Not all participants have the same goals. Can have much data, but if you don’t have the important data, it’ll suck and so will your ML. Do we need predictions only, or explanations too? What is a qualitative method? – putting judgements on things as humans, using those. We need to make predictive models sharable objects in themselves; can build on each other’s work. Prediction vs description – not really ‘versus’, have to do both. Without description, don’t know what to do with the prediction. Results of probability, what do you do? – depends on audience. “We’re Ok with qualitative because we’re going to just force it in to quantitative anyway”; comfortable with that, but will it be strong enough?

Wrap up

George: Becoming more aware of machine learning concepts in learning analytics, learning sciences. Based on some of the conversations you’ve had today, spend time about what you think would be a next step to bridge ML and LA as disciplines. Personal – want to become better aware of ML, or of LA lit. Or work on pulling together a paper with people you met here. Discussion based on your experiences today, what are actionable next steps you’d be happy to pursue? Being dazed and confused is Ok as a strategy too. (Laughter.)

My group:

Having staff – machine learning people – get money!

One possibility (me!) – write a simple primer on ML techniques and what they can and can’t do, perhaps with an example(s) for each that make sense for educationalists. An overview.

Phil: People take the data that’s available because the software has that feature. Ask, how the software can be designed to ask what data could be useful. Also, confidence slider/radio buttons after questions- how confident are you that you’re right? What should we be measuring? That drives the instrumentation question.

Dragan: More people talking to each other. Repeat workshop like yesterday with Phil and Zach. Publications – tutorial type papers – special issue in the journal? ML primers for LA people.

Like ‘statistics without tears’.

Hands-on activity is good.

Shane: Want to make the data in his university available to the ML people.


Bridge HE and K12.Developing open standards in terms of the data – what metrics, what analysis can be done. Making dashboards available and evaluation frameworks. Education provides questions they want to answer, formulate them in ML terms. People don’t realise they need these techniques. But – has to be genuine dialogue going on.

George: This is where we torture LA folks, and in Doha we do something similar to the ML folks. So if feel it’s of relevance, want to introduce it to others. But it’s not a one-way street. Here it’s what can LA learn from ML, but the other half is important too. Confusion isn’t a good thing, but going through it is the route to learning.

State of the field type paper, what are the big RQs. Reference implementations and results. Moving beyond learning styles – identifying clusters empirically. List of things each community want from each other.

George: It’s trying to understand the world in a way that isn’t based on metaphor and narrative, like when we first encountered e.g. a new coding tool or stats work. As you get deeper, see the value as a way to understand patterns you can’t capture through metaphor and narrative. We’re in a similar space now for LA and ML. Lots of people at a similar stage. As you progress, there’s the prospect that you can gain a different level of insight in to large-scale methods.

Making the ideas more accessible, e.g. wrapped around a problem statement, easier to grasp the idea then, than when expression of a process or algorithm; so problem-oriented. Software redesign, optimised to capture the data you want. Continued interaction between the communities. Accessible concepts – making the ideas translated so can answer ‘What can I do with that?’. Finally, making data more accessible – ML experts haven’t had access to educational data.

This work by Doug Clow is copyright but licenced under a Creative Commons BY Licence.
No further permission needed to reuse or remix (with attribution), but it’s nice to be notified if you do use it.