[]1One of
the things I’m interested in is how to use assessment (and especially
peer and self assessment) as a learning activity. A method I used as a
teacher, and more recently in my research, was to use a kind of
‘diagnostic and training’ exercise in which students would assess (on a
rubric) sample assignments about which I have prior knowledge (i.e., I
know their grades), prior to doing any other assessment. The benefits of
this model are that (1) students gain understanding of the assessment
system (2) students see assessments other than their own (3) feedback
can be richer, more frequent, & more collaborative, and (4) that you
know the feedback will be of a certain quality because of the
diagnostic/training element. At UTS we have some wonderful practice on
this very thing – [Cathy Gorrie in life sciences]2, and [Andy
Leigh (also life sciences)]3, (and probably others!) – have worked
on a model using ‘benchmarking’ with exemplar cases to be marked by
students (per my diagnostic), alongside flagging common errors in
student responses (things to ‘watch for’ in assessment). In Cathy’s talk
today, she indicated this has largely gone well, with students tending
to rate slightly lower on high marked texts (i.e. being too harsh)
and higher on low marked texts (i.e. they were generous), and there
was a big range of results given across the exemplars, but excellent
written feedback. Markers notified if they did a poor job, and asked to
re-do the benchmarking. On appeal students could write 1/2 a page to
justify the mark they thought (3/307 took that opportunity, with 2
awarded). It also looks like students are broadly satisfied with the
model with few complaints (4, who just really don’t like peer marking),
and those generally thought it was lazy teaching…emphasising (1) the
pedagogic value of the exercise (rather than educator burden relief),
and (2) (suggested to me by someone later in the day) the _range _of
results given on the benchmarking, to flag how varied people’s
interpretation of the marking criteria are, may help with this issue.
Data investigations This area excites me because I think
it’s a great teaching tool, and it was something I enjoyed doing with my students (as well as asking them to write sample exams, and asking them to improve poor answers). I also think there’s a lot of rich data from the technique, and there are a lot of small changes that could be explored for their impact, some of which would adapt a generic model to be closer to the kind of calibrated peer assessment at UCLA (but without buying into the particular product), for example: 1. Around the benchmark, it would be interesting to know: 1. if students who perform poorly, also perform poorly in the assessment 2. if students who perform poorly, in fact perform poorly as assessors too 2. To reduce the need for moderation and (hopefully) increase quality, it would be interesting to look at some reliability analysis of student ratings, and whether or not targeting discrepancies (or, tracking students who tended to disagree) could reduce variance in grades, and perhaps also support those students in their assessments and assignments (per ‘1’) 3. Analysis of the written feedback would be interesting. In my work I asked students to suggest 3 improvements that could be made, using a drop down menu to categorise those improvements according to the rubric facets they were assessed on. Looking at the kinds of suggestions made is much easier with that kind of semantic annotation, but even just taking the written feedback and analysing the topics/themes should provide useful insight. It would also be interesting to explore whether ‘poorer’ raters also gave poorer written feedback or not (and especially if not, why is that the case?) Update I’d forgotten, I actually posted a short powerpoint of the model I use, along with some teaching resources in an [earlier post on this issue!]4
Footnotes
-
/static/2015/11/0d41b348606fd6e4959ee5aa_150_marking.jpg ↩
-
http://sjgknight.com/finding-knowledge/2015/07/diagnostic-peer-assessment/ ↩