> How do you distribute a shared dataset, in such a way that quiz-answers based on the data can’t just be copied? One of the subjects I’m involved in: 1. Has a reasonably large, and growing enrolment (>50, which will grow to multiple classes of ~40) 2. Has a diverse cohort with students from across faculties 3. Teaches quantitative skills, with in-class tuition focussed on practical examples, and at-home quizzes building up to an major written report of student analyses of their own data 4. We want students to both grasp the concepts (e.g. of a z score), and get experience of communicating about the concept (and critiquing the communication of others) The subject is continuing to evolve and it’s great to see the payoff from our continual improvements. At the moment our classes focus on application of quantitative/statistical concepts to example cases, but we tend to shift datasets each week. We do emphasise that students must apply their knowledge to their analysis of their own dataset – to be written up as a report – and we have a number of very good scaffolds in place for that, but no direct modelling.  One proposal I’ve been wondering about is whether we could use the same dataset over a number of weeks to teach multiple concepts, and to provide a thread such that an example practice report could be written based on the shared data.  In that model, we could: 1. Thread a theme through the sessions, demonstrating how different statistical techniques provide different insights, and how communication of that (and the particular analyses we choose to communicate) matters 2. Possibly increase buy-in or engagement, by working with an authentic dataset with a 3. More readily provide extension work – students who are ahead can engage with the same dataset still, using additional techniques (the results of which could be shared back for working on how to communicate the numbers) 4. Make assessments more authentic, by asking students to actually do analysis for reporting of the statistics (rather than MCQs) and providing support to understand that analysis and its reporting when they come to their own data 5. Shift more direct tuition to pre-work, with in class sessions based on problem solving on the particular data, and communicating insights (with a possible role for peer review) Of course, to do that effectively, you can’t just use a shared dataset – with the best will in the world, there’s no way to avoid answer-copying in that scenario.  I’ve been pondering for a while whether we could, though, introduce a shared dataset with each student receiving their own slightly different version, such that while the generally shape and insights in the data are identical, the precise data has some random variance. OF course copying formulae would still be possible – but who cares?

Looking to set a qiuz in which each student has individualised answers (they all have distinct datasets) – any options? #edtech #tools-and-analyses/blackboard

— Simon Knight (@sjgknight) December 16, 2016

In some of my sessions (ones where we use google sheets) I already use conditional formatting to ‘grade’ work (if cell x = y, then green, else ‘check your formulae, this doesn’t look quite right – or whatever).  I’m aware other projects have done some form of plagiarism detection on excel files (Sydney had an ‘XL app’ (I think also for shared data), and this [tool from Kent]1), but nothing for this specific use case. I’m thinking: 1. Use google sheets 1. Cloud based so no software issues, students can use it on their phones (they do…although I think this is a bit mad) 2. Access control – to give direct access to someone else would require sharing their UID/password (of course, that could happen – but it could happen with any take-home quiz assessment too) 3. We can see what’s happened in the sheet and offer support 4. Easy to give/remove access, and to programmatically access the data 2. For a list of UIDs in a googlesheet 3. And a template dataset in a separate googlesheet 1. Ideally the class would have some say over the source data (prework vote) 4. Use [Doctopus]2 to distribute a copy to each student 5. For each copy within the folder, introduce random variance to the data, constrained by the distribution and min-max 1. Use ‘googlesheets’ package in R to access all the files for ‘gs_edit_cells’ use 2. Exclude min-max from dataset 3. Use jitter function to add noise to the other values 6. Each copy is then available to the student and tutors 7. For each sheet, compute whatever values we need (e.g. M,mode,median, etc.) and paste into the class-roll sheet per-UID 8. Create a quiz that asks for whatever values from the dataset 1. Either, just collect them, and then grade them by checking against the individual dataset values later 2. Or set the quiz to read individual-level answers from the reference sheet and automatically distribute/feedback that (the auto-marking bit is easy – using google forms you can just use formulae within the collection sheet to check, but I’m less sure I know how to show that back to the user) 3. Of course another way to do this is to have a ‘you have to get the correct answer in this cell’ assessment within the sheet itself, and auto-grade based off that. I quite like the deliberate nature of needing to complete a quiz though 4. Possibly grade the formulae as well as results

Footnotes

  1. https://www.cs.kent.ac.uk/pubs/2009/2950/

  2. http://sjgknight.com/finding-knowledge/2016/06/using-google-drive-and-doctopus-for-whole-class-group-and-individual-tasks/