[THIS post is a slightly edited version of an internship proposal I submitted…I’ll try to update as I hear about that) Introduction With an increasing interest in ‘big data’ and business intelligence, and the rise of Virtual Learning Environments (VLEs) and Content Management Systems (CMSs) has come a new development – analytics for education (Ferguson, 2012).  The increasing use of such analytics for accountability purposes has been part of the motivation behind an increasing shift in these analytics towards their potential as pedagogic tools (Ferguson, 2012).  The economic pressure to reduce costs in education is both a further motivator, and risk factor for innovative pedagogy which makes use of ‘learning analytics’.  In contrast to ‘academic analytics’ – which tends to explore issues around course completion, drop outs, attendance and so on, learning analytics tends to position itself as focussing on learning and learners over metrics of success (Siemens & Baker, 2012). Six Provocations for Learning Analytics: Depending on how the tools and outputs of learning analytics are deployed, they have potential to support current educational practices, or to challenge them and reshape education.  They also have potential to: marginalise learners (and educators); limit what we talk about as ‘learning’ to that which we can create analytics for; and exclude alternative ways of engaging in activities, to the detriment of learners.  While the use of analytics for collaborative sensemaking (Knight, Buckingham Shum, & Littleton, 2013a), and formative assessment over summative assessment of learning may be important (Knight, Buckingham Shum, & Littleton, 2013b), algorithms may both ignore, and mask some key elements of the learning process.  Whether algorithms can be informants with respect to the discursive properties of learning is (perhaps) an open question.  To what extent should teachers (ever) utilise analytics as the basis of assessment, or conversations for formative feedback?  These are pressing issues given the rise of learning analytics, and increasing interest in mass online education at both pre-university and university level (e.g. the growing interest in moocs).  boyd and Crawford’s (2011) provocations on big data are of relevance to these debates.  In that paper, concerns are outlined regarding: 1)      Automating research changes the definition of knowledge (change in focus) 2)      Claims to objectivity and accuracy are misleading (still bias in data) 3)      Bigger data are not always better data (still gaps) 4)      Not all data are equivalent (small methods, big data – SNA for big data relevance? What about personal relationship of teacher-student? Peer-groups self-selected v. imposed?) 5)      Just because it’s accessible doesn’t make it ethical 6)      Limited access to big data creates new digital divides Within the learning analytics community a narrative around these concerns can be conceived: 1)      Automating LA as assessment (summative or formative) changes the focus of learning, we might be concerned about: 1. Do we measure what we value, or merely value what is easy to measure (Wells & Claxton, 2002)? (where the emphasis is on very large scale measurement with the goal of comparing outcomes across contexts at regional, national or international scale) 2. Under what circumstances might LA empower, or de-professionalise educators? 3. What are the epistemic properties of such data – can they act as reliable informants?  What could (should) teachers expect from such tools? 2)      LA may provide decontextualized data from highly contextualised settings – the notion of ‘objectivity’ in learning has long been critiqued (see e.g. Dewey (1998)), building an accurate picture of “what is going on” isn’t just technically problematic, it relies on a naïve realism which seeks objective truth, which may be particularly troublesome in education. 3)      In addition, there are technical issues with capturing data, and increasing the size of the data set is unlikely to solve these – bigger data will still have gaps, there will still be non-digital learning which it is hard to capture.  Systems which ignore this issue will be less rich than those which acknowledge it. 4)      Furthermore, many analytics may reduce the complexity of relationships – between actors, between assignments, between a student statement and an assignment of feedback, etc. – in such a way as to suggest equivalence in a network (Social Network Analytics are a key part of LA).  While these may be useful in some contexts, they may mask many (of the more interesting) facets of learning; for example, the personal relevance of online and offline relationships between teachers and students, peers, self-selected and teacher-imposed work groups, and so on. 5)      One suggestion for dealing with some of these issues, is probing student accounts (public or private, with permission) for further information.  However, even where students give permission (and access) to do this, there are serious ethical issues with dealing with such data.  An additional concern is that, while refusal may ostensibly be an option to students, there is an ethical issue with equitability of support between LA adopters and refusers. 6)      This of course is part of the story regarding new digital divides, and access to big data.  Perhaps as concerning is the potential disparity in access to analytics, or/and high quality pedagogy – whether in traditional institutions, through the increasing number of moocs, or some other model of education. While the LA community has sought to address these issues, and indeed the above might be argued to encapsulate the concerns, there is further work to engage in.  In many cases the discussion is around how to do what we’re already doing, better – improve the algorithms, make sure the Terms of Service are clearer, etc.  I think we can go further and explore: * Under what circumstances might learning algorithms marginalise students, or entrench existing divides? * What can LA tell us about learning?; do algorithms for success provide good (trustworthy, valid, reliable, objective, extensive, complex?) sources of testimonial knowledge regarding learning?; These are questions regarding epistemology and the philosophy of information, in particular testimonial knowledge, pragmatic epistemology and the epistemology of silence (e.g. the epistemic significance of absence of data). * What can, should (?) teachers use such data for?  What do we use them for?  Discursive, formative, contextualised conversations, or taxonomic, summative, abstracted judgements? * While in big data contexts, working with ‘averages’ can provide benefits – if we know the average sentiment (and ignore unclassifiable entities) then we can gain market share.  What concerns might there be in educational research and deployment of LA given the ‘broad brush’ of many such analytics?  To what extent should personal knowledge of individuals (through peers, parents and practitioners and themselves) be privileged as a source of testimonial knowledge regarding a learner, over information algorithmic analytics might provide? I want to think about how far algorithmic and network perspectives can go and whether or not we should challenge this view using a pragmatist epistemology which emphasises the shared meaning making around algorithms, over the pursuit of the “right” answer or “objective truth” revealed by those algorithms. Thinking about the sorts of questions the community should ask, how, of whom, and about what, is an important consideration for this growing discipline, and may well contribute to other fields as they continue to adopt big data analytic techniques. ResearchBlogging.org boyd, d., & Crawford, K. (2011). Six Provocations for Big Data SSRN Electronic Journal DOI: 10.2139/ssrn.1926431