I’ve recently joined the UTS Data Governance sub-committee, which alongside our ongoing work across stakeholder levels (from individual students and staff to institutional) has got me thinking a bit more about these issues.
As Elouazizi notes a key challenge in data governance for learning analytics is the inherently distributed nature of the data ownership – and (implicitly) power relations among them. For example, LMS managers are key data processors/stewards, and may own part of the data processing system to feed data to management roles; but despite their key role in the process, instructors may not get access to the same data. This also relates to Elouazizi’s second challenge regarding who has access to data for sensemaking, the hypotheses tested, and interpretations developed, and (the third challenge) the decisions actioned from this.
Happily, there’s now an excellent special section on ‘Ethics and Privacy in Learning Analytics‘ in the Journal of Learning Analytics. In ‘Developing a code of practice for learning analytics’ Sclater describes the development of the Jisc code of practice. Two really useful things that emerged from Nial Sclater’s work were a table detailing ethical, legal, and logistical issues for learning analytics (original from Jisc), which I’m reprinting below (it’s under a CC-By license) along with the code itself (also under a CC-By license).
Rodríguez-Triana, Martínez-Monés, and Villagrá-Sobrino flag a concern I’ve had before – that most discussion of learning analytics focusses on institutional level interventions, rather than teacher-led data collection and analysis. They note questions aligned with the framework below, which I’m summarising here, and make some recommendations based on case studies they conducted (reprinted below the code of ethics):
- Responsibility: Can teachers be equipped to take responsibility for learning analytics;
- Transparency: How can teachers present data collection and analysis to students transparently;
- Consent: Under what circumstances should students (or parents) be asked for consent in teacher-led analytics, and how do we give logistical support to exclude ‘opt-out’ students from analysis where appropriate;
- Privacy: Teacher-led analytics still need to maintain privacy, in addition (not noted by the authors) there are interesting issues around whether institutions should want to, and ethically be able to, collect data from smaller-scale teacher-led analytic interventions;
- Validity: How do we ensure teachers have access to the right data, and tools, to ensure high quality (valid) analytics;
- Access: How, and in what ways, should students have access to their data?
- Enabling positive intervention: Similar to institutional level concerns, except teachers of course always engage in varied pedagogic interventions and evaluation of those interventions
- Minimizing impact: At smaller scales, this may be more or less manageable by individual teachers
- Stewardship of data: Teachers may well require help in gathering data (and only the data that is necessary for the purpose), and keeping it in accordance with institutional policies
In their paper Steiner, Kickmeier-Rust and Albert cite the OECD guidelines saying:
The OECD guidelines are a relevant source of basic principles when seeking guidance on how to deal with privacy issues in analytics technologies and other systems (Spiekermann & Cranor, 2009; Tene & Polonetsky, 2013). In 1980, the OECD (Organisation of Economic Cooperation and Development) provided the first internationally agreed collection of privacy principles, aiming at harmonizing legislation on privacy and facilitating the international flow of data. The set of eight basic guidelines mirrored the principles earlier defined by the European Convention for the Protection of Individuals with Regard to the Automatic Processing of Personal Data (Levin & Nicholson, 2005). The basic OECD (2013b, pp. 14–15) principles are as follows:• Collection limitation: There should be limits to the collection of personal data. Data should be obtained by lawful and fair means and, where appropriate, with the knowledge or consent of the data subject.• Data quality: Personal data should be relevant to the purposes for which they are to be used, and to the extent necessary for those purposes. Data should be accurate, complete, and kept up-to-date.• Purpose specification: The purposes for which personal data are collected should be specified no later than at the time of data collection. Subsequent use should be limited to the fulfilment of those purposes or compatible purposes.• Use limitation: Personal data should not be disclosed, made available, or used for purposes other than those specified — except with the consent of the data subject or by the authority of the law.• Security safeguards: Personal data should be protected by reasonable security safeguards against loss or unauthorized access, destruction, use, modification, or disclosure.• Openness: There should be a general policy of openness about developments, practices, and policies with respect to personal data. Information on the existence and nature of personal data, purpose of their use, and the identity and location of the data controller should be available.• Individual participation: Individuals should have the right to obtain confirmation of whether or not data relating to them is held and to have communicated to them the data, to be given reasons if a request is denied, to challenge data relating to them, and to have the data erased, rectified,completed, or amended.• Accountability: The data controller should be accountable for complying with measures that give effect to the above.
In discussing the legal context and issues around consent in big data, Cormack says:
Data protection therefore suggests that an ethical framework should treat learning analytics as two separate stages, using different justifications and their associated ways of protecting individuals:
- the discovery of significant patterns (“analysis”) treated as a legitimate interest of the organization, which must include safeguards for individuals’ interests and rights; and
- the application of those patterns to meet the needs of particular individuals (“intervention”), which requires their informed consent or, perhaps in future, a contractual agreement.
Note that the first still requires that individuals be informed of the data that will be processed, who will process it, and why. Notably, while ‘legitimate interests’ give scope for much analytics work, it doesn’t give “carte blanche” (p.98), for example, profiling of students could be seen to be an interference with individual rights of those students. Cormack suggests that on the first type, there must be a “functional separation” between analysis and intervention, such that there is no possibility that analysis have impact on individual data subjects (unless their specific consent has been sought). The implication of this is that raw or processed data must not be re-shared back to e.g., tutors who deal directly with students in a form that could lead to direct intervention with individuals (although broader interventions – such as setting module pre-requisites for future cohorts based on analysis) would not be restricted.
Clearly there are balances to be made here, for example research to:
- Improve pedagogy to improve outcomes
- Improve pedagogy to reduce or maintain costs
- Understand pathways to improving student employability and other life-chance issues
- Understand pathways to transferring students to further study or higher-cost courses
Might all be legitimate business interests, but the last is clearly at odds with individual interests (and thus would – if targeted – require individual consent), or (more likely), be disregarded as not a legitimate aim (as would ‘change pedagogy to improve test scores’).
What still, as far as I can see, is under-addressed is considerations of what might be considered fairly traditional education/learning research – where researchers work with teachers/instructors to conduct an intervention. In these cases the responsibility is key: Teachers are a key stakeholder and should act as gatekeepers – a concept not discussed in the literature I’ve encountered so far. Interventions should not be conducted without the teacher’s knowledge (e.g., manipulation of systems teachers use or teach with). Where blinding is necessary, this consent should still be sought.
Key, then, is that proposals are targeted at legitimate quality improvement purposes (i.e., improving provision of learning and teaching), and involvement of key stakeholders as gatekeepers. Where proposals involve direct intervention with students, consent should be sought. Where data (perhaps historic) will be obtained from typical educational activities, for the purposes of normal course-evaluation processes, then direct consent may not be needed.
Taxonomy of ethical, privacy, and logistic issues in learning analytics
By Sclater & Jisc (2015) https://analytics.jiscinvolve.org/ under a CC-By license
|Ownership & Control||Overall responsibility||Who in the institution is responsible for the appropriate and effective use of learning analytics?||Logistical||1||Senior management|
|Control of data for analytics||Who in the institution decides what data is collected and used for analytics?||Logistical||1||Senior management|
|Breaking silos||How can silos of data ownership be broken in order to obtain data for analytics?||Logistical||2||Analytics Committee|
|Control of analytics processes||Who in the institution decides how analytics are to be created and used?||Logistical||1||Analytics Committee|
|Ownership of data||How is ownership of data assigned across stakeholders?||Legal||1||Analytics Committee|
|Consent||When to seek consent||In which situations should students be asked for consent to collection and use of their data for analytics?||Legal / Ethical||1||Analytics Committee|
|Consent for anonymous use||Should students be asked for consent for collection of data which will only be used in anonymised formats?||Legal / Ethical||3||Analytics Committee|
|Consent for outsourcing||Do students need to give specific consent if the collection and analysis of data is to be outsourced to third parties?||Legal||3||Analytics Committee|
|Clear and meaningful consent processes||How can institutions avoid opaque privacy policies and ensure that students genuinely understand the consent they are asked to give?||Legal / Ethical||1||Analytics Committee|
|Right to opt out||Do students have the right to opt out of data collection and analysis of their learning activities?||Legal / Ethical||1||Analytics Committee|
|Right to withdraw||Do students have the right to withdraw from data collection and analysis after previously giving their consent?||Legal||3||Analytics Committee|
|Right to anonymity||Should students be allowed to disguise their identity in certain circumstances?||Ethical / Logistical||3||Analytics Committee|
|Adverse impact of opting out on individual||If a student is allowed to opt out of data collection and analysis could this have a negative impact on their academic progress?||Ethical||1||Analytics Committee|
|Adverse impact of opting out on group||If individual students opt out will the dataset be incomplete, thus potentially reducing the accuracy and effectiveness of learning analytics for the group||Ethical / Logistical||1||Data scientist|
|Lack of real choice to opt out||Do students have a genuine choice if pressure is put on them by the insitution or they feel their academic success may be impacted by opting out?||Ethical||3||Analytics Committee|
|Student input to analytics process||Should students have a say in what data is collected and how it is used for analytics?||Ethical||3||Analytics Committee|
|Change of purpose||Should institutions request consent again if the data is to be used for purposes for which consent was not originally given?||Legal||2||Analytics Committee|
|Legitimate interest||To what extent can the institution’s “legitimate interests” override privacy controls for individuals?||Legal||2||Analytics Committee|
|Unknown future uses of data||How can consent be requested when potential future uses of the (big) data are not yet known?||Logistical||3||Analytics Committee|
|Consent in open courses||Are open courses (MOOCs etc) different when it comes to obtaining consent?||Legal / Ethical||2||Analytics Committee|
|Use of publicly available data||Can institutions use publicly available data (e.g. tweets) without obtaining consent?||Legal / Ethical||3||Analytics Committee|
|Transparency||Student awareness of data collection||What should students be told about the data that is being collected about them?||Legal / Ethical||1||Analytics Committee|
|Student awareness of data use||What should students be told about the uses to which their data is being put?||Legal / Ethical||1||Analytics Committee|
|Student awareness of algorithms and metrics||To what extent should students be given details of the algorithms used for learning analytics and the metrics and labels that are created?||Ethical||2||Analytics Committee|
|Proprietary algorithms and metrics||What should institutions do if vendors do not release details of their algorithms and metrics?||Logistical||3||Analytics Committee|
|Student awareness of potential consequences of opting out||What should students be told about the potential consequences of opting out of data collection and analysis of their learning?||Ethical||2||Analytics Committee|
|Staff awareness of data collection and use||What should teaching staff be told about the data that is being collected about them, their students and what is being done with it?||Ethical||1||Analytics Committee|
|Privacy||Out of scope data||Is there any data that should not be used for learning analytics?||Ethical||2||Analytics Committee|
|Tracking location||Under what circumstances is it appropriate to track the location of students?||Ethical||2||Analytics Committee|
|Staff permissions||To what extent should access to students’ data be restricted within an institution?||Ethical / Logistical||1||Analytics Committee|
|Unintentional creation of sensitive data||How do institutions avoid creating “sensitive” data e.g. religion, ethnicity from other data?||Legal / Logistical||2||Data scientist|
|Requests from external agencies||What should institutions do when requests for student data are made by external agencies e.g. educational authorities or security agencies?||Legal / Logistical||2||Senior management|
|Sharing data with other institutions||Under what circumstances is it appropriate to share student data with other institutions?||Legal / Ethical||2||Analytics Committee|
|Access to employers||Under what circumstances is it appropriate to give employers access to analytics on students?||Ethical||2||Analytics Committee|
|Enhancing trust by retaining data internally||If students are told that their data will be kept within the institution will they develop greater trust in and acceptance of analytics?||Ethical||3||Analytics Committee|
|Use of metadata to identify individuals||Can students be identified from metadata even if personal data has been deleted?||Legal / Logistical||2||Data scientist|
|Risk of re-identification||Does anonymisation of data become more difficult as multiple data sources are aggregated, potentially leading to re-identification of an individual?||Legal / Logistical||1||Data scientist|
|Validity||Minimisation of inaccurate data||How should an institution minimise inaccuracies in the data?||Logistical||2||Data scientist|
|Minimisation of incomplete data||How should an institution minimise incompleteness of the dataset?||Logistical||2||Data scientist|
|Optimum range of data sources||How many and which data sources are necessary to ensure accuracy in the analytics?||Logistical||2||Data scientist|
|Validation of algorithms and metrics||How should an institution validate its algorithms and metrics?||Ethical / Logistical||1||Data scientist|
|Spurious correlations||How can institutions avoid drawing misleading conclusions from spurious correlations?||Ethical / Logistical||2||Data scientist|
|Evolving nature of students||How accurate can analytics be when students’ identities and actions evolve over time?||Logistical||3||Educational researcher|
|Authentication of public data sources||How can institutions ensure that student data taken from public sites is authenticated to their students?||Logistical||3||IT|
|Access||Student access to their data||To what extent should students be able to access the data held about them?||Legal||1||Analytics Committee|
|Student access to their analytics||To what extent should students be able to access the analytics performed on their data?||Legal / Ethical||1||Analytics Committee|
|Data formats||In what formats should students be able to access their data?||Logistical||2||Analytics Committee|
|Metrics and labels||Should students see the metrics and labels attached to them?||Ethical||2||Analytics Committee|
|Right to correct inaccurate data||What data should students be allowed to correct about themselves?||Legal||1||Analytics Committee|
|Data portability||What data about themselves should students be able to take with them?||Legal||2||Analytics Committee|
|Action||Institutional obligation to act||What obligation does the institution have to intervene when there is evidence that a student could benefit from additional support?||Legal / Ethical||1||Analytics Committee|
|Student obligation to act||What obligation do students have when analytics suggests actions to improve their academic progress?||Ethical||2||Student|
|Conflict with study goals||What should a student do if the suggestions are in conflict with their study goals?||Ethical||3||Student|
|Obligation to prevent continuation||What obligation does the institution have to prevent students from continuing on a pathway which analytics suggests is not advisable?||Ethical||2||Analytics Committee|
|Type of intervention||How are the appropriate interventions decided on?||Logistical||1||Educational researcher|
|Distribution of interventions||How should interventions be distributed across the institution?||Logistical||1||Analytics Committee|
|Conflicting interventions||How does the institution ensure that it is not carrying out multiple interventions with conflicting purposes?||Logistical||2||Educational researcher|
|Staff incentives for intervention||What incentives are in place for staff to change practices and facilitate intervention?||Logistical||3||Analytics Committee|
|Failure to act||What happens if an institution fails to intervene when analytics suggests that it should?||Logistical||3||Analytics Committee|
|Need for human intermediation||Are some analytics better presented to students via e.g. a tutor than a system?||Ethical||2||Educational researcher|
|Triage||How does an institution allocate resources for learning analytics appropriately for learners with different requirements?||Ethical / Logistical||1||Analytics Committee|
|Triage transparency||How transparent should an institution be in how it allocates resources to different groups?||Ethical||3||Analytics Committee|
|Opportunity cost||How is spending on learning analytics justified in relation to other funding requirements?||Logistical||2||Senior management|
|Favouring one group over another||Could the intervention strategies unfairly favour one group over another?||Ethical / Logistical||2||Educational researcher|
|Consequences of false information||What should institutions do if a student gives false information e.g. to obtain additional support?||Logistical||3||Analytics Committee|
|Audit trails||Should institutions record audit trails of all predictions and interventions?||Logistical||2||Analytics Committee|
|Unexpected findings||How should institutions deal with unexpected findings arising in the data?||Logistical||3||Analytics Committee|
|Adverse impact||Labelling bias||Does labelling or profiling of students bias institutional perceptions and behaviours towards them?||Ethical||1||Educational researcher|
|Oversimplification||How can institutions avoid overly simplistic metrics and decision making which ignore personal circumstances?||Ethical||1||Educational researcher|
|Undermining of autonomy||Is student autonomy in decision making undermined by predictive analytics?||Ethical||2||Educational researcher|
|Gaming the system||If students know that data is being collected about them will they alter their behaviour to present themselves more positively, thus distracting them and skewing the analytics?||Ethical||2||Educational researcher|
|Abusing the system||If students understand the algorithms will they manipulate the system to obtain additional support?||Ethical||3||Educational researcher|
|Adverse behavioural impact||If students are presented with data about their performance could this have a negative impact e.g. increased likelihood of dropout?||Ethical||1||Educational researcher|
|Reinforcement of discrimination||Could analytics reinforce discriminatory attitudes and actions by profiling students based on their race or gender?||Ethical||1||Educational researcher|
|Reinforcement of social power differentials||Could analytics reinforce social power differentials and students’ status in relation to each other?||Ethical||2||Educational researcher|
|Infantilisation||Could analytics “infantilise” students by spoon-feeding them with automated suggestions, making the learning process less demanding?||Ethical||3||Educational researcher|
|Echo chambers||Could analytics create “echo chambers” where intelligent software reinforces our own attitudes and beliefs?||Ethical||3||Educational researcher|
|Non-participation||Will knowledge that they are being monitored lead to non-participation by students?||Ethical||2||Educational researcher|
|Stewardship||Data minimisation||Is all the data held on an individual necessary in order to carry out the analytics?||Legal||1||Data scientist|
|Data processing location||Is the data being processed in a country permitted by the local data protection laws?||Legal||1||IT|
|Right to be forgotten||Can all data regarding an individual (expect that necessary for statutory purposes) be deleted?||Legal||1||IT|
|Unnecessary data retention||How long should data be retained for?||Legal||1||Analytics Committee|
|Unhelpful data deletion||If data is deleted does this restrict the institution’s analytics capabilities e.g. refining its models and tracking performance over multiple cohorts?||Logistical||2||Data scientist|
|Incomplete knowledge of data sources||Can an institution be sure that it knows where all personal data is held?||Legal / Logistical||1||IT|
|Inappropriate data sharing||How can data sharing be prevented with parties who have no legitimate interest in seeing it or who may use it inappropriately?||Legal||1||IT|
Jisc Code of Practice for Learning Analytics
By Jisc (2015) https://www.jisc.ac.uk/guides/code-of-practice-for-learning-analytics under a CC-By license.
Learning analytics uses data about students and their activities to help institutions understand and improve educational processes, and provide better support to learners.
It should be for the benefit of students, whether assisting them individually or using aggregated and anonymised data to help other students or to improve the educational experience more generally. It is distinct from assessment, and should be used for formative rather than summative purposes.
The effective use of learning analytics will initially involve the deployment of new systems, and changes to institutional policies and processes. New data may be collected on individuals and their learning activities. Analytics will be performed on this data, and interventions may take place as a result. This presents opportunities for positive engagements and impacts on learning, as well as misunderstandings, misuse of data and adverse impacts on students.
Complete transparency and clear institutional policies are therefore essential regarding the purposes of learning analytics, the data collected, the processes involved, and how they will be used to enhance the educational experience.
This code of practice aims to set out the responsibilities of educational institutions to ensure that learning analytics is carried out responsibly, appropriately and effectively, addressing the key legal, ethical and logistical issues which are likely to arise.
Educational institutions in the UK already have information management practices and procedures in place and have extensive experience of handling sensitive and personal data in accordance with the Data Protection Act (DPA) 1998.
By transferring and adapting this expertise to regulate the processing of data for learning analytics, institutions should establish the practices and procedures necessary to process the data of individuals lawfully and fairly.
Institutions must decide who has overall responsibility for the legal, ethical and effective use of learning analytics. They should allocate specific responsibility within the institution for:
- The collection of data to be used for learning analytics
- The anonymisation of the data where appropriate
- The analytics processes to be performed on the data, and their purposes
- The interventions to be carried out
- The retention and stewardship of data used for and generated by learning analytics
Student representatives and key staff groups at institutions should be consulted around the objectives, design, development, roll-out and monitoring of learning analytics.
Transparency and consent
Institutions will define the objectives for the use of learning analytics, what data is necessary to achieve these objectives, and what is out of scope. The data sources, the purposes of the analytics, the metrics used, who has access to the analytics, the boundaries around usage, and how to interpret the data will be explained clearly to staff and students.
Institutions should also clearly describe the processes involved in producing the analytics to students and staff or make the algorithms transparent to them.
Students will normally be asked for their consent for personal interventions to be taken based on the learning analytics. This may take place during the enrolment process or subsequently. There may however be legal, safeguarding or other circumstances where students are not permitted to opt out of such interventions. If so these must be clearly stated and justified.
New learning analytics projects may not be covered by the institution’s existing arrangements. Collection and use of data for these may require further measures, such as privacy impact assessments and obtaining additional consent.
Options for granting consent must be clear and meaningful, and any potential adverse consequences of opting out must be explained. Students should be able easily to amend their decisions subsequently.
Access to student data and analytics should be restricted to those identified by the institution as having a legitimate need to view them.
Where data is to be used anonymously particular care will be taken by institutions to avoid:
- Identification of individuals from metadata
- Re-identification of individuals by aggregating multiple data sources
The use of “sensitive data” (as defined by the DPA), such as religious affiliation and ethnicity, for the purposes of learning analytics requires additional safeguards. Circumstances where data and analytics could be shared externally – eg requests from educational authorities, security agencies or employers – will be made explicit to staff and students, and may require additional consent.
Institutions should ensure that student data is protected when contracting third parties to store data or carry out learning analytics on it.
Institutions may have a legal obligation to intervene, and hence override some privacy restrictions, where data or analytics reveal that a student is at risk. Such circumstances should be clearly specified.
It is vital that institutions monitor the quality, robustness and validity of their data and analytics processes in order to develop and maintain confidence in learning analytics and ensure it is used to the benefit of students. Institutions should ensure that:
- Inaccuracies in the data are understood and minimised
- The implications of incomplete datasets are understood
- The optimum range of data sources is selected
- Spurious correlations are avoided
All algorithms and metrics used for predictive analytics or interventions should be understood, validated, reviewed and improved by appropriately qualified staff.
Data and analytics may be valid but should also be useful and appropriate; learning analytics should be seen in its wider context and combined with other data and approaches as appropriate.
Students should be able to access all learning analytics performed on their data in meaningful, accessible formats, and to obtain copies of this data in a portable digital format. Students have a legal right under the DPA to be able to correct inaccurate personal data held about themselves.
They should normally also be able to view the metrics and labels attached to them. If an institution considers that the analytics may have a harmful impact on the student’s academic progress or wellbeing it may withhold the analytics from the student, subject to clearly defined and explained policies. However, the student must be shown the data about them if they ask to see it.
Enabling positive interventions
Institutions should specify under which circumstances they believe they should intervene when analytics suggests that a student could benefit from additional support. This may include advising students that they should not continue on a particular pathway. Students may also have obligations to act on the analytics presented to them – if so these should be clearly set out and communicated to the students.
The type and nature of interventions, and who is responsible for carrying them out, should be clearly specified. Some may require human rather than digital intermediation. Predictions and interventions will normally be recorded, and auditable, and their appropriateness and effectiveness reviewed.
The impact of interventions on staff roles, training requirements and workload will be considered and requires support from senior management. Institutions will also be clear about the priority given to learning analytics in relation to other requirements.
Institutions will decide how to allocate resources for learning analytics appropriately for learners with different requirements and ensure that diverse groups and individuals are treated equitably.
Minimising adverse impacts
Institutions recognise that analytics can never give a complete picture of an individual’s learning and may sometimes ignore personal circumstances.
Institutions will take steps to ensure that trends, norms, categorisation or any labelling of students do not bias staff, student or institutional perceptions and behaviours towards them, reinforce discriminatory attitudes or increase social power differentials.
Analytics systems and interventions will be carefully designed and regularly reviewed to ensure that:
- Students maintain appropriate levels of autonomy in decision making relating to their learning, using learning analytics where appropriate to help inform their decisions
- Opportunities for “gaming the system” or any benefit to the student from doing so are minimised
- Knowledge that their activity is being monitored does not lead to non-participation by students or other negative impacts on their academic progress or wellbeing
- Adverse impacts as a result of giving students and staff information about the students’ performance or likelihood of success are minimised
- Staff have a working understanding of legal, ethical and unethical practice
Stewardship of data
Data for learning analytics will comply with existing institutional data policies and the DPA, and will in particular be:
- Kept to the minimum necessary to deliver the purposes of the analytics reliably
- Processed in the European Economic Area or, if elsewhere, only in accordance with the DPA
- Retained only for appropriate and clearly defined periods
On request by students any personal data used for or generated by learning analytics should be destroyed or anonymised, with the exception of certain, clearly specified data fields required for educational or statutory purposes such as grades.
Teacher-led learning analytics recommendations
By MJ Rodríguez-Triana (2016) under a CC-By-ND-NC license.