forget me not image

Forget Me Not

The recent ‘right to be forgotten’ ([factsheet]1) has had a lot of attention, I wanted to post a blog a while ago thinking about it but haven’t had enough time to read around the various issues. This is pulling together some questions I have about how the ruling is being implemented, but it is very sketchy at the moment. My major concerns are: 1. relevance is defined solely by complainants not information seekers, meaning a search query specifying a name and some particular thing (e.g. a crime) will – despite its relevance to the query – potentially not return target documents, 2. it is unworkable in the long-run given disambiguation issues (name overlap, content-mirroring/URL changes, etc.) 3. search engines don’t exist in a vacuum and their relationship e.g. to social media content, etc. has not been considered properly Having said that, I find some of the complaints raised dubious – it is quite obviously the case that if I ask “So, tell me about James” that any good informant will tell me recent, relevant, and largely factually accurate (rather than rumour mill) information. Given google’s role in processing the ways we access data, they are a relevant informant and where things seem to be going wrong we might need some recourse (which is not to say we should censor). # RTBF [This article]2 seems to explain how google are dealing with the process (i.e. their submission form, etc.), collecting minimal data, confirming the requester is (or has the ID of) the target, asking for URL target pages, and reasons why they are irrelevant, outdated or otherwise inappropriate. I like the book analogy – the RTBF doesn’t remove pages, it removes index terms. It isn’t tantamount to pulping books, but it is to burning index cards. My concerns with RTBF are largely pragmatic rather than ethical (,etc.): * It seems to me entirely appropriate that someone who searches for “Mr x, bankruptcy repossession” should in fact find that information (i.e., that combination of keyterms shouldn’t be “deindexed” from the target document). This would be true of various cases including victims and perpetrators of crime, corrupt politicians, etc. At the moment the implementation is (I think) removing these index terms too. This seems wrong to me – the relevance of the information in these cases is strong insofar as the information seeker has defined their need in those terms. * It seems less appropriate that someone who searches for “Mr x” should see old or irrelevant material as a top result, where other results exist about “Mr x”, this is particularly true of victims of crime (and the accused but not guilty) where links to news articles about them might be ranked higher compared to other information about them, even if they are not the core content (the perpetrator is) or/and it is outdated. In these cases RTBF may be the only means to ‘downgrade’ that information. * Someone searching for “Madrid repossession firms” should still be able to find the target document; the index “Mr x” is peripheral. Such searches should be unaffected by the ruling. Overall, I don’t buy the claim that “it cannot be right to remove verifiable true facts” – there are cases where we might remove true information (e.g. for safety reasons), or limit indexing. Search engines aren’t just some mirror on the world, their algorithms target ‘importance’, ‘authority’, ‘diversity’, etc. – these techniques prioritise certain information, particularly from news sources. Just telling individuals to get better at personal SEO isn’t enough. Google does process information when it “decides” what links to present, and in what order, to which queries. That’s not just a trivial matter, it says something about how we prioritise information and its authoritativeness. # Unresolved issues What I am hesitant of here is just supporting the ‘free speech’ line entirely, something about proportionality, notability, and biographies of living persons (including reference to peripheral characters) is important nuance – this is much in line with the Wikipedia stance. Some unresolved issues: 1. It isn’t clear what deindexing is happening – whether target documents are being removed from SERPs entirely, or for what keyterms (and how requesters are being informed of that). 2. How symmetric/asymmetric requests should be dealt with (e.g. interesting cases around potential for victim and perpetrator of crime requesting RTBF on the same/different pages). It isn’t outside the realm of possibilities that two individuals – with the same name – might both appear in a target page, one wishing to be deindexed, the other not. This is more likely of shared surname (so query specificity really matters). We can also imagine cases where individuals request information relating to a namesake (which would of course be returned on searching for the name) be removed, and while google would presumably instigate some investigation to avoid such issues it’s not clear they’d always be successful. 3. What the implications are for integration of services in search engines – many search engines integrate some social-media features, for example alongside searching the ‘open web’ also searching posts in google+, facebook, etc. made by my friends. The removal of many links under RTBF parallels the “super injunction” madness, although matters are more complicated by material that is about me (my friend posts a photo of me I don’t want available – I can’t force google+ to take it down but could I stop them indexing that material?). 4. What the implications are for other services including (but not limited to) Wikimedia projects – 1. e.g. on Wikipedia we can link to target documents where the title of the article might match a query leading to that document, will that mean the Wikipedia article is removed? This would undoubtedly lead to censorship of content on Wikipedia, or/and of access to Wikipedia articles ([recent interesting case]3 of a Greek Wikipedian accused of defamation). 2. Could the projects be forced (by ECJ) to remove certain articles or content despite BLP/notability guidance? We’ve already seen attempts to bring legal action against editors in particular countries (unsuccessfully), does RTBF give another avenue a) for that and b) for legal complaint against the WMF even if it is based in US 3. Could edit histories/revisions be censored to conceal ‘outdated’ revs to an article? 5. The weighting google gives to recency and in/out links isn’t clear, and it isn’t clear whether a rejigging in ranking would address some concerns – in the case in point, a lesson in self-SEO and restrictions around what newspapers can publish, or/and what historic information credit-agencies can use (if that was an issue) could both address the concern. Neither of those ‘solutions’ would lead to the kind of rtbf deindexing being done. The ways search engines rank information are important for the processing and presentation of it. 6. There are serious pragmatic issues around: 1. capacity of smaller search engines (and potentially other organisations) to comply 2. leaving things up to each individual company to comply are also of concern (as per [Lords discussion]4) – note it isn’t clear at the moment if google’s decisions have all been appropriate, e.g. [these guardian pieces]5 (apparently since reinstated), and I think there’s a very legitimate concern regarding leaving decisions to individual search engines

Footnotes

  1. http://ec.europa.eu/justice/data-protection/files/factsheets/factsheet_data_protection_en.pdf

  2. http://searchengineland.com/google-right-to-be-forgotten-form-192837

  3. http://blog.wikimedia.org/2014/09/23/greek-wikipedia-user-wins-key-hearing-in-defamation-case/

  4. http://www.theguardian.com/technology/2014/jul/30/lords-right-to-be-forgotten-ruling-unworkable

  5. http://www.theguardian.com/commentisfree/2014/jul/02/eu-right-to-be-forgotten-guardian-google