Now that more companies (e.g. Facebook, Google, Twitter) are letting you download at least partial dumps of the data they hold on you, I’m hoping we’ll start to see more desktop based (or, local browser based) tools that support people navigating this data and what can be done with. In our new UTS Open course ‘What Does Facebook Know About You?‘ we guide people through downloading their facebook data, exploring the privacy options, and investigating what facebook knows and what can be done with that data. But there’s clearly so much more potential to help people understand their data now that they’re more empowered to actually access that data.
There have been a spate of tools over recent years to help users explore aspects of this data, often relying on scraping the data, or apis…but many of these are now dead or blocked (e.g. this great data selfie tool). With the data dump, something that’s quite nice is that the user retains ownershp of the data, and it should be clearer to them what’s going where.
We can imagine an interface that would allow them to load up e.g. the facebook directory, and show them what analyses could be run with each folder or set of folders in that directory. E.g., you can send the ‘likes’ files to ‘apply magic sauce’ to get the Cambridge Psychometric centre profile on those. You can send individual status updates to the IBM text analytics services to view sentiment, entity extraction, and whatever else. Of course, some of this would need a bit of reverse engineering still, because Facebook provides human interpretable data but not their internal data, so e.g. the second of these files is a list of like names but not their IDs (which the apply magic sauce api requires), so we’d need to look these up in the Facebook api.
Advertisers who’ve uploaded a contact list with your information: ads/advertisers_who_uploaded_a_contact_list_with_your_information.html
Pages you’ve liked or reacted to: likes_and_reactions/pages.html
Indeed, the nice thing with this kind of tools is that users could select which bits of their data they do / don’t want to send to these services, and receive back the results they request. From a research perspective we lose some data – because it might never pass through our servers – but it would give users more control. We can also imagine e.g. how this might show them some of the flaws with those analyses; e.g., the apply magic sauce models were built quite some time ago now – what happens if users send their ‘likes’ from particular windows of time? Which likes are picked up and not? (And why should we believe that even if ‘curly fries’ was associated with intelligence back then, that the same like profiles still apply?).
Now…for someone to build them…