Having just seen a BPS Research Digest on the Facebook study, I thought I’d jot down my (very rough) thoughts. For those who haven’t seen the study (?!) basically some Facebook researchers manipulated what was shown in user’s news feeds (the bit you see when you log in of your friend’s activities) such that it was either showing more positive or negative posts. Those posts were automatically detected, and the idea was to see whether individuals in the positive-feed or negative-feed conditions posted more positive or negative things over the course of the week.
They found that:
- When positive expressions were reduced, people produced fewer positive posts and more negative posts; when negative expressions were reduced, the opposite pattern occurred.
- Those exposed to fewer emotional posts (of either kind) posted less
- Omitting positive/negative words correlated with a reduction in the number of words posted, with a larger effect for the omission of positive than negative words
I’ve been paying attention to this, and watching Facebook a bit (can you guess my friends’ moods from this post 😉 ) but not reading a huge amount of the outside coverage or seeking it out. However, from what I can see some of the commentary lacks a bit of nuance. There are a lot of complaints that no informed consent was given, and of the potential for harm, and an apology from the lead author responding to that. On the other side I’ve seen a few “well, what do you expect?”, and “it’s all algorithmically mediated anyway” responses. I have some sympathy with both sides, and it surprises me people don’t realise they’re never shown everything (and probably wouldn’t want to be without the ability to filter). This does, perhaps though, raise another cases where understanding how things are being personalised is key to trust in a good informant, whether on a newsfeed, or in search engine personalisation. Of course, as danah boyd nicely puts it:
given that Facebook wants to keep people on Facebook, if people came away from Facebook feeling sadder, presumably they’d not want to come back to Facebook again. Thus, it’s in Facebook’s better interest to leave people feeling happier. And this study suggests that the sentiment of the content influences this. This suggests that one applied take-away for product is to downplay negative content. Presumably this is better for users and better for Facebook.
Anyway, let’s take a closer look at some of the issues raised…
Some BPS ethics committee reps had this to say (in a Guardian letter), which I’m bulleting here:
- It infringed the autonomy and dignity of individuals by interfering with the personal decision-making as to the posts that people wished to make to their chosen groups
- and, most importantly, by failing to gain valid informed consent from the participants.
- The scientific value of this study would seem to be low, since there is already a strong body of literature which confirms emotional contagion as a social process.
- The intervention was socially irresponsible, in that it clandestinely meddled in people’s social lives with consequences that are very likely to have had significant negative effects on individuals and groups.
This seems like a pretty good place to start, so let’s go through them:
The salient issue here is presumably that by down or up-rating posts of a particular type (positive or negative), the researchers impacted negatively on the posters of that content (note, this issue cannot relate to the receivers of the information). This is actually an interesting point and not one I’ve seen made in other considerations of the ethics. However, it entirely ignores the facts that:
- this is a completely normal facet of how facebook works. Now, we might think they should be more open with their algorithms and allow for more explicit choices (indeed, I do), but as it stands this is a fairly ordinary part of facebook use
- there is no reason to think (although it’s hard to know) that this manipulation was any more or less a) sizeable and b) impactful than any other
- the number of posts impacted is very low, not all of them, and not all of the time, and only over the course of a single week (for viewing users who posted at least 1 status update – that’s not very many!). It’s very likely that posts which receive attention from other friends, posts by ‘favourited’ friends, etc. would still be seen
This certainly is an issue – if we post something, we don’t expect it to be artificially hidden – but given the existing news-feed algorithm, it’s hard to say how far the study deviates from the normal experience here. That’s not to say the normal experience isn’t problematic (I know lots of people hate the fact stuff is hidden, and the ‘secrecy’ re: how this is done is a problem too), but it certainly isn’t unique to this study.
2. Consent and Debriefing
The second, and widely cited concern, is that no informed consent was gained from participants. The view here is that given this was an intervention, participants should have consented to taking part and having their data used in this way. Generally speaking this is right, and I’m less clear on the situation in the US (certainly lots of the rest of the world is more relaxed than the UK re: explicit consent), but I think some posters misunderstand the nature of explicit informed consent. Sometimes we don’t need to gain informed consent up front, there are two clear classes here:
- Where deception is involved – The only way you could run a study like this (with consent) would be by asking for consent to manipulate news feeds, you clearly could not ask for consent to manipulate based on emotion.
- Where consent is assumed – For example, in (freely) completing a questionnaire, often explicit consent may not be sought, particularly for non-identifiable and non-personal data, it is simply assumed consent is given from the completion of the survey itself (although it’s often considered good practice to give detail on the purpose of the study, etc. and a box to tick to indicate you’ve read this, etc.).
However, we would generally expect a debrief to take place. Other than some questionnaire studies, the only time I can think of when that isn’t the case is, for example, in some observational studies where:
- A debrief would not be possible because the relevant individuals are not known (they were observed in a public space)
- In particular, where debriefing would violate the privacy of the individual (and this is deemed to do more harm than good)
So, the issue of consent might rest on some points made under social responsibility below.
3. Scientific Validity
So the next issue is with regard to scientific validity. Now what’s important here isn’t what result was found (although certainly if we’d expected some sort of harmful result that would be salient to the 4th and 2nd issues here), but whether the research is “good” research, what quality it was, whether it’s a contribution to knowledge, in short whether it was worth doing. In the case of direct contact, the question I ask myself is “is it worth asking this participant for their time to complete this task?”.
So, what’s the claimed contribution in this case:
- The ability to randomise. Observational studies suffer from confounds – we don’t know whether the social contagion is an effect or cause of the social-network (i.e. do people of a particular mood come together, or do moods spread through a network)
- The fact the emotion is not ‘directed’ at anyone, that exposure alone is enough for social contagion (little explored previously)
- The exploration of textual information alone
Now those are certainly interesting considerations, and if a study could address them, from what little understanding of social contagion I have, it seems it would be a worthy contribution. And indeed, although the article is short and could do with some expansion, it looks like they did some important controls, for example:
- “This is not a simple case of mimicry, either; the cross-emotional encouragement effect (e.g., reducing negative posts led to an increase in positive posts) cannot be explained by mimicry alone, although mimicry may well have been part of the emotion-consistent effect.”
- They also used a control for a ‘response model’ – i.e. that people post negative/positive posts in response to negative/positive posts (but that there is no other effect), making a comparison between effect sizes in the positive/negative condition (where prior research suggests we should expect responses to negative news to be ‘bigger’ than to positive)
However, on the first this really doesn’t tell us enough about what’s going on – particularly given the very small effect sizes. And on the second although prior research might indicate people respond ‘bigger’ to negative emotion, I am not at all convinced we can transfer that finding (from 2001) to the social network context. It’s also never made clear in the paper what exactly is being measured – the indication is “people’s own status updates”, but this excludes all responses (comments) on other posts; this is particularly concerning given the potential for emotional expression in these posts (and indeed, the definition of a ‘like’ as a symbol of positive affect towards a post!) and the potential of this data for strengthening the second analysis. I actually had to go check what the news-feed looked like back in 2012 (the year, by the way, timeline was rolled out to all) & basically it seems the answer is “the same as now” (i.e., one could still comment on posts then).
I also wonder what the relationship is between the count of sentiment-words (what was measured), versus the sentiment of posts (what was manipulated), and the length of posts in both cases – this matters, because for example negative posts may be longer than positive (and indeed, this may account for the difference in incidence), this may also have an impact on the length of time people attend to posts of a particular type (rather than the number of items people are displayed). They do say:
Separate control conditions were necessary as 22.4% of posts contained negative words, whereas 46.8% of posts contained positive words
In total, over 3 million posts were analyzed, containing over 122 million words, 4 million of which were positive (3.6%) and 1.8 million negative (1.6%).
Emotional expression was modeled, on a per-person basis, as the percentage of words produced by that person during the experimental period that were either positive or negative.
Another big problem with the study is the operationalisation of ‘positive’ and ‘negative’ posts. The method here is deeply flawed, using a keyword counting approach which is built on much longer texts than the average facebook post. If you have a look at, for example, this post listed in that research digest, you can see that ”I’m not having a great day’ would be analysed positively – and that this is particularly problematic given the other methodological flaws, and the tiny effect size, which becomes relevant as it could just be a statistical anomaly.
None of these things is directly about the results found, but rather about the suitability of the methods taken to address the (very interesting) questions posed. Ethics panels (and journal reviewers) are supposed to care about this sort of thing; doing poor quality research is in and of itself an ethical concern.
Ignoring the issues raised above, some might still go for the RCT argument – that given we didn’t know which direction the study would go, such an intervention is appropriate. Now, even if we thought this was the case, such a/b testing in the context of such a power relationship is hugely problematic, particularly given the secrecy and lack of IRB structure.
So let’s have a look again at this issue, using some of the consent issues and a discussion of the US IRB:
Here’s how. Section 46.116(d) of the regulations provides:
An IRB may approve a consent procedure which does not include, or which alters, some or all of the elements of informed consent set forth in this section, or waive the requirements to obtain informed consent provided the IRB finds and documents that:
- The research involves no more than minimal risk to the subjects;
- The waiver or alteration will not adversely affect the rights and welfare of the subjects;
- The research could not practicably be carried out without the waiver or alteration; and
- Whenever appropriate, the subjects will be provided with additional pertinent information after participation.
The Common Rule defines “minimal risk” to mean “that the probability and magnitude of harm or discomfort anticipated in the research are not greater in and of themselves than those ordinarily encountered in daily life . . . .” The IRB might plausibly have decided that since the subjects’ environments, like those of all Facebook users, are constantly being manipulated by Facebook, the study’s risks were no greater than what the subjects experience in daily life as regular Facebook users, and so the study posed no more than “minimal risk” to them.
That strikes me as a winning argument, unless there’s something about this manipulation of users’ News Feeds that was significantly riskier than other Facebook manipulations. It’s hard to say, since we don’t know all the ways the company adjusts its algorithms—or the effects of most of these unpublicized manipulations. We know that one News Feed tweak “directly influenced political self-expression, information seeking and real-world voting behaviour of millions of people” during the 2010 congressional elections. That tweak may have been designed to contribute to generalizable knowledge, so perhaps it shouldn’t count in the risks “ordinarily encountered in daily life” analysis. But another tweak to the Facebook interface designed to affect not only users’ word choice or even mood but their behavior—Mark Zuckerberg’s decision to give users a formal way of telling their friends that they had registered as an organ donor—was motivated by altruism after conversations with liver transplant recipient Steve Jobs, although the dramatic effects of that policy change have been studied by academics.
Evaluating the study
So where does that leave us?
It leaves us with a study claiming to show some novel (and productive) effects, but with absolutely tiny effect sizes (see e.g. here on interpreting Cohen’s d) at .02, .001, .02, .008 (.2 is usually considered small). Within a large dataset this isn’t negligible, although understanding how that variance is distributed (i.e. within-condition variance) would be interesting. And it leaves plenty of other questions, so for example I’d love to know how many posts people actually made during the study, some descriptives would’ve been great around:
Participants were randomly selected based on their User ID, resulting in a total of ∼155,000 participants per condition who posted at least one status update during the experimental period.
Under what circumstances could we say this study was ‘ethical’?
One thing to think about here is under what circumstances would the study have been ok.
- If a pilot were conducted indicating productive potential but unclear regarding the direction of effect. We’d still expect a debrief here (news coverage of a publication doesn’t count!) . I don’t think this could be compared to a ‘public space’ observation in which for example shopping-centre visitors are ‘routed’ in different experiment conditions (or whatever), facebook might be a public space in some senses, but the intervention here is not of that kind
- If some sort of partial consent (e.g. for generic news-feed improvements) was gained with a debrief given (deception is needed for this sort of study, but a debrief could easily be given).
- If a smaller study with more elements to analysis were conducted (i.e., it isn’t acceptable)
We might also expect certain controls to be taken to exclude potentially vulnerable individuals (including under 18s).
However, the crucial thing for me here is that there was no need for a study like this – academia works incrementally for a reason. Other analysis was perfectly possible, there’s a temptation to “go big” just because you can, but as they note, this is preliminary work in the area, a richer study could’ve given us so much more, and avoided practical complexities around consent.
It is on this basis that I would agree with the BPS writers on the 3rd of their points. It is important to note that, the same research in different contexts might be ethical or unethical; in this case, the quality of work given the context of the study and the prior research seems to me to be unfavourable. Other interesting research could (and should) have been done first. Some of the critiques are off the mark. However, particularly given the backlash, Facebook will need to think about how they seek consent, debrief, and run their IRB system (I’m surprised they didn’t have one before and hope they’ve got good people on whatever they’ve apparently set up). It’s great that facebook is publishing, but if it’s around things more than just “how do we keep users on our site?” and they’re going to make claims about it, then it should be good research, and it should go through an ethics panel.