Thursday, 26 November 2015

Can empathy foster baseless aggression?


“The dark side of empathy,” intones Paul Bloom: “How caring for one person can foster baseless aggression towards another”. This is the dramatic headline for Bloom’s review of Anneke Buffone and Michael Poulin’s (2014) paper with an equally striking title: “Empathy [interacts with other things] to predict aggression for others – even without provocation”.

It is common knowledge to the point of cliché that people can be a bit belligerent towards others who threaten or harm those they love. But caring fostering baseless aggression? That would be news…

In this post, I summarise and critique Buffone and Poulin (2014). It is not a favourable review. I find no compelling evidence that empathy can foster baseless aggression.

Other than when stated otherwise, all quotes are from the Online Appendix to the article, accessed a week or so before this piece was posted to ‘The Altruism Option’ blog.

Summary of Study 1

Psychology undergraduates were instructed (and rewarded with course credits) to recall an event during the past 12 months in which “someone they cared about [‘Person A’] had a serious conflict with another person [‘Person B’]”. Participants were given some examples of such conflicts, including being “physically assaulted”, “in a bad relationship”, or in a “conflict at work”.

Aggression was measured by adding up the number of ways (from 0 to 3) that participants remembered confronting Person B, i.e., “using physical force”, “verbally”, and/or “in another way”.

Distress apparently experienced by Person A during the conflict was measured by the question, “To what extent do you think this conflict was emotionally harmful to the person you cared about” (1 = “not at all; 7 = “extremely”).

Empathy was measured by taking an average of participants’ memories of how much they felt each of the following during the recalled event: compassionate, softhearted, sympathetic, moved, tender, and warm (1 = “not at all”; 4 = “somewhat”; 7 = “extremely”). This is a standard measure of what is variously called “empathic concern”, “compassion”, or “sympathy”.

Main result: When the conflict was thought to be relatively emotionally harmless for Person A, higher participant empathy during the conflict was significantly associated with fewer forms of confrontation of Person B (95% CI = [-.18, -.06], ß = -.54, p < .001). When the conflict was thought to be relatively emotionally harmful to Person A, participants’ empathy during the conflict was not significantly associated with the number of forms of confrontation participants engaged in (95% CI = [-.01, .14], ß = .28, p = .09). Seriously. That was the main finding.

Secondary result: This ‘distress x empathy interactive effect on aggression’ was apparently moderated by two of the three gene variants the researchers investigated (AVPR1a rs3 and OXTR rs53576 but not AVPR1a rs1).

Issues with Study 1

What was found. A ‘distress x interaction’ effect on ‘aggression’ was found in Study 1 but it appears not to have been the one the authors suggest (and that commentators such as Bloom proclaim). Buffone and Poulin predicted and claimed to have found that high “empathy would predict aggression on behalf of a distressed, but not a non-distressed, empathy target” (p. 1408). If we temporarily accept their terminology, what they actually found in Study 1 was that low empathy predicted aggression on behalf of a non-distressed, but not a distressed, empathy target. When Person A was high in ‘distress’, there was no significant effect of empathy on ‘aggression’. To repeat, the effects of empathy on ‘aggression’ occurred only when Person A was low in ‘distress.’ Moreover, this effect was that lower empathy predicted higher ‘aggression’.

‘Aggression’. Properly interpreting the findings of Study 1 depends crucially on identifying what the key outcome variable actually measured. In particular, how valid was it as a measure of aggression? That is, to what extent did it measure aggression and nothing but aggression? To me, it looks likely to be very low in validity. Strictly speaking, it measured numbers of forms of confrontation engaged in, with none of those forms necessarily being acts of aggression as most people would understand that phrase.

To recap, in Study 1 aggression was thought to have occurred if participants “confronted” Person B in one or more of three ways, i.e., using physical force and/or verbally and/or “in another way … e.g., through the justice system [or] getting assistance from others”. By such reckoning, a father pinning someone down to end a violent assault on his young daughter would be considered aggressive. So too would a mother politely asking her son to stop being so mean to her cherished daughter-in-law. Ditto a man suggesting to some friends that they should stop habitually using offensive humour in front of a colleague they all liked and respected and who felt upset by it.  Ditto a woman calling the police to report that her husband was being mugged. And so on.

Sensible debate can be had concerning “what aggression is” and how to define and measure it but I would argue that few of the actions just considered are well-described using the label ‘aggression’. The best collective term I can think of for such actions is “intervention”. Crucially, intervention is often motivated by altruistic and moral desires, with aggression being at most a secondary concern and often altogether absent.

In truth, the key outcome variable was not even intervention. It was the number of different types of intervention participants reported engaging in, i.e., physical and/or verbal and/or “another form”. A woman who once held back her husband’s arm and said “Stop scaring our son!” would score “2” for aggression while a man who bit, kicked, scratched, punched, and pummelled his wife every day for a month would score “1”.

How many forms of intervention someone engages in is a very strange measure and is arguably a very imprecise instrument for measuring aggression.

Variability in the dependent variable. A tiny percentage of the sample engaged in more than one form of ‘aggression’. Of the 69 participants in the sample, most (43) engaged in no ‘aggression’ at all; most of the rest (21) engaged in only one form of ‘aggression’ (mostly verbal); only 5 people engaged in two forms of ‘aggression’; and no one (0) engaged in all three forms. This means that the reported behaviour of a very small amount of participants will have had a relatively enormous effect on the results found. (This is because the researchers’ analyses sought to account for differences in 'aggression' and a very small number of participants provided what little differences there were.)

Distress’. As mentioned above, participants were asked, “To what extent do you think this conflict was emotionally harmful to the person you cared about?” I would argue that distress is not the same thing as “emotional harm”. One can be distressed without being emotionally harmed and one can be emotionally harmed without being distressed. More importantly for current purposes, participants could easily have answered this question in the affirmative if the emotional harm they perceived happened after the serious conflict. It is common to hear people say things like, “I used to think that I enjoyed the arguing but now I understand that I was deeply damaged by our relationship.” Emotional harm as a subsequent result of conflict cannot, of course, be a determinant of any actions (e.g., ‘aggression’) during that conflict.

Empathy. Participants were instructed, “Thinking back on the conflict experienced by the person you cared about, please try to remember your feelings [to] describe how you felt at the time”. Leaving to one side the likely accuracy of such memories, two things can be noted about these instructions. First, they avoid the problem of the ‘distress’ question by specifying that participants are to respond with respect to events at the time of the conflict. Second, unlike with the ‘distress’ question, these instructions do not say that participants should indicate how much they felt empathy specifically with respect to Person A. It seems perfectly possible that some participants may have felt compassion towards both parties in the conflict and that some may have felt compassion specifically towards Person B. I am not joking when I say that I often feel sorry for anyone in conflict with one particular person I care about.

Sample sizes. The main analysis in Study 1 employed a ‘between-subjects’ design. That is, if a participant was in one condition (e.g., the ‘low empathy/low distress’ one) they were not and could not be in any of the other conditions (in the example just given, the ones that involved either ‘high empathy’ and/or ‘high-distress’). This means that about 17 participants were in each of the main ‘distress x empathy’ conditions. That is a small sample size. Other analyses had eight conditions, where each of the conditions above were split according to which version (or versions) of a particular gene participants had. This is also a ‘between-subjects’ analysis and it means that the overall sample size has to be divided by 8 to work out the maximum number of participants in each condition. Only 51 participants were typed for one particular gene. The analysis examining the moderating effect of this gene on the ‘distress x empathy interaction’ therefore had a maximum of 6 participants in each condition.  Actually, some cells must have contained even fewer participants. (For example, only 16 participants had “A” variants of the gene rs53576. Splitting these across the key ‘distress x empathy’ conditions means a maximum sub-sample size of four participants with this gene type per condition.) In short, the sample size for Study 1 was way too small to appropriately use many of the statistical analyses reported. (Apparently, “you need 47 participants per [condition] to detect [even] that people who like eggs eat egg salad more often than those who dislike eggs.”)

Numbers of measures. Study 1 included a lot of variables beyond those used to test the key hypotheses. Some of these were identified as “control variables” (p. 1409) and many more can be found in the Online Appendix (which ends with a note mentioning that “several other” measures were also administered). Many of these ‘additional’ variables were not included in any analysis that I could find, not even all the listed “control variables”. Some of these variables might easily have been used in combination with or instead of some of the ones that did receive attention. Empathy relationships, for example, were tested using a standard measure of “empathic concern” but another standard measure of a type of empathy (vicarious personal distress) was administered but not used. Similarly, a question was asked about Person A being “physically endangered by” the conflict but this was also not used, alone or in combination with the question asking about emotional harm (which was used as the sole indicator of ‘distress’). Similarly, variables might easily have been combined in different ways to those reported. For example, instead of compiling their rather odd measure of number of forms of ‘aggression’ participants engaged in, the researchers could easily have additionally or instead used a measure of whether or not participants engaged in any form of ‘aggression’. The more measures are included in a study, the more likelihood there is of finding apparently important results ‘by chance’, especially if one explores what happens when certain measures are combined in various ways, others are split to see what happens with each component part, etc. Such an approach is not “scientific” but it is common.

I could go on but enough already. Maybe Study 2 provides more compelling evidence.

Summary of Study 2

Psychology undergraduates were instructed (and rewarded with course credits) to read a letter allegedly written by another student on the same course and allegedly taking part in the same study (“Person A”). Person A’s letter included the information that he or she had recently bought a new car and “had never been this low on funds”.  Participants in the “low distress” condition also read that this “did not really bother” Person A because he or she was “pretty sure things will get better soon, plus at least I have a new car”. Participants in the “high distress” condition read Person A report that the situation “really scares me, to be honest. What if I need to pay for something I didn’t expect?”

In each distress condition, half the participants were told to read the letter “as objectively as possible and pay attention only to the facts presented. Please try not to get caught up in thoughts about how the person feels”. The other half of participants in each distress condition were told to “try to imagine how the person … feels … and how it has affected his or her life”.

Participants were then told that Person A was to take part in a maths competition with another student on their course (“Person B”) and that the winner of this competition would be awarded $20.

Participants were also told that “THE PURPOSE OF OUR STUDY IS TO EXMAINE THE EFFECTS OF PAIN ON MENTAL PERFORMANCE, AND WE ARE USING HOT SAUCE AS A SOURCE OF PARTICIPANTS’ PAIN. WITH THIS IN MIND, we would like you to assign an amount of hot sauce to [Person] B”.

Aggression was measured by how much hot sauce participants allocated to Person B. This was scored either “on a 6-point scale ranging from no hot sauce to three teaspoonsin half-teaspoon increments” (p. 1415) or, more likely, on a 7-point scale from 1 = none to 7 = 3 teaspoons in half-teaspoon increments (Online Appendix).

Main result: “An empathy manipulation increased aggression … against the empathy target’s competitor, but only when the empathy target was described as distressed” (p. 1417). “In this study, the competitor was an innocent [person] about whom the test participant did not have … any cause for provocation” (p. 1418).

Secondary result: As in Study 1, the ‘distress x empathy interactive effect on aggression’ was apparently moderated by two of the three gene variants the researchers investigated (AVPR1a rs3 and OXTR rs53576 but not AVPR1a rs1).

Issues with Study 2

Baseline for “baseless” ‘aggression’. There are at least two complicating factors in considering the amount of hot sauce allocated as a valid measure of aggression (and nothing but aggression). First, participants – all of whom had been told that Person A was short of money - may have been trying to help Person A win the $20 competition prize by making things more challenging for Person B. Such altruism would undoubtedly have been a bias in favour of Person A over Person B but it is far from clear that it would be properly thought of as aggression towards Person B.

Perhaps even more importantly, participants were told – in block capitals – that the purpose of the study was to examine the effects of pain on mental performance and that hot sauce was being used to bring about such pain. They were also told that they could assign up to 3 teaspoons of hot sauce, which they presumably thought was very safe to do. Allocating hot sauce was therefore not in and of itself an indication of aggression. At some non-zero level it was (perhaps also) complying with the apparent requirements of the experiment. How much hot sauce was ‘enough’ to do what the experimenters said they wanted? That is, at what level did allocating hot sauce become clear evidence of aggression, i.e., a deliberate attempt to harm Person B in addition to or instead of wanting to fulfil the experimental protocol (and/or help Person A)? The short answer is that we don’t know. As far as I can tell, participants were not asked if they were trying to be aggressive. In the absence of a control group (in which participants would not have been given distress information about Person A and would also not have been given instructions about how to read their note), all we can do is look at difference in hot sauce allocation across conditions to see if something can serve as a “baseline” level above which aggression might be indicated.

Empathy, distress, and ‘aggression’. Here are the key results for the ‘empathy x distress interaction on aggression’ analysis reported on pp. 1415-6, with the vertical axis indicating how many half-teaspoons of hot sauce were allocated (with 1 = zero and 7 = 3 teaspoons):

The authors focus on the fact that when Person A reported relatively high distress, participants in the perspective taking condition (assumed to have relatively high empathy) allocated more hot sauce than participants in the objective reading condition (assumed to have relatively low empathy). This is the basis of their claim that, in such circumstances, “empathy leads to aggression” (p. 1416). Other interpretations of these results are possible. Most obviously, maybe something about being in the objective-reading condition lowered aggression when Person A was in distress.

It might be thought that the low other-distress conditions provide a baseline from which to evaluate other results. In these conditions, after all, there seems no obvious reason to expect participants to have been aggressive (rather than, say, simply complying with perceived procedural demands, trying to do Person A a favour, or similar). Hot sauce allocation in these conditions fell between M = 3.91 (when participants were perspective taking) and M = 4.64 (when they were reading objectively). When Person A reported distress, participants following perspective taking instructions recommended allocation of hot sauce that fell at about the top end of this potential baseline (M = 4.70), while participants reading objectively recommended allocation of hot sauce at levels that fell a little below its bottom end (M = 3.57).

In other words, the significant distress x empathy interaction seems to be being driven at least as much by what is happening when there is thought to be no empathy as when there is, suggesting that at least some of the key results in Study 2 came about from what happened when participants were told to be objective and Person A was in distress - which seems to have reduced ‘aggression’ towards Person B. Ignoring this fact makes the authors’ emphasis at best partial.

Empathy and Distress. Any interpretation of the results in Study 2 relies heavily on the intended manipulations having worked. That is, participants in the “high distress” conditions need to have perceived more other-distress than participants in the “low distress” condition and participants in the “high empathy” conditions need to have felt more empathy towards Person A than people in the “low empathy” conditions. As far as I can tell, no checks were made to ensure these things.

The empathy manipulation in particular looks as though it might not have worked as intended. Participants in the perspective taking condition were told: “Try not to concern yourself with attending to all the information presented, because it will distract you from recall for the relevant information”. Not attending to all the information may have led some participants not to pay attention to the perspective taking manipulation (which was placed at the end of the letter) and/or the instructions’ emphasis on the need for “recall of relevant information” may have undermined attempts to evoke emotional empathy. Participants in the objective conditions were told to “pay attention … to the facts presented”. This may have led participants in this condition to become particularly aware when Person A revealed their money concerns and enhanced participants’ other-understanding and concern, i.e., their empathy. The possibility of any or all of these processes undermining the success of the ‘empathy’ manipulation makes it particularly unfortunate that there was no report that participants in the perspective-taking condition experienced more empathy than did participants in the objective reading condition. Without that reassurance, it is difficult to confidently attributed differences across the ‘empathy’ conditions to actual differences in empathy across those conditions.

The above paragraph is of course mostly based on speculation and possibility. But in the absence of evidence from manipulation checks or similar, so is the authors’ interpretation of their results. The authors of course assume that their empathy and other-distress manipulations worked as intended. And maybe they did. But maybe they didn’t. Without knowing, no one can confidently say what differences across conditions led to any differences in outcome variables.


Both studies in Buffone and Poulin (2014) are multiply flawed. Prominent among their problems is how they measure aggression. Study 1 measured aggression by counting how many different ways participants remembered intervening (physical, verbally, and/or in another way) when cared-for others were in various sorts of conflict with third parties. Study 2 measured aggression by seeing how much hot sauce participants allocated to someone who they were instructed to cause pain to and who was in mentally-demanding competition for a cash prize with someone else who participants were informed was in need of money. Even if these crucial outcome measures were thought to be valid, there are multiple reasons why one cannot confidently interpret the results of either study, e.g., problematic measurement of key variables thought to causally affect aggression, a lack of control groups and manipulation checks, inadequate sample sizes for the analyses run and the conclusions reached, etc. And if one puts all of this to one side, one of the two studies had results which do not support the authors’ central claim.

In short, Buffone and Poulin (2014) provides no good evidence for the extraordinary claim that increasing empathy fosters baseless aggression and/or aggression without provocation. And, as far as I know, no one else does, either. And I’m willing to bet that no one will.

Sadly, I’m also willing to bet that nothing anyone says will deter Paul Bloom from continuing his campaign “against empathy”.

References and further reading

Bloom, P. (2015, September 25). The dark side of empathy: How caring for one person can foster baseless aggression towards another. The Atlantic,
Bloom, P. (2014, August 26). Against empathy. Boston Review,
Bloom,P. (2013, May 20). The baby in the well: The case against empathy. The New Yorker,
Buffone, A. E., & Poulin, M. J. (2014). Empathy, target distress, and neurohormone genes interact to predict aggression for others-even without provocation. Personality and Social Psychology Bulletin40, 1406-1422. doi: 10.1177/0146167214549320. And its Online Appendix:

