“The dark side of empathy,”
intones Paul Bloom: “How caring for one person can foster baseless aggression
towards another”. This is the dramatic headline for Bloom’s review of Anneke Buffone
and Michael Poulin’s (2014) paper with an equally striking title: “Empathy
[interacts with other things] to predict aggression for others – even without
provocation”.
It is common knowledge to the
point of cliché that people can be a bit belligerent towards others who
threaten or harm those they love. But caring fostering baseless
aggression? That would be news…
In this post, I summarise and
critique Buffone and Poulin (2014). It is not a favourable review. I find no
compelling evidence that empathy can foster baseless aggression.
Other than when stated otherwise,
all quotes are from the Online Appendix to the article, accessed a week or so
before this piece was posted to ‘The Altruism Option’ blog.
Summary of Study 1
Psychology
undergraduates were instructed (and rewarded with course credits) to recall an
event during the past 12 months in which “someone they cared about [‘Person A’]
had a serious conflict with another person [‘Person B’]”. Participants were
given some examples of such conflicts, including being “physically assaulted”,
“in a bad relationship”, or in a “conflict at work”.
Aggression was measured by adding up the number of
ways (from 0 to 3) that participants remembered confronting Person B, i.e.,
“using physical force”, “verbally”, and/or “in another way”.
Distress apparently experienced by Person A during
the conflict was measured by the question, “To what extent do you think this
conflict was emotionally harmful to the person you cared about” (1 = “not at
all; 7 = “extremely”).
Empathy was measured by taking an average of
participants’ memories of how much they felt each of the following during the recalled
event: compassionate, softhearted, sympathetic, moved,
tender, and warm (1 = “not at all”; 4 = “somewhat”; 7 =
“extremely”). This is a standard measure of what is variously called “empathic
concern”, “compassion”, or “sympathy”.
Main result: When the conflict was thought to be
relatively emotionally harmless for Person A, higher participant empathy during
the conflict was significantly associated with fewer forms of confrontation of Person B (95% CI = [-.18, -.06], ß
= -.54, p < .001). When
the conflict was thought to be relatively emotionally harmful to Person A, participants’
empathy during the conflict was not
significantly associated with the number of forms of confrontation participants
engaged in (95% CI = [-.01, .14], ß = .28, p = .09). Seriously. That was the main finding.
Secondary result: This ‘distress x empathy interactive
effect on aggression’ was apparently moderated by two of the three gene
variants the researchers investigated (AVPR1a rs3 and OXTR rs53576 but
not AVPR1a rs1).
Issues with Study 1
What was found. A ‘distress x interaction’ effect on
‘aggression’ was found in Study 1 but it appears not to have been the one the
authors suggest (and that commentators such as Bloom proclaim). Buffone and Poulin
predicted and claimed to have found that high “empathy would predict aggression
on behalf of a distressed, but not a non-distressed, empathy target” (p. 1408).
If we temporarily accept their terminology, what they actually found in Study 1
was that low empathy predicted
aggression on behalf of a non-distressed, but not a distressed, empathy target.
When Person A was high in ‘distress’, there was no significant effect of empathy on
‘aggression’. To repeat, the effects of empathy on ‘aggression’ occurred only
when Person A was low in ‘distress.’ Moreover, this effect was that lower
empathy predicted higher ‘aggression’.
‘Aggression’. Properly interpreting the findings of
Study 1 depends crucially on identifying what the key outcome variable actually
measured. In particular, how valid was it as a measure of aggression? That is,
to what extent did it measure aggression and nothing but aggression? To me, it
looks likely to be very low in validity. Strictly speaking, it measured numbers
of forms of confrontation engaged in, with none of those forms necessarily
being acts of aggression as most people would understand that phrase.
To recap, in Study
1 aggression was thought to have occurred if participants “confronted” Person B
in one or more of three ways, i.e., using physical force and/or verbally and/or
“in another way … e.g., through the justice system [or] getting assistance from
others”. By such reckoning, a father pinning someone down to end a violent
assault on his young daughter would be considered aggressive. So too would a
mother politely asking her son to stop being so mean to her cherished
daughter-in-law. Ditto a man suggesting to some friends that they should stop habitually
using offensive humour in front of a colleague they all liked and respected and who
felt upset by it. Ditto a woman calling
the police to report that her husband was being mugged. And so on.
Sensible debate can
be had concerning “what aggression is” and how to define and measure it but I
would argue that few of the actions just considered are well-described using
the label ‘aggression’. The best collective term I can think of for such
actions is “intervention”. Crucially, intervention is often motivated by
altruistic and moral desires, with aggression being at most a secondary concern
and often altogether absent.
In truth, the key
outcome variable was not even intervention. It was the number of different
types of intervention participants reported engaging in, i.e., physical
and/or verbal and/or “another form”. A woman who once held back her husband’s
arm and said “Stop scaring our son!” would score “2” for aggression while a man
who bit, kicked, scratched, punched, and pummelled his wife every day for a
month would score “1”.
How many forms of intervention
someone engages in is a very strange measure and is arguably a very imprecise instrument for measuring aggression.
Variability in
the dependent variable. A
tiny percentage of the sample engaged in more than one form of ‘aggression’. Of
the 69 participants in the sample, most (43) engaged in no ‘aggression’ at all;
most of the rest (21) engaged in only one form of ‘aggression’ (mostly verbal);
only 5 people engaged in two forms of ‘aggression’; and no one (0) engaged in
all three forms. This means that the reported behaviour of a very small amount
of participants will have had a relatively enormous effect on the results found. (This
is because the researchers’ analyses sought to account for differences in 'aggression' and a very small number of participants provided what little
differences there were.)
‘Distress’.
As mentioned above, participants were asked, “To what extent do you think this
conflict was emotionally harmful to the person you cared about?” I would argue
that distress is not the same thing as “emotional harm”. One can be distressed
without being emotionally harmed and one can be emotionally harmed without
being distressed. More importantly for current purposes, participants could
easily have answered this question in the affirmative if the emotional harm they
perceived happened after the serious conflict. It is common to hear
people say things like, “I used to think that I enjoyed the arguing but now I
understand that I was deeply damaged by our relationship.” Emotional harm as a subsequent
result of conflict cannot,
of course, be a determinant of any actions (e.g., ‘aggression’) during
that conflict.
Empathy. Participants were instructed, “Thinking back on the conflict
experienced by the person you cared about, please try to remember your feelings
[to] describe how you felt at the time”.
Leaving to one side the likely accuracy of such memories, two things can be
noted about these instructions. First, they avoid the problem of the ‘distress’
question by specifying that participants are to respond with respect to events
at the time of the conflict. Second, unlike with the ‘distress’ question, these
instructions do not say that
participants should indicate how much they felt empathy specifically with
respect to Person A. It seems perfectly possible that some participants may
have felt compassion towards both parties
in the conflict and that some may have felt compassion specifically towards Person
B. I am not joking when I say that I often feel sorry for anyone in conflict
with one particular person I care about.
Sample sizes. The main analysis in Study 1 employed a
‘between-subjects’ design. That is, if a participant was in one condition
(e.g., the ‘low empathy/low distress’ one) they were not and could not be in
any of the other conditions (in the example just given, the ones that involved either
‘high empathy’ and/or ‘high-distress’). This means that about 17 participants
were in each of the main ‘distress x empathy’ conditions. That is a small
sample size. Other analyses had eight conditions, where each of the conditions
above were split according to which version (or versions) of a particular gene
participants had. This is also a ‘between-subjects’ analysis and it means that
the overall sample size has to be divided by 8 to work out the maximum number
of participants in each condition. Only 51 participants were typed for one
particular gene. The analysis examining the moderating effect of this gene on
the ‘distress x empathy interaction’ therefore had a maximum of 6
participants in each condition. Actually,
some cells must have contained even fewer participants. (For example, only 16
participants had “A” variants of the gene rs53576. Splitting these across the
key ‘distress x empathy’ conditions means a maximum
sub-sample size of four participants with this gene type per condition.) In short,
the sample size for Study 1 was way too small to appropriately use many
of the statistical analyses reported. (Apparently, “you need 47 participants per [condition] to detect [even] that people who like eggs eat egg salad more often than those who dislike eggs.”)
Numbers of
measures. Study 1 included
a lot of variables beyond those used to test the key hypotheses. Some of these
were identified as “control variables” (p. 1409) and many more can be found in
the Online Appendix (which ends with a note mentioning that “several other”
measures were also administered). Many of these ‘additional’ variables were not
included in any analysis that I could find, not even all the listed
“control variables”. Some of these variables might easily have been used in
combination with or instead of some of the ones that did receive attention.
Empathy relationships, for example, were tested using a standard measure of “empathic
concern” but another standard measure of a type of empathy (vicarious personal
distress) was administered but not used. Similarly, a question was asked
about Person A being “physically endangered by” the conflict but this
was also not used, alone or in combination with the question asking about
emotional harm (which was used as the sole indicator of ‘distress’). Similarly,
variables might easily have been combined in different ways to those reported. For
example, instead of compiling their rather odd measure of number of forms of
‘aggression’ participants engaged in, the researchers could easily have
additionally or instead used a measure of whether or not participants engaged
in any form of ‘aggression’. The more
measures are included in a study, the more likelihood there is of finding
apparently important results ‘by chance’, especially if one explores what
happens when certain measures are combined in various ways, others are split to
see what happens with each component part, etc. Such an approach is not “scientific” but it is common.
I could go on but
enough already. Maybe Study 2 provides more compelling evidence.
Summary of Study 2
Psychology undergraduates
were instructed (and rewarded with course credits) to read a letter allegedly written
by another student on the same course and allegedly taking part in the same
study (“Person A”). Person A’s letter included the information that he or she
had recently bought a new car and “had never been this low on funds”. Participants in the “low distress” condition
also read that this “did not really bother” Person A because he or she was
“pretty sure things will get better soon, plus at least I have a new car”.
Participants in the “high distress” condition read Person A report that the
situation “really scares me, to be honest. What if I need to pay for something
I didn’t expect?”
In each distress
condition, half the participants were told to read the letter “as objectively
as possible and pay attention only to the facts presented. Please try not to
get caught up in thoughts about how the person feels”. The other half of
participants in each distress condition were told to “try to imagine how the
person … feels … and how it has affected his or her life”.
Participants were
then told that Person A was to take part in a maths competition with another
student on their course (“Person B”) and that the winner of this competition
would be awarded $20.
Participants were also
told that “THE PURPOSE OF OUR STUDY IS TO EXMAINE THE EFFECTS OF PAIN ON MENTAL
PERFORMANCE, AND WE ARE USING HOT SAUCE AS A SOURCE OF PARTICIPANTS’ PAIN. WITH
THIS IN MIND, we would like you to assign an amount of hot sauce to [Person]
B”.
Aggression was measured by how much hot sauce
participants allocated to Person B. This was scored either “on a 6-point scale
ranging from no hot sauce to three teaspoons … in
half-teaspoon increments” (p. 1415) or, more likely, on a 7-point scale
from 1 = none to 7 = 3 teaspoons in half-teaspoon increments
(Online Appendix).
Main result: “An empathy manipulation increased
aggression … against the empathy target’s competitor, but only when the empathy
target was described as distressed” (p. 1417). “In this study, the competitor
was an innocent [person] about whom the test participant did not have … any
cause for provocation” (p. 1418).
Secondary result: As in Study 1, the ‘distress x empathy
interactive effect on aggression’ was apparently moderated by two of the three
gene variants the researchers investigated (AVPR1a rs3 and OXTR rs53576
but not AVPR1a rs1).
Issues with Study 2
Baseline for
“baseless” ‘aggression’. There
are at least two complicating factors in considering the amount of hot sauce allocated
as a valid measure of aggression (and nothing but aggression). First, participants
– all of whom had been told that Person A was short of money - may have been
trying to help Person A win the $20 competition prize by making things more
challenging for Person B. Such altruism would undoubtedly have been a bias in
favour of Person A over Person B but it is far from clear that it would be properly
thought of as aggression towards Person B.
Perhaps even more
importantly, participants were told – in block capitals – that the purpose of
the study was to examine the effects of pain on mental performance and that hot
sauce was being used to bring about such pain. They were also told that they
could assign up to 3 teaspoons of hot sauce, which they presumably thought was
very safe to do. Allocating hot sauce was therefore not in and of itself an
indication of aggression. At some non-zero level it was (perhaps also) complying with the apparent requirements of
the experiment. How much hot sauce was ‘enough’ to do what the
experimenters said they wanted? That is, at what level did allocating hot sauce
become clear evidence of aggression,
i.e., a deliberate attempt to harm Person B in addition to or instead of
wanting to fulfil the experimental protocol (and/or help Person A)? The short
answer is that we don’t know. As far as I can tell, participants were not asked
if they were trying to be aggressive. In the absence of a control group (in
which participants would not have been given distress information about Person
A and would also not have been given instructions about how to read their
note), all we can do is look at difference in hot sauce allocation across
conditions to see if something can serve as a “baseline” level above which aggression
might be indicated.
Empathy, distress, and ‘aggression’. Here are the key results for the ‘empathy
x distress interaction on aggression’ analysis reported on pp. 1415-6, with the
vertical axis indicating how many half-teaspoons of hot sauce were allocated
(with 1 = zero and 7 = 3 teaspoons):
The authors focus
on the fact that when Person A reported relatively high distress, participants
in the perspective taking condition (assumed to have relatively high empathy) allocated
more hot sauce than participants in the objective reading condition (assumed to
have relatively low empathy). This is the basis of their claim that, in such
circumstances, “empathy leads to aggression” (p. 1416). Other interpretations
of these results are possible. Most obviously, maybe something about being in
the objective-reading condition lowered aggression when Person A was in
distress.
It might be thought
that the low other-distress conditions provide a baseline from which to
evaluate other results. In these conditions, after all, there seems no obvious reason
to expect participants to have been aggressive (rather than, say, simply complying
with perceived procedural demands, trying to do Person A a favour, or similar).
Hot sauce allocation in these conditions fell between M = 3.91 (when participants
were perspective taking) and M = 4.64 (when they were reading objectively).
When Person A reported distress, participants following perspective taking
instructions recommended allocation of hot sauce that fell at about the top end
of this potential baseline (M = 4.70), while participants reading
objectively recommended allocation of hot sauce at levels that fell a little
below its bottom end (M = 3.57).
In other words, the
significant distress x empathy interaction seems to be being driven at least as
much by what is happening when there is thought to be no empathy as when there
is, suggesting that at least some of the key results in Study 2 came about from
what happened when participants were told to be objective and Person A was in
distress - which seems to have reduced ‘aggression’ towards Person B. Ignoring
this fact makes the authors’ emphasis at best partial.
Empathy and Distress. Any interpretation of the results in Study
2 relies heavily on the intended manipulations having worked. That is,
participants in the “high distress” conditions need to have perceived more
other-distress than participants in the “low distress” condition and
participants in the “high empathy” conditions need to have felt more empathy
towards Person A than people in the “low empathy” conditions. As far as I can
tell, no checks were made to ensure these things.
The empathy
manipulation in particular looks as though it might not have worked as
intended. Participants in the perspective taking condition were told: “Try not
to concern yourself with attending to all the information presented, because it
will distract you from recall for the relevant information”. Not attending to
all the information may have led some participants not to pay attention to the
perspective taking manipulation (which was placed at the end of the letter)
and/or the instructions’ emphasis on the need for “recall of relevant
information” may have undermined attempts to evoke emotional empathy.
Participants in the objective conditions were told to “pay attention … to the
facts presented”. This may have led participants in this condition to become
particularly aware when Person A revealed their money concerns and enhanced
participants’ other-understanding and concern, i.e., their empathy. The
possibility of any or all of these processes undermining the success of the
‘empathy’ manipulation makes it particularly unfortunate that there was no
report that participants in the perspective-taking condition experienced more
empathy than did participants in the objective reading condition. Without that
reassurance, it is difficult to confidently attributed differences across the
‘empathy’ conditions to actual differences in empathy across those conditions.
The above paragraph
is of course mostly based on speculation and possibility. But in the absence of
evidence from manipulation checks or similar, so is the authors’ interpretation
of their results. The authors of course assume that their empathy and
other-distress manipulations worked as intended. And maybe they did. But maybe they
didn’t. Without knowing, no one can confidently say what differences
across conditions led to any differences in outcome variables.
Summary
Both studies in
Buffone and Poulin (2014) are multiply flawed. Prominent among their problems is
how they measure aggression. Study 1 measured aggression by counting how many
different ways participants remembered intervening (physical, verbally, and/or
in another way) when cared-for others were in various sorts of conflict with
third parties. Study 2 measured aggression by seeing how much hot sauce
participants allocated to someone who they were instructed to cause pain to and
who was in mentally-demanding competition for a cash prize with someone else
who participants were informed was in need of money. Even if these crucial outcome
measures were thought to be valid, there are multiple reasons why one cannot confidently
interpret the results of either study, e.g., problematic measurement of key
variables thought to causally affect aggression, a lack of control groups and
manipulation checks, inadequate sample sizes for the analyses run and the
conclusions reached, etc. And if one puts all of this to one side, one of the
two studies had results which do not support the authors’ central claim.
In short, Buffone
and Poulin (2014) provides no good evidence for the extraordinary
claim that increasing empathy fosters baseless aggression and/or aggression
without provocation. And, as far as I know, no one else does, either. And I’m willing to bet that no one will.
Sadly, I’m also
willing to bet that nothing anyone says will deter Paul Bloom from continuing
his campaign “against empathy”.
References and further reading
Bloom, P. (2015, September 25). The dark side
of empathy: How caring for one person can foster baseless aggression towards
another. The Atlantic, http://www.theatlantic.com/science/archive/2015/09/the-violence-of-empathy/407155/
Bloom, P. (2014, August 26). Against empathy.
Boston Review, http://www.bostonreview.net/forum/paul-bloom-against-empathy
Bloom,P.
(2013, May 20). The baby in the well: The case against empathy. The New
Yorker, http://www.newyorker.com/magazine/2013/05/20/the-baby-in-the-well
Buffone, A. E., & Poulin, M. J.
(2014). Empathy, target distress, and neurohormone genes interact to
predict aggression for others-even without provocation. Personality and
Social Psychology Bulletin, 40, 1406-1422. doi:
10.1177/0146167214549320. And its Online Appendix: http://psp.sagepub.com/content/suppl/2014/09/10/0146167214549320.DC1/10.1177_0146167214549320_Online_Appendix.pdf
Picture credits
Dangerous love [Link]
Saucy costumes [Link]
How to cite this
blog post using APA Style
T. Farsides.
(2015, November 26). Does empathy promote baseless aggression? Retrieved
from http://tomfarsides.blogspot.com/2015/11/can-empathy-foster-baseless-aggression.html