Counterfactuals and Policy Interventions

A synthetic control-derived counterfactual of Ecuador’s predicted GDP in the absence of the 1973 oil price shock. Liou and Musgrave, 2014 (not published (yet) ).

A pressing question in policy analysis concerns estimating counterfactual outcomes. Given that we only observe one world, how do we know that policymakers’ decisions had an impact compared to likely alternative outcomes? If we assess that their decisions did have an impact, how confident can we be that its impact was positive or negative? Such answers confront what social scientists call the Fundamental Problem of Causal Inference: we can’t know for certain what the outcome had a different intervention (or none) been chosen, so instead we have to infer the existence and magnitude of an effect from other sources.

This problem is not merely academic: it affects everything. Any causal claim of the form “If X, then Y; if not-X, then not-Y” requires an assumption that we can evaluate X and Y given that we will only observe one potential outcome. Things get complicated in the real world, where we might observe Y because of processes not involving X (for instance, if I drink caffeine, I may feel more alert, but I may feel more alert if I go for a bike ride instead even if I do not drink coffee) or where some other process might interrupt the postulated mechanism (if I drink caffeine, I may not feel more alert if my body has developed too high a tolerance for caffeine, for instance).

Practical people assume that speculating about such counterfactuals is a waste of time—that it is playing tennis without a net. But everyone does this, whether they acknowledge it or not. Did the TARP bailout reduce the United States’ severity and duration of the recession caused by the bursting of the financial bubble? Would a demonstration of the atomic bomb have been effective in convincing the government of Japan to surrender in World War II? Did the assassination of the archduke cause the First World War? And do higher taxes lead to lower economic development? People routinely engage in arguments like these, and their claims can usually be restated (if they are at all falsifiable) in terms amenable to counterfactual analysis. Some of these debates are intractable, in fact, exactly because they revolve around unobservable cases.

For policy analysts, there are many strategies to avoid this problem. We can assume that a case is sufficiently like other cases to allow us to use evidence from a universe of other cases to infer a causal treatment; your doctor does this when she prescribes you medicine after diagnosing your symptoms. We can make these results more or less convincing depending on the precision of a research design (bluntly, results from experiments = more convincing; results from observational studies = less convincing). But sometimes we are interested in knowing the answer to a question about a precise case: it’s well and good to know that domestic partners are more likely to commit murders than strangers, but was this murder committed by a domestic partner or by a stranger? Knowing the base rate helps, but establishing a precise causal chain is more important. And working through this logic will often involve assessing claims about whether or not the observed evidence is consistent with one or another causal process (if the murder was committed by a domestic partner, then he would have had to have been in the same location as the victim, but since we know he was in Milwaukee that is highly unlikely).

For complex causal claims, these evaluations are more difficult. Evidence is more suspect; the hypothesized causal relationships themselves are more subtle than “bullets kill people”, and the number of counterfactuals to be observed is literally infinite (and even the range of plausible counterfactuals is extremely high). Moreover, the rarer the case, the fewer the number of comparison cases, and thus the weaker our evidentiary basis for making inferences: this is why it’s easier to talk about how a given intervention affects voters than it is to forecast a presidential election. There are lots of voters, but few elections.

Nevertheless, reasonable attempts can be made, and evaluating counterfactual outcomes both prospectively (“If we do this, what is likely to happen?”) and retrospectively (“Had we done something else, would we have attained a different result?”) is crucial for assessing lessons learned and choosing strategies. Doing so as carefully and as well as possible is key. Not only is evaluating counterfactuals intrinsically difficult, but motivated reasoning and self-interest can pollute what is already a challenging process.

Recently, the US Holocaust Museum and its associated Simon-Skjodt Center for the Prevention of Genocide asked whether the Obama administration could have done anything to prevent or mitigate violence against civilians in Syria. I do not claim to know anything beyond what a general reader knows about the case, so I cannot evaluate their findings. I do, however, have a little bit of expertise in constructing and using counterfactuals; they play a major role in my dissertation and in a 2014 Comparative Political Studies article I co-wrote about evaluating the resource curse. Further, I trained with a leading qualitative methodologist (Andrew Bennett, who is blameless in this paper) and have read much (not all!) of the political-science literature on the question.

On that basis, I think I can evaluate part of the Syria project, especially the piece by Daniel Solomon: “Evaluating Counterfactual US Policy Action in Syria, 2011-2016.” Solomon takes two tacks in assessing the question. First, he builds on other work in the project to identify potential points–critical junctures–when the United States could have taken a different policy course. He identifies what other strategies were discussed or possible. Second, on the basis of within-case evidence and evidence from other studies, Solomon evaluates whether they could have led to different outcome. Solomon is engaged with the political science literature on counterfactuals and cites many of the classic and emergent works on the question of how to conduct inference (although he does not engage with the potential-outcomes framework I lay out above, the basis of the claims he makes are largely equivalent to that framework). In particular, Solomon examines

  1. US provision of lethal support to the armed opposition
  2. the utility of limited (air) strikes against Syrian government targets
  3. no-fly zones

All of these are policy options that were hotly debated at the time and since. They therefore meet even the (perhaps overly) restrictive criterion that policymakers have actually considered an option laid out in Niall Ferguson’s Virtual History. (I think this is too restrictive because if an action was plausible but not discussed for idiosyncratic or strategic reasons it may still have been reasonable; for instance, FDR’s death in January 1945 would have triggered plausible and unwritten counterfactuals that Ferguson would ignore.)

Solomon carefully seeks to classify each of these options and the Syria case itself within the broader framework of scholarship about civil wars. In this, he runs into two problems, one of which he mentions explicitly and the other he deals with implicitly. The explicit problem is that “the small universe of contemporary cases relevant to the Syrian civil war is characterized by extensive heterogeneity. Even where relevant cases share some similar characteristics, diverse endogenous and exogenous factors interact to produce unanticipated causal effects.” (p. 2) This isn’t just a  throwaway caveat; it’s an important recognition of the problem (in an academic sense) that there are too few cases like Syria’s civil war to give us a sense of what the range of interventions and associated outcomes are. (Political scientists frequently lament the lack of data whose generation would require tragedies that no political scientist would ever want to have occur.)

The implicit problem Solomon notes here is that the cases we do observe have not been randomly assigned to treatments; rather, their interventions may be polluted by endogeneity. Endogeneity refers to the concern that we may observe certain ‘treatments’  only when agents perceive them to be beneficial, for instance; we assume that college students benefit from their college educations, but estimating how much they benefit from those educations is difficult because the same types of people who would likely have superior later-in-life outcomes (income, civic activism, etc) are more likely to attend college in the first place. In the same way, saying that policy X generated outcome Y in situation Z does not mean that we can assert that X will always give us Y; perhaps we observe that intervention only when Y is already assured, in which case the causal claim is much weaker. Nevertheless, we do have to start somewhere, and any caution along these lines applies a fortiori to those who want to argue that Obama could have done more as much as to those who applaud the administration’s relative inaction.

Solomon recognizes these limitations and deals with them in two ways. First, he seeks to see if quantitative studies of like cases have a consistent finding. If they do, he will use that finding as a way to evaluate the counterfactual claim. Second, if there is mixed evidence, he will analyze the most-like cases in order to evaluate what is most-like the Syrian range of potential outcomes. This seems reasonable to me, although I think that the second process (“most like”) could have also have been employed even when average findings tend toward a common outcome; perhaps the average treatment effect does not hold across cases most-like Syria, regardless of the consensus on that average.

Note that Solomon’s methodology here must deal with the fact that evidence about the motivations, capabilities, and intentions of all the actors–Assad, Obama, Gulf rulers, etc.–is partial. We don’t know what these folks were thinking and are unlikely to in any reasonable amount of time. This complicates the analysis in a sense but, again, is a limitation that applies to all debates on this issue and is not unique to Solomon’s analysis. Furthermore, his methodology would prove able to cope with additional information, since new information would allow future analysts to refine his typing of cases and thereby provide more granularity about what cases really were informative.

Solomon consistently describes each policy option’s deliberations (from open sources), summarizes the broader academic literature about the magnitude and processes generating effects in case literature, and then applies those findings to the policy option in context. For instance, to evaluate whether providing external lethal support to the armed Syrian opposition would have led to lower civilian casualties, he first sketches the debate within the administration, including causal claims over the efficacy of that dispute. He establishes that advocates of the action argued that it would have “allowed the [Free Syrian Army] and other ‘moderate’ opposition groups to achieve military parity with the regime, thereby imposing extreme costs for new violence against civilians” and “would have given the United States and its partners greater influence over the makeup of the opposition, allowing moderate groups less likely to perpetrate atrocities an upper hand over jihadi factions.” (p. 8) Solomon then concludes from a review of academic studies about support for rebel groups that external support prolongs civil wars in general, but that external support in the early years of a conflict increases the likelihood of a rebel victory and a negotiated end. However, external support is likely to lead to greater atrocities, and any mediating influence exerted by a democratic sponsor (such as the United States) is reduced by the influence of other donors (such as the Gulf countries). Accordingly, Solomon concludes (p. 13), “the most likely consequence of greater and earlier support to the Syrian opposition would have been an increase in atrocities against civilians.”

From this process, Solomon arrives at an overall conclusion (p. 22) that “the effect of US action on the duration, severity, and extent [of atrocities] often hinges on the response of other external conflict actors”–that, in other words, talking only about the US options without thinking through what Russia, Hezbollah, and Iran would have done in response to different US strategy choices badly biases accounts. He continues that “Evidence from qualitative and quantitative studies, however, indicates that military intervention is more effective in reducing the duration of conflict and increasing the likelihood of negotiated settlement when accompanied by different forms of economic pressure and diplomatic mediation. The importance of a multifaceted strategy contradicts arguments, commonly expressed in the context of Syria, about the independent signaling power of military force.”

Any conclusion rests on the strength and reliability of the method chosen and the quality of the data to which that methodology is applied. I am not a civil war or intervention expert (and I try not to play one even in the classroom). But I can attest that the methodology chosen here, assuming that the evidence and literature is honestly and properly summarized, is one that should yield reliable results. The conclusions of the piece may be politically welcome or unwelcome depending on the priors observers have, but it is those priors that should shift in response to the research rather than the other way around.