2 New De velop ments in Program Evaluation
The econometric literature on estimating causal effects has been a very active one for over three
decades now. Since the early 1990s the potential outcome, or Neyman-Rubin Causal Model,
approach to these problems has gained substantial acceptance as a fr amework for analyzing
causal problems. (We should note, however, that there is a complementary approach based on
graphical models (e.g., Pearl [2000]) that is widely used in other disciplines, though less so in
economics.) In the potential outcome approach, there is for each unit i, and each level of the
treatment w, a potential outcome Y
i
(w), that describes the level of the outcome under treatment
level w for that unit. In this perspective, causal effects are comparisons of pairs of potential
outcomes for the same unit, e.g., the difference Y
i
(w
′
) − Y
i
(w). Because a given unit can only
receive one level of the treatment, say W
i
, and only the corresponding level of the outcome,
Y
obs
i
= Y
i
(W
i
) can be observed, we can never directly observe the causal effects, which is what
Holland [1986] calls the “fundamental problem of causal inference.” Estimates of causal effects
are ultimat ely based on comparisons of different units with different levels of the treatment.
A large part of the causal or treatment effect literature has f ocused on estimating average
treatment effects in a binary treatment setting under the unconfoundedness assumption ( e.g.,
Rosenbaum and Rubin [1983a]),
W
i
⊥⊥
Y
i
(0), Y
i
(1)
X
i
.
Under this assumption, associational or correlational relations such as E[Y
obs
i
|W
i
= 1, X
i
=
x] − E[Y
obs
i
|W
i
= 0, X
i
= x] can be g iven a causal interpretation as the average treatment
effect E[Y
i
(1) − Y
i
(0)|X
i
= x]. The literature on estimating average treatment effects under
unconfoundedness is by now a very mature litera t ure, with a number of competing estima-
tors and many applications. Some estimators use matching methods, some rely on weighting,
and some involve the propensity score, the conditio nal probability of receiving the treatment
given the covariates, e(x) = pr(W
i
= 1|X
i
= x). There a r e a number of recent reviews of
the general literature (Imbens [2004], Imbens and Rubin [2015], and fo r a different perspective
Heckman and Vytlacil [2007a,b]), and we do not review it in its entirety in this review. However,
one area with continuing developments concerns settings with many covariates, po ssibly more
than there are units. For this setting connections have been made with the machine learning and
[4]