## 2019

Gelman, A., Imbens, G.,**Why High-Order Polynomials Should Not Be Used in Regression Discontinuity Designs**,

*Journal of Business & Economic Statistics*, 2019. Published Paper

It is common in regression discontinuity analysis to control for third, fourth, or higher-degree polynomials of the forcing variable. There appears to be a perception that such methods are theoretically justified, even though they can lead to evidently nonsensical results. We argue that controlling for global high-order polynomials in regression discontinuity analysis is a flawed approach with three major problems: it leads to noisy estimates, sensitivity to the degree of the polynomial, and poor coverage of confidence intervals. We recommend researchers instead use estimators based on local linear or quadratic polynomials or other smooth functions.Imbens, G., Wager, S.,

**Optimized Regression Discontinuity Designs**,

*The Review of Economics and Statistics*, May 2019.

The increasing popularity of regression discontinuity methods for causal inference in observational studies has led to a proliferation of different estimating strategies, most of which involve first fitting nonparametric regression models on both sides of a treatment assignment boundary and then reporting plug-in estimates for the effect of interest. In applications, however, it is often difficult to tune the nonparametric regressions in a way that is well calibrated for the specific target of inference; for example, the model with the best global in-sample fit may provide poor estimates of the discontinuity parameter, which depends on the regression function at boundary points. We propose an alternative method for estimation and statistical inference in regression discontinuity designs that uses numerical convex optimization to directly obtain the finite-sample-minimax linear estimator for the regression discontinuity parameter, subject to bounds on the second derivative of the conditional response function. Given a bound on the second derivative, our proposed method is fully data driven and provides uniform confidence intervals for the regression discontinuity parameter with both discrete and continuous running variables. The method also naturally extends to the case of multiple running variables.Athey S., Bayati M., Imbens G., Qu Z.,

**Ensemble Methods for Causal Effects in Panel Data Settings**.

*American Economics Review Papers and Proceedings,*May 2019. Published Paper | Working Paper

This paper studies a panel data setting where the goal is to estimate causal effects of an intervention by predicting the counterfactual values of outcomes for treated units, had they not received the treatment. Several approaches have been proposed for this problem, including regression methods, synthetic control methods and matrix completion methods. This paper considers an ensemble approach, and shows that it performs better than any of the individual methods in several economic datasets. Matrix completion methods are often given the most weight by the ensemble, but this clearly depends on the setting. We argue that ensemble methods present a fruitful direction for further research in the causal panel data setting.Bertanha, M., Imbens G.,

**External Validity in Fuzzy Regression Discontinuity Designs**.

*Journal of Business & Economic Statistics*, April 2019. Working Paper

Fuzzy regression discontinuity designs identify the local average treatment effect (LATE) for the subpopulation of compliers, and with forcing variable equal to the threshold. We develop methods that assess the external validity of LATE to other compliance groups at the threshold, and allow for identification away from the threshold. Specifically, we focus on the equality of outcome distributions between treated compliers and always-takers, and between untreated compliers and never-takers. These equalities imply continuity of expected outcomes conditional on both the forcing variable and the treatment status. We recommend that researchers plot these conditional expectations and test for discontinuities at the threshold to assess external validity. We provide new commands in STATA and MATLAB to implement our proposed procedures.

## 2018

Imbens G., **Comments on Understanding and Misunderstanding Randomized Controlled Trials: A commentary on Cartwright and Deaton**. *Social Science & Medicine*, 2018; 210: 50-52 Published Paper

Deaton and Cartwright (DC2017 from hereon) view the increasing popularity of randomized experiments in social sciences with some skepticism. They are concerned about the quality of the inferences in practice, and fear that researchers may not fully appreciate the pitfalls and limitations of such experiments. I am more sanguine about the recent developments in empirical practice in economics and other social sciences, and am optimistic about the ongoing research in this area, both empirical and theoretical. I see the surge in use of randomized experiments as part of what Angrist and Pischke [2010] call the credibility revolution, where, starting in the late eighties and early nineties a group of researchers associated with the labor economics group at Princeton University, including Orley Ashenfelter, David Card, Alan Krueger and Joshua Angrist, led empirical researchers to pay more attention to the identification strategies underlying empirical work. This has led to important methodological developments in causal inference, including new approaches to instrumental variables, difference-in-differences, regression discontinuity designs, and, most recently, synthetic control methods (Abadie et al. [2010]). I view the increased focus on randomized experiments in particular in development economics, led by researchers such as Michael Kremer, Abhijit Banerjee, Esther Duflo, and their many coauthors and students, as taking this development even further.1 Nothwithstanding the limitations of experimentation in answering some questions, and the difficulties in implementation, these developments have greatly improved the credibility of empirical work in economics compared to the standards prior to the mid-eighties, and I view this as a major achievement by these researchers. It would be disappointing if DC2017 takes away from this, and were to move empirical practice away from the attention paid to identification and the use of randomized experiments. In the remainder of this comment I will discuss four specific issues. Some of these elaborate on points I raised in a previous discussion of D2010, Imbens [2010].

Perlis R., Mehta R., Edwards A., Tiwari A., Imbens G., **Pharmacogenetic Testing Among Patients With Mood and Anxiety Disorders Is Associated With Decreased Utilization and Cost: A Propensity-Score Matched Study, ***Depression and Anxiety*, 2019; 35: 10 Published Paper

Naturalistic and small randomized trials have suggested that pharmacogenetic testing may improve treatment outcomes in depression, but its cost-effectiveness is not known. There is growing enthusiasm for personalized medicine, relying on genetic variation as a contributor to heterogeneity of treatment effects. We sought to examine the relationship between a commercial pharmacogenetic test for psychotropic medications and 6-month cost of care and utilization in a large commercial health plan.

Zhang X., Faries D., Li H., Stamey J., **Addressing Unmeasured Confounding in Comparative Observational Research**. *Pharmacoepidemiology & Drug Safety*, 2018; 27(4): 373-382. Published Paper

Observational pharmacoepidemiological studies can provide valuable information on the effectiveness or safety of interventions in the real world, but one major challenge is the existence of unmeasured confounder(s). While many analytical methods have been developed for dealing with this challenge, they appear under-utilized, perhaps due to the complexity and varied requirements for implementation. Thus, there is an unmet need to improve understanding the appropriate course of action to address unmeasured confounding under a variety of research scenarios.

Athey S., Imbens G., Wager S., **Approximate Residual Balancing: Debiased Inference of Average Treatment Effects in High Dimensions**. *Journal of Royal Statistical Society-Series B*, 2018; 80: 4: 597-623 | Published Paper

## 2017

Athey S., Imbens G., Pham T., Wager S., **Estimating Average Treatment Effects: Supplementary Analyses and Remaining Challenges**. *Amercan Economic Review*, 2017; 107(5): 278-281. Published Paper | Working Paper

There is a large literature on semiparametric estimation of average treatment effects under unconfounded treatment assignment in settings with a fixed number of covariates. More recently attention has focused on settings with a large number of covariates. In this paper we extend lessons from the earlier literature to this new setting. We propose that in addition to reporting point estimates and standard errors, researchers report results from a number of supplementary analyses to assist in assessing the credibility of their estimates.

## 2016

Yang S., Imbens G., Cui Z., Faries D., Zbigniew K., **Propensity Score Matching and Subclassification in Observational Studies with Multi-Level Treatments**. *Biometrics*, 2016;72(4):1055-1065. Published Paper | Working Paper

In this article, we develop new methods for estimating average treatment effects in observational studies, in settings with more than two treatment levels, assuming unconfoundedness given pretreatment variables. We emphasize propensity score subclassification and matching methods which have been among the most popular methods in the binary treatment literature. Whereas the literature has suggested that these particular propensity-based methods do not naturally extend to the multi-level treatment case, we show, using the concept of weak unconfoundedness and the notion of the generalized propensity score, that adjusting for a scalar function of the pretreatment variables removes all biases associated with observed pretreatment variables. We apply the proposed methods to an analysis of the effect of treatments for fibromyalgia. We also carry out a simulation study to assess the finite sample performance of the methods relative to previously proposed methods.

Imbens G., Kolesár M. **Robust Standard Errors in Small Samples: Some Practical Advice**. *The Review of Economics and Statistics*, 2016;98(4):701-712. Published Paper | Working Paper

We study the properties of heteroskedasticity-robust confidence intervals for regression parameters. We show that confidence intervals based on a degrees-of-freedom correction suggested by Bell and McCaffrey (2002) are a natural extension of a principled approach to the Behrens-Fisher problem. We suggest a further improvement for the case with clustering. We show that these standard errors can lead to substantial improvements in coverage rates even for samples with fifty or more clusters. We recommend that researchers routinely calculate the Bell-McCaffrey degrees-of-freedom adjustment to assess potential problems with conventional robust standard errors.

Abadie, A, Imbens G. **Matching on the Estimated Propensity Score**. *Econometrica*, 2016;84(2):781-807. Published Paper | Working Paper

Propensity score matching estimators (Rosenbaum and Rubin (1983)) are widely used in evaluation research to estimate average treatment effects. In this article, we derive the large sample distribution of propensity score matching estimators. Our derivations take into account that the propensity score is itself estimated in a first step, prior to matching. We prove that first step estimation of the propensity score affects the large sample distribution of propensity score matching estimators, and derive adjustments to the large sample variances of propensity score matching estimators of the average treatment effect (ATE) and the average treatment effect on the treated (ATET). The adjustment for the ATE estimator is negative (or zero in some special cases), implying that matching on the estimated propensity score is more efficient than matching on the true propensity score in large samples. However, for the ATET estimator, the sign of the adjustment term depends on the data generating process, and ignoring the estimation error in the propensity score may lead to confidence intervals that are either too large or too small.

## 2015

Kolesar M, Chetty R, Friedman J, Glaeser E, Imbens G. **Identification and Inference With Many Invalid Instruments**. *Journal of Business & Economic Statistics*, 2015;50(2):373-419. Published Paper | Working Paper

We study estimation and inference in settings where the interest is in the effect of a potentially endogenous regressor on some outcome. To address the endogeneity, we exploit the presence of additional variables. Like conventional instrumental variables, these variables are correlated with the endogenous regressor. However, unlike conventional instrumental variables, they also have direct effects on the outcome, and thus are “invalid” instruments. Our novel identifying assumption is that the direct effects of these invalid instruments are uncorrelated with the effects of the instruments on the endogenous regressor. We show that in this case the limited-information-maximum-likelihood (liml) estimator is no longer consistent, but that a modification of the bias-corrected two-stage-least-square (tsls) estimator is consistent. We also show that conventional tests for over-identifying restrictions, adapted to the many instruments setting, can be used to test for the presence of these direct effects. We recommend that empirical researchers carry out such tests and compare estimates based on liml and the modified version of bias-corrected tsls. We illustrate in the context of two applications that such practice can be illuminating, and that our novel identifying assumption has substantive empirical content.

Imbens G. **Matching Methods in Practice**. *Journal of Human Resources*, 2015;50(2):373-419. Published Paper | Working Paper

There is a large theoretical literature on methods for estimating causal effects under unconfoundedness, exogeneity, or selection-on-observables type assumptions using matching or propensity score methods. Much of this literature is highly technical and has not made inroads into empirical practice where many researchers continue to use simple methods such as ordinary least squares regression even in settings where those methods do not have attractive properties. In this paper, I discuss some of the lessons for practice from the theoretical literature and provide detailed recommendations on what to do. I illustrate the recommendations with three detailed applications.

## 2014

Abadie, A, Imbens G, Zheng, F. **Inference for Misspecified Models With Fixed Regressors**. *Journal of the American Statistical Association*, 2014;109(508):1601-1614. Published Paper | Working Paper

Following the work by Eicker, Huber, and White it is common in empirical work to report standard errors that are robust against general misspecification. In a regression setting, these standard errors are valid for the parameter that minimizes the squared difference between the conditional expectation and a linear approximation, averaged over the population distribution of the covariates. Here, we discuss an alternative parameter that corresponds to the approximation to the conditional expectation based on minimization of the squared difference averaged over the sample, rather than the population, distribution of the covariates. We argue that in some cases this may be a more interesting parameter. We derive the asymptotic variance for this parameter, which is generally smaller than the Eicker–Huber–White robust variance, and propose a consistent estimator for this asymptotic variance. Supplementary materials for this article are available online.

Graham, B, Imbens, G, Ridder, G. **Complementarity and aggregate implications of assortative matching: A nonparametric analysis**. *Quantitative Economics*, 2014;5(1):29-66. Published Paper | Working Paper

This paper presents econometric methods for measuring the average output effect of reallocating an indivisible input across production units. A distinctive feature of reallocations is that, by definition, they involve no augmentation of resources and, as such, leave the marginal distribution of the reallocated input unchanged. Nevertheless, if the production technology is nonseparable, they may alter average output. An example is the reallocation of teachers across classrooms composed of students of varying mean ability. We focus on the effects of reallocating one input, while holding the assignment of another, potentially complementary, input fixed. We introduce a class of such reallocations–correlated matching rules–that includes the status quo allocation, a random allocation, and both the perfect positive and negative assortative matching allocations as special cases. We also characterize the effects of small changes in the status quo allocation. Our analysis leaves the production technology nonparametric. Identification therefore requires conditional exogeneity of the input to be reallocated given the potentially complementary (and possibly other) input(s). We relate this exogeneity assumption to the pairwise stability concept used in the game theoretic literature on two-sided matching models with transfers. For estimation, we use a two-step approach. In the first step, we nonparametrically estimate the production function. In the second step, we average the estimated production function over the distribution of inputs induced by the new assignment rule. Our methods build upon the partial mean literature, but require extensions involving boundary issues and the fact that the weight function used in averaging is itself estimated. We derive the large-sample properties of our proposed estimators and assess their small-sample properties via a limited set of Monte Carlo experiments. Our characterization of the large-sample properties of estimated correlated matching rules uses a new result on kernel estimated “double averages,” which may be of independent interest.

## 2013

Goldsmith-Pinkham, P, Imbens G. **Social Networks and the Identification of Peer Effects**. *Journal of Business & Economic Statistics*, 2013;31(3):253-264. Published Paper | Working Paper

There is a large and growing literature on peer effects in economics. In the current article, we focus on a Manski-type linear-in-means model that has proved to be popular in empirical work. We critically examine some aspects of the statistical model that may be restrictive in empirical analyses. Specifically, we focus on three aspects. First, we examine the endogeneity of the network or peer groups. Second, we investigate simultaneously alternative definitions of links and the possibility of peer effects arising through multiple networks. Third, we highlight the representation of the traditional linear-in-means model as an autoregressive model, and contrast it with an alternative moving-average model, where the correlation between unconnected individuals who are indirectly connected is limited. Using data on friendship networks from the Add Health dataset, we illustrate the empirical relevance of these ideas.

## 2012

Abadie A., Imbens G. **A Martingale Representation for Matching Estimators**. *Journal of the American Statistical Association,* 2012;107(498):833-843. Published Paper | Working Paper

Matching estimators (Rubin, 1973a, 1977; Rosenbaum, 2002) are widely used in statistical data analysis. However, the large sample distribution of matching estimators has been derived only for particular cases (Abadie and Imbens, 2006). This article establishes a martingale representation for matching estimators. This representation allows the use of martingale limit theorems to derive the large sample distribution of matching estimators. As an illustration of the applicability of the theory, we derive the asymptotic distribution of a matching estimator when matching is carried out without replacement, a result previously unavailable in the literature. In addition, we apply the techniques proposed in this article to derive a correction to the standard error of a sample mean when missing data are imputed using the “hot deck”, a matching imputation method widely used in the Current Population Survey (CPS) and other large surveys in the social sciences. We demonstrate the empirical relevance of our methods using two Monte Carlo designs based on actual data sets. In these realistic Monte Carlo exercises the large sample distribution of matching estimators derived in this article provides an accurate approximation to the small sample behavior of these estimators. In addition, our simulations show that standard errors that do not take into account hot deck imputation of missing data may be severely downward biased, while standard errors that incorporate the correction proposed in this article for hot deck imputation perform extremely well. This result demonstrates the practical relevance of the standard error correction for the hot deck proposed in this article.

Imbens G, Barrios T, Diamond R, Kolesar M. **Clustering, Spatial Correlations and Randomization Inference**. *Journal of the American Statistical Association*, 2012;107(498):578-591. Published Paper | Working Paper

It is standard practice in empirical work to allow for clustering in the error covariance matrix if the explanatory variables of interest vary at a more aggregate level than the units of observation. Often, however, the structure of the error covariance matrix is more complex, with correlations varying in magnitude within clusters, and not vanishing between clusters. Here we explore the implications of such correlations for the actual and estimated precision of least squares estimators. We show that with equal sized clusters, if the covariate of interest is randomly assigned at the cluster level, only accounting for non-zero covariances at the cluster level, and ignoring correlations between clusters, leads to valid standard errors and confidence intervals. However, in many cases this may not suffice. For example, state policies exhibit substantial spatial correlations. As a result, ignoring spatial correlations in outcomes beyond that accounted for by the clustering at the state level, may well bias standard errors. We illustrate our findings using the 5% public use census data. Based on these results we recommend researchers assess the extent of spatial correlations in explanatory variables beyond state level clustering, and if such correlations are present, take into account spatial correlations beyond the clustering correlations typically accounted for.

Imbens G, Kalyanaraman K. **Optimal Bandwidth Choice for the Regression Discontinuity Estimator**. *Review of Economic Studies*, 2012;79(3):933-959. Published Paper | Working Paper

We investigate the problem of optimal choice of the smoothing parameter (bandwidth) for the regression discontinuity estimator. We focus on estimation by local linear regression, which was shown to be rate optimal (Porter, 2003). Investigation of an expected-squared- error–loss criterion reveals the need for regularization. We propose an optimal, data dependent, bandwidth choice rule. We illustrate the proposed bandwidth choice using data previously analyzed by Lee (2008), as well as in a simulation study based on this data set. The simulations suggest that the proposed rule performs well.

## 2011

Imbens G, Abadie A. **Bias-Corrected Matching Estimators for Average Treatment Effects**. *Journal of Business and Economic Statistics*, 2011;29(1):1-11. Published Paper | Working Paper

Matching estimators for average treatment effects are widely used in evaluation research despite the fact that their large sample properties have not been established in many cases. In this article, we develop a new framework to analyze the properties of matching estimators and establish a number of new results. First, we show that matching estimators include a conditional bias term which may not vanish fast enough for the estimators to be root-N-consistent. Second, we show that even after removing the conditional bias, matching estimators with a fixed number of matches are mnot efficient, although the efficiency loss may be small. Third, we propose a bias-correction that removes the conditional bias asymptotically, making matching estimators root-N-consistent. Fourth, we provide a new estimator for the conditional variance that does not require consistent nonparametric estimation of unknown functions. We carry out a small simulation study based where a simple implementation of the bias-corrected matching estimator performs well compared to both simple matching estimators and to regression estimators in terms of bias and root-mean-squared-error. Software for implementing the proposed estimators in STATA and Matlab is available from the authors on the web.

## 2010

Imbens G. **Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009)**. *Journal of Economic Literature*, 2010;48(2):399-423. Published Paper | Working Paper

Two recent papers, Deaton (2009), and Heckman and Urzua (2009), argue against what they see as an excessive and inappropriate use of experimental and quasi-experimental methods in empirical work in economics in the last decade. They specifically question the increased use of instrumental variables and natural experiments in labor economics, and of randomized experiments in development economics. In these comments I will make the case that this move towards shoring up the internal validity of estimates, and towards clarifying the description of the population these estimates are relevant for, has been important and beneficial in increasing the credibility of empirical work in economics. I also address some other concerns raised by the Deaton and Heckman-Urzua papers.

Imbens G. **An Economist’s Perspective on Shadish (2010) and West and Thoemmes (2010)**. *Psychological Methods*, 2010;15(1):47-55. Published Paper

In Shadish (2010) and West and Thoemmes (2010), the authors contrasted 2 approaches to causality. The first originated in the psychology literature and is associated with work by Campbell (e.g., Shadish, Cook, & Campbell, 2002), and the second has its roots in the statistics literature and is associated with work by Rubin (e.g., Rubin, 2006). In this article, I discuss some of the issues raised by Shadish and by West and Thoemmes. I focus mostly on the impact the 2 approaches have had on research in a 3rd field, economics. In economics, the ideas of both Campbell and Rubin have been very influential, with some of the methods they developed now routinely taught in graduate programs and routinely used in empirical work and other methods receiving much less attention. At the same time, economists have added to the understanding of these methods and through these extensions have further improved researchers’ ability to draw causal inferences in observational studies.

## 2009

Imbens G, Donald S, Newey W. **Choosing Instrumental Variables in Conditional Moment Restriction Models**. *Journal of Econometrics*. 2009;152(1):28-36. Published Paper

Properties of GMM estimators are sensitive to the choice of instrument. Using many instruments leads to high asymptotic asymptotic efficiency but can cause high bias and/or variance in small samples. In this paper we develop and implement asymptotic mean square error (MSE) based criteria for instrument selection in estimation of conditional moment restriction models. The models we consider include various nonlinear simultaneous equations models with unknown heteroskedasticity. We develop moment selection criteria for the familiar two-step optimal GMM estimator (GMM), a bias corrected version, and generalized empirical likelihood estimators (GEL), that include the continuous updating estimator (CUE) as a special case. We also find that the CUE has lower higher-order variance than the bias-corrected GMM estimator, and that the higher-order efficiency of other GEL estimators depends on conditional kurtosis of the moments.

Imbens G, Crump R, Hotz JV, Mitnik O. **Dealing with Limited Overlap in Estimation of Average Treatment Effects**. *Biometrika*, 2009;96(1):187-199. Published Paper

Estimation of average treatment effects under unconfounded or ignorable treatment assignment is often hampered by lack of overlap in the covariate distributions between treatment groups. This lack of overlap can lead to imprecise estimates, and can make commonly used estimators sensitive to the choice of specification. In such cases researchers have often used {ad hoc} methods for trimming the sample. We develop a systematic approach to addressing lack of overlap. We characterize optimal subsamples for which the average treatment effect can be estimated most precisely. Under some conditions, the optimal selection rules depend solely on the propensity score. For a wide range of distributions, a good approximation to the optimal rule is provided by the simple rule of thumb to discard all units with estimated propensity scores outside the range [0.1,0.9].

Imbens G, Newey W. **Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity**. *Econometrica*, 2009;77(5):1481-1512. Published Paper | Working Paper

This paper uses control variables to identify and estimate models with nonseparable, multidimensional disturbances. Triangular simultaneous equations models are considered, with instruments and disturbances independent and reduced form that is strictly mono- tonic in a scalar disturbance. Here it is shown that the conditional cumulative distribution function of the endogenous variable given the instruments is a control variable. Also, for any control variable, identification results are given for quantile, average, and policy effects. Bounds are given when a common support assumption is not satisfied. Estimators of identified objects and bounds are provided and a demand analysis empirical example given.

Imbens G, Wooldridge J. **Recent Developments in the Econometrics of Program Evaluation**. *Journal of Economic Literature*, 2009;47(1):5-86. Published Paper | Working Paper

Many empirical questions in economics and other social sciences depend on causal effects of programs or policies. In the last two decades much research has been done on the econometric and statistical analysis of the effects of such programs or treatments. This recent theoretical literature has built on, and combined features of, earlier work in both the statistics and econometrics literatures. It has by now reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization and other areas of empirical micro-economics. In this review we discuss some of the recent developments. We focus primarily on practical issues for empirical researchers, as well as provide a historical overview of the area and give references to more technical research.

## 2008

Imbens G, Abadie A. **On the Failure of the Bootstrap for Matching Estimators**. *Econometrica*, 2008;76(6):1537-1557. Published Paper | Working Paper

Matching estimators are widely used for the evaluation of programs or treatments. Often researchers use bootstrapping methods for inference. However, no formal justification for the use of the bootstrap has been provided. Here we show that the bootstrap is in general not valid, even in the simple case with a single continuous covariate when the estimator is root-N consistent and asymptotically normally distributed with zero asymptotic bias. Due to the extreme non-smoothness of nearest neighbor matching, the standard conditions for the bootstrap are not satisfied, leading the bootstrap variance to diverge from the actual variance. Simulations confirm the difference between actual and nominal coverage rates for bootstrap confidence intervals predicted by the theoretical calculations. To our knowledge, this is the first example of a root-N consistent and asymptotically normal estimator for which the bootstrap fails to work.

Imbens G, Cvrump R, Hotz JV, Mitnik O. **Nonparametric Tests for Treatment Effect Heterogeneity**. *Review of Economics and Statistics*, 2008;90(3):389-405. Published Paper | Working Paper

In this paper we develop two nonparametric tests of treatment effect heterogeneity. The first test is for the null hypothesis that the treatment has a zero average effect for all subpopulations defined by covariates. The second test is for the null hypothesis that the average effect conditional on the covariates is identical for all subpopulations, that is, that there is no heterogeneity in average treatment effects by covariates. We derive tests that are straightforward to implement and illustrate the use of these tests on data from two sets of experimental evaluations of the effects of welfare-to-work programs.

Imbens G, Lemieux T. **Regression Discontinuity Designs: A Guide to Practice**. *Journal of Econometrics*, 2008;142(2):615-635. Published Paper | Working Paper

In Regression Discontinuity (RD) designs for evaluating causal effects of interventions, assignment to a treatment is determined at least partly by the value of an observed covariate lying on either side of a fixed threshold. These designs were first introduced in the evaluation literature by Thistlewaite and Campbell (1960). With the exception of a few unpublished theoretical papers, these methods did not attract much attention in the economics literature until recently. Starting in the late 1990s, there has been a large number of studies in economics applying and extending RD methods. In this paper we review some of the practical and theoretical issues involved in the implementation of RD methods.

Imbens G, Lemieux T. **Special Issue Editors’ Introduction: The Regression Discontinuity Design – Theory and Applications**. *Journal of Econometrics* [Internet], 2008;142(2):611-614. Published Paper | Working Paper

Regression discontinuity (RD) designs for evaluating causal effects of interventions where assignment to a treatment is determined at least partly by the value of an observed covariate lying on either side of a cutoff point were first introduced by Thistlewaite and Campbell (1960). With the exception of a few unpublished theoretical papers (Goldberger, 1972a and Goldberger, 1972b), these methods did not attract much attention in the economics literature until recently. Starting in the late 1990s, there has been a growing number of studies in economics applying and extending RD methods, starting with Van der Klaauw (2002),Lee (2007), Angrist and Lavy (1999), and Black (1999). Around the same time, key theoretical and conceptual contributions, including the interpretation of estimates for fuzzy RD designs and allowing for general heterogeneity of treatment effects, were developed by Hahn et al. (2001). The time appeared right for a conference focusing on these methods and most papers in this volume are the result of such a conference, organized by David Card and Thomas Lemieux at the Banff International Research Station (BIRS) in Banff, Canada, in the Spring of 2003. We first set the stage for the volume by reviewing some of the practical issues in implementation of RD methods. There is relatively little novel in this discussion, but it addresses some of the practical issues in implementing RD designs, including the new theoretical developments. Given the very recent nature of this renaissance of RD methods in economics, such reviews have so far been largely absent from the economic literature (exceptions include Van der Klaauw, 2008 and the discussion in Angrist and Krueger, 1999).

Imbens G, Abadie A. **Estimation of the Conditional Variance in Paired Experiments**. Annales d’Economie et de Statistique. 2008;(91-92):175-187. Published Paper | Working Paper

In paired randomized experiments units are grouped in pairs, often based on covariate information, with random assignment within the pairs. Average treatment effects are then estimated by averaging the within-pair differences in outcomes. Typically the variance of the average treatment effect estimator is estimated using the sample variance of the within-pair differences. However, conditional on the covariates the variance of the average treatment effect estimator may be substantially smaller. Here we propose a simple way of estimating the conditional variance of the average treatment effect estimator by forming pairs-of-pairs with similar covariate values and estimating the variances within these pairs-of-pairs. Even though these within-pairs-of-pairs variance estimators are not consistent, their average is consistent for the conditional variance of the average treatment effect estimator and leads to asymptotically valid confidence intervals.

## 2007

Imbens G, Athey S. **Discrete Choice Models with Multiple Unobserved Choice Characteristics**. *International Economic Review*, 2007;48(4):1159-1192. Published Paper | Working Paper

Since the pioneering work by Daniel McFadden, utility-maximization-based multinomial response models have become important tools of empirical researchers. Various generalizations of these models have been developed to allow for unobserved heterogeneity in taste parameters andchoice characteristics. Here we investigate how rich a specification of the unobserved components is needed to rationalize arbitrary choice patterns in settings with many individual decision makers, multiple markets, and large choice sets. We find that if one restricts the utility function to be monotone in the unobserved choice characteristics, then up to two unobserved choice characteristics may be needed to rationalize the choices.

Imbens G, Chernozhukov V, Newey W. **Instrumental Variable Estimation of Nonseparable Models**. *Journal of Econometrics*, 2007;139(1):4-14. Published Paper

There are many environments where knowledge of a structural relationship is required to answer questions of interest. Also, nonseparability of a structural disturbance is a key feature of many models. Here, we consider nonparametric identification and estimation of a model that is monotonic in a nonseparable scalar disturbance, which disturbance is independent of instruments. This model leads to conditional quantile restrictions. We give local identification conditions for the structural equations from those quantile restrictions. We find that a modified completeness condition is sufficient for local identification. We also consider estimation via a nonparametric minimum distance estimator. The estimator minimizes the sum of squares of predicted values from a nonparametric regression of the quantile residual on the instruments. We show consistency of this estimator.

Imbens G, Blundell R, Newey W, Persson T. **Nonadditive Models with Endogenous Regressors**. *In: Advances in Economics and Econometrics, Ninth World Congress of the Econometric Society*, Vol.III. ; 2007. p. Chapter 2. Published Paper | Working Paper

In the last fifteen years there has been much work on nonparametric identification of causal effects in settings with endogeneity. Earlier, researchers focused on linear systems with additive residuals. However, such systems are difficult to motivate by economic theory. In many cases it is precisely the nonlinearity of the system and the presence of unobserved heterogeneity in returns (and thus non-additivity in the residuals) that leads to the type of endogeneity problems that economists are concerned with. In the more recent literature researchers have attempted to characterize conditions for identification that do not rely on such functional form or homogeneity assumptions, instead relying on assumptions that are more tightly linked to economic theory. Such assumptions often include exclusion and monotonicity restrictions and (conditional) independence assumptions. In this paper I will discuss part of this literature. I will focus on a two-equation triangular (recursive) system of simultaneous equations with a single endogenous regressor and a single instrument, with the main interest in the outcome equation relating the outcome to the (endogenous) regressor of interest. The discussion will include settings with binary, continuous, and discrete regressors.

## 2006

Imbens G, Hotz JV, Klerman J. **Evaluating the Differential Effects of Alternative Welfare-to-Work Training Components: A Re-Analysis of the California GAIN Program**. *Journal of Labor Economics*, 2006;24(3):521-566. Published Paper | Working Paper

In this paper, we explore ways of combining experimental data and non-experimental methods to estimate the differential effects of components of training programs. We show how data from a multi-site experimental evaluation in which subjects are randomly assigned to any treatment versus a control group who receives no treatment can be combined with non-experimental regression-adjustment methods to estimate the differential effects of particular types of treatments. We also devise tests of the validity of using the latter methods. We use these methods and tests to re-analyze data from the MDRC Evaluation of California%u2019s Greater Avenues to Independence (GAIN) program. While not designed to estimate the differential effects of the Labor Force Attachment (LFA) training and Human Capital Development (HCD) training components used in this program, we show how data from this experimental evaluation can be used in conjunction with non-experimental methods to estimate such effects. We present estimates of both the short- and long-term differential effects of these two training components on employment and earnings. We find that while there are short-term positive differential effects of LFA versus HCD, the latter training component is relatively more beneficial in the longer-term.

Imbens G, Athey S. **Identification and Inference in Nonlinear Difference-In-Differences Models**. *Econometrica*, 2006;74(2):431-497. Published Paper | Working Paper

This paper develops an alternative approach to the widely used Difference-In-Difference (DID) method for evaluating the effects of policy changes. In contrast to the standard approach, we introduce a nonlinear model that permits changes over time in the effect of unobservables (e.g., there may be a time trend in the level of wages as well as the returns to skill in the labor market). Further, our assumptions are independent of the scaling of the outcome. Our approach provides an estimate of the entire counterfactual distribution of outcomes that would have been experienced by the treatment group in the absence of the treatment, and likewise for the untreated group in the presence of the treatment. Thus, it enables the evaluation of policy interventions according to criteria such as a mean-variance tradeoff. We provide conditions under which the model is nonparametrically identified and propose an estimator. We consider extensions to allow for covariates and discrete dependent variables. We also analyze inference, showing that our estimator is root-N consistent and asymptotically normal. Finally, we consider an application.

Imbens G, Abadie A. **Large Sample Properties of Matching Estimators for Average Treatment Effects**. *Econometrica*, 2006;74(1):235-267. Published Paper | Working Paper

Matching estimators for average treatment effects are widely used in evaluation research despite the fact that their large sample properties have not been established in many cases. The absence of formal results in this area may be partly due to the fact that standard asymptotic expansions do not apply to matching estimators with a fixed number of matches because such estimators are highly nonsmooth functionals of the data. In this article we develop new methods for analyzing the large sample properties of matching estimators and establish a number of new results. We focus on matching with replacement with a fixed number of matches. First, we show that matching estimators are not N1/2-consistent in general and describe conditions under which matching estimators do attain N1/2-consistency. Second, we show that even in settings where matching estimators are N1/2-consistent, simple matching estimators with a fixed number of matches do not attain the semiparametric efficiency bound. Third, we provide a consistent estimator for the large sample variance that does not require consistent nonparametric estimation of unknown functions. Software for implementing these methods is available in Matlab, Stata, and R.

Imbens G, Lynch L. **Re-employment Probabilities Over the Business Cycle**. *Portuguese Economic Journal*, 2006;5(2):111-134. Published Paper | Working Paper

Using a Cox proportional hazard model that allows for a flexible time dependence in order to incorporate business cycle effects, we analyze the determinants of reemployment probabilities of young workers in the U.S. from 1978-1989. We find considerable changes in the chances of young workers finding jobs over the business cycle despite the fact that personal characteristics of those starting jobless spells do not vary much over time. Therefore, government programs that target specific demographic groups may change individuals’ positions within the queue of job seekers, but may only have a more limited impact on average re-employment probabilities. Living in an area with high local unemployment reduces re-employment chances as does being in a long spell of nonemployment. However, the damage associated with being in a long spell seems to be reduced somewhat if a worker is unemployed in an area with high overall unemployment.

## 2005

Imbens G, Porter J. **Bias-adjusted Nearest Neighbor Estimation for the Partial Linear Model**, 2005. Working Paper

In semiparametric models estimation methods interest is often in the finite dimensional parameter, with the nonparametric component a nuisance function. In many examples, including Robinson’s partial linear model and the estimation of average treatment effects, the nuisance function is a conditional expectation. For the large sample properties of the estimators of the parameters of interest it is typically important that the estimators for these nuisance functions satisfy certain bias and variance properties. Estimators that have been used in these settings include series estimators and higher order kernel methods. In both cases the smoothing parameters have to be choosen in a sample-size dependent manner. On the other hand, nearest neighbor methods with a fixed number of neighbours do not rely on sample size dependent smoothing parameters, but they often violate the conditions on the rate of the bias unless the covariates in the regression are of very low dimension. In many cases only scalar covariates are allowed. In this paper we develop an alternative method for estimating the unknown regression functions that, like nearest neighbor methods, does not rely on sample-size dependent smoothing parameters, but that, like the series and higher order kernel methods, does not suffer from bias-rate problems. We do so by combining nearest neighbor methods with local polynomial regression using a fixed number of neighbors.

Imbens G, Graham B, Ridder G. **Measuring the Average Outcome and Inequality Effects of Segregation in the Presence of Social Spillovers**, 2005. Working Paper

In this paper we provide a nonparametric treatment of identification in models with social spillovers. We consider a setting with ‘high’ and ‘low’ type individuals. Individual outcomes depend upon the fraction of high types in one’s group. We refer to this dependence as a social spillover or peer group effect. We define estimands measuring local and global spillover strength as well as the outcome and inequality effects of increasing segregation (by type) across groups. We relate our estimands to the theory of sorting in the presence of social externalities.

Imbens G, Spady R. **The Performance of Empirical Likelihood and its Generalizations**. *Identification and Inference for Econometric Models, Essays in Honor of Thomas Rothenberg*, Cambridge University Press, 2005. Published Chapter

We calculate higher-order asymptotic biases and mean-squared errors (MSE) for a simple model with a sequence of moment conditions. In this setup, generalized empirical likelihood (GEL) and infeasible optimal GMM (OGMM) have the same higher-order biases, with GEL apparently having an MSE that exceeds OGMM’s by an additional term of order (M – 1)/N, i.e. the degree of overidentification divided by sample size. In contrast, any two-step GMM estimator has an additional bias relative to OGMM of order (M – 1)/N and an additional MSE of order (M-1)^2/N. Consequently, GEL must be expected to dominate two-step GMM. In our simple model all GEL’s have equivalent next higher order behavior because generalized third moments of moment conditions are assumed to be zero; we explore in further analysis and simulations the implications of dropping this assumption.

Imbens G, Hotz J, Mortimer J. **Predicting the Efficacy of Future Training Programs Using Past Experiences at Other Locations.** *Journal of Econometrics*, 2005;125(1-2):241-270. Published Paper | Working Paper

We investigate the problem of predicting the average effect of a new training program using experiences with previous implementations. There are two principal complications in doing so. First, the population in which the new program will be implemented may differ from the population in which the old program was implemented. Second, the two programs may differ in the mix of their components. With sufficient detail on characteristics of the two populations and sufficient overlap in their distributions, one may be able to adjust for differences due to the first complication. Dealing with the second difficulty requires data on the exact treatments the individuals received. However even in the presence of differences in the mix of components across training programs comparisons of controls in both populations who were excluded from participating in any of the programs should not be affected. To investigate the empirical importance of these issues, we compare four job training pro-grams implemented in the mid-eighties in different parts of the U.S. We find that adjusting for pre-training earnings and individual characteristics removes most of the differences between control units, but that even after such adjustments, post-training earnings for trainees are not comparable. We surmise that differences in treatment components across training programs are the likely cause, and that more details on the specific services provided by these programs are necessary to predict the effect of future programs. We also conclude that effect heterogeneity, it is essential, even in experimental evaluations of training programs record pre-training earnings and individual characteristics in order to render the extrapolation of the results to different locations more credible.

Imbens G, Rosenbaum P. **Randomization Inference with an Instrumental Variable**. *Journal of the Royal Statistical Society*, Series A. 2005;168(1):109-126. Published Paper

An instrument or instrumental variable manipulates a treatment and affects the outcome only indirectly through its manipulation of the treatment. For instance, encouragement to exercise might increase cardiovascular fitness, but only indirectly to the extent that it increases exercise. If instrument levels are randomly assigned to individuals, then the instrument may permit consistent estimation of the effects caused by the treatment, even though the treatment assignment itself is far from random. For instance, one can conduct a randomized experiment assigning some subjects to ‘encouragement to exercise’ and others to ‘no encouragement’ but, for reasons of habit or taste, some subjects will not exercise when encouraged and others will exercise without encouragement; none-the-less, such an instrument aids in estimating the effect of exercise. Instruments that are weak, i.e. instruments that have only a slight effect on the treatment, present inferential problems. We evaluate a recent proposal for permutation inference with an instrumental variable in four ways: using Angrist and Krueger’s data on the effects of education on earnings using quarter of birth as an instrument, following Bound, Jaeger and Baker in using simulated independent observations in place of the instrument in Angrist and Krueger’s data, using entirely simulated data in which correct answers are known and finally using statistical theory to show that only permutation inferences maintain correct coverage rates. The permutation inferences perform well in both easy and hard cases, with weak instruments, as well as with long-tailed responses.

## 2004

Imbens G, Mealli F, Ferro M, Biggeri M. **Analyzing a Randomized Trial on Breast Self-examination with Noncompliance and Missing Outcomes**. *Biostatistics*, 2004;5(2):2007-2222. Published Paper

Recently, instrumental variables methods have been used to address non-compliance in randomized experiments. Complicating such analyses is often the presence of missing data. The standard model for missing data, missing at random (MAR), has some unattractive features in this context. In this paper we compare MAR-based estimates of the complier average causal effect (CACE) with an estimator based on an alternative, nonignorable model for the missing data process, developed by Frangakis and Rubin (1999, Biometrika, 86, 365-379). We also introduce a new missing data model that, like the Frangakis-Rubin model, is specially suited for models with instrumental variables, but makes different substantive assumptions. We analyze these issues in the context of a randomized trial of breast self-examination (BSE). In the study two methods of teaching BSE, consisting of either mailed information about BSE (the standard treatment) or the attendance of a course involving theoretical and practical sessions (the new treatment), were compared with the aim of assessing whether teaching programs could increase BSE practice and improve examination skills. The study was affected by the two sources of bias mentioned above: only 55% of women assigned to receive the new treatment complied with their assignment and 35% of the women did not respond to the post-test questionnaire. Comparing the causal estimand of the new treatment using the MAR, Frangakis-Rubin, and our new approach, the results suggest that for these data the MAR assumption appears least plausible, and that the new model appears most plausible among the three choices.

Imbens G, Manski C. **Confidence Intervals for Partially Identified Parameters**. *Econometrica*, 2004;72(6):1845-1857 Published Paper

In the last decade a growing body of research has studied inference on partially identified parameters (e.g., Manski, 1990, 2003). In many cases where the parameter of interest is real valued, the identification region is an interval whose lower and upper bounds may be estimated from sample data. Confidence intervals may be constructed to take account of the sampling variation in estimates of these bounds. Horowitz and Manski (1998, 2000) proposed and applied interval estimates that asymptotically cover the entire identification region with fixed probability. Here we introduce conceptually different interval estimates that asymptotically cover each element in the identification region with fixed probability (but not necessarily every element simultaneously). We show that these two types of interval estimate are different in practice, the latter in general being shorter. The difference in length (in excess of the length of the identification set itself) can be substantial, and in large samples is comparable to the difference of one — and two—sided confidence intervals. A complication arises from the fact that the simplest version of the proposed interval is discontinuous in the limit case of point identification, leading to coverage rates that are not uniform in important subsets of the parameter space. We develop a modification depending on the width of the identification region that restores uniformity. We show that under some conditions, using the estimated width of the identification region instead of the true width maintains uniformity.

Imbens G, Abadie A, Drukker D, Herr J. **Implementing Matching Estimators for Average Treatment Effects in Stata.** *The STATA Journal*, 2004;4(3):290-311. Published Paper | Working Paper

This paper presents an implementation of matching estimators for average treatment effects in Stata. The nnmatch command allows you to estimate the average effect for all units or only for the treated or control units; to choose the number of matches; to specify the distance metric; to select a bias adjustment;and to use heteroskedastic-robust variance estimators.

Imbens G. **Nonparametric Estimation of Average Treatment Effects under Exogeneity: A Review.** *Review of Economics and Statistics*, 2004;86(1):4-29. Published Paper | Working Paper

Recently there has been a surge in econometric work focusing on estimating average treatment effects under various sets of assumptions. One strand of this literature has developed methods for estimating average treatment effects for a binary treatment under assumptions variously described as exogeneity, unconfoundedness, or selection on observables. The implication of these assumptions is that systematic (e.g., average or distributional) differences in outcomes between treated and control units with the same values for the covariates are attributable to thetreatment. Recent analysis has considered estimation and inference for average treatment effects under weaker assumptions than typical of the earlier literature by avoiding distributional and functional form assumptions. Various methods of semiparametric estimation have been proposed, including estimating the unknown regression functions, matching, methods using the propensity score such as weighting and blocking, and combinations of these approaches. In this paper I review the state of this literature and discuss some of its unanswered questions, focusing in particular on the practical implementation of these methods, the plausibility of this exogeneity assumption in economic applications, the relative performance of the various semiparametric estimators when the key assumptions (unconfoundedness and overlap) are satisfied, alternative estimands such as quantile treatment effects, and alternate methods such as Bayesian inference.

Imbens G, Hirano K. **The Propensity Score with Continuous Treatments**. *Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives.* 2004. Published Chapter

Much of the work on propensity score analysis has focused on the case where the treatment is binary. In this chapter we examine an extension to the propensity score method, in a setting with a continuous treatment. Following Rosenbaum and Rubin (1983) and most of the other literature on propensity score analysis, we make an unconfoundedness or ignorability assumption, that adjusting for differences in a set of covariates removes all biases in comparisons by treatment status. Then, building on Imbens (2000) we define a generalization of the binary treatment propensity score, which we label the generalized propensity score (GPS). We demonstrate that the GPS has many of the attractive properties of the binary treatment propensity score. Just as in the binary treatment case, adjusting for this scalar function of the covariates removes all biases associated with differences in the covariates. The GPS also has certain balancing properties that can be used to assess the adequacy of particular specifications of the score. We discuss estimation and inference in a parametric version of this procedure, although more flexible approaches are also possible.

Imbens G, Chamberlain G. **Random Effects Estimators with Many Instrumental Variables.** *Econometrica*, 2004;72(1):295-306. Published Paper | Working Paper

In this paper we propose a new estimator for a model with one endogenous regressor and many instrumental variables. Our motivation comes from the recent literature on the poor properties of standard instrumental variables estimators when the instrumental variables are weakly correlated with the endogenous regressor. Our proposed estimator puts a random coefficients structure on the relation between the endogenous regressor and the instruments. The variance of the random coefficients is modelled as an unknown parameter. In addition to proposing a new estimator, our analysis yields new insights into the properties of the standard two-stage least squares (TSLS) and limited-information maximum likelihood (LIML) estimators in the case with many weak instruments. We show that in some interesting cases, TSLS and LIML can be approximated by maximizing the random effects likelihood subject to particular constraints. We show that statistics based on comparisons of the unconstrained estimates of these parameters to the implicit TSLS and LIML restrictions can be used to identify settings when standard large sample approximations to the distributions of TSLS and LIML are likely to perform poorly. We also show that with many weak instruments, LIML confidence intervals are likely to have under-coverage, even though its finite sample distribution is approximately centered at the true value of the parameter. In an application with real data and simulations around this data set, the proposed estimator performs markedly better than TSLS and LIML, both in terms of coverage rate and in terms of risk.

## 2003

Imbens G, Hirano K, Ridder G. **Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score.** *Econometrica*, 2003;71(4):1161-1189. Published Paper | Working Paper

We are interested in estimating the average effect of a binary treatment on a scalar outcome. If assignment to the treatment is exogenous or unconfounded, that is, independent of the potential outcomes given covariates, biases associated with simple treatment-control average comparisons can be removed by adjusting for differences in the covariates. Rosenbaum and Rubin (1983) show that adjusting solely for differences between treated and control units in the propensity score removes all biases associated with differences in covariates. Although adjusting for differences in the propensity score removes all the bias, this can come at the expense of efficiency, as shown by Hahn (1998), Heckman, Ichimura, and Todd (1998), and Robins, Mark, and Newey (1992). We show that weighting by the inverse of a nonparametric estimate of the propensity score, rather than the true propensity score, leads to an efficient estimate of the average treatment effect. We provide intuition for this result by showing that this estimator can be interpreted as an empirical likelihood estimator that efficiently incorporates the information about the propensity score.

Imbens G, Donald S, Newey W. **Empirical Likelihood Estimation and Consistent Tests with Conditional Moment Restrictions.** *Journal of Econometrics*, 2003;117(1):55-93. Published Paper | Working Paper

This paper is about efficient estimation and consistent tests of conditional moment restrictions. We use unconditional moment restrictions based on splines or other approximating functions for this purpose. Empirical likelihood estimation is particularly appropriate for this setting, because of its relatively low bias with many moment conditions. We give conditions so that efficiency of estimators and consistency of tests is achieved as the number of restrictions grows with the sample size. We also give results for generalized empirical likelihood, generalized method of moments, and nonlinear instrumental variable estimators.

Imbens G, Chamberlain G. **Nonparametric Application of Bayesian Inference.** *Journal of Business and Economic Statistics*. 2003;21(1):12-18. Published Paper | Working Paper

The paper evaluates the usefulness of a nonparametric approach to Bayesian inference by presenting two applications. The approach is due to Ferguson (1973, 1974) and Rubin (1981). Our first application considers an educational choice problem. We focus on obtaining a predictive distribution for earnings corresponding to various levels of schooling. This predictive distribution incorporates the parameter uncertainty, so that it is relevant for decision making under uncertainty in the expected utility framework of microeconomics. The second application is to quantile regression. Our point here is to examine the potential of the nonparametric framework to provide inferences without making asymptotic approximations. Unlike in the first application, the standard asymptotic normal approximation turns out to not be a good guide. We also consider a comparison with a bootstrap approach.

Imbens G. **Sensitivity to Exogeneity Assumptions in Program Evaluation.** *The American Economic Review*, 2003;93(2):126-132. Published Paper

In many empirical studies of the effect of social programs researchers assume that, conditional on a set of observed covariates, assignment to the treatment is exogenous or unconfounded (aka selection on observables). Often this assumption is not realistic, and researchers are concerned about the robustness of their results to departures from it. One approach (e.g., Charles Manski, 1990) is to entirely drop the exogeneity assumption and investigate what can be learned about treatment effects without it. With unbounded outcomes, and in the absence of alternative identifying assumptions, there are no restrictions on the set of possible values for average treatment effects. This does not mean, however, that all evaluations are equally sensitive to departures from the exogeneity assumption. In this paper I explore an alternative approach, developed by Paul Rosenbaum and Donald Rubin (1983), where the assumption of exogeneity is explicitly relaxed by allowing for a limited amount of correlation between treatment and unobserved components of the outcomes.

## 2002

Imbens G, Angrist J. **Comment on: ‘Covariance Adjustment in Randomized Experiments and Observational Studies’, by Paul Rosenbaum.** *Statistical Science*, 2002;17(3):304-307. Published Paper

Paul Rosenbaum has been an articulate and tireless advocate of randomization inference (RI) as a “reasoned basis for inference” when assessing treatment effects. In this paper and previous work he has extended the scope for RI beyond the traditional field of randomized trials into the much messier world of observational studies. The current paper provides a characteristically lucid discussion of the use of RI in observational studies, where the possibility of overt biases commonly motivates covariance adjustment. The paper discusses an approach based on propensity-score style conditioning on sufficient statistics, incorporates regression adjustment into an RI framework and offers an extension to research designs involving instrumental variables (IV). An especially interesting feature of his discussion of IV is the link to the recent literature on weak instruments, where standard inference based on normal approximation to sampling distributions is often inaccurate. Rosenbaum also discusses the use of sensitivity analyses.

Imbens G, Spady R. **Confidence Intervals in Generalized Method of Moments Models.** *Journal of Econometrics*, 2002;107:87-98. Published Paper

We consider the construction of confidence intervals for parameters characterized by moment restrictions. In the standard approach to generalized method of moments (GMM) estimation, confidence intervals are based on the normal approximation to the sampling distribution of the parameters. There is often considerable disagreement between the nominal and actual coverage rates of these intervals, especially in cases with a large degree of overidentification. We consider alternative confidence intervals based on empirical likelihood methods which exploit the normal approximation to the Lagrange multipliers calculated as a byproduct in empirical likelihood estimation. In large samples such confidence intervals are identical to the standard GMM ones, but in finite samples their properties can be substantially different. In some of the examples we consider, the proposed confidence intervals have coverage rates much closer to the nominal coverage rates than the corresponding GMM intervals.

Imbens G. **Generalized method of moments and empirical likelihood.** *Journal of Business and Economic Statistics*, 2002;20(4):493-506. Published Paper

Generalized method of moments (GMM) estimation has become an important unifying framework for inference in econometrics in the last 20 years. It can be thought of as encompassing almost all of the common estimation methods, such as maximum likelihood, ordinary least squares, instrumental variables, and two-stage least squares, and nowadays is an important part of all advanced econometrics textbooks. The GMM approach links nicely to economic theory where orthogonality conditions that can serve as such moment functions often arise from optimizing behavior of agents. Much work has been done on these methods since the seminal article by Hansen, and much remains in progress. This article discusses some of the developments since Hansen’s original work. In particular, it focuses on some of the recent work on empirical likelihood-type estimators, which circumvent the need for a first step in which the optimal weight matrix is estimated and have attractive information theoretic interpretations.

Imbens G, Abadie A, Angrist J. **Instrumental variables estimates of the effect of subsidized training on the quantiles of trainee earnings**. *Econometrica*, 2002;70(1):91-117. Published Paper | Working Paper

This paper reports estimates of the effects of JTPA training programs on the distribution of earnings. The estimation uses a new instrumental variable (IV) method that measures program impacts on quantiles. The quantile treatment effects (QTE) estimator reduces to quantile regression when selection for treatment is exogenously determined. QTE can be computed as the solution to a convex linear programming problem, although this requires first-step estimation of a nuisance function. We develop distribution theory for the case where the first step is estimated nonparametrically. For women, the empirical results show that the JTPA program had the largest proportional impact at low quantiles. Perhaps surprisingly, however, JTPA training raised the quantiles of earnings for men only in the upper half of the trainee earnings distribution.

## 2001

Imbens G, Hyslop D. **Bias from Classical and Other Forms of Measurement Error.** *Journal of Business and Economic Statistics*, 2001;19(October, 4):475-481. Published Paper | Working Paper

We consider the implications of a specific alternative to the classical measurement error model, in which the data are optimal predictions based on some information set. One motivation for this model is that if respondents are aware of their ignorance they may interpret the question ‘what is the value of this variable?’ as what is your best estimate of this variable?’, and provide optimal predictions of the variable of interest given their information set. In contrast to the classical measurement error model, this model implies that the measurement error is uncorrelated with the reported value and, by necessity, correlated with the true value of the variable. In the context of the linear regression framework, we show that measurement error can lead to over- as well as under-estimation of the coefficients of interest. Critical for determining the bias is the model for the individual reporting the mismeasured variables, the individual’s information set, and the correlation structure of the errors. We also investigate the implications of instrumental variables methods in the presence of measurement error of the optimal prediction error form and show that such methods may in fact introduce bias. Finally, we present some calculations indicating that the range of estimates of the returns to education consistent with amounts of measurement error found in previous studies. This range can be quite wide, especially if one allows for correlation between the measurement errors.

Imbens G, Hirano K, Ridder G, Rubin D. **Combining Panel Data Sets with Attrition and Refreshment Samples.** *Econometric*, 2001;69(6):1645-1659. Published Paper | Working Paper

In many fields researchers wish to consider statistical models that allow for more complex relationships than can be inferred using only cross-sectional data. Panel or longitudinal data where the same units are observed repeatedly at different points in time can often provide the richer data needed for such models. Although such data allows researchers to identify more complex models than cross-sectional data, missing data problems can be more severe in panels. In particular, even units who respond in initial waves of the panel may drop out in subsequent waves, so that the subsample with complete data for all waves of the panel can be less representative of the population than the original sample. Sometimes, in the hope of mitigating the effects of attrition without losing the advantages of panel data over cross-sections, panel data sets are augmented by replacing units who have dropped out with new units randomly sampled from the original population. Following Ridder (1992), who used these replacement units to test some models for attrition, we call such additional samples refreshment samples. We explore the benefits of these samples for estimating models of attrition. We describe the manner in which the presence of refreshment samples allows the researcher to test various models for attrition in panel data, including models based on the assumption that missing data are missing at random (MAR, Rubin, 1976; Little and Rubin, 1987). The main result in the paper makes precise the extent to which refreshment samples are informative about the attrition process; a class of non-ignorable missing data models can be identified without making strong distributional or functional form assumptions if refreshment samples are available.

Imbens G. **Comment on: “Estimation of Limited-Dependent Variable Models with Dummy Endogenous Regressors: Simple Strategies for Empirical Practice’, by Joshua Angrist.** *Journal of Business and Economic Statistics*, 2001;19(1):17-20. Published Paper

Applied economists have long struggled with the question of how to accommodate binary endogenous regressors in models with binary and non-negative outcomes. I argue here that much of the difficulty with limited-dependent variables comes from a focus on structural parameters, such as index coefficients, instead of causal effects. Once the object of estimation is taken to be the causal effect of treatment, a number of simple strategies is available. These include conventional two-stage least squares, multiplicative models for conditional means, linear approximation of nonlinear causal models, models for distribution effects, and quantile regression with an endogenous binary regressor. The estimation strategies discussed in the paper are illustrated by using multiple births to estimate the effect of childbearing on employment status and hours of work.

Imbens G, Rubin D, Sacerdote B. **Estimating the Effect of Unearned Income on Labor Supply, Earnings, Savings and Consumption: Evidence from a Survey of Lottery Players**. *The American Economic Review*, 2001;91(4):778-794. Published Paper | Working Paper

Knowledge of the effect of unearned income on economic behavior of individuals in general, and on labor supply in particular, is of great importance to policy makers. Estimation of income effects, however, is a difficult problem because income is not randomly assigned and exogenous changes in income are difficult to identify. Here we exploit the randomized assignment of large amounts of money over long periods of time through lotteries. We carried out a survey of people who played the lottery in the mid-eighties and estimate the effect of lottery winnings on their subsequent earnings, labor supply, consumption, and savings. We find that winning a modest prize ($15,000 per year for twenty years) does not affect labor supply or earnings substantially. Winning such a prize does not considerably reduce savings. Winning a much larger prize ($80,000 rather than $15,000 per year) reduces labor supply as measured by hours, as well as participation and social security earnings; elasticities for hours and earnings are around -0.20 and for participation around -0.14. Winning a large versus modest amount also leads to increased expenditures on cars and larger home values, although mortgages values appear to increase by approximately the same amount. Winning $80,000 increases overall savings, although savings in retirement accounts are not significantly affected. The results do not vary much by gender, age, or prior employment status. There is some evidence that for those with zero earnings prior to winning the lottery there is a positive effect of winning a small prize on subsequent labor market participation.

Hirano K. **Estimation of Causal Effects Using Propensity Score Weighting: An Application to Data on Right Hear Catherization**. *Health Services and Outcome Research Methodology*, 2001. Published Paper

We consider methods for estimating causal effects of treatments when treatment assignment is unconfounded with outcomes conditional on a possibly large set of covariates. Robins and Rotnitzky (1995) suggested combining regression adjustment with weighting based on the propensity score (Rosenbaum and Rubin, 1983). We adopt this approach, allowing for a flexible specification of both the propensity score and the regression function. We apply these methods to data on the effects of right heart catheterization (RHC) studied in Connors et al (1996), and we find that our estimator gives stable estimates over a wide range of values for the two parameters governing the selection of variables.

Imbens G. **Some Remarks on Instrumental Variables**. In: Lechner, Pfeiffer. *Econometric Evaluations of Labour Market Policies*. Spring Verlag; 2001. Published Paper

There has been much work on identification and inference with instrumental variables in the last decade. Researchers have investigated conditions for identification of causal effects without normality, linearity, and additivity assumptions. In this discussion, I will comment on some of the new results in this area and discuss some implications for applied researchers in the context of some specific examples, focusing on identification rather than inference. Most of the comments will be limited to the case with a binary endogenous regressor.

## 2000

Imbens G, Hirano K, Rubin D, Zhou X-H. **Assessing the Effect of an influenza vaccine in an Encouragement Design**. *Biostatistics*, 2000;1(1):69-88. Published Paper | Working Paper

Many randomized experiments suffer from noncompliance. Some of these experiments, so-called encouragement designs, can be expected to have especially large amounts of noncompliance, because encouragement to take the treatment rather than the treatment itself is randomly assigned to individuals. We present an extended framework for the analysis of data from such experiments with a binary treatment, binary encouragement, and background covariates. There are two key features of this framework: we use an instrumental variables approach to link intention-to-treat effects to treatment effects and we adopt a Bayesian approach for inference and sensitivity analysis. This framework is illustrated in a medical example concerning the effects of inoculation for influenza. In this example, the analyses suggest that positive estimates of the intention-to-treat effect need not be due to the treatment itself, but rather to the encouragement to take the treatment: the intention-to-treat effect for the subpopulation who would be inoculated whether or not encouraged is estimated to be approximately as large as the intention-to-treat effect for the subpopulation whose inoculation status would agree with their (randomized) encouragement status whether or not encouraged. Thus, our methods suggest that global intention-to-treat estimates, although often regarded as conservative, can be too coarse and even misleading when taken as summarizing the evidence in the data for the effects of treatments.

Imbens G, Angrist J, Graddy K. **The Interpretation of Instrumental Variables Estimators in Simultaneous Equations Models with an Application to the Demand for Fish**. *Review of Economic Studies*, 2000;67, July:499-527. Published Paper | Working Paper

In markets where prices are determined by the intersection of supply and demand curves, standard identification results require the presence of instruments that shift one curve but not the other. These results are typically presented in the context of linear models with fixed coefficients and additive residuals. The first contribution of this paper is an investigation of the consequences of relaxing both the linearity and the additivity assumption for the interpretation of linear instrumental variables estimators. Without these assumptions, the standard linear instrumental variables estimator identifies a weighted average of the derivative of the behavioural relationship of interest. A second contribution is the formulation of critical identifying assumptions in terms of demand and supply at different prices and instruments, rather than in terms of functional-form specific residuals. Our approach to the simultaneous equations problem and the average-derivative interpretation of instrumental variables estimates is illustrated by estimating the demand for fresh whiting at the Fulton fish market. Strong and credible instruments for identification of this demand function are available in the form of weather conditions at sea.

Imbens G. **The Role of the Propensity Score in Estimating Dose-Response Functions**. *Biometrika*, 2000;87(3):706-710. Published Paper | Working Paper

Estimation of average treatment effects in observational, or non-experimental in pre-treatment variables. If the number of pre-treatment variables is large, and their distribution varies substantially with treatment status, standard adjustment methods such as covariance adjustment are often inadequate. Rosenbaum and Rubin (1983) propose an alternative method for adjusting for pre-treatment variables based on the propensity score conditional probability of receiving the treatment given pre-treatment variables. They demonstrate that adjusting solely for the propensity score removes all the bias associated with differences in pre-treatment variables between treatment and control groups. The Rosenbaum-Rubin proposals deal exclusively with the case where treatment takes on only two values. In this paper an extension of this methodology is proposed that allows for estimation of average causal effects with multi-valued treatments while maintaining the advantages of the propensity score approach.

## 1999

Imbens G, Angrist J. **Comment on James Heckman, “Instrumental Variables: A Study of Implicit Behavioral Assumption Used in Making Program Evaluations**. *The Journal of Human Resources*, 1999;34(Autumn, 1999):823-827. Published Paper

In a recent paper in this journal, Heckman discussed the use of instrumental variables methods in evaluation research and our local average treatment effects (LATE) interpretation of instrumental variables estimates. This comment provides additional background for Heckman’s paper, and a review of our rationale for focusing on LATE. We also show that a set of assumptions proposed by Heckman as an alternative to the LATE assumptions are not compatible with either latent-index assignment models or the definition we proposed for an instrument.

Imbens G, Hellerstein J. **Imposing Moment Restrictions by Weighting**. *Review of Economics and Statistics*, 1999;LXXXI:1-14. Published Paper | Working Paper

In this paper we analyze estimation of coefficients in regression models under moment restrictions where the moment restrictions are derived from auxiliary data. Our approach is similar to those that have been used in statistics for analyzing contingency tables with known marginals. These methods are useful in cases where data from a small, potentially non-representative data set can be supplemented with auxiliary information from another data set which may be larger and/or more representative of the target population. The moment restrictions yield weights for each observation that can subsequently be used in weighted regression analysis. We discuss the interpretation of these weights both under the assumption that the target population and the sampled population are the same, as well as under the assumption that these populations differ. We present an application based on omitted ability bias in estimation of wage regressions. The National Longitudinal Survey Young Men’s Cohort (NLS), as well as containing information for each observation on earnings, education and experience, records data on two test scores that may be considered proxies for ability. The NLS is a small data set, however, with a high attrition rate. We investigate how to mitigate these problems in the NLS by forming moments from the joint distribution of education, experience and earnings in the 1% sample of the 1980 U.S. Census and using these moments to construct weights for weighted regression analysis of the NLS. We analyze the impacts of our weighted regression techniques on the estimated coefficients and standard errors on returns to education and experience in the NLS control- ling for ability, with and without assuming that the NLS and the Census samples are random samples from the same population.

Imbens G, Angrist J, Krueger A. **Jackknife Instrumental Variables Estimation.** *Journal of Applied Econometrics*, 1999;14(1):57-67. Published Paper | Working Paper

Two-stage-least-squares (2SLS) estimates are biased towards OLS estimates. This bias grows with the degree of over-identification and can generate highly misleading results. In this paper we propose two simple alternatives to 2SLS and limited-information-maximum-likelihood (LIML) estimators for models with more instruments than endogenous regressors. These estimators can be interpreted as instrumental variables procedures using an instrument that is independent of disturbances even in finite samples. Independence is achieved by using a `leave-one-out’ jackknife-type fitted value in place of the usual first-stage equation. The new estimators are first-order equivalent to 2SLS but with finite-sample properties superior to those of 2SLS and similar to LIML when there are many instruments. Moreover, the jackknife estimators appear to be less sensitive than LIML to deviations from the linear reduced form used in classical simultaneous equations models.

## 1998

Imbens G, Johnson P, Spady R. **Information Theoretic Approaches to Inference in Moment Condition Models**. *Econometrica*, 1998;66:333-357. Published Paper | Working Paper

One-step efficient GMM estimation has been developed in the recent papers of Back and Brown (1990), Imbens (1993) and Qin and Lawless (1994). These papers emphasized methods that correspond to using Owen’s (1988) method of empirical likelihood to reweight the data so that the reweighted sample obeys all the moment restrictions at the parameter estimates. In this paper we consider an alternative KLIC motivated weighting and show how it and similar discrete reweightings define a class of unconstrained optimization problems which includes GMM as a special case. Such KLIC-motivated reweightings introduce M auxiliary `tilting’ parameters, where M is the number of moments; parameter and overidentification hypotheses can be recast in terms of these tilting parameters. Such tests, when appropriately conditioned on the estimates of the original parameters, are often startlingly more effective than their conventional counterparts. This is apparently due to the local ancillarity of the original parameters for the tilting parameters.

## 1997

Imbens G, Rubin D. **Bayesian Inference for Causal E.ects in Randomized Experiments with Noncompliance.** *Annals of Statistics*, 1997;25(1):305-327. Published Paper

For most of this century, randomization has been a cornerstone of scientific experimentation, especially when dealing with humans as experimental units. In practice, however, noncompliance is relatively common with human subjects, complicating traditional theories of inference that require adherence to the random treatment assignment. In this paper we present Bayesian inferential methods for causal estimands in the presence of noncompliance, when the binary treatment assignment is random and hence ignorable, but the binary treatment received is not ignorable. We assume that both the treatment assigned and the treatment received are observed. We describe posterior estimation using EM and data augmentation algorithms. Also, we investigate the role of two assumptions often made in econometric instrumental variables analyses, the exclusion restriction and the monotonicity assumption, without which the likelihood functions generally have substantial regions of maxima. We apply our procedures to real and artificial data, thereby demonstrating the technology and showing that our new methods can yield valid inferences that differ in practically important ways from those based on previous methods for analysis in the presence of noncompliance, including intention-to-treat analyses and analyses based on econometric instrumental variables techniques. Finally, we perform a simulation to investigate the operating characteristics of the competing procedures in a simple setting, which indicates relatively dramatic improvements in frequency operating characteristics attainable using our Bayesian procedures.

Imbens G. **Book review of “The Foundations of Econometric Analysis” by David Hendry and Mary Morgan**. *Journal of Applied Econometrics*, 1997;12:91-94. Published Review

Imbens G, Rubin D. **Estimating Outcome Distributions for Compliers in Instrumental Variable Models**. *Review of Economic Studies*, 1997;(October). Published Paper

In Imbens and Ingrist (1994), Angrist, Imbens and Rubin (1996) and Imbens and Rubin (1997),assumptions have been outlined under which instrumental variables estimands can be given a causal interpretation as a local average treatment effect without requiring functional form or constant treatment effect assumptions. We extend these results by showing that under these assumptions one can estimate more from the data than the average causal effect for the subpopulation of compliers; one can, in principle, estimate the entire marginal distribution of the outcome under different treatments for this subpopulation. These distributions might be useful for a policy maker who wishes to take into account not only differences in average of earnings when contemplating the merits of one job training programme vs. another. We also show that the standard instrumental variables estimator implicitly estimates these underlying outcome distributions without imposing the required nonnegativity on these implicit density estimates, and that imposing non-negativity can substantially alter the estimates of the local average treatment effect. We illustrate these points by presenting an analysis of the returns to a high school education using quarter of birth as an instrument. We show that the standard instrumental variables estimates implicitly estimate the outcome distributions to be negative over a substantial range, and that the estimates of the local average treatment effect change considerably when we impose nonnegativity in any of a variety of ways.

Imbens G. **One-step Estimators for Over-identified Generalized Method of Moments Models**. *Review of Economic Studies*, 1997;(July). Published Paper

In this paper I discuss alternatives to the GMM estimators proposed by Hansen (1982) and others. These estimators are shown to have a number of advantages First of all, there is no need to estimate in an initial step a weight matrix as required in the conventional estimation procedure Second, it is straightforward to derive the distribution of the estimator under general misspecifications. Third, some of the alternative estimators have appealing information-theoretic interpretations In particular, one of the estimators is an empirical likelihood estimator with an interpretation as a discrete support maximum likelihood estimator. Fourth, in an empirical example one of the new estimators is shown to perform better than the conventional estimators. Finally, the new estimators make it easier for the researcher to get better approximations to their distributions using saddlepoint approximations. The main cost is computational. the system of equations that has to be solved is of greater dimension than the number of parameters of interest. In practice this may or may not be a problem in particular applications.

## 1996

Imbens G, Lancaster T. **Case-Control Studies with Contaminated Controls**. *Journal of Econometrics*, 1996;71(1-2):145-160. Published Paper | Working Paper

This paper considers inference about a parametric binary choice model when the data consist of two distinct samples. The first is a random sample from the people who made choice 1, say, with all relevant covariates completely observed. The second is a random sample from the whole population with only the covariates observed. This is called a contaminated sampling scheme. An example might be where we have a random sample of female labor force participants and their covariate values and a second random sample of working age women, with covariates, whose participant status is unknown. We consider the cases in which the fraction of the population making choice 1 is known and that in which it is not. For both cases we give semiparametrically efficient procedures for estimating the choice model parameters.

Imbens G, Lancaster T. **Efficient Estimation and Stratified Sampling**. *Journal of Econometrics*, 1996;74(2):289-318. Published Paper | Working Paper

In this paper we investigate estimation of a class of semi-parametric models. The part of the model that is not specified is the marginal distribution of the explanatory variables. The sampling is stratified on the dependent variables, implying that the explanatory variables are no longer exogenous or ancillary. We develop a new estimator for this estimation problem and show that it achieves the semi-parametric efficiency bound for this case. In addition we show that the estimator applies to a number of sampling schemes that have previously been treated separately.

Imbens G, Angrist J, Rubin D. **Identification of Causal effects Using Instrumental Variables**. *Journal of Econometrics*, 1996;71(1-2):145-160. Published Paper | Working Paper

We outline a framework for causa li nference i n settings w here assignment to a binary treatment is ignorable, but compliance with the assignment is not perfects o that t he receipt o f treatment is nonignorable. To address t he problems associated with comparing subjects by the ignorable assignment-an “intention-to-treat analysis”-we make u se of instrumental variables, which have long been used by economists in the context of regression models with constant treatment effects. We show that the instrumental variables (IV) estimand can be embedded within the Rubin Causal Model ( RCM) and that under some simple and easily interpretable assumptions, the IV estimand is the average causal effect for a subgroup of units, the compliers. Without these assumptions, the IV estimand is simply the ratio of intention-to-treat causal estimands with no interpretation as an average causal effect. The advantages of embedding the IV approach in the RCM are that it clarifies the nature of critical assumptions needed for a causal interpretation, and moreover allows us to consider sensitivity of the results to deviations from key assumptions in a straightforward manner. We apply our analysis to estimate the effect of veteran status in the Vietnam era on mortality, using the lottery number that assigned priority for the draft as an instrument, and we use our results to investigate the sensitivity of the conclusions to critical assumptions.

## 1995

Imbens G, Rubin D. **Discussion of: ‘Causal Diagrams for Empirical Research’ by J. Pearl**. Biometrika, 1995;82(4):694-695. Published Paper

Imbens G, VanderKlaauw W. **Evaluating the Cost of Conscription in The Netherlands**. *Journal of Business and Economic Statistics*, 1995;13(2):207–215. Published Paper | Working Paper

In this article we investigate the effect of military service in the Netherlands on future earnings. Estimating the cost or benefit of military service is complicated by the complex selection that determines who eventually serves in the military: On the one hand, potential conscripts have to pass medical and psychological examinations before entering the military, and on the other hand numerous (temporary) exemptions exist that can be manipulated by young men to avoid military service. We use substantial, policy-induced variation in aggregate military enrollment rates to deal with these selection issues. We find that approximately 10 years after serving in the military former conscripts have earnings that are on average 5% lower than the earnings of members of their birth cohort who did not serve in the military. These findings are shown to be robust against a variety of specifications.

Imbens G, Angrist J. **Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity**. *Journal of the American Statistical Association*, 1995;90(430):431-442. Published Paper

Two-stage least squares (TSLS) is widely used in econometrics to estimate parameters in systems of linear simultaneous equations and to solve problems of omitted-variables bias in single-equation estimation. We show here that TSLS can also be used to estimate the average causal effect of variable treatments such as drug dosage, hours of exam preparation, cigarette smoking, and years of schooling. The average causal effect in which we are interested is a conditional expectation of the difference between the outcomes of the treated and what these outcomes would have been in the absence of treatment. Given mild regularity assumptions, the probability limit of TSLS is a weighted average of per-unit average causal effects along the length of an appropriately defined causal response function. The weighting function is illustrated in an empirical example based on the relationship between schooling and earnings.

## 1994

Imbens G, Lancaster T. **Combining Micro and Macro Data in Microeconometric Models**. *Review of Economic Studies*, 1994;61(4):655-680. Published Paper

Census reports can be interpreted as providing nearly exact knowledge of moments of the marginal distribution of economic variables. This information can be combined with cross-sectional or panel samples to improve accuracy of estimation. In this paper, the authors show how to do this efficiently. They show that the gains from use of marginal information can be substantial. The authors also discuss how to test the compatibility of sample and marginal information. Copyright 1994 by The Review of Economic Studies Limited.

Imbens G, Angrist J. **Identification and Estimation of Local Average Treatment Effects**. *Econometrica*, 1994;62(2):467-476. Published Paper | Working Paper

We investigate conditions sufficient for identification of average treatment effects using instrumental variables. First we show that the existence of valid instruments is not sufficient to identify any meaningful average treatment effect. We then establish that the combination of an instrument and a condition on the relation between the instrument and the participation status is sufficient for identification of a local average treatment effect for those who can be induced to change their participation status by changing the value of the instrument. Finally we derive the probability limit of the standard IV estimator under these conditions. It is seen to be a weighted average of local average treatment effects.

Imbens G, Lancaster T. **Optimal Stock/Flow Panels**. *Journal of Econometrics*, 1994;66(182, December):1994. Published Paper

A stock/flow panel is a way of sampling a population of agents moving through a collection of discrete states. The scheme is to form separate samples of the residents of each state — the stocks — and of those moving between states — the flows. We calculate optimal stock/flow sampling schemes and provide efficient estimators of the transition intensities in the particular case of an alternating Poisson process. We also compute the efficiency gains compared to randomly sampled panels.

Imbens G. **Transition Models in a Non-Stationary Environment**. *Review of Economics and Statistics*, 1994;LXXVI(4):703-720. Published Paper

An alternative form of the proportional hazard model is proposed. It allows one to introduce correlation between exit rates at the same (calendar) time for different individuals. One can, in the context of this model, still allow for, and estimate, duration effects. These should be parametrized. These modifications to the original Cox model are possible by reversing the roles of duration and calendar time. It is argued that flexibility with respect to the effects of these macro processes is of particular relevance in economic models. An example using Dutch data on labor market transitions illustrates the idea that to ignore calendar time effects may have severe consequences for the estimation of duration dependence.

## 1992

Imbens G. **An Efficient Method of Moments Estimator for Discrete Choice Models with Choice-Based Sampling.** *Econometrica*, 1992;60(5):1187-1214. Published Paper

In this paper a new estimator is proposed for discrete choice models with choice-based sampling. The estimator is efficient and can incorporate information on the marginal choice probabilities in a straightforward manner and for that case leads to a procedure that is computationally and intuitively more appealing than the estimators that have been proposed before. The idea is to start with a flexible parameterization of the distribution of the explanatory variables and then rewrite the estimator to remove dependence on these parametric assumptions.

## 1990

Imbens G, Lancaster T. **Choice–Based Sampling of Dynamic Populations**. In: Theeuwes, Ridder. *Panel Data and Labor Market Studies*, 1990

## 1987

Imbens G, Lancaster T, Dolton P. **Job Separation and Job Matching**. In: Heymans, Neudecker. *The Practice of Econometrics*, 1987. eBook

## Working Papers

Athey S, Imbens G. **Machine Learning Methods Economists Should Know About. **2019. Working Paper

We discuss the relevance of the recent Machine Learning (ML) literature for economics and econometrics. First we discuss the differences in goals, methods and settings between the ML literature and the traditional econometrics and statistics literatures. Then we discuss some specific methods from the machine learning literature that we view as important for empirical researchers in economics. These include supervised learning methods for regression and classification, unsupervised learning methods, as well as matrix completion methods. Finally, we highlight newly developed methods at the intersection of ML and econometrics, methods that typically perform better than either off-the-shelf ML or more traditional econometric methods when applied to particular classes of problems, problems that include causal inference for average treatment effects, optimal policy estimation, and estimation of the counterfactual effect of price changes in consumer choice models.

Athey S, Imbens G. **Design-based Analysis in Difference-In-Differences Settings with Staggered Adoption. **2018. Working Paper

In this paper we study estimation of and inference for average treatment effects in a setting with panel data. We focus on the setting where units, e.g., individuals, firms, or states, adopt the policy or treatment of interest at a particular point in time, and then remain exposed to this treatment at all times afterwards. We take a design perspective where we investigate the properties of estimators and procedures given assumptions on the assignment process. We show that under random assignment of the adoption date the standard Difference-In-Differences estimator is an unbiased estimator of a particular weighted average causal effect. We characterize the proeperties of this estimand, and show that the standard variance estimator is conservative.

Athey S, Imbens G. **Estimation Considerations in Contextual Bandits. **2018. Working Paper

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We study a consideration for the exploration vs. exploitation framework that does not arise in multi-armed bandits but is crucial in contextual bandits; the way exploration and exploitation is conducted in the present affects the bias and variance in the potential outcome model estimation in subsequent stages of learning. We develop parametric and non-parametric contextual bandits that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for contextual bandits with balancing in the domain of linear contextual bandits that match the state of the art regret bounds. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model mis-specification and prejudice in the initial training data. Additionally, we develop contextual bandits with simpler assignment policies by leveraging sparse model estimation methods from the econometrics literature and demonstrate empirically that in the early stages they can improve the rate of learning and decrease regret.

Athey S, Bayati M, Doudchencko N, Imbens G, Khosravi K.**Matrix Completion Methods for Causal Panel Data Models** 2017. Working Paper

In this paper we develop new methods for estimating causal effects in settings with panel data, where a subset of units are exposed to a treatment during a subset of periods, and the goal is estimating counterfactual (untreated) outcomes for the treated unit/period combinations. We develop a class of estimators that uses the observed elements of the matrix of control outcomes corresponding to untreated unit/periods to predict the “missing” elements of the matrix, corresponding to treated units/periods. The approach estimates a matrix that well-approximates the original (incomplete) matrix, but has lower complexity according to a matrix norm, where we consider the family of Schatten norms based on the singular values of the matrix. The proposed methods have attractive computational properties. From a technical perspective, we generalize results from the matrix completion literature by allowing the patterns of missing data to have a time series dependency structure. We also present new insights concerning the connections between the interactive fixed effects models and the literatures on program evaluation under unconfoundedness as well as on synthetic control methods. If there are few time periods and many units, our method approximates a regression approach where counterfactual outcomes are estimated through a regression of current outcomes on lagged outcomes for the same unit. In contrast, if there are few units and many periods, our proposed method approximates a synthetic control estimator where counterfactual outcomes are estimated through a regression of the lagged outcomes for the treated unit on lagged outcomes for the control units. The advantage of our proposed method is that it moves seamlessly between these two different approaches, utilizing both cross-sectional and within-unit patterns in the data.

Abadie A, Athey S, Imbens G, Woolbridge J.,**Sampling-based vs. Design-based Uncertainty in Regression Analysis.** 2017. Working Paper

Consider a researcher estimating the parameters of a regression function based on data for all 50 states in the United States or on data for all visits to a website. What is the interpretation of the estimated parameters and the standard errors? In practice, researchers typically assume that the sample is randomly drawn from a large population of interest and report standard errors that are designed to capture sampling variation. This is common practice, even in applications where it is difficult to articulate what that population of interest is, and how it differs from the sample. In this article, we explore an alternative approach to inference, which is partly design-based. In a design-based setting, the values of some of the regressors can be manipulated, perhaps through a policy intervention. Design-based uncertainty emanates from lack of knowledge about the values that the regression outcome would have taken under alternative interventions. We derive standard errors that account for design-based uncertainty instead of, or in addition to, sampling-based uncertainty. We show that our standard errors in general are smaller than the infinite-population sampling-based standard errors and provide conditions under which they coincide.

Athey S, Imbens G. **Machine Learning Methods for Estimating Heterogeneous Causal Effects. **2015. Working Paper

In this paper we study the problems of estimating heterogeneity in causal effects in experimental or observational studies and conducting inference about the magnitude of the differences in treatment effects across subsets of the population. In applications, our method provides a data-driven approach to determine which subpopulations have large or small treatment effects and to test hypotheses about the differences in these effects. For experiments, our method allows researchers to identify heterogeneity in treatment effects that was not specified in a pre-analysis plan, without concern about invalidating inference due to multiple testing. In most of the literature on supervised machine learning (e.g. regression trees, random forests, LASSO, etc.), the goal is to build a model of the relationship between a unit’s attributes and an observed outcome. A prominent role in these methods is played by cross-validation which compares predictions to actual outcomes in test samples, in order to select the level of complexity of the model that provides the best predictive power. Our method is closely related, but it differs in that it is tailored for predicting causal effects of a treatment rather than a unit’s outcome. The challenge is that the “ground truth” for a causal effect is not observed for any individual unit: we observe the unit with the treatment, or without the treatment, but not both at the same time. Thus, it is not obvious how to use cross-validation to determine whether a causal effect has been accurately predicted. We propose several novel cross-validation criteria for this problem and demonstrate through simulations the conditions under which they perform better than standard methods for the problem of causal effects. We then apply the method to a large-scale field experiment re-ranking results on a search engine.

Athey S, Eckles D, Imbens G. **Exact P-values for Network Interference**. 2015. Working Paper

We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit’s treatment status on another according to the distance between units; for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect. We also consider hypotheses concerning the validity of sparsification of a network (for example based on the strength of ties) and hypotheses restricting heterogeneity in peer effects (so that, for example, only the number or fraction treated among neighboring units matters). Our general approach is to define an artificial experiment, such that the null hypothesis that was not sharp for the original experiment is sharp for the artificial experiment, and such that the randomization analysis for the artificial experiment is validated by the design of the original experiment.

Athey S, Eckles D, Imbens G. **Machine Learning for Estimating Heretogeneous Casual Effects**. 2015. Working Paper

In this paper we study the problems of estimating heterogeneity in causal effects in experimental or observational studies and conducting inference about the magnitude of the differences in treatment effects across subsets of the population. In applications, our method provides a data-driven approach to determine which subpopulations have large or small treatment effects and to test hypotheses about the differences in these effects. For experiments, our method allows researchers to identify heterogeneity in treatment effects that was not specified in a pre-analysis plan, without concern about invalidating inference due to multiple testing. In most of the literature on supervised machine learning (e.g. regression trees, random forests, LASSO, etc.), the goal is to build a model of the relationship between a unit’s attributes and an observed outcome. A prominent role in these methods is played by cross-validation which compares predictions to actual outcomes in test samples, in order to select the level of complexity of the model that provides the best predictive power. Our method is closely related, but it differs in that it is tailored for predicting causal effects of a treatment rather than a unit’s outcome. The challenge is that the “ground truth” for a causal effect is not observed for any individual unit: we observe the unit with the treatment, or without the treatment, but not both at the same time. Thus, it is not obvious how to use cross-validation to determine whether a causal effect has been accurately predicted. We propose several novel cross-validation criteria for this problem and demonstrate through simulations the conditions under which they perform better than standard methods for the problem of causal effects. We then apply the method to a large-scale field experiment re-ranking results on a search engine.

Graham B, Imbens G, Ridder G. **Measuring the Effects of Segregation in the Presence of Social Spillovers: A Nonparametric Approach**. 2010. Working Paper

Deaton and Cartwright (DC2017 from hereon) view the increasing popularity of randomized experiments in social sciences with some skepticism. They are concerned about the quality of the inferences in practice, and fear that researchers may not fully appreciate the pitfalls and limitations of such experiments. I am more sanguine about the recent developments in empirical practice in economics and other social sciences, and am optimistic about the ongoing research in this area, both empirical and theoretical. I see the surge in use of randomized experiments as part of what Angrist and Pischke [2010] call the credibility revolution, where, starting in the late eighties and early nineties a group of researchers associated with the labor economics group at Princeton University, including Orley Ashenfelter, David Card, Alan Krueger and Joshua Angrist, led empirical researchers to pay more attention to the identification strategies underlying empirical work. This has led to important methodological developments in causal inference, including new approaches to instrumental variables, difference-in-differences, regression discontinuity designs, and, most recently, synthetic control methods (Abadie et al. [2010]). I view the increased focus on randomized experiments in particular in development economics, led by researchers such as Michael Kremer, Abhijit Banerjee, Esther Duflo, and their many coauthors and students, as taking this development even further.1 Nothwithstanding the limitations of experimentation in answering some questions, and the difficulties in implementation, these developments have greatly improved the credibility of empirical work in economics compared to the standards prior to the mid-eighties, and I view this as a major achievement by these researchers. It would be disappointing if DC2017 takes away from this, and were to move empirical practice away from the attention paid to identification and the use of randomized experiments. In the remainder of this comment I will discuss four specific issues. Some of these elaborate on points I raised in a previous discussion of D2010, Imbens [2010]

Imbens G, Kalyanaraman K. **An Empirical Model for Strategic Network Formation**. 2010. Working Paper

We study the calculation of exact p-values for a large class of non-sharp null hypotheses about treatment effects in a setting with data from experiments involving members of a single connected network. The class includes null hypotheses that limit the effect of one unit’s treatment status on another according to the distance between units; for example, the hypothesis might specify that the treatment status of immediate neighbors has no effect, or that units more than two edges away have no effect. We also consider hypotheses concerning the validity of sparsification of a network (for example based on the strength of ties) and hypotheses restricting heterogeneity in peer effects (so that, for example, only the number or fraction treated among neighboring units matters). Our general approach is to define an artificial experiment, such that the null hypothesis that was not sharp for the original experiment is sharp for the artificial experiment, and such that the randomization analysis for the artificial experiment is validated by the design of the original experiment.

Imbens G, Ridder G. **Estimation and Inference for Generalized Full and Partial Means and Average Derivatives**. 2009. Working Paper

Many empirical studies use Fuzzy Regression Discontinuity (FRD) designs to identify treatment effects when the receipt of treatment is potentially correlated to outcomes. Existing FRD methods identify the local average treatment effect (LATE) on the subpopulation of compliers with values of the forcing variable that are equal to the threshold. We develop methods that assess the plausibility of generalizing LATE to subpopulations other than compliers, and to subpopulations other than those with forcing variable equal to the threshold. Specifically, we focus on testing the equality of the distributions of potential outcomes for treated compliers and always-takers, and for non-treated compliers and never-takers. We show that equality of these pairs of distributions implies that the expected outcome conditional on the forcing variable and the treatment status is continuous in the forcing variable at the threshold, for each of the two treatment regimes. As a matter of routine, we recommend that researchers present graphs with estimates of these two conditional expectations in addition to graphs with estimates of the expected outcome conditional on the forcing variable alone. We illustrate our methods using data on the academic performance of students attending the summer school program in two large school districts in the US.

Imbens G, Newey W, Ridder G. **Mean-Squared-Error Calculations for Average Treatment Effects**. 2007. Working Paper

This paper develops a new efficient estimator for the average treatment effect, if selection for treatment is on observables. The new estimator is linear in the first-stage nonparametric estimator. This simplifies the derivation of the means squared error (MSE) of the estimator as a function of the number of basis functions that is used in the first stage nonparametric regression. We propose an estimator for the MSE and show that in large samples minimization of this estimator is equivalent to minimization of the population MSE.