Published and Forthcoming Papers
Robust Post-Matching Inferencewith Alberto Abadie
Journal of the American Statistical Association (forthcoming)
Nearest-neighbor matching is a popular nonparametric tool to create balance between treatment and control groups in observational studies. As a preprocessing step before regression, matching reduces the dependence on parametric modeling assumptions. In current empirical practice, however, the matching step is often ignored in the calculation of standard errors and confidence intervals. In this article, we show that ignoring the matching step results in asymptotically valid standard errors if matching is done without replacement and the regression model is correctly specified relative to the population regression function of the outcome variable on the treatment variable and all the covariates used for matching. However, standard errors that ignore the matching step are not valid if matching is conducted with replacement or, more crucially, if the second step regression model is misspecified in the sense indicated above. Moreover, correct specification of the regression model is not required for consistent estimation of treatment effects with matched data. We show that two easily implementable alternatives produce approximations to the distribution of the post-matching estimator that are robust to misspecification. A simulation study and an empirical example demonstrate the empirical relevance of our results.
Augmenting Pre-Analysis Plans with Machine Learningwith Jens Ludwig and Sendhil Mullainathan
AEA Papers and Proceedings, Vol. 109 (2019), Pages 71-76
Concerns about the dissemination of spurious results have led to calls for pre-analysis plans (PAPs) to avoid ex-post “p-hacking.” But often the conceptual hypotheses being tested do not imply the level of specificity required for a PAP. In this paper we suggest a framework for PAPs that capitalize on the availability of causal machine-learning (ML) techniques, in which researchers combine specific aspects of the analysis with ML for the flexible estimation of unspecific remainders. A “cheap-lunch” result shows that the inclusion of ML produces limited worst-case costs in power, while offering a substantial upside from systematic specification searches.
Big Data and Discriminationwith Talia Gillis
The University of Chicago Law Review, Vol. 86 (2019), Issue 2, Pages 459-487
The ability to distinguish between people in setting the price of credit is often constrained by legal rules that aim to prevent discrimination. These legal requirements have developed focusing on human decision-making contexts, and so their effectiveness is challenged as pricing increasingly relies on intelligent algorithms that extract information from big data. In this Essay, we bring together existing legal requirements with the structure of machine-learning decision-making in order to identify tensions between old law and new methods and lay the ground for legal solutions. We argue that, while automated pricing rules provide increased transparency, their complexity also limits the application of existing law. Using a simulation exercise based on real-world mortgage data to illustrate our arguments, we note that restricting the characteristics that the algorithms allowed to use can have a limited effect on disparity and can in fact increase pricing gaps. Furthermore, we argue that there are limits to interpreting the pricing rules set by machine learning that hinders the application of existing discrimination laws. We end by discussing a framework for testing discrimination that evaluates algorithmic pricing rules in a controlled environment. Unlike the human decision-making context, this framework allows for ex ante testing of price rules, facilitating comparisons between lenders.
Machine Learning: An Applied Econometric Approachwith Sendhil Mullainathan
Journal of Economic Perspectives, Vol. 31 (2017), Issue 2, Pages 87–106
Machines are increasingly doing “intelligent” things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the presence y of a face from pixels x. This similarity to econometrics raises questions: How do these new empirical tools fit with what we know? As empirical economists, how can we use them? We present a way of thinking about machine learning that gives it its own place in the econometric toolbox. Machine learning not only provides new tools, it solves a different problem. Specifically, machine learning revolves around the problem of prediction, while many economic applications revolve around parameter estimation. So applying machine learning to economics requires finding relevant tasks. Machine learning algorithms are now technically easy to use: you can download convenient packages in R or Python. This also raises the risk that the algorithms are applied naively or their output is misinterpreted. We hope to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble—and thus where they can be most usefully applied.