segregsmall: a Command to Estimate Segregation in the Presence of Small Units
Stata Journal, 21(1), pp. 152-179, March 2021 (link to PDF article)
Joint work with: Xavier D’Haultfoeuille and Roland Rathelot.
The Stata package “segregsmall” implements three methods to measure classical segregation indices (Duncan, Theil, Atkinson, Coworker a.k.a. Isolation, and Gini) in the context of small units. Units can be geographical areas, residential neighborhoods, firms, classrooms, or other clusters provided that every individual belongs to exactly one unit. Small units refer to the fact that the number of individuals per unit is small, typically around a few dozens at a maximum. In such settings, which are quite common in concrete applications, looking at the variations of the empirical shares of the minority group across units, although a natural idea, leads to biased indices. They are upward biased and cannot be reliably compared over time or across settings as the bias might evolve too. Hence the interest of methods that account for the small-unit bias.
The package is operational and can be installed on your Stata through this GitHub page. Please contact me should you have any questions or difficulties.
Abstract: Suppose that a population, comprised of a minority and a majority group, is allocated into units, which can be neighborhoods, firms, classrooms, etc. Qualitatively, there is some segregation whenever the allocation process leads to the concentration of minority individuals in some units more than in others. Quantitative measures of segregation have struggled with the small-unit bias. When units contain few individuals, indices based on the minority shares in units are upward biased. For instance, they would point to a positive amount of segregation even when the allocation process is strictly random. The Stata command segregsmall implements three recent methods correcting for such bias: the non- parametric, partial identification approach of D’Haultfœuille and Rathelot (2017), the parametric model of Rathelot (2012), and the linear correction of Carrington and Troske (1997). The package also allows for conditional analyses, namely measures of segregation taking into account characteristics of the individuals or the units.
Bridging Methodologies: J. Angrist and G. Imbens’ Contributions to Causal Identification
Revue d’économie politique, 133(6), pp. 845-905, 2023 (link) ; arXiv version (link)
Joint work with: Yannick Guyonvarch.
This article is a review requested by the Revue d’économie politique in the context of the Nobel Prize received by J. Angrist, D. Card, and G. Imbens in 2021. It focuses on J. Angrist and G. Imbens; the companion article devoted to D. Card’s works was written by Dominique Goux and Eric Maurin (link).
Abstract: In the 1990s, Joshua Angrist and Guido Imbens studied the causal interpretation of Instrumental Variable estimates (a widespread methodology in economics) through the lens of potential outcomes (a classical framework to formalize causality in statistics). Bridging a gap between those two strands of literature, they stress the importance of treatment effect heterogeneity and show that, under defendable assumptions in various applications, this method recovers an average causal effect for a specific subpopulation of individuals whose treatment is affected by the instrument. They were awarded the Nobel Prize primarily for this Local Average Treatment Effect (LATE). The first part of this article presents that methodological contribution in-depth: the origination in earlier applied articles, the different identification results and extensions, and related debates on the relevance of LATEs for public policy decisions. The second part reviews the main contributions of the authors beyond the LATE. J. Angrist has pursued the search for informative and varied empirical research designs in several fields, particularly in education. G. Imbens has complemented the toolbox for treatment effect estimation in many ways, notably through propensity score reweighting, matching, and, more recently, adapting machine learning procedures.
Explicit non-asymptotic bounds for the distance to the first-order Edgeworth expansion
Sankhya A, 86(1), pp. 261-336, 2024 (link)
Joint work with: Alexis Derumigny, Yannick Guyonvarch.
Chronologically, with Alexis and Yannick, we started to work on Edgeworth expansions and Berry-Esseen bounds after an initial version of our current work on nonasymptotic confidence intervals for linear regressions’ coefficients, in which we used existing Berry-Esseen inequalities valid under finite third-order moments. However, in the econometrics of linear regressions, it is standard to assume finite fourth-order moments for regressors to have a consistent estimator of the asymptotic variance of the OLS estimator and thus classical asymptotic confidence intervals and tests. Refined bounds with as small as possible numeric constants are important for the practical use of nonasymptotic inference tools. Hence the initial motivation of this article: improve existing Berry-Esseen bounds under finite fourth-order moments. We do so through bounds for Edgeworth expansions, which basically refines Berry-Essen inequalities by adjusting for possible skewness. The project then extends to study both i.n.i.d. and i.i.d. cases as well as tighter bounds under additional regularity assumptions, which, in essence, relate to having an absolutely continuous distribution for the observations with respect to Lebesgue’s measure (as opposed to a discrete distribution). This article is the completed version of Chapter 4 of my PhD manuscript.
Abstract: In this article, we obtain explicit bounds on the uniform distance between the cumulative distribution function of a standardized sum S of n independent centered random variables with moments of order four and its first-order Edgeworth expansion. Those bounds are valid for any sample size with n^{-1/2} rate under moment conditions only and n^{-1} rate under additional regularity constraints on the tail behavior of the characteristic function of S. In both cases, the bounds are further sharpened if the variables involved in S are unskewed. We also derive new Berry-Esseen-type bounds from our results and discuss their links with existing ones. Following these theoretical results, we discuss the practical use of our bounds, which depend on possibly unknown moments of the distribution of S. Finally, we apply our bounds to investigate several aspects of the non-asymptotic behavior of one-sided tests: informativeness, sufficient sample size in experimental design, distortions in terms of levels and p-values.