Publications

segregsmall: a Command to Estimate Segregation in the Presence of Small Units

Stata Journal, 21(1), pp. 152-179, March 2021

Joint work with: Xavier D’Haultfoeuille and Roland Rathelot.

The Stata package “segregsmall” implements three methods to measure classical segregation indices (Duncan, Theil, Atkinson, Coworker a.k.a. Isolation, and Gini) in the context of small units. Units can be geographical areas, residential neighborhoods, firms, classrooms, or other clusters provided that every individual belongs to exactly one unit. Small units refer to the fact that the number of individuals per unit is small, typically around a few dozens at a maximum. In such settings, which are quite common in concrete applications, looking at the variations of the empirical shares of the minority group across units, although a natural idea, leads to biased indices. They are upward biased and cannot be reliably compared over time or across settings as the bias might evolve too. Hence the interest of methods that account for the small-unit bias.

The package is operational and can be installed on your Stata through this GitHub page. Please contact me should you have any questions or difficulties.

Abstract: Suppose that a population, comprised of a minority and a majority group, is allocated into units, which can be neighborhoods, firms, classrooms, etc. Qualitatively, there is some segregation whenever the allocation process leads to the concentration of minority individuals in some units more than in others. Quantitative measures of segregation have struggled with the small-unit bias. When units contain few individuals, indices based on the minority shares in units are upward biased. For instance, they would point to a positive amount of segregation even when the allocation process is strictly random. The Stata command segregsmall implements three recent methods correcting for such bias: the non- parametric, partial identification approach of D’Haultfœuille and Rathelot (2017), the parametric model of Rathelot (2012), and the linear correction of Carrington and Troske (1997). The package also allows for conditional analyses, namely measures of segregation taking into account characteristics of the individuals or the units.