key: cord-0984150-ozbl2m25 authors: Johnson, Janna E. title: Does the Census Miss the Native-Born Children of Immigrant Mothers? Evidence from State-Level Undercount by Race and Hispanic Status date: 2021-04-20 journal: Popul Res Policy Rev DOI: 10.1007/s11113-021-09651-w sha: 511b6784ea698e0e46fd63c249ae43f68919f118 doc_id: 984150 cord_uid: ozbl2m25 Despite research going back over a century showing the U.S. census counts some groups more accurately than others at the national level, little is known about how undercount varies within the country. I focus on a population easily measured with administrative data yet known to suffer high levels of undercount—native-born young children—to document state-level variation in undercount by race and Hispanic status. Although the race-specific analysis is only possible for the 2000 census, the patterns I show for all children are similar to those in 2010, implying the results from 2000 are likely relevant to today. Undercount levels vary widely across states, with non-Black children having the highest rates in the south and southwest, and Black children in the northeast. Results by Hispanic status show non-Black Hispanic young children are highly undercounted in several states with high Hispanic populations, but not all, and are also highly undercounted in the northeast and New England. In several states with high non-Black Hispanic undercounts, non-Black non-Hispanic children are also undercounted at a high rate. I find a very strong correlation between the fraction of births to foreign-born mothers in the state and the undercount of Black and non-Black children—in fact, it is the strongest correlate with the undercount of native-born Black children of those I investigate. The fraction of foreign-born mothers does not correlate with the undercount of non-Black Hispanic and non-Hispanic young children, although Hispanic status of the parents do. My results suggest a group-specific, local focus for future work is needed to determine the causes of census undercount. The 2020 census has received more attention, and been the subject of more controversy, than any other United States census in recent memory. Census-related issues-from the proposed addition of a citizenship question to the challenges of collecting data in the midst of the global COVID-19 pandemic-have been everpresent in the headlines. As of December 2020, the controversy around the census is still ongoing: The Supreme Court is now hearing a case proposing to remove unauthorized immigrants from the apportionment counts, the population measurements that re-allocate seats in the House of Representatives (and presidential electoral votes) across states. The potential implications of this ruling cannot be understated and will likely be felt far beyond the next 10 years. Behind all of this controversy and attention is concern about the ability of the census to accurately count all residents of the United States, commonly referred to as undercount. The circumstances around Census 2020 have aroused great concern among demographic experts and policymakers that this census may suffer from unprecedented levels of inaccuracy. Particularly troubling is a possible increase in the undercount of communities of color and the foreign born. These populations are known to have experienced disproportionately high levels of undercount in previous censuses and are likely among the groups most affected by the challenges presented by the pandemic and the current political climate. Despite the long-standing knowledge that particular groups are missed more frequently in the census at the national level, very little is known about how undercount rates vary locally by race and Hispanic status. For example, Hispanic children in Ohio and Black children in Pennsylvania may be undercounted at a different rate than Hispanic children in New Mexico and Black children in Mississippi. As decennial census counts determine the allocation of over $1.5 trillion annually in federal funding across states and counties (Sullivan, 2020) , differences in undercount across geographic areas may leave some communities under-resourced to serve their neediest members, many of whom are children. Indeed, state and local funding allocations for several programs that directly serve low income children, such as the Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), Head Start, and the State Children's Health Insurance Program (SCHIP) are determined by local area census counts of young children (Reamer, 2017) . The population aged 0-4 has also historically suffered some of the highest census undercount rates of any age group. In this paper I estimate how the census undercount of young children varies across states by race and Hispanic status. Research on geographic variation in undercount is scarce as the data necessary to form highly accurate subnational population measures independent of the census is difficult to obtain. Besides being of interest due to their outsize role in determining both undercount levels and funding allocations, young children are among the easiest groups to estimate from administrative sources, mostly due to the high accuracy of U.S. birth records. To further increase accuracy, I narrow my focus to native-born children and estimate undercount by state of birth (rather than state of residence) to remove any reliance on measures of international and internal migration. I am the first to my knowledge to document the substantial variation in undercount rates by state in the 2000 census, and the first to show state-level undercount patterns by race and Hispanic status for any census. As the identification of place of birth relies on information collected on the long form, discontinued after Census 2000, I cannot repeat my analysis for more recent censuses. However, the patterns shown in my 2000 results are very similar to those shown in the single extant similar study of young children in 2010-evidence that such patterns are likely consistent across time and still informative today. I also am the first to correlate state-level undercount patterns with administrative data from birth certificates, showing that for native-born young children, foreign-born status of the mother is strongly associated with higher undercount at the state level, particularly for Black children. Results for Hispanic children suggest caution is needed when using the Hispanic status of the parents to define that of the child, particularly in light of recent work on the fluidity of race and ethnic status (Liebler et al., 2017) . This paper proceeds as follows. The next section provides a short summary of the existing research on census undercount of young children, framing my work within the literature. The following section describes the calculation of the undercount of young children by state of birth, presents results, and tests the robustness of the results to assumptions about net emigration. The penultimate section correlates birth certificate information on mothers with state-level undercounts by race and Hispanic status. The final section concludes. Young children are one of the highest undercounted age groups in the United States census and those of other countries (O'Hare, 2017) , dating back to at least 1850 (Hacker, 2013) through the present day (O'Hare, 2014b (O'Hare, , 2015 Robinson et al., 1993 Robinson et al., , 2002 . Despite the prevalence of the phenomenon, the underlying reasons why young children are more often missed than older individuals remain mysterious. To be undercounted, young children are either left off a household's census form while other members are enumerated, or the entire household is missed but only the undercount of young children is evident due to overcount of older age groups. How these two mechanisms play a role in the undercount of children remains unknown. However, children in households with so-called "hard-to-count" characteristics, such as belonging to a minority racial and ethnic group or residing in rental housing, are more likely to be undercounted than children who do not share these characteristics (Bruce & Robinson, 2003) . 1 The two existing studies most similar to mine are O'Hare (2014c) and O'Hare et al. (2016) . They both use the demographic analysis (DA) method to estimate state-level undercounts of young children aged 0-4 in the 2010 census. The DA methodology involves creating a measure of the population independent of census counts, usually using births, deaths, and measures of net migration, to estimate undercount. 2 O'Hare (2014c) and O'Hare et al. (2016) use the Census Bureau's Vintage 2010 state population estimates as a benchmark for comparison to the census counts. For children aged 0-4, these population estimates consist of birth records adjusted for deaths and net internal and international migration. O'Hare (2014c) calculates undercount by state for all young children in 2010, and shows this undercount is strongly correlated with the size of the Black and Hispanic populations in each state, as well as with state-level measures of linguistic isolation, low education, and unemployment. O'Hare et al. (2016) focus on Hispanic children only, and show that the undercount of Hispanic children in the 2010 census is concentrated among a small number of states. 3 This paper expands on this existing work in several important ways. First, despite using slightly different samples of young children, the pattern of undercount across states is remarkably similar across my results and those of the two studies above. As we use two completely different censuses-2000 and 2010-this is evidence that the factors driving these patterns are not unique to a single census and are likely persistent through time. Second, I estimate state-level undercounts for four different racial and ethnic groups: non-Black, Black, non-Black Hispanic, and non-Black non-Hispanic, while previous work only did so for all children and Hispanic children. State-level undercount patterns are quite different across these four groups. Third, unlike previous work, I am able to form uncertainty measures on my undercount estimates and test how assumptions about net emigration affect my results. Finally, I employ multivariate regression analysis to correlate the characteristics of mothers and fathers of young children from birth certificates with state-of-birth level undercounts, providing a more detailed picture of the relationship between the undercount of young children and parents' characteristics than the methods used in previous work. Demographic analysis has been used by the Census Bureau since 1950 to measure the population coverage of the decennial census (Robinson, 2010) . DA uses measures of births, deaths, and migration to form a measure of the national population independent of any census count, called the demographic estimate (DE). This value is then compared to the census estimate (CE) from the decennial census to calculate the undercount rate of the census: The undercount rate ( %U ) is expressed as a percentage. I follow the convention used by the Census Bureau: positive values imply an undercount (the demographic estimate is larger than the census estimate) and negative an overcount (the census estimate exceeds the demographic estimate). In DA, the demographic estimate is assumed to be the correct measure of the population. However, some components of the demographic estimate are more accurate than others. In particular, three components of net migration have no administrative records: foreign-born emigration, native-born emigration, and "residual foreign-born migration" (including unauthorized and quasi-legal migrants). 4 Measures of these components are based on residual methods, combining estimates from previous censuses with administrative records like death records and legal immigration records. 5 As the estimates of these components are based largely on assumptions rather than administrative records (such as the accuracy of the enumeration of the previous census), it is hard to evaluate their accuracy. By focusing exclusively on native-born young children, I do not need to incorporate potentially inaccurate estimates of the foreign-born. However, children born in the United States may emigrate prior to the census, and these children should therefore be excluded from the demographic estimate. As the United States does not keep records of citizens who leave the country, measuring emigrants is difficult. Given the lack of reliable estimates of emigration, I make the necessary assumption that the emigration of nativeborn young children is zero. If this assumption is violated, the demographic estimate is biased upward, increasing the likelihood my results show an undercount of nativeborn children. Later in the paper, I test the implications of this assumption for my results. Under the assumption of zero out-migration, I only require data on births and deaths to calculate the demographic estimate of native-born young children aged 0-4. Data on births come from the National Center for Health Statistics' (NCHS) Vital Statistics Natality Birth Data. As I calculate undercount for native-born children aged 0-4 on April 1, 2000, I use births that occurred in the United States between April 1, 1995 and March 31, 2000 . Each birth record contains information on the child (such as birthweight, sex, month of birth, location of birth, etc.) and the parents (such as age, race, education, place of residence, etc.). In addition to using these records for birth counts by cohort, race/Hispanic status, and state of birth, I use them to compute the state-of-birth-level information on mothers used later in the paper. Registration of births in the United States is assumed to be complete for my cohorts of interest, as the Census Bureau assumes complete registration beginning with the 1985 birth cohort. I use death data from the NCHS Vital Statistics Multiple Cause-of-Death Mortality Data. The data contain individual records for all deaths occurring in the United States in each calendar year, and include date of death, place of death, state of birth, age, residence, sex, race, and cause of death. I use all deaths occurring between April 1, 1995 and March 31, 2000 of children in my birth cohorts of interest. However, as death records do not contain month of birth, and only month of death and age, a challenge arises in assigning decedents to specific birth cohorts. I describe the procedure I employ to assign deaths to birth cohorts in Appendix 2. For my census estimate, I use a different source than the Census Bureau. As the Bureau only reports undercount by race and age, they use the 100% census counts for their census estimate. As I measure undercount for the native born by state of birth, I use the 5% PUMS as my census estimate (Ruggles et al., 2020) , as place of birth is only recorded on the census long form in 2000. The long form was administered to approximately 16% of the population. However, one can still form full population counts from the 5% PUMS using the provided weights, which are constructed to match the full population counts. One of the largest challenges in demographic analysis is the assignment of race and Hispanic status. In order for DA to be accurate, the values of these characteristics must be consistently defined across multiple sources: birth certificates, death certificates, and census responses. In the case of young children, census responses are not self-responses, but those of a proxy (usually a parent). Race and Hispanic status are similarly recorded by proxy on death certificates. Birth certificates contain this information for both parents (if it's not missing, as discussed below), which must be used to define the race and Hispanic status of the child. Due to the difficulty of ensuring race is defined consistently across these sources, the Census Bureau historically reported DA estimates for two race categories: Black and non-Black, as they did in 2000. 6 Starting in 2010, an additional category was added: Black alone or in combination (Devine et al., 2010) . As the Bureau did in 2000, I also use two race categories in my analysis. I use the same race assignment rule for Black and non-Black as the Census Bureau did in 2000: the father rule. Research done by the Bureau and others showed that this rule produced an estimate of the Black population that most closely matched census estimates (Passel, 1990; Robinson, 2010) . This involves assigning the race of the father to the child, and if the father's race is missing, assigning the mother's race. 7 Census 2000 was the last year this rule was used in DA. With the increase in multiracial births, the Census Bureau developed a file in the early 2000s which linked children aged 0 to 17 with their parents in the 2000 census. This "Kid Link File" was used to establish patterns of race reporting in multiracial households, and the distribution of responses was applied to aggregated birth records to assign race in 2010 (Sink & Jensen, 2019) . As this file was not available in 2000, the father rule was used to assign births to the two race categories. The assignment of Hispanic status to births in 2000 is more complicated than the assignment of race. The Census Bureau did not report DA results by Hispanic status in 2000; they began doing so in 2010 for ages 0-19 only (Devine et al., 2010) . Hispanic status was not reported by all states on the birth and death certificates until 1993 and 1997, respectively, preventing the use of DA for the Hispanic population for older birth cohorts. As I focus on young children aged 0-4 in 2000, Hispanic status is available for these cohorts. However, unlike race, Hispanic status is missing on some birth and death records, and the fraction of records missing this characteristic varies by state. In Appendix 3, I show the extent of this variation and describe how I account for missing values when assigning Hispanic status to births and deaths. As I did for race, I use the father rule to assign Hispanic status to births. 8 The 2000 census was the first census in which a respondent could identify as belonging to multiple races, which could complicate the assignment of race and Hispanic status. To simplify the issue, I use the variable RACESING, generated by IPUMS-USA, to identify a child as Black or non-Black in census data (Ruggles et al., 2020) . This variable was created by the IPUMS team to enable the consistent definition of race across earlier censuses that only allowed one race and census 2000 and later. It assigns each multiple-race person to a single race based on age, sex, Hispanic status, and geographic characteristics. Hispanic status is recorded separately from race, and I use the IPUMS variable HISPAN to assign children as Hispanic or non-Hispanic. Once I form the demographic and census estimates, I calculate the undercount rate as in Eq. (1). I estimate undercounts for the native-born population aged 0-4 by race (Black and non-Black), Hispanic status, 9 and state of birth. I do not estimate undercount by sex as undercount rates for young children do not vary by sex (O'Hare, 2014a). 8 An alternative Hispanic status assignment rule based on the Census Bureau's procedure for assigning Hispanic status to births in 2010 is also described in Appendix 3. Use of this alternative assignment rule does not substantially impact results, as shown in Appendix Tables 6 and 16 . 9 I only calculate undercount by Hispanic status for non-Black children due to the small size of the Hispanic Black population. Typically, undercount rates are reported without any measures of uncertainty. 10 Birth and death records as well as the 100% census counts are assumed to comprise the full population and therefore have no associated standard error. Uncertainty in measures of migration, while suspected to be substantial, is not incorporated in official undercount estimates reported by the Census Bureau. In my case, as my demographic estimate only contains birth and death records of recent birth cohorts, I assume it has no associated error. However, to calculate undercount by state of birth, I use the 5% PUMS for the census estimate. The 2000 5% PUMS is a stratified subsample of the full census sample that received the census long form (about 16% of the population), and therefore has sampling error (U.S. Census Bureau, 2003) . The supplied sampling weights can be used to form standard errors and confidence intervals. If the demographic estimate falls outside of the 95% confidence interval of the census estimate, the undercount rate is significantly different from zero at that level of confidence. If not, the undercount rate is statistically indistinguishable from zero. The undercount rate for all native-born children aged 0-4 in the 2000 U.S. census is 5.0%, significant at the 95% level. Undercount of this group by state of birth and race (Black and non-Black) is shown in Fig. 1 , and in tabular form in Appendix Table 5 . The maps in Fig. 1 show states with statistically significant undercount rates; rates for all states are shown in Appendix Table 5 . For non-Black young children (Fig. 1a) , the overall national undercount is 4.72, significant at the 95% level. Significant undercount rates range from 1.61 (Pennsylvania) to 10.10 (Hawaii). 11 Twenty-six of the fifty states have statistically significant levels of undercount for native-born non-Black children. States with the highest levels of undercount tend to be those with high Hispanic populations, such as California (9.74%), Arizona (9.52), Texas (8.36), and Florida (6.11). Two exceptions to this are Tennessee (6.30%) and North Dakota (6.12). 12 The pattern for undercount of Black native-born young children by state of birth in Fig. 1b is quite different. The national overall undercount is higher than for 10 Two examples of studies reporting uncertainty in undercount rates are Robinson et al. (1993) , who formally models the uncertainty in undercounts for the 1990 census, and Devine et al. (2012) , who applies various assumptions about the range of the components in the demographic estimate to produce a high, medium, and low estimate of undercount for 2010. 11 State of birth undercount rates not significant at the 95% level range from − 1.47 (Montana) to 5.27 (Delaware). Undercount rates for the District of Columbia are shown in Appendix Table 5 but not in Fig. 1 . For both Black and non-Black, the undercount in DC is very high, likely due to recall bias when individuals fill out their census form. A potential explanation is mis-reporting of state of birth for many children born in a DC hospital, as the parents of these children may mistakenly report the state of birth of these children as their state of residence at the time of their birth (such as Virginia or Maryland). 12 Oregon, not widely known as a high Hispanic state, also has a relatively high undercount of non-Black children: 5.96%. Oregon's population in 2000 was 8% Hispanic, while the percent Hispanic for Tennessee and North Dakota in 2000 was 2.2 and 1.2%, respectively (Guzmán, 2001) . non-Black native-born young children: 6.45%. States with significant undercount range from Louisiana (4.73%) to Massachusetts (24.56) . Several states have undercount estimates less than zero, indicating an overcount, with Washington's (− 14.75) and New Mexico's (− 38.73) statistically significant at the 95% level. Fewer states (16 out of 50) have significant undercounts than for non-Black children, although this is unsurprising due to the small Black population in many states leading to very large confidence intervals on the census estimate. 13 States with the highest undercounts are mostly located in the Northeast, with the exception of Florida and Hawaii. 14 Undercount by Hispanic Status Figure 2 shows the undercount of native-born non-Black young children in 2000 by state of birth and Hispanic status. Results are shown in tabular form in Appendix Table 6 . Undercount for non-Hispanic non-Black young children is shown in Fig. 2a . Total national undercount for this group is 3.51%, and states with significant undercounts range from Pennsylvania (1.72) to Hawaii (12.52). States with statistically insignificant undercounts have values ranging from − 0.89 (Rhode Island) to 5.79 (Wyoming). As with the undercount for all non-Black children, states with the highest undercounts tend to be in the Southwest. Undercount for Hispanic children is shown in Fig. 2b . The national total for native-born non-Black Hispanic young children is higher than for non-Hispanic and Black young children: 8.97%. States with significant undercount rates range from Colorado (4.44) to Rhode Island (20.78), and those with non-significant rates from Vermont (− 139) to North Dakota (17.45). Six states have significant overcounts of Hispanic young children. The states with the highest Hispanic undercount rates are in the Northeast and Southwest. Several states with high Hispanic undercount rates also have high non-Hispanic undercount rates, such as Arizona and California. As I am only able to calculate undercount of native-born children for the 2000 census, a natural concern is whether the patterns across states and racial/ethnic groups evident in the 2000 results are unique to that census, or if they are more consistent through time. Additionally, as my results are only for native-born children by state of birth, can they be used to infer the characteristics of undercount for all children and by state of residence? Insight on these questions may be drawn from comparing 13 For example, Vermont has an overcount of − 109.05%, but the demographic and census estimates of Black children aged 0-4 born in Vermont in 2000 are only 210 and 439, respectively, and the 95% confidence interval on the census estimate is (206, 671). 14 Hawaii has among the highest undercount rates for both Black and non-Black children. This is likely due to the large proportion of mixed-race individuals in Hawaii leading to difficulties in matching race and ethnic classification across administrative records and census responses. Over 39% of young children born in Hawaii were recorded as belonging to 2 or more races in the 2000 census, compared to 5% nationally. Table 7 . There appears to be little difference in undercount of native-born children by state of birth in 2000 and all children by state of residence in 2010. Most states have very similar levels of undercount across the two censuses, and the total undercount rate for the entire country is the same in both years (5.0%). Only seven states have statistically significantly different undercount rates across the two censuses. 17 In terms of numerical difference, 24 states have undercount values less than 1 percentage point apart in 2000 and 2010, and 39 states are less than 2 percentage points apart. The patterns of undercount across states are also quite similar between the two censuses. Despite the substantial differences between the samples used by me for 2000 and by O'Hare for 2010, both the numerical undercount values and the patterns across states are very similar. Therefore, it appears reasonable to conclude that the patterns and values of undercount for native-born children in 2000 by state of birth are likely evidence of state-level undercount patterns for all young children that are persistent across recent census years. A key assumption in my undercount calculations is that the emigration of nativeborn young children is zero. Obviously, this is untrue; some children born in the U.S. leave soon after birth. If this emigration is substantial, it could mean my demographic estimates-and therefore my undercount results-are too high. Additionally, if children born in some states emigrate at higher rates than children born in other states, not accounting for this emigration could affect the pattern of undercount across states of birth shown by my results. I estimate the potential effect of emigration on my results using estimates of emigration rates reported in Van Hook et al. (2006) , who estimate emigration circa the year 2000 using the Current Population Survey (CPS). Van Hook and coauthors report estimates for the foreign-born population by age and country of origin. The process for adjusting undercount for emigration is described in Appendix 4. The adjustment is state-of-birth specific, as the fraction of births to foreign-born mothers varies by state, as well as by racial/ethnic group. I use two sets of Van Hook et al. (2006) 's annual emigration rates: the age 25-34 rate (2.7%) and a rate based on national origin: Mexican for non-Black and Hispanic (4.3%), Chinese for non-Hispanic (2.4%), and Caribbean for Black (1.8%). The age 25-34 rate is used as I assume that native-born young children emigrate with their mothers, a large portion of which are likely in this age range. The national origin rates are chosen based on the largest country of origin for foreign-born women aged 15-49 in the 2000 census in the corresponding race/ethnic category. (The birth records only identify Canada, Cuba, and Mexico for foreign-born mothers, with those born in other countries classified as "remainder of world". This is not enough detail to be useful in identifying the place of birth of the mother by race and Hispanic status.) All of these rates are likely upper bound estimates on the emigration of native-born young children of foreign-born mothers, as Van Hook and coauthors report that overall women emigrate at a much lower rate than men (1.7 vs. 4.4%), and all the rates I use are for both genders combined. I only adjust the demographic estimate of native-born children born to foreign-born mothers; I still assume that emigration of native-born children born to native-born mothers is zero. Selected results from this emigration adjustment are shown in Table 1 . Overall U.S. results for the four racial/ethnic groups are shown, as well as the number of states with significant undercounts and the unadjusted and emigration adjusted undercounts for the five states with the highest unadjusted undercount rates in Five highest undercount states in each category shown for simplicity, excluding District of Columbia. All states plus District of Columbia shown in Appendix Tables 8, 9 , 10, and 11. ** indicates 95% confidence interval on undercount estimate does not include zero. Emigration rates are 2.7% for ages 25-34 and 4.3, 1.8, and 2.4% for those originating in Mexico, the Caribbean, and China, respectively. Mexican rate used for non-Black and Hispanic, Caribbean rate for Black, and Chinese rate for Non-Hispanic. These annual net emigration rates are from Van Hook et al. (2006) . Undercount accounting for these rates calculated by subtracting emigrants from total demographic estimate. Emigrants calculated using listed rates, compounded annually for each year of life for the fraction of births in each state to foreign-born mothers. For details see Appendix 4. States with significant undercount rates are those for which 95% confidence interval on the undercount estimate does not include zero and the undercount rate is positive. For Black, 2 states have significant overcount (negative undercount) rates using all three demographic estimates, and 1 state has a significant overcount adjusting for emigration using the age 25-34 rate. For non-Hispanic, 1 state has a significant overcount using both the age 25-34 and Chinese rates. Hispanic has the largest number of states with significant overcounts: 7 using the unadjusted estimates, and 11 and 15 using the age 25-34 and Mexican emigration rates, respectively Appendix Tables 8, 9 , 10, and 11. Also shown in Table 1 are the number of states with significant undercount rates before and after adjusting for emigration, as well as the fraction of births in each state to foreign-born mothers. Total U.S. undercount of native-born non-Black children is reduced by approximately 1.5 and 2.5 percentage points when emigration is accounted for using the age 25-34 rate and the Mexican rate, respectively. Of the 27 states with significant unadjusted undercount rates, 21 and 16 remain significant after adjustment using the two different rates, respectively. The reduction for Black children is more modest, from 6.45% to 5.67 and 5.93%, respectively, and all the states with significant unadjusted rates also have significant adjusted undercount rates. For non-Hispanic children, adjusting for emigration reduces undercount by around one percentage point using both rates. Hispanic children are the group with the largest reduction in undercount rates after adjusting for emigration. Adjusting for emigration using the age 25-34 rate cuts the overall national undercount rate by nearly half, and by nearly three-quarters using the Mexican rate. Similarly, the number of states with significant undercount rates falls from 13 to 9 and 6, respectively. In all categories, the states that see the largest reductions in undercount rates are those with the highest proportion of native-born births to foreign-born mothers. As the emigration rates from Van Hook et al. (2006) are not specifically for children aged 0-4, it is difficult to know whether they accurately estimate the emigration of native-born children. However, recent evidence from Masferrer et al. (2019) In sum, emigration of native-born children of foreign-born mothers may account for high undercount levels of Hispanic, non-Black, and non-Hispanic young children in some states, but rates in large population states with large fractions of births to foreign-born mothers remain high and significant. Emigration of the children of foreign-born Black mothers does not explain the high undercount in states with high foreign-born Black populations. In this section, I examine the association between state-level characteristics and the undercount of native-born young children aged 0-4 in the 2000 census. The results of this analysis are not intended to be interpreted as causal evidence of the reasons for the variation in undercount across states. Drawing such conclusions based on correlations without a source of exogenous variation and using relatively sparse data is incorrect at best. Instead, the intent is to provide insights into characteristics associated with the undercount of young children that can potentially provide a foundation for future research into the causes of undercount. While O'Hare (2014c) examined the correlation between state-level characteristics and the undercount of young children in the 2010 census, my work expands on his in several ways. First, I perform my analysis for each race and Hispanic group separately, while O'Hare only looks at total state undercount. As undercount patterns across states vary substantially by race and Hispanic status, examining the correlates with undercount separately for these groups may provide more insight into potential explanations for the undercount of young children. Second, I correlate state-level undercounts with mothers' characteristics from birth records, while O'Hare (2014c) uses full population characteristics drawn from 2010 census counts and the 2010 American Community Survey (ACS). Birth certificate information on mothers, while imperfect, is likely more reflective of the characteristics of individuals who will respond to the census on behalf of their children than full population census or ACS data. The mothers' characteristics also contain potentially important measures in explaining undercount unexamined by O'Hare, particularly the fraction foreign born. Third, I use OLS regressions to measure the correlation between undercount and the mothers' characteristics, which enables me to consider not only single-variable correlations, as O'Hare does, but also how multiple characteristics combine to explain patterns of undercount across states. These differences enable me to form a more complete, nuanced picture of how the undercount of young children varies by state than previous work. To measure the correlation between state-of-birth level undercount and state-level characteristics of mothers, I use the following specification: where U s,g is undercount for native-born children born in state s in racial/ethnic group g (non-Black, Black, non-Black non-Hispanic, and non-Black Hispanic), X s,g is a state-level characteristic of mothers of children born in state s belonging to group g, 19 and s,g is a standard error term. The vector X s,g can consist of up to five variables: the fraction of mothers who are foreign born, who have less than a high school degree, who are unmarried, and the fraction of both mothers and fathers who are Hispanic. 20 These five measures are constructed using the same birth certificate data used to form the demographic estimate of native-born children aged 0-4. As the information reported on birth certificates is limited, these are the only (2) U s,g = X s,g g + s,g characteristics I can examine from this source. The coefficient g measures the relationship between the characteristic(s) and undercount for racial/ethnic group g. All variables are measured as fractions. The main specification includes all 50 states and the District of Columbia in the sample, and is weighted by the total state population of group g in the 2000 census. Equation 2 is estimated using OLS for all four ethnic groups g separately. As an initial analysis, I estimate single variable versions of Eq. 2 for each of the four racial/ethnic groups. Results are shown in Table 2 . As all variables are entered as fractions, we can interpret the coefficient for non-Black in column 1 as an increase in foreign-born mothers of 10 percentage points is associated with an increase in undercount of 1.78 percentage points. 21 However, more informative than the coefficients is the R 2 of each regression, which measures the fraction of between-state variation in undercount "explained" by (i.e., associated with) that characteristic. For non-Black children, a very large portion of the between-state variation is correlated with the fraction of foreign-born mothers, mothers with less than a high school degree, and Hispanic mothers and fathers, R 2 s of 0.67, 0.71, 0.77, and 0.77, respectively. The fraction of mothers who are unmarried explains a smaller portion with an R 2 of 0.30. However, all of these variables are strongly correlated with one another, as evidenced by the correlation coefficients also shown in Table 2 . The fraction Hispanic mothers and fraction Hispanic fathers are extremely strongly correlated (with a of almost 1) and both of these are highly correlated with the fraction of mothers who are foreign-born ( = 0.91). These high correlations are unsurprising, as the Hispanic population makes up a large proportion of the foreign-born non-Black population. Additionally, higher rates of low education and non-husband/wife households have been shown in previous work to be associated with hard-to-enumerate populations (Bruce & Robinson, 2003) . The pattern is quite different for Black young children. The fraction of mothers who are foreign-born explains the highest proportion of the between-state variation in the undercount of young children for this group, although it is a much lower amount than for non-Black children ( R 2 = 0.29). The fractions of mothers and fathers who are Hispanic are both positively correlated with undercount, but the fraction of unmarried and low-educated mothers do not significantly explain any of the between-state variation in undercount for Black children. 22 For both non-Black non-Hispanic and non-Black Hispanic children, the fraction Hispanic mothers is most strongly correlated with state-of-birth-level undercount of young children, but in each case this variable measures a slightly different characteristic of the child. For non-Hispanic children, the fraction Hispanic mothers is a measure of the fraction of 21 I report standard errors for the coefficient estimates in Tables 2 and 3 as my undercount estimates use a sample of the full census to measure the population of young children by state of birth. Additionally, it is likely there is measurement error in the information reported on the birth certificates, such as mother's education and marital status. 22 The strong correlations between Hispanic parentage and foreign-born mothers are likely due to the Black population of Dominican descent, who are Hispanic. non-Black children with a non-Hispanic father and a Hispanic mother, as Hispanic status is determined using the father rule. (This is also why the fraction Hispanic fathers cannot be included in the Hispanic status regressions, as it is perfectly correlated with Hispanic status using the father rule.) For Hispanic children, this variable measures the fraction of children with two Hispanic parents. Therefore, states with higher fractions of children with a Hispanic mother and a non-Hispanic father tend to have higher undercounts of non-Black non-Hispanic children, and states with a higher proportion of children with two Hispanic parents tend to have higher undercounts of Hispanic young children. The fraction of foreign-born mothers is associated with higher undercount of non-Black non-Hispanic children, but not Hispanic children, while the reverse is true for mothers with less than a high school degree. The fraction of unmarried mothers is not correlated with state-level undercounts of non-Black Hispanic nor non-Hispanic children. Results of multivariate analyses are shown in Table 3 . As several of the independent variables are highly correlated, I report variance inflation factors (VIFs) and condition numbers for all specifications. Many specifications in Table 3 may suffer from large standard errors and unstable coefficient estimates due to multicollinearity issues, so again I focus my discussion on the R 2 of each regression, which is unaffected by multicollinearity. The R 2 s show how much of the total between-state variation in undercount is correlated with the variables included in the model. Adjusted R 2 s, which account for the increasing number of covariates in each specification, are also shown. For non-Black, including all 5 variables in the model explains over 80% of the between-state variation in the undercount of native-born young children, but the addition of the final 3 variables adds very little explanatory power once the fraction of mothers who are foreign-born and have less than a high school degree are controlled for. Indeed, the VIFs and condition numbers indicate strong multicollinearity when all variables are included. 23 In contrast, the 5 variables together only explain around 30% of the between-state variation in undercount for Black nativeborn children. Nearly all of this explanatory power is from the fraction of mothers who are foreign-born. For non-Black non-Hispanic children, all variables contribute to the nearly 60% of the explained between-state variation in undercount, although the fraction mothers with less than a high school degree, Hispanic mothers (children with Hispanic mothers and non-Hispanic fathers), and fraction foreign-born mothers are more important in terms of increasing R 2 than fraction unmarried mothers. The fraction of mothers of non-Black Hispanic children who are foreign born doesn't contribute much to the explanatory power of the model of undercount. The total R 2 of 0.64 is mostly due to the fraction of mothers who are Hispanic, have less than a high school degree, and are unmarried. The appendix contains several robustness analyses for the models in Table 3 . To see whether the results are driven by states with very high levels of undercount, the highest undercount states in each of the four groups are excluded from the models in 23 A generally accepted rule of thumb in the literature is a condition number greater than 30 and/or a VIF greater than 10 indicates strong, likely problematic multicollinearity. However, multicollinearity can cause issues even if the VIFs and/or condition numbers are below these thresholds. (Allison, 1999; Kennedy, 2008) Appendix Table 12 . Results are nearly identical, as are those using only states with statistically significant undercount levels (Appendix Table 13 ). Appendix Table 14 uses the emigration-adjusted undercounts as the dependent variable. As these undercounts are adjusted using the fraction of mothers who are foreign born, the fraction of between-state variation explained by this characteristic falls substantially for the non-Black, non-Black non-Hispanic, and non-Black Hispanic populations. However, the reduction is much smaller for Black children, and the fraction of mothers who are foreign-born still contributes more to the total explanatory power of the model than the other four variables. The R 2 s for the full models containing all variables are very similar to those in Table 3 . Results using state-level measures of foreign-born, Hispanic, and low-educated women aged 15-49 constructed from 2000 census data, shown in Appendix Table 15 , are nearly identical to those using birth certificate data. Results using undercounts using Hispanic status assigned using the middle rule as opposed to the father rule are also very similar (Appendix Table 16 ). This paper is the first to calculate census undercount by state of birth, race, and Hispanic status for young children aged 0-4-a population known to suffer from high undercount but easy to measure using administrative records-to gain insight into geographic patterns of undercount by these characteristics. States show a wide range of undercount levels. Non-Black undercount is highest in states in the south and southwest with high Hispanic populations. Undercount for Black native-born children is highest in northeastern states and Florida, states that have a high foreignborn Black population. Non-Black non-Hispanic children have the highest undercount rates among those born in Hawaii, Arizona, and Florida, while the highest rates for non-Black Hispanic children are in the northeastern states of Rhode Island, Connecticut, and New York. Several states, such as California and Texas, have high undercount rates of both Hispanic and non-Hispanic non-Black children. Adjusting for the potential emigration of children of foreign-born mothers may account for the high undercount rates of non-Black, non-Black Hispanic, and non-Black non-Hispanic children born in some states, but does not explain the high undercount of native-born Black children born in states with a high fraction of foreign-born Black mothers. The strong association between foreign-born mothers and the undercount of Black native-born children is confirmed using OLS regressions of state-of-birth level undercount rates on mothers' characteristics from birth certificates. The fraction of births to foreign-born mothers has the highest correlation with state-of-birth level undercount of the characteristics examined, although it is highly correlated with fraction of mothers of Black children who are Hispanic (likely due to the large Hispanic Black population from the Caribbean). Fraction of mothers who are Hispanic is also highly correlated with state-of-birth undercounts of non-Black, non-Black Hispanic, and non-Black non-Hispanic children, but fraction foreign-born is not for non-Hispanic and Hispanic children. The fraction Hispanic mothers variable for these two groups actually measures the fraction of mothers of non-Hispanic children and Hispanic children who are Hispanic, as Hispanic status is defined using the father rule. However, using an alternative assignment rule that takes the mother's Hispanic status into account does not affect the results. The results for non-Hispanic children may indicate a disconnect between these assignment rules and the classification of children's Hispanic status by their parents in the census. Those for Hispanic children may indicate a higher likelihood of undercount if both parents are Hispanic. These correlations should not be considered causal evidence of the underlying reasons for undercount. The relatively sparse information contained on birth certificates combined with no source of exogenous variation makes it highly likely that other, unobserved factors are behind the observed undercount patterns, and the statelevel correlations may or may not reflect those at a more local (i.e., city or county) level. Despite these caveats, my results highlight the importance of focusing on the foreign-born population to mitigate coverage issues in the census or other surveys. While much attention has deservedly been paid to increasing the census response of foreign-born non-Black Hispanic individuals, my findings show this should not be done at the expense of other foreign-born populations-such as the foreign-born Black population-as they are also likely to suffer from high levels of undercount. Furthermore, my findings show that to address the undercount of young children, we need to focus on parents, particularly mothers. Determining which mechanism is behind child undercount-missing the child only, but counting the parents, or not counting the entire family-is a priority, as remedies would likely take different forms depending on the underlying cause. More research is needed, perhaps using administrative records like birth certificates matched to individual census responses, to uncover the true causes of undercount and ensure the census counts all residents of the United States. See Tables 4, 5 , 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, and 16 . (1) if both parents are of same Hispanic status, assign that status to child, (2) if father is Hispanic and mother non-Hispanic, assign 69.8% of these births as Hispanic and the remainder non-Hispanic, (3) if father is non-Hispanic and mother Hispanic,assign 61.4% of these births as Hispanic and the remainder non-Hispanic. If father's Hispanic status missing, assign a status for the father using fraction Hispanic by birth cohort, state of birth, and Hispanic status of mother, followed by applying rules (2) or (3) above. For births missing both mother's and father's Hispanic status, Hispanic status of the child assigned using fraction Hispanic by birth cohort and state of birth (2006) . Undercount accounting for these rates calculated by subtracting emigrants from total demographic estimate. Emigrants calculated using listed rates, compounded annually for each year of life for the fraction of births in each state to foreign-born mothers. For details see Appendix 4 Table 13 Correlates with state of birth census undercount by race and Hispanic status, including only states with statistically significant undercount, children aged 0-4, (1) (2) (3) brackets. Estimated using OLS weighted by state population of relevant race/ethic category. "LTHS" denotes "less than a high school degree." Dependent variable is 2000 undercount by state of birth (expressed as a fraction, not a percent) adjusted for emigration using either the age 25-34 emigration rate (columns 1-5) or the country of origin emigration rate (columns 6-10). This rate is the Mexican rate for non-Black and Hispanic, the Caribbean rate for Black, and the Chinese rate for non-Hispanic. For more details see text and Appendix 4. Independent variables calculated from birth certificate data, separately for each race/ethnic category. Race and ethnic status determined using father rule. Cannot include fraction births to Hispanic fathers in Hispanic regressions as father's Hispanic status perfectly determines the child's Hispanic status using the father rule. Condition numbers calculated including the constant term 2000 census in the corresponding race/ethnic category. (The birth records do not provide enough detail on the country of origin of the mother to be useful.) (b) Assume in each year of life a child is exposed to one of the above rates, and use the following compounding formula to estimate the demographic estimate of native-born children born to foreign-born mothers adjusted for emigration: where DE FB,g,i is the demographic estimate of native-born young children for group g (defined by race/ethnic group) and single year of age i, and rate g is the chosen annual emigration rate. (3) Add the demographic estimate of native-born children born to native-born mothers to the emigration-adjusted demographic estimate of native-born children born to foreign-born mothers to get the emigration-adjusted demographic estimate of native-born children (4) Calculate emigration-adjusted undercount rates using these demographic estimates and the corresponding census estimates using Eq. 1. Multiple regression: A primer. Pine Forge Press series in research methods and statistics The seasonality of live birth is strongly influenced by sociodemographic factors Tract-level planning database with census 2000 census data Evaluating components of international migration: The residual foreign born The development and sensitivity analysis of the 2010 demographic analysis estimates The use of vital statistics in the 2010 demographic analysis estimates Evaluating components of international migration: Native emigrants The Hispanic population New estimates of census coverage in the United States, 1850-1930 A review of methods for estimating emigration A guide to econometrics − rate g ) i+1 America's churning races: Race and ethnicity response changes between census 2000 and the 2010 census Immigrants in their parental homeland: Half a million U.S.-born minors settle throughout Mexico Evaluating components of international migration: Foreign-born emigrants Investigating the 2010 undercount of young childrensummary of recent research. Report from the task force on the undercount of young children Assessing net coverage for young children in the 2010 U.S. decennial census Historical examination of net coverage error for children in the U.S. decennial census: 1950 to 2010 State-level 2010 census coverage rates for young children The net undercount of children in the 2010 U.S. decennial census An international perspective on the undercount of young children in the U.S. census The invisible ones: How Latino children are left out of our nation's census count Demographic analysis: A report on its utility for adjusting the 1990 census. Working paper Counting for Dollars 2020: The Role of the Decennial Cernsus in the Geographic Distribution of Federal Funds Coverage of population in census 2000 based on demographic analysis: The history behind the numbers Estimation of population coverage in the 1990 united states census based on demographic analysis Coverage of the population in Census 2000: Results from demographic analysis Seasonality in human reproduction: An update IPUMS USA: Version 10 Race of children in population estimates. Presented at the Annual Meeting of the Southern Demographic Association Who's Missing? Undercounting and Underreporting Seasonality of coitus and seasonality of birth 2000 census of population and housing, public use microdata sample Foreign-born emigration: A new approach and estimates based on matched CPS files I use death data from the NCHS Vital Statistics Multiple Cause-of-Death Mortality Data. The data contain individual records for all deaths occurring in the United States in each calendar year, and include date of death, place of death, state of birth, age, residence, sex, race, and cause of death. I use all deaths occurring between April 1, 1995 and March 31, 2000 of children in my birth cohorts of interest. As the NCHS Vital Statistics Multiple Cause-of-Death Mortality Data do not contain month of birth, and only month of death and age, a challenge arises in assigning decedents to specific birth cohorts. For example, a child who died at age 2 in June 1999 could either have been born in census year 1996 (and her birthday would fall after her date of death) or in census year 1997 (and her birthday would fall before her date of death). I make several assumptions to assign deaths to specific ages. For deaths of those aged 1-4, I assume the likelihood of birth and death to be independent and uniformly distributed throughout the year, and each month is of equal length. 24 Under these assumptions, I assign the child discussed above a 5/24 probability of being born in census year 1997 (the probability of her being born in April or May 1997 plus the probability of being born in June 1997 and dying after her birthday), and a probability of 19/24 that she was born in census year 1996 (the probability of her birthday falling in July 1996 through March 1997 plus the probability of being born in June 1996 and dying prior to her birthday). I follow the same procedure to assign deaths to each possible cohort by month of death.The process is simpler for children who die at less than 1 year of age, as age in months is provided for these decedents. For example, a child who dies in June 1999 at less than a year of age is either in the 1998 census cohort if they are 3 or more months of age (born in July 1998 -March 1999 , or in the 1998 cohort if they are 1 month of age or younger (born in May or June 1999). For those who die at 2 months old, I again assume uniformity and independence of date of birth and death, and therefore assign a probability of 0.5 to belonging to the 1998 or 1999 cohort. independent variables calculated from birth certificate data, separately for each race/ethnic category. Race determined using father rule, Hispanic status using middle rule. Middle rule described in notes to Appendix Table 6 . Condition numbers calculated including the constant term The assignment of Hispanic status to births and deaths is complicated by the presence of missing Hispanic information, the extent of which varies substantially by state of birth. Appendix Table 4 shows the fraction of birth and death records missing Hispanic status for non-Black births in my sample. Overall, 1.23% of births and 1.70% of deaths to those aged 0-4 at the time of the 2000 census are missing Hispanic status information. Births missing Hispanic status are missing both the mother's and father's Hispanic status. Overall, 11.2 and 1.5% of birth records are missing the father's and mother's Hispanic status, respectively. For births, Rhode Island and New York are missing it on the highest percentage of records: 10.86 and 8.18%, respectively. Two other states are missing Hispanic status on around 5% of birth records, and a further 8 are missing it on between 1 and 5% of records. The remaining 39 states have missing rates of less than 1%. Oklahoma, which did not report Hispanic status on the death certificate at all until 1997, is missing it on over 30% of deaths in my sample, followed by Rhode Island at 23%. New York and North Dakota are next, missing Hispanic status on 7.1 and 5.4% of death records of young children, respectively. However, like birth records, the vast majority of states miss Hispanic status on less than 1% of death records in my sample.To address the missing Hispanic status problem, I assume that Hispanic status of the child is missing at random, conditional on state and year of birth. For example, if birth records with non-missing Hispanic status born in 1997 in a state are 21% Hispanic, I assign 21% of the children with missing Hispanic status born in the same year and state as Hispanic. I use the same process for death records. This assumption implicitly rules out any correlation between Hispanic status and the likelihood a record is missing Hispanic status, and unfortunately there is no way to test this assumption. However, as the vast majority of states are missing Hispanic status on a very small percentage of birth and death records, violations of this assumption should not make much difference in overall undercount results for these states. However, a few states, like Oklahoma, New York, and Rhode Island, are missing Hispanic status on a high enough percentage of birth and death records that the undercount results by Hispanic status for these states should be viewed with caution.I use the father rule to assign Hispanic status to births. The father rule for Hispanic status works the same way it did for assigning race: assign the child the Hispanic status of the father, and if the father's Hispanic status is missing, assign that of the mother. Those births missing both mother's and father's Hispanic status are assigned Hispanic or non-Hispanic using the fraction Hispanic by year and state of birth. However, when the Census Bureau began reporting undercount by Hispanic status in 2010, they did not use the father rule to assign Hispanic status to births. Instead, they used the 2000 Kid Link file to estimate how parents reported the Hispanic status of their children when one parent was Hispanic and the other non-Hispanic. According to their calculations, 61.4% of children were reported as Hispanic when the mother was Hispanic and the father non-Hispanic, and 69.8% when the father was Hispanic and the mother non-Hispanic (Devine et al., 2012) . As an alternative to the father rule, I use these proportions to assign Hispanic status to children of one Hispanic and one non-Hispanic parent. Children of two Hispanic or two non-Hispanic parents were assigned the Hispanic status of both parents. Those records missing the Hispanic status of both parents are assigned Hispanic or non-Hispanic using the same procedure as for the father rule, using the fraction Hispanic by year and state of birth. This appendix describes the process for adjusting undercount for emigration. The adjustment process assumes native-born young children of native-born mothers do not emigrate, and native-born children of foreign-born mothers emigrate at the same rate as their mothers.(1) Calculate demographic estimates of native-born children born to foreign-born mothers and native-born children born to native-born mothers, by race/ethnic status (Black, non-Black, non-Black non-Hispanic, non-Black Hispanic), single year of age, and state of birth.(a) Calculate demographic estimate of all native-born children by single year of age, race/ethnic status, and state of birth (b) Calculate fraction of native-born children born to foreign-born mothers by single year of age, race/ethnic status, and state of birth using birth records (c) Multiply the above to get the demographic estimate of native-born children born to foreign-born mothers by single year of age, race/ethnic status, and state of birth. Implicitly assume that the death rate is the same for children of foreign-born and native-born mothers within each state of birth/raceethnic/age cell, as cannot distinguish deaths by mother's birthplace. (d) Subtract this number from the original demographic estimate to calculate the demographic estimate of native-born children born to native-born mothers.(2) Adjust demographic estimate of native-born children born to foreign-born mothers for emigration (a) Assume native-born children born to foreign-born mothers emigrate at one of two possible two annual rates:(ii) Age 25-34 rate (2.7%)-this assumes that children emigrate with their mothers, who emigrate at this rate (iii) A rate based on country of origin of the mother: for non-Black and Hispanic, Mexico (4.3%), for non-Black, China (2.4%), and for Black, the Caribbean (1.8%). These rates are chosen based on the largest country of origin for foreign-born women aged 15-49 in the