key: cord-0485355-rk9rd01h authors: Biemer, Paul; Salvo, Joseph; Auerbach, Jonathan title: The Quality of the 2020 Census: An Independent Assessment of Census Bureau Activities Critical to Data Quality date: 2021-10-05 journal: nan DOI: nan sha: 78d5cc25e8d4b181e550369d6dc8015bd4dfc755 doc_id: 485355 cord_uid: rk9rd01h This report summarizes major findings from an independent evaluation of 2020 census operations. The American Statistical Association 2020 Census Quality Indicators Task Force selected the authors to conduct the evaluation using nonpublic operations data provided by the Census Bureau. The evaluation focused on the quality of state-level population counts released by the Census Bureau for congressional apportionment. The authors first partitioned the census enumeration process into five operation phases. Within each phase, one or more activities considered to be critical to census data quality were identified. Operational data from each activity were then analyzed to assess the risk of error, particularly as they related to similar activities in the 2010 census. Overall, the evaluation found that census operations relied on higher risk activities at a higher rate in 2020 than in 2010, suggesting that the risk of error may be higher in 2020 than in 2010. However, the available data were insufficient to determine whether the apportionment counts are of lower quality in 2020 than in 2010. The challenges facing the 2020 census were unprecedented. The coronavirus outbreak hit just as the Census Bureau began its mail-out procedures. The census schedule, encompassing hundreds of interconnected activities, changed repeatedly as conditions worsened. Activities were modified or canceled in real time to accommodate schedule changes, only to have the schedule revised again. Wildfires in the West and weather events in the South further impeded operations. Because of these and other challenges, many data users voiced concerns about potential coverage errors. Such errors jeopardize the fair division of representation and resources, the basis of U.S. government outlined in the Constitution. These concerns prompted the American Statistical Association (ASA) to convene the 2020 Census Quality Indicators Task Force, whose deliberations culminated in an October 2020 report, Census Quality Indicators: A Report from the American Statistical Association. The report recommended that the Census Bureau grant external census experts access to operational data so they can independently address the concerns of data users. The Census Bureau agreed, and the Task Force selected Paul Biemer, Joseph Salvo, and Jonathan Auerbach to conduct an evaluation. This document summarizes the findings of this independent evaluation of the quality of the state-level population counts released for congressional apportionment. It examines 10 process statistics (PSs) largely based on the quality indicators recommended in the Task Force report. Each PS reflects an activity or group of activities in the Bureau's schedule that may have been affected by the pandemic, other unprecedented events, or otherwise present some appreciable risk of producing errors in the 2020 census count as determined by the authors. This document does not consider the quality of characteristics, such as sex, race, or ethnicity, or the quality of the population count at smaller geographies, such as blocks, tracts, or counties. Further, because of the limitations of the data currently available, it does not attempt to quantify the impact of coverage errors on congressional apportionment. These findings contribute to the ongoing discussion about the quality of the 2020 census. Data users may also wish to review the growing body of literature on 2020 census quality to supplement the information in this report. For example, the Census Bureau engaged the JASON and the National Academies' Committee on National Statistics to assess 2020 census quality. The Bureau also released its own preliminary evaluation in April 2020, and it will soon release the Post Enumeration Survey, Demographic Analysis, and the 2020 Census Program for Evaluations, Assessments, and Experiments. This analysis partitions the census enumeration process into five operational phases and identifies major activities within each phase that are critical to 2020 census data quality. For each critical activity, one or more PSs are defined at the state level to assess the performance of the activity as it may affect the apportionment counts. All PSs defined in this report are proportions of cases (i.e., housing units [HUs] , persons or addresses) affected by the operation within its particular universe or purview. For example, one critical activity considered is the imputation of HU status. The PS defined for this activity is the proportion of addresses in a state for which HU status was imputed. Many activities associated with the census are interrelated, including the critical activities identified in this analysis. For example, administrative records are sometimes used in place of proxy respondents. In addition, field verification and quality assurance checks are conducted continually throughout the enumeration and are designed to reduce the risk of coverage error. No assessment of the effectiveness of these checks is available and will therefore not be considered in this analysis. A total of 10 critical activities and associated PSs are defined for the 2020 census. Six of these compare the same critical activity from the 2020 and 2010 censuses. Thus, these PSs reflect the change in the frequency a critical activity was employed in 2020 relative to 2010. In four cases, the critical activity either did not exist in 2010 or its comparability to 2020 could not be established. Thus, these PSs only reflect the critical activity employed in 2020. Assuming the PS is an indicator of coverage error risk from the critical activity, the variation in the PSs across states reflects the variation in error risks from that activity. Thus, for PSs that are differences between 2020 and 2010, positive values imply greater risk from the activity in 2020 than in 2010 while negative values imply lesser risk in 2020. Similarly, for PSs defined for 2020 only, larger values for a state imply larger risks from the critical activity compared to states with smaller values. The analysis in this report finds substantial increases in the frequency of several critical activities believed to be associated with higher error risk by the authors. Main findings can be divided into three groups: (1) Increase in critical activities from 2010 to 2020 that were expected, and Census Bureau procedures mitigated some of the error risk; (2) Increase in critical activities from 2010 to 2020 that were not fully expected, and additional procedures may not have mitigated error risk; and (3) Critical activities whose use in 2020 varied considerably across states. 1. Increase in critical activities from 2010 to 2020 that were expected, and Census Bureau procedures mitigated some of the error risk. Households for which the Census Bureau obtained multiple responses-either by enumeration or other census activity-increased by nearly 20 percentage points in 2020 compared to 2010. Part of the increase in multiple responses is the result of allowing households to respond over the Internet, which was a new v vi response option in 2020. The ease of responding by Internet likely increased the number of responses per household. Anticipating this increase and its associated error risks, the Bureau established a number of auxiliary error mitigation procedures. For example, the Bureau prepared a quality control approach referred to as the Primary Selection Algorithm to select the best response from the multiple responses, which at least partially mitigated the risk of overcounting. The use of administrative records was a new critical activity in 2020. Administrative records were used to enumerate up to 5% of households in some states. An error risk associated with its use is that records may be outdated or otherwise inaccurate, which may translate into coverage errors. This risk was mitigated to some extent through the use of multiple sources of administrative data rather than a single source. 2. Increase in critical activities from 2010 to 2020 that were not fully expected, and additional procedures may not have mitigated error risk. The frequency that college students were initially enumerated in wrong locations also increased in 2020 compared to 2010. Although the Bureau has corrective measures in place to mitigate these errors, it could not have anticipated the pandemic and its effect on school closings, which was to displace greater numbers of college students away from their usual residences. Although the Bureau had established procedures for dealing with persons not counted at the usual place of residence, the increased number of displacements increased the potential for error from those procedures. 3. Critical activities whose use in 2020 varied considerably across states. The imputation of persons in Group Quarters (GQs) was a new critical activity in 2020. The percent of all persons in GQs who were imputed varied more than 10 percentage points across states. Another main finding is that about 11% of addresses were either added to or deleted from the Master Address File (MAF) during the 2020 census. This percentage varied across states by more than 20 points. Because of changes to the MAF development process implemented in 2020 (described subsequently), the volume of MAF revisions likely exceeds those made to the MAF in 2010; however, there are no data available to verify this. As recommended by the ASA Task Force report, the findings in this report are all based on PSs calculated from operational data and are subject to a number of limitations. First, the PS reflects the frequency of an activity that could result in an error and not actual errors. As an example, the Bureau attempts to identify and reassign all college students initially counted in the wrong place to their proper locations. Although there is an error risk associated with this activity, the extent of the error is unknown. Without that information, it is impossible to draw conclusions about the error contributed by the activity. Second, errors introduced by one activity could be either mitigated or exacerbated by other activities. In that regard, the PSs may overstate or understate the actual error risk for an activity. Third, the selection of critical activities for analysis is a subjective process and based on the knowledge and experience of the authors. It is conceivable that another team of researchers could select other PSs that would lead to different determinations about 2020 census quality. Nevertheless, the analysis presented in this report is an essential first step towards better understanding 2020 Census data quality. It provides important context and insights that will facilitate the interpretation of the 2020 Census Post-Enumeration Survey estimates of undercounts and overcounts that will be released in early 2022. The strongest statement that can be made based on this analysis is that most of the critical activities considered in this report were exercised more in 2020 than in 2010. To the extent these activities reflect coverage error risk, it follows that the risk of coverage errors also increased in 2020 for these activities. However, given the data at hand, this review did not find conclusive evidence that state-level counts used for apportionment purposes are of lower quality in 2020 than in 2010. Nor is there evidence that the apportionment count for any given state is in error. A more conclusive assessment of 2020 census quality would be achieved by combining the PSs described in this analysis with the results of other methodologies, such as the estimate of undercounts and overcounts from the Census Bureau's Post-Enumeration Survey and the results of the Demographic Analysis. • Process statistics are largely based on the American Statistical Association 2020 Census Quality Indicators Task Force report, modified to reflect Census Bureau limitations and the best judgments of the authors. • Process statistics reflect the reliance on activities that are believed to increase the risk of an error, not the rate of error. Additional actions taken by the Census Bureau may have mitigated the error risks to an extent that cannot be determined from the data available. • Both the American Statistical Association 2020 Census Quality Indicators Task Force and the Census Bureau reviewed an earlier version of this report. This version of the report incorporates changes made as a result of those reviews, correcting factual errors and adding additional clarifications. All changes are made at the discretion of the authors. 1 2 This analysis partitions the census enumeration process into five phases: 1. Master Address File (MAF) Development -update the initial list of residential addresses used by the Census Bureau to contact households and elicit their response to the census. 2. Self-Response -encourage a household resident to complete the questionnaire by mail, telephone, or internet. 3. Nonresponse Follow-up (NRFU) -a resident of each household that did not self-respond is contacted by a census enumerator who assists in completing the questionnaire. statistical methods to fill missing information and resolve multiple responses from the same address. Quarters -enumerate persons in group facilities such as student housing, nursing homes, and military barracks. The quality of data from the enumeration process is investigated in three steps. First, an ideal scenario is envisioned for each phase. For example, for MAF Development: Ideally, the MAF would be completely accurate and not require revision. For Self-Response: Ideally, one and only one resident from each household would complete a census questionnaire, etc. Second, Census Bureau activities that deviate from the ideal scenario are identified. For example, for MAF Development: revising an address on the MAF. For Self-Response: collecting multiple responses for the same household, etc. Finally, ten process statistics (PSs) are created for all states, DC, and the United States overall, which measure the relative frequency that an activity was exercised, or the opportunity for error from the activity. A higher PS value for a state indicates that a greater use of the activity increased the risk of error in a state's count (see Disclaimers). As an example, PS 3 in Table 1 measures how often the Census Bureau received more than one response for a household in the 2020 census relative to the 2010 census. Multiple responses have the potential for introducing error because the Bureau must decide which response to count. An error in this decision process could increase coverage error. Thus, greater numbers of HUs with multiple responses imply a greater error risk from this activity. This analysis converts each PS to a number between 1 and 5 (quintile) where 1 denotes that it ranks in the lower 20% of all states, 2 denotes that it ranks between 20% and 40% of all states, and so on. The higher a state's rank, the higher the risk from that activity. A summary statistic is created by taking the weighted average of the quintile ranks for the 10 PSs, where the weights are proportional to the number of cases within the universe or purview of the activity. This summary process statistic (SPS) ranges from 1 to 5 and is regarded as an average error risk from all 10 critical activities. This section provides an overview of each Process Statistic (PS). It begins by reviewing the five phases of census operations. It then explains how the statistics relate to these phases. This is followed by a profile of each statistic including a combined statistic referred to as the summary process statistic (SPS Each person below represents one five-hundredth of the occupied housing units counted in the 2020 census (roughly 250,000 households), colored by the phase in which they were counted (see census phases on facing page). MAF not applicable and GQs excluded. How is this process statistic calculated? Percent of HUs submitting questionnaires without census IDs and no matching address was found on the MAF for 2020. How is this process statistic interpreted? Ideally, every response includes a census ID, linking it to the Census Bureau address list. Some responses lack IDs, and the Bureau must carefully examine these responses to ensure that they come from valid addresses. In particular, the Bureau checks if the address is in the MAF. But nonstandard addresses (e.g., those without a number or street), may not be in the MAF. This statistic reflects quality because responses without ID that are not in the MAF are at higher risk of being incorrectly included or excluded in the census count. • Nationwide, nearly 9% of households without census IDs had no match in the MAF. • The percentages varied widely by state, from 4.3% (MD) to 21.6% (AK). • Nonstandard addresses in urban and rural areas are a contributing factor. • Does an increase in non-ID returns decrease count accuracy? • Did encouraging non-ID responses also encourage the submission of addresses that could not be matched to the MAF? • Did residence issues related to the pandemic affect the evaluation of nonmatches in the field? *States are sorted and colored by the risk of error implied by the process statistic-from lowest risk (rank 1, lightest color) to highest risk (rank 51, darkest color). The bars facilitate comparisons between states, having 0 length for the lowest risk state and filling the entire cell for the highest. How is this process statistic calculated? Percent of occupied HUs with two or more responses from various sources for 2020 minus the corresponding percentage for 2010. Ideally, one member of every household fills out one and only questionnaire. But inevitably some households return multiple questionnaires. Other operations also produce multiple returns. This statistic reflects quality because the Census Bureau must correctly identify and eliminate multiple returns and that process, especially for duplicate questionnaires, can increase error risk. Little is known, however, about the overall accuracy of deduplication methods. • All states had double-digit increases in the percentage of multiple responses in 2020. Nationwide, the percentage was 18%. • The increase in the percentage of multiple response households ranged from 15% (ME) to 20% (NM). How is this process statistic calculated? Percent of occupied HUs with two or more people where one or more occupant indicated their usual residence was at college for 2020 minus the corresponding percentage for 2010. How is this process statistic interpreted? Ideally, students in college residences as of April 1, 2020, are counted as living in college residences. But because of the pandemic, many students moved and instead responded as living in a household away from college. This PS reflects quality because the Census Bureau must decide whether to reassign these students, increasing the risk of error. Little is known, however, about the overall accuracy of reassignment. • The percent of households with a college student reporting a Usual Residence at College (URC) rose modestly between 2010 and 2020, roughly half a percent nationwide. • The increase in households with a college student reporting a URC ranged from -0.11% (DC) to 0.89% (VA). • Does the reassignment of URC residents increase coverage error? • How often does the Bureau fail to correctly reassign URCs? • How much does a failure to correctly reassign URCs contribute to coverage error? *States are sorted and colored by the risk of error implied by the process statistic-from lowest risk (rank 1, lightest color) to highest risk (rank 51, darkest color). The bars facilitate comparisons between states, having 0 length for the lowest risk state and filling the entire cell for the highest. Percent of persons in occupied HUs whose count was obtained by proxy interview for 2020 Process Statistic 5: Percent of persons in occupied HUs whose count was obtained by proxy interview for 2020 minus the corresponding percentage for 2010* How is this process statistic calculated? Percent of persons in occupied HUs whose count was obtained by proxy interview (during NRFU) for 2020 minus the corresponding percentage for 2010. Ideally, the Census Bureau interviews a member of every household that did not self-respond. But inevitably some households cannot be reached and enumerators must interview proxies, such as a neighbor or building manager. This PS reflects quality because research from prior censuses has demonstrated that proxy interviews are more likely to contain errors. • Nationwide, the percentage of proxies declined 0.35% in 2020 relative to 2010. The greatest declines were in WV, LA, and MS. • Four states-KS, VT, UT, and RI-saw increases in the percentage of proxies by 0.5% or more. RI increased by 1.7%. • How complete were proxy responses? • Did the use of administrative records lead to a reduction in the use of proxies in the 2020 census data collection? *States are sorted and colored by the risk of error implied by the process statistic-from lowest risk (rank 1, lightest color) to highest risk (rank 51, darkest color). The bars facilitate comparisons between states, having 0 length for the lowest risk state and filling the entire cell for the highest. How is this process statistic calculated? Percent of occupied HUs where only a population count was obtained for 2020 minus the corresponding percentage for 2010. Ideally, every response contains complete information on all residents. But inevitably some responses are population count only, meaning only the number of persons in each household is known. These responses can arise from interviews with apprehensive household members, which may have been more common during the pandemic. This PS reflects quality because incomplete responses are more likely to have incorrect information. • Nationwide, the percent of population count-only households increased roughly half a percentage point between 2010 and 2020. • The change in percentages varied widely by state, from increasing 1.6% (NY) to declining 1.3% (HI). • The increase in 13 states exceeded 1%, while 3 states declined more than 1%. • How is the timestamp of an enumeration related to the frequency of population count-only households? • How much did proxy interviews and administrative records contribute to the number of population count-only households? *States are sorted and colored by the risk of error implied by the process statistic-from lowest risk (rank 1, lightest color) to highest risk (rank 51, darkest color). The bars facilitate comparisons between states, having 0 length for the lowest risk state and filling the entire cell for the highest. How is this process statistic calculated? Percent of occupied HUs enumerated by administrative records for 2020. Ideally, the Census Bureau interviews a member of every household that did not self-respond. But inevitably some households cannot be reached. Administrative records, new for 2020, can provide the missing information-both housing status (occupied, vacant, etc.) and characteristics (race, gender, etc.). This PS reflects quality because administrative records may be outdated, inaccurate, or incomplete, increasing the risk of error. • Nationwide, the percentage of households enumerated with administrative records was 3.8%. • Percentages varied widely by state, from 1.7% (HI) to more than 5% (RI and LA). • Administrative records may have reduced reliance on less accurate enumeration methods, such as unknowledgeable proxies, and imputation. • Did using administrative records increase coverage error? • How does the error rate compare with alternatives such as using proxies and statistical imputation? *States are sorted and colored by the risk of error implied by the process statistic-from lowest risk (rank 1, lightest color) to highest risk (rank 51, darkest color). The bars facilitate comparisons between states, having 0 length for the lowest risk state and filling the entire cell for the highest. Percent of MAF units whose status was imputed for 2020 Process How is this process statistic calculated? Percent of MAF units whose status was imputed for 2020 minus the corresponding percentage for 2010. Ideally, the Census Bureau can determine whether a HU is occupied after several visits or with administrative records. But inevitably, these procedures may be inconclusive. The Bureau may then use information on neighboring HUs to predict the status, a process known as imputation. This PS measures quality because imputations may incorrectly classify vacant or nonexistent units as occupied or occupied units as vacant, increasing the risk of error. • All states experienced an increase in the percentage of HUs whose status was imputed. Nationwide, the increase was roughly three-quarters of a percentage point. • In five states-LA, NY, MA, RI, and HI-the percentages increased by one point or more. • The pandemic may have decreased the accuracy of imputations. • Do status imputations increase coverage errors? • How much did the pandemic affect status imputations? *States are sorted and colored by the risk of error implied by the process statistic-from lowest risk (rank 1, lightest color) to highest risk (rank 51, darkest color). The bars facilitate comparisons between states, having 0 length for the lowest risk state and filling the entire cell for the highest. Percent of occupied HUs with known status but whose population count was imputed for 2020 Process Statistic 9: Percent of occupied HUs with known status but whose population count was imputed for 2020 minus the corresponding percentage for 2010* How is this process statistic calculated? Percent of occupied HUs with known status but whose population count was imputed for 2020 minus the corresponding percentage for 2010. Ideally, the Census Bureau can determine how many people reside in a HU after several visits or with administrative records. But inevitably, these procedures may prove inconclusive. The Bureau may then use information on neighboring HUs, a process known as imputation. This PS reflects quality because imputations may underestimate or overestimate the number of residents, increasing the risk of error. • Nationwide, the percent of imputed population counts declined slightly between 2010 and 2020. • Levels stayed the same or declined in all but a handful of states. In those states, the increases were relatively small. This suggests that one or more census operations precluded the need for count imputation. • How are count imputation and coverage error related? • What occurred in the census process, and especially in field operations, that held the level of count imputation in check, despite the pandemic? *States are sorted and colored by the risk of error implied by the process statistic-from lowest risk (rank 1, lightest color) to highest risk (rank 51, darkest color). The bars facilitate comparisons between states, having 0 length for the lowest risk state and filling the entire cell for the highest. How is this process statistic calculated? Percent of the GQs population that was imputed in 2020. Ideally, administrators of GQs facilities submit an accurate population count. But the pandemic greatly complicated efforts to count GQs, particularly skilled nursing facilities and college residences. The Census Bureau used statistical methods to predict the population when population counts were not available, a process known as imputation. This PS reflects quality because imputations may underestimate or overestimate the population of GQs, increasing the risk of error. • Nationwide, roughly 2% of the Total GQs population was imputed in 2020. • The percentages varied widely by state, from 0.13% (NH) to more than 11% (DE). DE and MS were unusually high relative to other states. • The GQs population is small overall, but some states had fairly high levels of count imputation. • Did GQ imputation increase coverage error? • How did the level of imputation vary by GQs type, given the unequal impact of the pandemic? *States are sorted and colored by the risk of error implied by the process statistic-from lowest risk (rank 1, lightest color) to highest risk (rank 51, darkest color). The bars facilitate comparisons between states, having 0 length for the lowest risk state and filling the entire cell for the highest. How is the summary process statistic calculated? For each PS, states are assigned a number between 1 and 5 (quintile) reflecting risk of error: 1 denotes very low risk, 2 low, 3 medium, 4 high, 5 very high. The SPS is the weighted average of these numbers. Weights are chosen to reflect the proportion of the count at risk. How is the summary process statistic interpreted? The SPS is an overall measure of the error risk from all 10 PSs. A state at higher risk across all statistics will have a higher SPS value. For example, an SPS value of 5 means all 10 PSs are at the highest risk level and a value of 1 means they are all at the lowest risk level. • Five highest risk states are AK, NJ, UT, NY, and TX. • Eight lowest risk states have an average SPS less than 2. • Do states with a higher SPS also have higher levels of coverage error? • What is the best way to summarize the risk that a state's population count will be significantly higher or lower than the true count? • What are the characteristics of substate areas with higher values for the SPS? *States are listed alphabetically following the U.S. The width of the second column represents the SPS of each state. Each color represents the relative *contribution of the process statistics in each census phase: MAF Development (red), SR (blue), NRFU (green), Data Processing (Purple), and GQ (orange). How is the summary process statistic decomposed? Rather than summing the risk of error (quintiles) across all 10 PSs as in the previous profile, sum only PSs within the same census phase (i.e., MAF, SR, NRFU). Weight reflects the proportion of the count potentially at risk. The phase-level SPS reflects the contribution of each of the five phases to total census quality. For example, as shown in Table 1 General Approach This report attempts to address the question: "what evidence is there that the quality of the 2020 census apportionment counts is less than the quality of the 2010 census counts?" To this end, PSs from five phases of the 2020 census are identified, evaluated, and compared with PSs from the 2010 census whenever possible. Use of the 2010 census as a reference point is not meant to imply that that census is regarded as a standard for perfection. Rather, because the 2010 census was not subject to a pandemic and had fewer other issues than 2020, it can be used as the most recent example of how a "typical" census might perform. More specifically, this analysis partitions Census Bureau activities into the following five census phases: It then seeks to discover the following: 1. For all phases of the census from MAF development to data compilation and processing, identify activities within each census phase that pose some risk of coverage error at the state level. 2. For each activity, define one or more PSs that reflect potentially elevated error risks in performing the activity. 3. If similar PSs are available from 2010, compare the 2020 and 2010 PSs to determine activities that could have greater error risk (i.e., more opportunities for error in 2020 than in 2010). For each census phase, an ideal scenario can be envisioned. For example, for MAF Development, the MAF would be complete and accurate; for Self-Response, one and only one resident from each household would complete a census questionnaire; and so on. Deviations from these ideals tend to increase risk of coverage errors. A total of 10 critical activities are identified with at least one critical activity per census phase. A critical activity is a major census operation designed to mitigate possible errors resulting from deviations from an ideal process. They are associated with error risk in that some deviations from the ideal will not be mitigated successfully and, for those, coverage errors could result. An activity is critical if its successful implementation is critical to census quality. Our analysis assumes that the risk of error increases each time the activity is performed. In other words, errors are more likely to occur as the opportunities for an error are increased. For each critical activity, one or more PSs are defined at the state level to assess the performance of the activity as it may affect the apportionment counts. All 10 PSs defined in this report are based on the proportions of cases (i.e., HUs, persons or addresses) affected by some critical activity. Six PSs are formed as the difference between the proportions from the 2020 and 2010 censuses for the same critical activity. For example, PS 1 (MAF Addresses Having Imputed Status) is the difference in the proportions of MAF addresses with imputed status between 2020 and 2010. In four cases, the critical activity either did not exist in 2010 or its comparability to 2020 could not be established. Thus, the PS only reflects the performance of the critical activity in 2020. To the extent that a PS can be associated with the risk of coverage error from an activity, the variation in a PS across states also reflects the variation in error risks from its related activity. For PSs that are differences between 2020 and 2010, positive values imply greater risk from the activity in 2020 than in 2010 while negative values imply lesser risk in 2020. Similarly, for PSs defined for 2020 only, larger values for a state imply large risks from the critical activity in that state compared with other states having smaller values. All 50 states, Washington, D.C., and the entire United States are ranked according to each PS. The higher the value of the PS, the higher the ranking. In addition, an SPS is created that may be regarded as an indicator of average error risk from all 10 critical activities. The SPS is formed by first replacing each PS by its quintile ranking and then taking the weighted average of the 10 quintile ranked PSs. Here the weight applied to a PS for a state is proportional to the number of persons in the state count affected by the critical activity. It is important to understand that PSs are not error rates and should not be interpreted as indicators of error. The process statistic is a well-known concept in the survey quality literature. It is defined as any quantity computed from operational data that may be related to the performance of the operation (see, for example, Biemer & Lyberg, 2003, Table 7 .2). A large value for a PS does not mean that errors occurred in an operation. Rather, the PS is intended to reflect the error risk its underlying activity presents to data quality. A higher PS value simply implies more opportunities for error and, thus, a greater chance that some of those opportunities resulted in an error. It is also important to understand that errors can be offsetting when aggregated to the state level (i.e., undercounts may offset overcounts to some unknown extent when the apportionment counts are tallied). The PSs in this report are not intended to provide any information on the risk of net coverage error, which is more relevant than gross error for the purposes of assessing apportionment count accuracy. This is another important reason that interpreting error risks at the state level as evidence of actual error in state counts is inappropriate. To illustrate the proper interpretation of a PS, consider the process statistic defined for the use of proxy respondents -a critical activity that is performed on the occupied HUs in the NRFU universe. When a household member cannot be contacted or refuses to provide information on an HU, a neighbor or other knowledgeable informant, referred to as a "proxy" respondent, may be consulted to supply the required information. The PS for proxy respondents is the difference between 2020 and 2010 in the proportion of occupied HUs where a proxy respondent was used to determine the count of persons living at an address and possibly other characteristics about the residents. The Census Bureau's analysis of the 2010 Post-Enumeration Survey results suggests that proxy interviewing poses some appreciable risk of coverage error because proxies may not be sufficiently knowledgeable about the household to provide accurate information. Therefore, the proportion of proxy interviews is an informative PS for assessing the error risk associated with NRFU proxy responses. The PS provides a count of how often a proxy was used during NRFU as a proportion of occupied HUs. Larger proportions suggest greater potential for proxy error than smaller proportions. Activities like proxy interviewing, status imputation, count imputation, and GQ imputation, carry a relatively high risk, supported by the census literature, that a household count could be in error. For other activities like the use of administrative records, resolving multiple responses, and MAF revisions, the evidence of error risk is weaker and more speculative. Nevertheless, these activities involve decisions that may be difficult in many situations and thus subject to error. If the proportion of cases affected by these activities is larger in 2020 than in 2010, inferring greater error risk in 2020 from these activities than in 2010 is still justified. This is because the interpretation of the PS does not depend on whether the actual error rates (i.e., the percent of opportunities for error that actually result in error) are large or small. What matters is whether the activity error rates are about the same for 2010 and 2020. If this is true, then it follows that an increase in the PS suggests an increase in error risk. Finally, note that this evaluation does not represent the definitive statement on the quality of the 2020 census. Rather, the Census Bureau is expected to release the results of its Post-Enumeration Survey as it always has in recent censuses. This survey is conducted as a follow-up to the census and will provide estimates of census coverage error (i.e., overcounts and undercounts) for all states, many substate areas, and various subpopulations. The first Post-Enumeration Survey results are expected in the first quarter of 2022. Nevertheless, the PSs in this report provide a glimpse at the operations in the census that we hope will be informative and useful for understanding the quality associated with the state population counts since they may affect apportionment. Although raw values for a single PS may be useful for a state-by-state comparison of the risks for a particular activity or phase, they are not useful for depicting patterns across different PSs within a state because the measurement scales may be very different. For example, an important question in this evaluation is whether a particular state's PSs are predominantly high (indicating high overall risk) or low (indicating low overall risk). Because each PS may have a different base (e.g., addresses, HUs, or persons) and a different range of values, comparing their raw values can be both confusing and misleading. To address this issue, we decided to standardize the PSs so that their relative magnitudes are more meaningful and comparable across activities. Thus, for our primary analysis, we replaced each PS's raw value by its quintile (i.e., 1, 2, 3, 4, or 5) relative to the other 52 entities (i.e., 50 states, Washington, D.C., and the United States), where each quintile represents 20 percentage points. Quintiles for a PS can be computed by first ranking the PS from smallest to largest. The smallest 11 states are assigned the value 1, the next 10 the value 2, the next 10 the value 3, the next 10 the value 4, and the final 11 the value 5. As an example, the value of PS for Multiple Responses is 15.18 for Alabama, which is assigned the value 1 because it is among the smallest 11 states for this PS. In this same manner, all the PSs are converted to the numbers 1, 2, 3, 4, or 5 corresponding to their respective quintiles. Furthermore, in each case, the higher a PS's quintile ranking, the higher the error risk is for the census activity measured by that PS relative to the 50 other states. The original values of the PSs have been documented in this report and can also be found in Appendix A (see Table A1 .) Unless otherwise stated in our analysis, the PSs are defined across all census type of enumeration areas (TEAs) for both 2020 and 2010. In 2020, four TEAs were defined. About 95% of households received their census invitation in the mail (TEA 1) and almost 5% received their invitation when a census taker dropped it off at their home in so-called update leave (UL) areas (TEA 6). The remaining less than 1% of areas-mostly remote-were counted in person by a census taker instead of being invited to respond on their own (TEAs 2 and 4). Although TEA 6 accounts for only a small percentage of all HUs in the nation, in some states a much higher percentage of housing is in these areas (see 2020 Census: Type of Enumeration Area (TEA) Viewer). It is also important to note that states having a higher percentage of UL or Update Enumerate (UE) areas may have different error risks than states where only a small fraction of the enumerations used these data collection methods. This section describes 10 critical activities examined in this review along with their respective PSs as listed in Process Statistics at a Glance. The section is organized by the five phases of the census identified in the previous section. After discussing each critical activity and its corresponding PS, we describe the SPS that combines all 10 PSs into a single PS. Process Statistic: Percent of all MAF addresses that were either added or deleted during the 2020 census data collection period. The foundation for the decennial census is the list of addresses used to mail packets with a request to respond to the decennial census. To be included in the census, every person must be linked to a physical address of an HU or GQs facility. This list comes from the Census Bureau's MAF, a repository for all information collected over time for an address-a kind of longitudinal history of addresses associated with an HU. The Census Bureau does biannual updates of the MAF using data from the U.S. Postal Service's Delivery Sequence Files, which is used for mail delivery. In addition, the Census Bureau solicits address updates from tribal, state, and local governments as part of several geographic partnership programs throughout the decade. The MAF is a dynamic list that reflects the ever-changing inventory of housing in the United States. The MAF errors that are the biggest concern for the apportionment counts occur when addresses on the MAF that are not living quarters are classified that way or addresses that are associated with living quarters are missing from the MAF. Census Bureau procedures during the enumeration phase of 2020 census operations dictate that addresses on the MAF that are not living quarters should be deleted from the MAF (referred to as "deletes") and living quarters that are not on the MAF should be added to the MAF (referred to as "adds"). However, as in any extremely large and complex operation, errors can occur. The most common errors are the following: Consideration was given to the development of an equivalent MAF revisions PS for 2010; however, this was not possible because the MAF development processes for 2020 and 2010 were so different. For example, rather than the full address canvassing operation implemented in 2010, about 65% of the MAF used an "in-office" canvassing and review approach. Field address canvassing was implemented only in difficult-to-canvass areas that required ground truth data. In addition, the initial MAF in 2020 (MAF1 in Figure 1 ) contained many addresses whose validity was equivocal because of conflicting information from various administrative data sources including the U.S. Postal Service. Thus, many addresses on the 2020 MAF likely would not have been included on the MAF in 2010. These differences between 2020 and 2010 made the creation of a comparable PS for 2010 infeasible. Process Statistic: The percent of 2020 census HUs submitting questionnaires without census IDs and no matching address was found on the MAF. A census ID is a unique identifier (somewhat like a Social Security number) that is assigned to every address on the MAF. Every mailed letter requesting self-response included this ID. However, to increase the self-response rate, the Bureau allowed respondents to submit a questionnaire with an address but without a census ID. In that case, the Census Bureau used the respondent's address and other identifiers to match the responding household to a MAF address. The non-ID procedure can work well for returns that have a standard city-style address (i.e., number, street, apartment) and can be readily linked to the Census Bureau's MAF. But it may not work as well for returns that have nonstandard addresses (e.g., when apartment numbers do not formally exist in some cities and towns) that cannot be readily identified on the MAF. The same is true for rural areas where addresses may not have standard or easily identified labeling. Many governments use the Geographic Support Program or Local Update of Census Address (LUCA) program run by the Census Bureau to update the MAF prior to the census to include these types of addresses with appropriate labels. Although allowing non-ID submissions made it easier for people to respond to the census, it also resulted in returns whose addresses did not match any address on the MAF. The non-ID nonmatching returns resulted in extra work both in the office and in the field to resolve these cases. This problem is unique to the 2020 census because active promotion of non-ID returns was not a feature of the 2010 census. Thus, the presence of non-ID returns, particularly those with nonmatching addresses, increased the opportunities for error in 2020 compared to 2010. 3. Resolving multiple responses (2020 vs. 2010) Process Statistic: Percent of HUs having two or more questionnaires for 2020 minus the corresponding percentage for 2010. The major single cause of multiple returns is duplicate questionnaires (i.e., households with the same identification number or address responding multiple times with possibly different responses). The 2020 census provided multiple options for responding: mail, telephone, Internet (for the first time), or enumerator-assisted interviews. In addition, the same household could submit returns with or without an ID. Returns submitted with the same ID could be readily identified during data processing, where a process referred to as the Primary Selection Algorithm decided which return to count. Identifying multiple or duplicate returns where one or more returns were submitted without an ID required more effort, particularly if the same household used slight variations of the same address. In many of these situations, additional fieldwork was required by the Bureau to verify these addresses and reconcile conflicting address information, often under tight time constraints. The time-constrained field period may have increased the risk that enumerators would falsify information in the field to save time. It may have also required the Census Bureau to perform additional investigations and evaluations to reconcile inconsistent data during the post-data collection processing phase. Duplicate submissions for the same household that are not detected could increase population counts in some states more than others, leading to differential coverage error by state that could skew the apportionment results. However, a majority of multiple responses resulted for reasons other than respondent-generated duplication. These include: • Duplicate questionnaires created when an original return's accuracy was deemed to be unacceptable (e.g., questionnaires suspected of being falsified). • HUs that were ultimately classified as GQs. • Invalidated records, for example, continuation forms that were later linked to their parent form. • Multiple returns and dummy records generated by the Bureau's systems to address various data processing issues. • Partial internet responses that were deemed ineligible when a more complete response was received. Regardless of the causes of a multiple return, its handling and resolution may still be viewed as an error risk. In addition, the above reasons for multiple returns also existed in 2010. Thus, an increase in this activity in 2020 may not only suggest greater respondent-generated duplication, but also greater falsification of questionnaires, system problems, data processing issues, and so on. Although each individual cause may not carry the same risk to count accuracy, they all pose some appreciable risk. In addition, a large increase in the volume of multiple returns in 2020 should raise concerns that error risks associated with multiple returns may have also increased. 4. Reassignment of college students with usual residence elsewhere (2020 vs. 2010) Process Statistic: Percent of occupied HUs with two or more people, where one or more occupant indicated their usual residence was at college, minus the corresponding percentage for 2010. Population relocations were particularly more frequent for college students since many campuses closed their dormitories because of the pandemic. Along with those living off-campus, many students decided to relocate to the homes of parents, relatives, and friends. These so-called URCs persons created additional challenges for the Census Bureau, especially in locations with high infection rates. Thus, the risk that college students could be counted in multiple locations was greater in 2020 than in 2010. The URC PS captures that risk. The URC PS is the number of HUs with one or more URCs divided by the number of households of size 2. The justification for this divisor is that, although it was asked for internet and phone respondents, the URE question was not asked for single-person HUs in the paper questionnaire. In addition, households within which the sole resident is a displaced college student are likely to be quite rare compared to their prevalence in households of size 2 or more. Thus, using the total number of HUs as the divisor would artificially attenuate the impact of URC on the enumeration process. Nonresponse Follow-up Phase 5. Proxy response (2020 vs. 2010) Process Statistic: Percent of 2020 census occupied HUs units whose census count was obtained from a proxy respondent during the NRFU phase of the census minus the corresponding number from the 2010 census. Data from the 2010 Census Coverage Measurement program have shown that well over 95% of households that self-responded were correctly enumerated. The comparable estimate for households enumerated through an interview with a household member was correct at 93%. This compares with just 70% for proxy respondents and 68% correct enumerations for situations where the type of respondent was unknown. These data suggest that the percentage of returns collected by proxy reflects an important source of error risk. If the proxy rate is greater in 2020 than in 2010 for some state, then the risk of coverage error from this activity is greater in 2020 than in 2010. 6. Count only HUs (2020 vs. 2010) Process Statistic: Percent of HUs where only a population count was obtained for 2020, minus the corresponding percentage for 2010. The percent of households where the only information available was a population count could be a sign of resistance among household informants who are only willing to provide the minimal amount of information. It could also suggest a greater willingness of enumerators to accept the minimum rather than pursuing more complete information. This could be the case when enumerators are under pressure to close out their assignments. In either case, it can be regarded as an informative PS for assessing error risk. For example, it is difficult to know if households that provided only a population count are aware of and followed the somewhat complex rules for determining how many persons resided at the address on Census Day. URCs and other occupants whose usual address is elsewhere may have been counted. Other persons who are away but still use the address as their usual residence may not have been counted. Process Statistic: Percent of occupied HUs enumerated by administrative records for 2020. Administrative records played a much smaller role in the 2010 census compared to 2020, although some matching of census responses to administrative records was done in 2010 as a way of resolving potential undercounts of persons reported on the census form as part of Coverage Follow-Up checks. For this reason, the PS for the administrative records use activity only reflects their use in the 2020 census, not the difference between the 2020 and 2010 censuses. Of the more than 2.3 million persons matched to administrative records, counts were determined in the 2010 Census Coverage Measurement program to be correct more than 95% of the time. However, this sample was limited and census tests completed to prepare for the 2020 census revealed flaws in the ability of administrative records to determine occupancy status. In fact, these tests led to a one-visit requirement to confirm occupancy status in 2020. Thus, any definitive conclusions about the accuracy of administrative records must await the results of the Post-Enumeration Survey due out in early 2022. Nevertheless, because administrative records have been shown to be generally less accurate than self-response for determining household size, the use of administrative records is a critical activity in our analysis. 8. Status Imputations (2020 vs. 2010) Process Statistic: Percent of MAF units whose status was imputed in 2020, minus the corresponding percentage for 2010. Sometimes no information is available for an address, even after multiple visits. It is critically important to know whether the address is an occupied or vacant HU, a commercial building, or nonexistent. When the occupancy status or very existence of HUs cannot be determined, statistical imputation is applied, using the attributes of HUs in the neighborhood or other surrounding areas to assign a status to the address in question. The risk of misclassifying an address using imputation is high and can lead to both undercoverage and overcoverage. 9. Count Imputations (2020 vs. 2010) Process Statistic: Percent of occupied HUs with known status, but whose population count was imputed for 2020, minus the corresponding percentage for 2010. As a last resort when a building or apartment is determined to be an occupied HU, but there is no reliable source of information about its number of residents, its population count is imputed. Imputation is a process whereby the missing count at an address is obtained from other similarly occupied HUs within the same general vicinity and whose count is known. Although the population count for the unit in question and its "donor" may be quite similar, there is still an obvious risk that the imputed count may be in error. 10. Group Quarters Count Imputation (2020 vs. 2010) Process Statistic: Percent of the GQs population that was imputed in 2020. In addition to the designation of addresses as GQs on the MAF, the GQs operation for the 2020 census is conducted using advance visits. These visits were done in January and February 2020, before the onset of the pandemic. Along with input from the LUCA program, the Bureau began the census with a good initial list of facilities and well-developed plans for capturing GQs data. Once pandemic restrictions took hold, however, the enumeration of GQs in a number of categories was disrupted, most notably in college dorms, skilled nursing facilities, and the service-based enumeration (SBE), which was initially scheduled for the end of March 2020. Efforts to enumerate the population in various GQs categories took place over the course of the year, including the SBE September 22 through 24 and much work in the post-data collection phase of the census. Moreover, when the Census Bureau was still on a December 31 timeline for apportionment numbers to be delivered, a decision was made not to do a Count Review of GQs by members of the Federal-State Cooperative on Population Estimates-a check using local lists of facilities. Once the December 31 deadline was moved, the Census Bureau mounted an effort to "fill in the holes" and avert a potential undercount of GQs in post-data collection processing. Calls to GQs facilities and extended use of administrative data from a variety of sources were deployed in the late stages of data processing. For the first time, imputation was used as a method to determine counts of persons in GQs. Thus, the level of count imputation used to create the GQs population is included as a measure of risk. Because GQ imputation was not used in 2010, this PS only reflects the 2020 census activity. As previously noted, a major goal of this work is to distinguish states by their risks for errors in the apportionment counts, especially as they exceed the corresponding risks in 2010. This goal is facilitated by using the SPS. The SPS combines all the PSs of interest into a single measure for each state after the PSs have been converted to quintile ranks. The SPS is intended as a measure of the total error risk to the accuracy of a state's count across 10 critical activities of interest. Note that a measure of total error risk could be formed by simply averaging the PSs of interest. However, some activities, such as address status imputation, affect a relatively small proportion of a state's count, while other activities, such resolving multiple responses affect a much larger proportion of the count. To account for this variation in effects across critical activities, the PSs are weighted according to their potential impacts and then summed to produce the SPS. The purpose of PS weighting is to produce a summary measure of the total error risk for the 2020 census by ensuring that activities affecting more cases carry more weight than activities that affect fewer cases. A PS's weight is derived by estimating the proportion of the state count that is affected by the census activity underlying the PS, for example, the percentage of occupied HUs that submitted two or more questionnaires as a percent of all occupied HUs. Separate weights are computed for each PS for each state (i.e., 510 weights in all). So that the resulting SPS is also a number between 1 and 5, the weights are scaled so that their sum is 1. These scaled weights are applied to their respective PSs before summing. In other words, the SPS is equal to the PS scaled weight times the PS summed over all 10 PSs. For example, as shown in Table A2 in the Appendix, the weight for Status Imputations for Alabama (AL) is 1.09. The sum of all 10 weights for AL is 54.32. Therefore, the scaled weight for AL is 1.09/54.32 or 0.02. Repeating this for all 10 PSs for AL will result in a scaled weight sum of 1. In addition, the quintile ranking for Status Imputations for AL is 5 and thus its weighted contribution is 0.02×5 or 0.1. Repeating this for the remaining nine PSs for AL and summing them results in a SPS of 2.38. One additional adjustment is applied to the weights before they are used to compute the SPS for PSs that are differences between 2020 and 2010. It happens that, for some states, the difference between 2020 and 2010 could be negative indicating that 2010 exceeded 2020 in terms of the error risk reflected in the PS. When this occurs, the weight is set to 0 for the state indicating that additional impact on the 2020 count relative to 2010 is 0. For example, the weight for the proxy response PS is the percent of occupied HUs whose census count was obtained by a proxy (scaled as previously described). The proxy response PS is the difference in this percent between 2020 and 2010. If the difference is negative for a state (indicating greater proxy use in that state for 2010 than in 2020), the weight is set to 0 so that the SPS shows no increased risk because of proxy respondents for that state. Figure 2 shows the range and mean of the scaled weights for each of the 10 PSs. Three points can be made about this figure. Recall that the weight for a PS is 0 whenever the difference between 2020 and 2010 is negative, which implies error risks were greater in 2010 for that PS's activity than in 2020. Thus, it is possible that SPS could be very small (say close to 1). How would one then interpret the following situation? Suppose that (1) the range of SPS across states is quite small and the largest value of SPS is also quite small. For example, suppose the SPS range is 1.5 to 2.0 for all 51 states and Washington, D.C. Then, one can conclude that the 2020 census differed from 2010 by approximately the same amount for all states, but also, the difference was quite small. In other words, the quality risks for the 2020 census and the 2010 census are about the same. Finally, it is possible that the difference in error risks for a state in quintile 1 and a state in quintile 5 is very close, even though the state in quintile 1 is in the bottom 20% and the state in quintile 5 is in the top 20% of states. For example, the PS Status Imputations has a range of 0.3 to 1.4. This PS has the smallest range among the 10 PSs. This means that the lowest state differs from 2010 by 0.3 points and the highest state by 1.4 points regarding status imputations. Although this may not seem like a large range, it is still true that the highest state has about 4.7 points more status imputations than the lowest state. In terms of error risk, this may still be an important finding for 2020. It should be noted that the status imputation PS has a relatively small weight, which limits its influence on the SPS. In fact, for the 10 PSs in our analysis, larger weights are accompanied by wider ranges and smaller weights by smaller ranges of PS values. Thus, the range in values of SPS across states tends to be driven by the PSs with the larger weight. Given the difficult circumstances surrounding the 2020 census, data users are concerned about data quality and about potential threats to the counts that affect their share of political representation and access to resources over an entire decade. These concerns were reflected in the publication of the 2020 Census Quality Indicators Task Force report: 2020 Census Quality Indicators: A Report from the American Statistical Association (see ASA Board Releases 2020 Census Quality Indicators), when it recommended that qualified external researchers should be granted access to the data by the Census Bureau to help conduct analyses to assess the quality of the 2020 census. The analysis in this report is in response to that recommendation. Specifically, our analysis examines 10 PSs associated with activities that are critical to 2020 census quality. The analysis provides important insights regarding the quality of the 2020 census for the state counts. However, the analysis does not provide a definitive statement about 2020 census quality. That will hopefully come in the first quarter of 2022 when the Census Bureau begins releasing results from the Post-Enumeration Survey that will provide estimates of coverage error rates. Still, the current report provides a glimpse at the 2020 census operations. PSs have much to say about the origins of possible error. Thus, they are a necessary but not sufficient step toward a more comprehensive assessment of 2020 census quality. The next section answers nine questions that we believe are fundamental to understanding 2020 census quality and how it differs from 2010 census quality. Each question is posed followed by an answer that is based on the data in this report. Answers to Key Questions Regarding the Quality of the 2020 Census The pandemic adversely affected many aspects of life in the United States and the census was no exception. Hurricanes in the south and wildfires in the west further changed the lives of many Americans. These events created a very challenging environment for conducting the 2020 census-much more so than in 2010. It should not be surprising to learn that the risk of coverage errors was higher in 2020 than in 2010. We examined 10 PSs that either directly or indirectly compared the error risks of various operations and activities for 2020 and 2010. Weighting these activities by their expected relative impacts on total error risks, we constructed an overall PS referred to as the SPS, which varies between 1 and 5, where 1 indicates a minimal increase in risk and 5 a maximal increase in risk. The results suggest that error risks, as measured by the SPS, increased relative to 2010 for 50 states and Washington, D.C. The SPS ranges from 1.21 (NE) to 4.47 (AK). The SPS suggests that the risk of coverage error in 2020 is higher than in 2010. (What this says about the relative data quality is discussed further in our answer to Question 9.) However, while the analysis of PSs sheds light on potential error in the data collection for U.S. households, the strongest statement that can be made based on this analysis is the following: most of the critical activities considered in this report were exercised more frequently in 2020 than in 2010. To the extent an activity poses a coverage error risk, it follows that the risk of coverage errors also increased in 2020 for these activities. Despite these findings, there is no evidence that state-level counts used for apportionment purposes are of lower quality in 2020 than in 2010. Nor is there evidence that the apportionment count for any given state is in error. A more conclusive assessment of 2020 census quality is probably best achieved by combining multiple approaches, such as PSs for critical activities and estimates of undercounts and overcounts from the Census Bureau's Post-Enumeration Survey and Demographic Analysis. procedures were used in the 2020 census that were not used or were used much less frequently in 2010? Is there evidence from the PS analyses that these new census approaches increased or decreased error risk in 2020 compared to 2010? Over the past several decades, the Census Bureau has revised its approaches for improving census data collection and processing. Two examples are paid professional advertising for the 2000 census and the LUCA program aimed at improving the 2000 census Address List with local information. There is evidence that paid advertising improved initial census response and that LUCA added many addresses that were missing from the MAF. However, the LUCA program also greatly increased the workload for address de-duplication activity. This may have contributed to an overcount as estimated by the Census Bureau's 2000 coverage evaluation program. A lesson from these two experiences is that innovations have both benefits and error risks. In 2010, problems with deploying new technology for field data collection and concerns about using the Internet for data collection delayed the adoption of these innovations. Although the 2010 census was judged a success in several areas, the use of the latest technology lagged. The decade that followed leading up to the 2020 census saw the Bureau make a big leap into new technologies and innovations in four areas to make the address list better, self-response easier, and the data collection more efficient: The Census Bureau began an effort to do more continuous updating of the address list through its Geographic Support System Initiative (GSS-I) in 2011 with a range of partners in tribal, state, and local governments, in addition to two yearly updates from the U.S. Postal Service. These efforts served to increase confidence in the MAF and led to the decision to replace the costly 100% pre-census address canvass with a combination of in-office and in-field address canvassing in preparation for the 2020 census. The Bureau "canvassed" about two-thirds of the nation using aerial photography, satellite imagery, and other local files to determine the validity of addresses for use in the census. These technological innovations were a new paradigm for the census, eclipsing the preparations for the 2010 census, which were largely confined to the LUCA program. Of course, like the censuses of the past, innovation brought benefits and risks, and 2020 was no exception. Although the MAF was enhanced through the GSS-I, the selection of a subset of addresses for inclusion in the address list for the census was still based on a series of judgments about the quality of addresses, which were mostly untested, given the absence or cancellation of census tests because of funding shortfalls. Moreover, the potential error resulting from the 2020 in-office canvassing operation has not yet been assessed, although an assessment is forthcoming. Based on the available data, it is likely that these innovations increased error risk to some extent because of the large percentage of deleted and added addresses reflected in the MAF Revision PS. Moreover, the address "filtering" operation that created the initial census MAF also imparted error risks as it retained addresses that were unlikely to be valid, while b. Self-response via the Internet Table 2 contrasts 2020 data collection to the 2010 data collection at a highly aggregated level. In 2020, a higher percentage of households self-responded, but a lower percentage responded by a household member in the NRFU phase. Adding these two percentages, data collection from the preferred methods (i.e., via a household member) was over 90% in 2020, compared to 93% in 2010. The Bureau used the Internet as the primary means of self-response for the first time in 2020 and more than 80% of self-respondents chose that option. Although self-response, particularly via the Internet, may be the most cost-effective and accurate data collection method, it produces error risks primarily because of multiple responses that then must be accurately resolved. In addition to response by Internet, the 2020 census promoted the submission of questionnaires without census IDs-a new approach that harnessed the ability of the MAF to accurately match non-ID returns to an address. About 22.2 million returns were submitted without a census ID, the majority of which were verified via automated match to the MAF. Of the remaining addresses about 9% could not be matched to the MAF, requiring more work on the part of the Census Bureau-in office and in the field-to determine their status (e.g., deletes, adds). Thus, a consequence of promoting non-ID responses is some level of risk associated with the large volume of nonmatching addresses that increased the workload for Bureau staff. Finally, an important item for future research is the impact of potential error in parsing addresses -many without standard features or sufficient detail -for determining whether two addresses are the same HU. Although administrative records have been used in the past for selected populations, such as for certain types of GQs, this is the first time these were used to gauge occupancy status and to enumerate households. About 3.8% of all households were enumerated using administrative records in 2020. As discussed earlier, the use of administrative records has been controversial, especially as regards demographic characteristics. However, our sole focus here is the apportionment counts, and for counts, some data from 2010 Census evaluations do show that administrative records are superior to alternatives such as proxy responses and count imputations. As Table 2 shows, using proxy responses for counts in the NRFU declined between 2010 and 2020, from 6.4% to 5.4% of all households, while administrative records accounted for 3.8% in 2020, compared to zero in 2010. It is likely that administrative records reduced reliance on proxy response and imputations in 2020. It is also likely that the observed increase in imputations-from 0.4% in 2010 to 0.7% in 2020-was minimized by the option to use administrative records. Still, the 0.3% increase in imputed status/counts is important given that it is the least preferred option for completing the census, with most of the increase the result of status imputation. 3. The Census Bureau seeks to deploy its procedures in standardized fashion throughout the nation. However, conditions in each state can differ considerably regarding the base of addresses and difficulties surrounding the enumeration. How did the SPS vary across states? What does the range look like across states? Table 3 lists the 10 PSs, their means and minimum and maximum values for the 50 states and Washington, D.C. The last row of the table shows the same quantities for the SPS. The range is defined as the maximum PS minus the minimum PS. The relative range is the range divided by the absolute value of the mean PS. For almost all PSs, the relative range exceeds 1, which means that range is at least as large as the magnitude of We can use the Phase Decomposition of SPS approach discussed in the Profiles section to address this question. As shown in Table 4 , seven states have SPS values that exceed 4. In each case, the phase contributing most to SPS is self-response. By comparison, self-response contributes slightly more than 69% of SPS for the United States as a whole. MAF development and NRFU are the next largest contributors to total error risk. GQs contributes the least with less than 0.3% of SPS. For the United States as a whole, it is less than 0.13% of SPS. As emphasized throughout this report, neither the individual PSs nor the SPS alone can be used to make definitive statements about the accuracy of the apportionment counts. error to occur. It is not known how many of these opportunities will result in an error (i.e., the error rate). For example, if we knew that for every 100 opportunities, one person is erroneously counted (i.e., a 1% error rate), then error risks could be converted to numbers of errors. However, because error rates are unknown, how data quality is affected by error risks is also unknown. Additionally, error rates are likely to vary a lot by PS; for example, the count imputation error rate is likely to be larger than the multiple responses resolution error rate. Since error rates are unknown, no statements regarding the magnitude of the errors in the apportionment counts by state or for the United States can be supported by the data in this report. However, because 2020 has greater error risk than 2010, it is more likely that the error in the state counts is larger (on average) in 2020 than it was in 2010. But, as the next point illustrates, the net coverage error may not be greater than in 2010. b. The accuracy of the apportionment counts depends on net coverage error and not on total number of errors. Some errors add to the apportionment counts (overcounts), while other errors subtract from them (undercounts). If the number of overcounted persons equals the number of undercounted persons for a state, then the state count would be perfectly accurate. For example, suppose a state was determined from our analysis to have 1 million opportunities for error. Furthermore, suppose that 1,000 of these opportunities resulted in undercounts and another 1,000 resulted in overcounts. Then, the apportionment count would be perfectly accurate because two errors would be offsetting. This is an important reason to avoid interpreting error risk as evidence of error in the apportionment counts. Knowing the error requires not only knowing the error rates for the 10 critical activities but also knowing their direction: positive or negative. c. The approach taken in this report is just one of many possible approaches, each potentially yielding somewhat different results and conclusions. The ASA Task Force Report provided specific guidance as to how the analysis should be approached and that guidance was followed in this report. Their report suggested about two dozen PSs or "quality indicators" as it referred to them for assessing 2020 census error risks. This report chose only 10 PSs-some suggested by the Task Force and others based on the consideration of activities that are most critical to census quality. Although these selected PSs are comprehensive and the analytical approach reflects the recommendations of the Task Force report, many other PSs could have been examined. Furthermore, the general approach adopted in this report (i.e., decomposing the census process into phases, using quintiles to compare risks across PSs, using weights to summarize these risks, using vetted methods to analyze the results) represents only one approach to the 2020 census quality assessment. Other approaches are possible and could produce different results and possibly different conclusions. Data for such experimentation are provided in Tables A1, A2 , and A3 in the Appendix. 6. There was a great deal of concern about the movement of college students related to the pandemic and its impact on the count. Do the PSs suggest any negative effects of these movements on the count? According to the Census Bureau's rules of where to count people, college students are supposed to be counted where they attend college. College students living in dormitories are included in the enumeration of people living in GQs, such as dormitories, halfway houses, and skilled nursing facilities. In addition, college students living off-campus were included in an early door-to-door operation carried out by mid-April to enumerate them before they left town at the end of the school year. However, because of the pandemic, most students left campus during March before that enumeration was completed. Questions remain on how successfully the Census Bureau managed to obtain school records from colleges and universities indicating where students were supposed to have been living on April 1. In addition, the risk that students may have been counted twice -at school and in the locations where they moved after campuses were shut down, such as at their parents' house -is a legitimate concern. The question on the paper census form that asks, "Does this person usually live or stay somewhere else?" with an option for "Yes, for college" is shown in Figure 3 . On the paper form, this question is asked of every person listed on the census questionnaire except the person completing the questionnaire. It is asked of everyone listed on the Internet or enumerator-assisted forms. (As previously described, the PS for the process of reassigning students enumerated in wrong locations is the difference between 2020 and 2010 for the ratio of the number of URCs to the number of HUs consisting of at least two persons expressed as a percentage.) Our results showed that the percentage increased in all states in 2020 except for Washington, D.C. Nationwide, about one percent of two-person HUs had at least one URC. The percentage of the population affected by this PS is small at the state level, but that may not be the case at substate levels. Similar to the GQs count imputation activity, URCs are highly localized and it is likely that the error in the URC reassignment activity is as well. Perhaps more importantly, the PS may reflect two related risks: (1) the reassignment of larger numbers of college students to alternate addresses in 2020 than in 2010, and (2) the risk of possible duplicate responses from college students who may be listed at two addresses. It may suggest the potential that college students may not have been properly identified as URCs on the census form and thus counted at the wrong address 7. The GQs enumeration was greatly altered by the pandemic, given problems with access to facilities and changes in the approaches taken by the Bureau to collect data. What did the PSs tell us about the GQs enumeration? For the first time, the population counts for some GQs were imputed in 2020. As previously noted, this was likely the result of GQs enumeration difficulties caused by the pandemic. Although its impact at the state level may be small, the GQs count can be influential for substate statistics as evidenced by the wide variation in the percent of persons imputed by state. Locally, substantial coverage errors can result from the omission of a few large facilities and, although GQs count imputation may have reduced error risks, error risks remain because of the inherent inaccuracies of the count imputation process. Because GQ imputation is a new procedure that was required because of difficulties encountered in their enumeration, there are concerns regarding its implications for substate counts. An error risk evaluation for areas of high GQs concentrations is advisable; especially in Delaware and Mississippi where GQs count imputation rates were markedly higher than for other states. Local administrative records could be used as a resource to check the utility of the imputation methods (e.g., the kind that would have been deployed by the Federal State Cooperative on Population Estimates in the GQs count review that was unfortunately cancelled). states as of April 2020 allowed for comparison with the state census counts used for reapportionment. Is there any evidence from the PSs that states where population counts differed the most from their population estimates had higher levels of risk? The Population Estimates Program provides annual population estimates for geographic and demographic categories, including total population for states, cities, and towns. The most current population estimates that are available represent the resident population as of July 1, 2020, but the Census Bureau has issued estimates for April 1, 2020, for comparison with census counts by state. The population estimates have been carried forward from a 2010 census base, and likely have levels of uncertainty that make comparisons with the 2020 census counts challenging at best. Nonetheless, comparisons of the most recent population estimates available for states with the corresponding 2020 census counts may prove useful. Of special interest are those states with large discrepancies, which may indicate inaccuracies for the 2020 census. What this analysis shows is that there is little if any relationship between the PSs and count-estimate discrepancies. The SPS has a correlation of just 0.17 with the estimates-count difference. Table 5 shows the top quintile of states on the two most dominant critical activities in this analysis: resolving multiple responses and MAF revisions. In both instances, the lists of top states on each PS present a disparate picture of differences between the population estimates and the census counts. There is no apparent pattern in magnitudes of the relative differences for these top-ranked states. This suggests that the relative difference may not be a reliable indicator of census error risks. a. Seven states-AK, MT, NJ, NM, NY, TX, UT-have SPSs exceeding 4, which suggests that these states have the highest risk for census error. These states have very different populations and range from mostly urban to mostly rural. Therefore, it is surprising they all rank in the top tier for SPS, which indicates that the error risks apply to very diverse populations and not only to either densely or sparsely populated areas of the country. b. The Census Bureau made many important changes to ensure that the 2020 census counted everyone once and only once, and in the right place. These changes were designed to both reduce costs and increase the likelihood of response. However, as is usually the case with new procedures, these changes also increased error risks. Non-ID returns and multiple responses markedly increased in 2020 from 2010 levels. The address list required an unprecedented number of revisions, each carrying some, albeit small, error risk. Administrative records were employed that, in many ways, hugely benefited census data collection during the pandemic. However, although facilitating the counting process, administrative records increased uncertainty because their use can impart error. Unfortunately, the currently available data are insufficient for assessing the effect of these new approaches on coverage error. c. Despite operating in one of the most challenging environments imaginable, the Census Bureau managed to successfully conduct the 2020 census. This is a considerable accomplishment and is testimony to the Bureau's laser focus on the execution of operations, many of which have stood the test of time and others that were new. Being a massive operational and engineering project, the Bureau must rely on data-driven decisions and well-tested methodologies. However, the lean fiscal environment that prevailed in intercensal years preceding 2020 curtailed some of the research and testing that would have been prudent considering the dramatic shift in census methodologies that occurred. As a result, the amount of error introduced by the critical activities identified in this report must be purely speculative. Perhaps in 2030, the Bureau will be better prepared to advance quality declarations that take error into account for any new, innovative methodologies it plans to roll out. d. Many elements of the census are impressive in their design and execution. However, our assessment has especially brought a newfound appreciation of the complex process of building a census address list from the MAF. Without a complete and accurate MAF, other innovations and processes that are part of the execution of the census may become compromised, similar to a building with a poor foundation. Investments in research and evaluation on the potential errors related to decisions affecting the MAF in the 2020 census would go a long way as the Bureau looks forward to 2030. e. At this time, little can be said about the causes of the differential between the population estimates published prior to the 2020 census and the 2020 census counts. The risk measures in this report do not explain why the census counts either fell short of or exceeded expectations. However, this result was anticipated because even the Post-Enumeration Survey results have not been able to explain gaps in actual versus expected counts in prior censuses. One reason is that the differential could be equally attributed to inaccuracies in the population estimates and to the census counts. It is important not to place much credence on differences between the census counts and the population estimates as an indicator of census error. While it may provide some information about error risk, it is not a reliable measure of census error. Original Values for the 10 Process Statistics Census Quality Study Census: Data Collection Operations Were Generally Completed as Planned, but Long-standing Challenges Suggest Need for Fundamental Reforms Imputation Research for the 2020 Census Administrative Records in the 2020 US Census: Civil Rights Considerations and Opportunities Urban Institute Census Coverage Measurement Memorandum Series #2010-G-01 Administrative Record Modeling in the 2020 Census Decennial Census: Item Nonresponse Imputation Assessment Report Salvo JASON report Assessment of 2020 Census Data Quality Processes U.S. Census Bureau, 2020 Census Detailed Operational Plan for: Post-Enumeration Survey (PES) Operations Geography Division, Census Address Filter Criteria Software Requirements Specification. Version 1.9 Census Detailed Operational Plan: 18. Nonresponse Followup Operation (NRFU) (Version 2.0) Census Detailed Operational Plan for: 8. Address Canvassing Operation Issued Census Detailed Operational Plan for: 30. Evaluations and Experiments Operation (EAE) Issued Census Detailed Operational Plan for 30: Evaluations and Experiments U.S. Department of Commerce Office of Inspector General Office of Audit and Evaluation Actions Needed to Address Challenge to Enumerating Hard-to-Count Groups, GAO-18-599 Census Nonresponse Followup Operations Assessment, REPORT NUMBER CPEX-190 The authors acknowledge the support of the US Census Bureau, which provided the data and valuable information needed for this report. The authors also thank the American Statistical Association's 2020 Census Quality Indicators Task Force for their support and guidance.