key: cord-0224930-o4o5n9f4 authors: Choi, Yunseo; Unwin, James title: Racial Impact on Infections and Deaths due to COVID-19 in New York City date: 2020-07-09 journal: nan DOI: nan sha: 38330462543f980e6e8641a0be0de09c9c987150 doc_id: 224930 cord_uid: o4o5n9f4 Redlining is the discriminatory practice whereby institutions avoided investment in certain neighborhoods due to their demographics. Here we explore the lasting impacts of redlining on the spread of COVID-19 in New York City (NYC). Using data available through the Home Mortgage Disclosure Act, we construct a redlining index for each NYC census tract via a multi-level logistical model. We compare this redlining index with the COVID-19 statistics for each NYC Zip Code Tabulation Area. Accurate mappings of the pandemic would aid the identification of the most vulnerable areas and permit the most effective allocation of medical resources, while reducing ethnic health disparities. Systemic racial segregation has left many United States (US) citizens-especially black Americanscloistered in adverse living conditions. Broadly, institutionalized racism encompasses policies, norms, and institutional practices (both intended and unintended) that amount to racial disparity [1] . Historically, institutionalized racism has left nonwhite or racially mixed communities with inadequate housing, disinvestment, and relatively low employment rates [2] . Many health researchers hypothesize that such practices of institutionalized racism are to blame for health disparities between ethnic groups in the US at individual and neighborhood levels [3] . Moreover, such health disparities are a particular concern during the current COVID-19 pandemic. Current efforts to quantify inequalities surrounding the COVID-19 pandemic in the US (see e.g. [4, 5] ) rely on identifying the vulnerability of subgroups according to traditional CDC-defined risk factors such as old age and underlying conditions [6] . However, racial differences in the number of COVID-19 cases and deaths are so severe that traditional risk factors alone cannot fully explain such disparity [7] [8] [9] [10] [11] [12] [13] . In this study, we show that in New York City (NYC) the demographics of a neighborhood can imply enhanced risk for its residents and should be considered when measuring an individual's vulnerability to COVID-19, in addition to the CDC traditionally defined risk factors. While several studies make use of the preexisting health surveys to arrive at their results, we make use of the data from the actual spread of the disease in New York City to arrive at our conclusions. For other COVID-19 studies focused on NYC, see e.g. [10] [11] [12] [13] [14] [15] [16] [17] . Specifically, here we compare COVID-19 data to a "redlining" index we construct for New York City (NYC). The term "redlining" refers to discriminatory practices in which banks historically avoided investments based on neighborhood demographics: therefore, denying services * Corresponding author: unwin@uic.edu to specific ethnic groups based on the locations of their residences [18] . Historically, banks disproportionately denied mortgage applications from black Americans, barring them from entering more affluent, traditionally white communities. Such practices have been a real and significant detriment to black Americans. In the context of health research, redlining and other mortgage discrimination have been empirically blamed for racial health disparities as such practices would assign black Americans to poor neighborhoods with lower standards of living. Limited access to nearby health care, poor air and water quality, and stress from high levels of crime and impoverishment mean that living standards can be closely linked with to health levels in the community [3] . This paper is structured as follows: In Section II, we outline the construction of a redlining index for each census tract. Then in Section III, we discuss the COVID-19 statistics for NYC and compare these to the redlining index of Section II. In Section IV, we discuss certain limitations of our model and possible extensions, and in Section V, we highlight the significance of our findings. To assuage the public's concern on mortgage discrimination, since 1975, the Federal Reserve Board has made it compulsory for financial institutions to release information about the mortgage applicants and their applications through the Home Mortgage Disclosure Act (HMDA) [18] . Currently, this data is publicly available online [20] . However, only a few researchers to-date have made use of the HMDA database in the context of health research. In these existing studies, the impacts of redlining on long-term, noncommunicable diseases such as cancer and those relating to perinatal health have been studied [21] [22] [23] . Another study [24] explored the effects of redlining on access to medical resources. These studies concluded that redlining has a statistically significant influence in increasing the rates of noncommunicable diseases and in decreasing access to healthcare. Here we examine the relationship between redlining and COVID-19 infections and outcomes. As such, we also present the first study of the impact of redlining on the spread of communicable diseases. To construct a redlining index we follow similar method to that in [21] [22] [23] and make use of the publicly available HMDA data sets for years 2013-2017 [20] . In these data sets, information about the applicant such as the applicant's ethnicity, income, loan amount, and sex was reported. Information about the application, which includes the purpose of the mortgage and the property type, was also reported. The smallest unit of neighborhood reported in the HMDA data set is the census tract. Since we are interested in the health disparities between black and white ethnic groups, we excluded primary applicants that did not identify as black or white. We also excluded applications for multi-family housing or home improvement purposes, as well as incomplete and withdrawn applications, from our analysis. After this filtering, there was a total of 208, 960 applications accounted for across 2095 census tracts 1 within the five year span of 2013-2017. We then geocoded the census tracts into Zip Code Tabulation Areas (ZCTA) using the Census Bureau's Relationship File [25] . Using the HMDA data, we constructed a redlining index using a multilevel logistical model and then evaluated it on each census tract in NYC. The main predictor of the logistical model was the ethnicity of the primary applicant. The outcome to be measured was the log-odds of the probability of mortgage acceptance p ij , where j indexes each census tract and i indicates each individual within census tract j. Two covariates were utilysed, based on the variables shown to be influential in previous studies [21] [22] [23] : the applicant's sex and the ratio between the amount of loan requested given their income. The index was computed from the two-level equations: where, r and s are the ethnicity and sex of an applicant i in census tract j (with r ij = 1 for white and r ij = 0 for black; s ij = 1 for a male and s ij = 1 for a female), and where l is the loan to income ratio of the applicant. In level 2, the coefficients β kj are then identified with a fixed factor γ k0 , the coefficient that best fits all of the data points, and a variation between census tracts j captured by u kj , with an assigned value such that each u kj best fits all of the data points within census tract j. Notably, β 1j , which tracks the ethnicity of the applicants, provides a measure of the black-to-white difference in mortgage acceptance for census tract j. Each of the four variations β kj (with k ∈ 0, 1, 2, 3) were tested on whether they improve the fit in terms of the χ 2 statistic. Permitting for variations due to the sex of the applicant, u 2j , was shown to not improve the fit, and therefore, was excluded from the final model. However, the fixed effect on the sex of the applicant γ 20 was retained. From fits of the logistical model to the HMDA data we constructed the redlining index: R = e β1j and quantified each census tract along a continuous scale of mortgage loan discrimination. In addition to this, one can identify the global component of the redlining index R F = e γ k0 such that R = R F e u kj . For the 5-year dataset analysed, the index R took values in the range 1.70 to 2.48 over the 177 NYC ZCTA. For reference, R = 2.0 implies the probability of mortgage acceptance of a white individual is twice that of a black individual in a given census tract (adjusting for sex and loan to income ratio). An average of 99.7 applications were considered from each census tract, and an average of 1180.6 applications were considered for each ZCTA. The percentage of mortgage denial from 2013-2017 in NYC ranged from 19.6% to 26.7%. After the redlining indices were calculated for each census tract, we geocoded the census tracts into ZCTA. We then weighed each census tract by their population and calculated the redlining index for each ZCTA. The results are illustrated in Figure 2 . Higher index scores indicate predominantly white, more affluent areas. Neighborhoods with the highest indices were Upper West (2.33) and Upper East (2.31), and those with the lowest scores were Rockaways (1.86) and Southeast Bronx (1.88). COVID-19 is an infectious disease caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The pathogen was first identified in Wuhan, China in December 2019 and rapidly led to a worldwide pandemic, this was particularly pronounced in the US. Our analysis focuses on the spread of COVID-19 in NYC, which was an early epicentre for the pandemic in 2020. Data on the spread of COVID-19 was retrieved from NYC's official website [26] , which was updated daily. 2 Tests and cases with unknown ZCTA were excluded from our analysis. As of 5/26/2020, the ZCTA of 3.1% of all positive tests and that of 1.1% of all tests could not be identified. We assigned ZCTA to neighborhoods using the 'Zip Code Definitions of NYC Neighborhoods' [27]. In Figures 3-5 we illustrate the variation in the number of confirmed cases, percentage of positive tests, and total deaths resulting from COVID-19 as of June 30 2020. The predominantly white neighborhoods of Greenwich Village/Soho reported 0.89 cases per 100 residents (phr), and lower manhattan had 0.078 phr, the lowest infection numbers in NYC. While West Queens and Rockaways were among the highest numbers of cases (3.52 and 3.51 phr, respectively), and both also had very low redlining scores (R = 1.91, 1.86). Similar statements hold for the proportion of positive tests. Moreover, Greenwich Village/Soho had the least COVID deaths (0.065 phr), whilst Rockaways which is predominantly Black/Latino, reported the most deaths (0.46 phr). Using the redlining index constructed in Section II, we compute the Pearson correlation coefficient [28] between the redlining index R and three COVID-19 data sets: • The number confirmed COVID infections case. • The percentage of positive tests '%+test'. • The number of COVID deaths. Specifically, we computed the correlation coefficient over five day periods starting from April 1 st 2020 until June 30 st 2020. Over this 90 day period, the cumulative number of cases increased from 73,533 to 252,585, the cumulative number of tests rose from 127,550 to 1,691,978, and the number of deaths spiked from 1374 to 18,492. To gain some intuition, we took the data for each 5-day period for which we calculated the correlation coefficient and mapped each ZCTA to points on scatterplots in the planes of R versus cases, percentage of positive tests, and deaths. We show one set of plots for 15th -20th of May 2020 in Figure 6 . Moreover, Figure 7 shows the evolution of the correlation coefficients over time (and a table of the coefficients is given in the appendix, along with the associated p values). Inspecting Figure 7 , we note that while the correlations started out relatively weak, they all significantly strengthened over time and also settled. It can be seen that the redlining index establishes a significant negative correlation with all three COVID data sets. This suggests that in redlined neighborhoods, the containment of the disease is harder, likely due to a lack of sufficient medical resources. It may also suggest that less individuals tend to seek medical care, and perhaps that more individuals have to work and thus are at risk of infection. Regardless the reason, the results emphasize the need for more medical resources in redlined areas. Since COVID-19 data was only reported in NYC in each ZCTA, in our analysis we worked at the ZCTAlevel. However, had COVID-19 data been released at the census tract level, a more detailed analysis could have been conducted. Such a fine detailed analysis would be interesting for identifying sub-pockets of vulnerable individuals. Furthermore, while the HMDA data does significantly increase the transparency of mortgage discrimination, potentially critical information such as the applicant's employment status, debt, and credit score were not reported. As such, these factors could not be included. Notably, a previous study on the 1993-1999 HMDA data set [29] observed that an application from a black applicant was more likely to have missing ethnicity information than that of a white applicant. The study thus concluded that mortgage discrimination visible in the HMDA database is an underestimation of the true severity of the problem. This may imply that redlining may have an even larger impact on racial health disparities than found in our analysis leading to even stronger correlations than reported in Figure 7 . Moreover, although we focused on black and white ethnic groups in this study, a potential future research directions would be to observe the impacts of residential segregation on the spread of pandemic on other ethnic groups. In particular, previous studies have concluded that the healthiness of the Hispanics in Milwaukee [23] and that of Chinese Americans [24] in Los Angeles are affected by residential segregation. Our analysis quantifies the impact of the COVID-19 pandemic on black Americans, a sub-group which has previously shown to be disproportionately affected by the pandemic [7] [8] [9] [10] [11] [12] [13] . As of June 2020, the rate of confirmed COVID-19 cases among black NYC residents were roughly 60% higher that of the white population after age adjustment [26] and the number of COVID related deaths was double for black Americans, compared to white Americans. The risk factors determined by the CDC [6]-old age and various underlying conditionsalone are not enough to explain such disparity. This naturally raises the question about whether medical resources were distributed equally among neighborhoods or whether certain subgroups are more or less likely to reach out of medical assistance. This study has endeavored to address these apparent health disparities through the lens of historical residential segregation. Moreover, this work contributes to measuring the lasting impacts of institutionalized racism on the spread of communicable diseases (taking COVID-19 as a prime example). While the medical literature is clear that environmental factors influence healthiness, very few studies have quantified residential segregation and measured its relationship with racial health disparities and even those that have primarily examined exclusively noncommunicable diseases [21] [22] [23] [24] . Notably, reasons for why neighborhood-level factors should influence healthinesssuch as stress and scarce medical resources-apply similarly to communicable and noncommunicable diseases. This work aims to minimize racial health disparities as a consequence of the lasting impacts of institutionalized racism, specifically during a pandemic when such disparities are amplified. As demonstrated by the present case study of NYC, we suggest that such index based analyses may be helpful in predicting the vulnerability of subgroups in other cities that the COVID-19 has yet to hit and to prepare for future pandemics. Accurate mappings of this pandemic allow us to predict the spread of a communicable diseases and identify the most vulnerable subgroups. This information should be acted upon to more appropriately allocate medical resources in the future, to support the communities and neighborhoods that are the most in need. Ultimately, an accurate model of the spread of the COVID-19 can minimize the lasting impacts of institutionalized racism and ensure that ethnicity is not what guarantees good healthcare. In the long run, quantitative analyses, such as presented here, can guide policies to aid in the reduction of health disparities in the post-COVID-19 era. Acknowledgements. This research was undertaken as part of the MIT-PRIMES program. This appendix provides a tabulation in Table II of the Pearson correlation coefficients calculated in Section III. These tabulated results are presented graphically in Figure 7 of the main text. Note that deaths were only recorded in each ZCTA from May 18th. The table also indicates the associated p-values for each time period. Observe that while for the earliest date (4/1/2020) the p-value was of marginal significance (p < 0.01), for subsequent dates the correlation was found to be pronounced between all quantities with p < 0.001. Date Cases %+tests Deaths 4/1 -0.22 * -0.46 ** -4/6 -0.36 ** -0.52 ** -4/11 -0.42 ** -0.53 ** -4/16 -0.47 ** -0.54 ** -4/21 -0.49 ** -0.53 ** -4/26 -0.49 ** -0.54 ** -5/1 -0.53 ** -0.57 ** -5/6 -0.53 ** -0.60 ** -5/11 -0.54 ** -0.61 ** -5/16 -0.54 ** -0.64 ** -5/21 -0.54 ** -0.64 ** -0.43 ** 5/26 -0.53 ** -0.65 ** -0.43 ** 5/31 -0.53 ** -0.65 ** -0.43 ** 6/5 -0.53 ** -0.65 ** -0.43 ** 6/10 -0.53 ** -0.65 ** -0.42 ** 6/15 -0.53 ** -0.65 ** -0.42 ** 6/20 -0.53 ** -0.65 ** -0.43 ** 6/25 -0.53 ** -0.65 ** -0.43 ** 6/30 -0.53 ** -0.64 ** -0.44 ** TABLE II. * p-value < 0.01 ; * * p-value < 0.001. Institutional racism in mental health care Redlining To Reinvestment The Relationship Between Perceived Racism/Discrimination and Health Among Black American Women: A Review of the Literature From Hospitalization Rates and Characteristics of Patients Hospitalized with Laboratory-Confirmed Coronavirus Disease 2019 -COVID-NET Disparities in the Population at Risk of Severe Illness From COVID-19 by Race/Ethnicity and Income People who are at Higher Risk COVID-19 and African Americans The COVID-19 Pandemic: a Call to Action to Identify and Address Racial and Ethnic Disparities COVID-19 and racial/ethnic disparities Disparities in mobility responses to COVID-19 COVID-19: Testing Inequality in New York City, No. w27019 The Determinants of the Differential Exposure to COVID-19 in New York City and Their Evolution Over Time An Unsupervised Machine Learning Approach to Assess the ZIP Code Level Impact of COVID-19 in NYC Factors associated with hospitalization and critical illness among 4,103 patients with COVID-19 disease Variation in COVID-19 hospitalizations and deaths across New York City boroughs Demographic determinants of testing incidence and COVID-19 infections in New York City neighborhoods, No. w26952. National Bureau of Economic Research Hospitilization dynamics during the first COVID-19 pandemic wave: SIR modeling compared to Belgium The Home Mortgage Disclosure Act: A Synopsis and Recent Legislative History Federal Financial Institutions Examination Council, Home Mortgage Disclosure Act New spatially continuous indices of redlining and racial bias in mortgage lending: links to survival after breast cancer diagnosis and implications for health disparities research Institutional racism and pregnancy health: using Home Mortgage Disclosure act data to develop an index for Mortgage discrimination at the community level Housing Discrimination, Residential Racial Segregation, and Colorectal Cancer Survival in Southeastern Wisconsin A Multilevel Analysis of the Relationship Between Institutional and Individual Racial Discrimination and Health Status Relationship Files COVID-19: Data [link] www1 Notes on Regression and Inheritance in the Case of Two Parents Missing Race Data in HMDA and the Implications for the Monitoring of Fair Lending Compliance