key: cord-0985241-88ewyslk
authors: Kim, Byoungjun; Rundle, Andrew G.; Goodwin, Alicia T.Singham; Morrison, Christopher N.; Branas, Charles C.; El-Sadr, Wafaa; Duncan, Dustin T.
title: COVID-19 testing, case, and death rates and spatial socio-demographics in New York City: An ecological analysis as of June 2020
date: 2021-02-19
journal: Health Place
DOI: 10.1016/j.healthplace.2021.102539
sha: 6158a5db11ff260fd42d08ad229eb054b02b47af
doc_id: 985241
cord_uid: 88ewyslk

We assessed the geographic variation in socio-demographics, mobility, and built environmental factors in relation to COVID-19 testing, case, and death rates in New York City (NYC). COVID-19 rates (as of June 10, 2020), relevant socio-demographic information, and built environment characteristics were aggregated by ZIP Code Tabulation Area (ZCTA). Spatially adjusted multivariable regression models were fitted to account for spatial autocorrelation. The results show that different sets of neighborhood characteristics were independently associated with COVID-19 testing, case, and death rates. For example, the proportions of Blacks and Hispanics in a ZCTA were positively associated with COVID-19 case rate. Contrary to the conventional hypothesis, neighborhoods with low-density housing experienced higher COVID-19 case rates. In addition, demographic changes (e.g. out-migration) during the pandemic may bias the estimates of COVID-19 rates. Future research should further investigate these neighborhood-level factors and their interactions over time to better understand the mechanisms by which they affect COVID-19.

After the first detected COVID-19 case in New York City (NYC) on March 1, 2020, the city rapidly became the first epicenter of disease in the United States. As of April 2020, at its peak the NYC Department of Health and Mental Hygiene (NYCDOHMH) reported 15 days in that period with over 6000 new confirmed cases of COVID-19 and over 500 deaths (New York City Department of Health and Mental Hygiene, 2020a) . There is emerging evidence that marginalized (i.e. low-income) and vulnerable (i.e. racial and ethnic minority) populations are disproportionately affected by COVID-19 (Webb Hooper et al., 2020; Alsan et al., 2020) . Black and Hispanic/Latinx people in the U.S. experienced higher case rates and death rates compared to Whites, and there was a clear socio-demographic gradient in COVID-19 infection by income and poverty (New York University Furman Center, 2020; Price-Haywood et al., 2020; Raifman and Raifman, 2020) . Occupational characteristics were associated with risk for the disease as well as secondary transmissions; e.g., workers in the healthcare sector and other essential service occupations were at higher risk for infection, due to frequent interactions with possibly infected individuals, and being in close quarters for extended periods of time with other workers (Baker et al., 2020) . In addition to the individual-level characteristics, environmental factors in urban contexts such as urban design, housing density, and transportation systems can impact the transmission of infectious diseases (Harlem, 2020) . Highly populated neighborhoods and multi-family housing structures tend to increase person-to-person contacts, which in turn can exacerbate community transmission (Rocklov and Sjodin, 2020; Ghinai et al., 2020) . The frequent use of public transportation systems has also been noted as a potential risk factor of COVID-19 in urban areas (Zheng et al., 2020) .

Recent analytic studies looking at associations between neighborhood characteristics and the geographic distribution of COVID-19 have focused on socio-demographic factors, but have not explored the contribution of environmental factors. In addition, as SARS-CoV-2 testing and COVID-19 case/death rates show geographical clustering with high spatial autocorrelation (Kang et al., 2020) , it is critical to properly adjust for such spatial similarity in the modeling stage. Lastly, with the advent of COVID-19, the socio-demographic landscapes in NYC changed due to residents moving out of NYC in response to the pandemic. On average, 5% of NYC residents left the city between March 1 and May 1, but the proportion of residents who left the city was socially patterned and varied substantially across neighborhoods (Quealy, 2020) . The COVID-19 rates provided by NYCDOHMH are based on estimated populations from census data before the pandemic; thus, the SARS-CoV-2 testing and COVID-19 case/death rates are likely to be biased estimates. To the best of our knowledge, none of the recent studies on socio-demographic predictors of COVID-19 cases, testing, or hospitalizations have incorporated data on NYC's population changes across neighborhoods, and few have accounted for spatial dependencies that could bias estimates. In this study, we examine the geographic variation in socio-demographic characteristics, migration patterns, mobility, and built environmental factors in NYC in relation to COVID-19 rates, using spatial analytic methods to address potential issues of spatial autocorrelation.

COVID-19 statistics for testing, positive cases, and death counts of New York City residents by residential ZCTA (ZIP Code Tabulation Area) were obtained from the New York City Department of Health and Mental Hygiene on June 10, 2020 (New York City Department of Health and Mental Hygiene, 2020c). The NYCDOHMH reports data using modified ZCTAs, with certain boundaries modified to combine areas with small or no populations to allow for stable estimates of COVID-19 rates. After combining 34 such ZCTAs, there were a total of 177 modified ZCTAs (referred to simply as ZCTAs from now on) with valid COVID-19 data included in this analysis. The three outcomesnumber of SARS-CoV-2 tests, COVID-19 cases, and deathswere normalized by the population of their ZCTAs and used as outcomes. The total population by ZCTA was obtained from the 2018 American Community Survey 5-year estimates. As our method for spatial regression analysis can only account for areas with physically-touching neighbors, one ZCTA with no neighboring areas (10044: Roosevelt Island) was excluded from the analysis. Another island in NYC, Rikers Island, is included in the analysis as it is incorporated into ZCTA 11370, which contains land in Queens (Astoria Heights) that shares boundaries with neighboring ZCTAs.

Pre-pandemic socio-demographic characteristics were calculated by ZCTA from the 2018 American Community Survey 5-year estimates, including age, sex, race/ethnicity, median income, household size, occupation, and commuting characteristics (U.S. Census Bureau, 2020). Specifically, variables included in the present analysis as predictors of COVID-19 outcomes were: male-to-female ratio (number of male/number of female); percentages of the population under 18 (used as reference), 18-44, 45-64, 65-74, and over 75 years; percentages of non-Hispanic White (used as reference), Black, Asian, others, and Hispanic populations; median household income; average household size; percentages involved in essential service occupations (firefighting, law enforcement, building and ground cleaning and maintenances, food preparation and serving related, and personal care) and health-related occupations (healthcare practitioners and technical occupations, healthcare support services); and percentage commuting via public transit. The socio-demographic variables were estimates from 2014 to 2018 surveys, and therefore the data may not fully reflect characteristics during the pandemic. Socio-demographic variables expressed as proportions were re-scaled such that a 1-unit change reflected the inter-quartile range (IQR), that is, the difference between the 75th and 25th percentile ZCTA.

A zoning map for NYC was obtained from the NYC Department of City Planning (New York City Department of City Planning, 2020), and each ZCTA's percentages of land assigned to low and high-density residential zonings were calculated using Quantum GIS v3.10. (QGIS Development Team, Open Source Geospatial Foundation Project). Residential zones R1-R5 are classified as low-density zones, predominantly consisting of detached or semi-detached single-and two-family housing. Zones R6-R10 are classified as higher-density residential zones, and allow for high-rise multifamily housing (New York City Department of City Planning, 2018).

Lastly, in order to account for residents moving out of the city during the COVID-19 pandemic, a dataset from cellular phone usage was utilized as a proxy measure of population changes. The data from cellular phone towers captures the mobility and migration patterns of a wide range of residents, as the towers interact with all types of cellular devices even when those devices are in stand-by and calls are not in progress. More than 1 million cellular devices that interacted with cellular towers in NYC were analyzed, and the percentage change in registered cellular phone signals between March 1, 2020 and May 1, 2020 were aggregated into census tracts by Teralytics Inc. (New York, NY). These data were provided to our research team by The New York Times (Quealy, 2020) . The census-tract-level data were converted to ZCTAs with the help of Crosswalk Files provided by the U.S. Department of Housing and Urban Development. For ZCTAs containing area from multiple census tracts, we calculated a weighted average of the component census tracts' population change rates. Weights were determined by calculating the proportion of residential addresses in a ZCTA contained within each given census tract (U.S. Department of Housing and Urban Development). The resulting ZCTA-level percent decreases in residential population was conceptualized as an "out-migration" index, and this metric was included in our analyses as a covariate.

Spatial autocorrelation indicates geographical interdependencies among observations in data. When spatial autocorrelation is detected, the major assumptions of uncorrelated error terms and independence of observations are violated. This can lead to biased parameter estimates, necessitating adjustment for spatial clustering (LeSage and Pace, 2009; Ward and Gleditsch, 2019) . Therefore, we first tested for spatial autocorrelations of all variables using the Global Moran's I. In this study, a row-standardized binary contiguity spatial weight matrix with first-order queen's criteria was employed, which is a conventional spatial matrix for areal data (Haining, 1991) . A pseudo p-value of the Global Moran's I for each variable was estimated from a Monte Carlo simulation of 999 random iterations. Second, spatially adjusted Spearman correlations were tested to evaluate bivariate correlations between study variables and COVID-19 rates based on a spatial adjustment method proposed by Clifford and Richardson (Clifford and Richardson, 1985; Duncan et al., 2011) .

After evaluating spatial autocorrelation for each variable of interest, we tested regression models for each exposure variable after adjusting for both the spatial autocorrelation and the out-migration index. This set of regression models provides unbiased crude associations between neighborhood-level factors and COVID-19 rates, adjusting for the potential confounding due to out-migration. Lastly, a set of spatially adjusted multivariable models with all neighborhood characteristic variables were specified. Because the neighborhood-level socio-demographic and built-environmental factors are interconnected (Leal et al., 2012) , we tested variance inflation factors (VIF) for each variable to check the degree of multicollinearity (Song et al., 2017) . We employed a cut-off point of 10, considering the sample size of the analysis and underlying correlations between socio-demographic characteristics (Craney and Surles, 2002) .

In the model with COVID-19 case rate as the outcome, we included testing rate as a covariate because the case rate is associated with the number of tests conducted in a ZCTA. Likewise, the case rate was included as a covariate in the model with COVID-19 death rate as the outcome. The multivariable models were adjusted for spatial autocor-relation contingent on Lagrange Multiplier (LM) test results. The LM tests evaluated spatial error and lag dependences from non-spatial ordinary least squares (OLS) models (LeSage and Pace, 2009). Based on the test results, spatial error models or spatial lag models were developed to account for spatial autocorrelation (LeSage and Pace, 2009; Ward and Gleditsch, 2019) . The spatial error model equation can be represented as follows:

Where y is the dependent variable; β is the vector of the regression parameter associated with the matrix of observations on the covariates X; λ is the spatial autoregressive coefficient that indicates the extent to which the spatial component of the errors is correlated with each other; W is the given spatial weight matrix; and μ is an independent error term. The spatial lag model considered is given by:

where y is the dependent variable; ρ is the spatial autoregressive coefficient for the lagged dependent variable matrix Wy (W is the given spatial weight matrix); β is the vector of coefficients of regression parameters associated with the independent variable matrix X; and ε is an error term that is assumed to be independent and identically distributed.

In the spatial lag model, therefore, spatial autocorrelation is introduced in the form of the spatially dependent variable, as the outcomes in one place predict an increased likelihood of similar outcomes in neighboring places.

The Akaike Information Criterion was examined for goodness-of-fit (Bozdogan, 1987) . All statistical analyses were conducted in R statistical software version.

Maps showing COVID-19 testing, case and death rates and selected Fig. 1. COVID-19 testing, case, death rates (/100,000) and socio-demographics in New York city by ZCTA socio-demographic characteristics by ZCTA are provided in Fig. 1 . Table 1 provides descriptive statistics and the spatial autocorrelation statistic (Moran's I) for each study variable. The city-wide SARS-CoV-2 testing rate was 9569.9 per 100,000 population (standard deviation across ZCTAs (SD): 2641.5), and the COVID-19 case and death rates were 2286.8 (SD: 878.0) and 188.7 (SD: 102.0) per 100,000 respectively. The Moran's I statistics show strong spatial autocorrelation of all variables of interest, indicating the violation of independence of observations. For example, the COVID-19 case rate has high spatial homogeneity (Moran's I = 0.75, p-value<0.01).

The spatially adjusted Spearman test illustrates the rank-order correlations between COVID-19 rates and variables of interest (Table 2) . For example, a high percentage of the population in health-related occupations is highly correlated with SARS-CoV-2 testing (rho = 0.41), case rate (rho = 0.62), and death rate (rho = 0.50). The out-migration index was associated with lower testing (rho = − 0.29), case (rho = − 0.57) and death (rho = − 0.40) rates, meaning that ZCTAs that lost more of their population during the pandemic had lower rates for all three outcomes assessed. Bivariate and multivariable model estimates for each outcome are provided in Table 3 . The bivariate analyses indicated predictors of each COVID-19 rate after adjusting for spatial autocorrelation and outmigration. The Moran's I values for the non-spatial multivariable OLS regression residuals showed evidence of spatial autocorrelation in each model, confirming the need for a spatial error model or a spatial lag model. The LM tests of OLS models indicate that spatial lag models would be better-fitting models for SARS-CoV-2 testing and case rates, whereas a spatial error model could be applied for COVID-19 death rate (Florax et al., 2003) . Lastly, the VIF tests for each multivariable OLS model detected high multicollinearity of one independent variable: the percentage of people in the ZCTA working service jobs (VIF > 10). We therefore fitted spatial lag and error models without that variable, and the results remained stable in terms of the size and confidence interval of each coefficient (data not shown).

From the multivariable spatial lag model (Table 3) (the interquartile range) shows a +2238 per 100,000-person difference in the testing rate, after adjusting for covariates. (Table 3 ). The out-migration index score was positively associated with ZCTA-level case rates (β = 11.5, CI: [0.2, 22.8]). (β = 48.3, CI: [20.9, 75 .7]) were positive predictors of COVID-19 death rate. The out-migration index score was not associated with ZCTA-level COVID-19 death rates.

The aim of this study was to examine social and environmental determinants of COVID-19 in NYC. To the best of our knowledge, this is the first study to address the potential bias due to residents moving out of the city at differential rates across neighborhoods. News-media reported that out-migration varied strongly by neighborhood socio-demographic characteristics, and our analysis found that neighborhood level outmigration, as measured by cell phone derived data, was also associated with COVID-19 case rates. As reported in the Supplemental table (Table S1) , there appeared to be modest confounding by out-migration; the sizes of coefficients for parameter estimates differed between models that did and did not adjust for the out-migration. After adjustment for out-migration, we identified potential drivers of SARS-CoV-2 test, case, and death rates, including a ZCTA's composition by sex, age, race/ ethnicity, income, and occupational risks. Specifically, our results showed that higher percentages of residents who were Black and Hispanic were positively associated with COVID-19 case rates. We also confirmed that neighborhoods with higher percentages of essential Boldface indicates statistical significance.

service and healthcare-related workers had increased SARS-CoV-2 testing and death rates. These findings are consistent with recent studies on the social determinants of COVID-19 (New York City Department of Health and Mental Hygiene, 2020a; Webb Hooper et al., 2020; New York University Furman Center, 2020; Baker et al., 2020) . Contrary to the existing findings (Harris, 2020) , higher pre-pandemic transit ridership in a neighborhood was not associated with higher COVID-19 case or death rates. Subway use has decreased dramatically after the declaration of a local state of emergency (Sy et al., 2020) , and this null association may reflect the reduced overall ridership during the pandemic. Another possible explanation could be increased vigilance in transit ridersand subsequent higher levels of health-promoting behaviors such as physical distancing and mask-wearingdue to public awareness of the risk for infection in enclosed spaces. Additionally, the NYC Metropolitan Transportation Authority (MTA) worked to mitigate COVID-19 transmission risk by regularly disinfecting all subways and buses, installing hand sanitizers in all stations, and marking six feet of distance on subway platforms (Goldbaum 2020) (New York City Office of the Mayor, 2020).

Our results indicated that neighborhoods zoned for predominantly low-density housing had higher COVID-19 case rates than those zoned for predominantly high-density housing. While this is not consistent with the conventional hypothesis (Rocklov and Sjodin, 2020) , there are inconsistent findings regarding the association between population density and COVID-19 rates. Recent studies on neighborhood-level factors in Chinese and European cities reported that population density was negatively or not associated with COVID-19 (Liu, 2020; Gerli et al., 2020) . One explanation could be the differential application of mitigation strategies. Early in the pandemic the Center for Disease Control and Prevention and the NYCDOHMH announced guidelines for maintaining safe operations of multifamily housing, including closing public areas in the building, disinfecting common areas, providing hand sanitizer in common areas, and mandatory mask-wearing (Centers for Disease Control and Prevention, 2020; New York City Department of Health and Mental Hygiene, 2020b) . A myriad of high-rise residential buildings in NYC voluntarily implemented such recommendations (Amy Plitt, 2020), and such vigilance may be associated with the relatively low COVID-19 case rates in neighborhoods zoned for high-density residential buildings. Visual inspection of the map (Fig. 1) shows that areas with the highest percentage of low-density housing are located on the periphery of the city. It is possible that such a geographic location could lead to lower access to healthcare and social support, and subsequent higher COVID-19 risk despite any protections conferred by living in low-density housing (Ji et al., 2020) .

There are limitations in this analysis. First, the city's COVID-19 testing and case data are not necessarily representative of the underlying populations at risk or experiencing COVID-19 infection in their ZCTA. Of note, until early May the NYC DOHMH discouraged people with mild and moderate symptoms from being tested due to limited testing resources. Therefore, these reported rates are subject to potential selection/sampling bias as well as misclassification. To illustrate, NYC overall is 52% female and 48% male (U.S. Census Bureau, 2020), but the testing breakdown by sex was 56% female and 44% male. Because COVID-19 testing and case data did not come from a randomly sampled or representative population, our analyses may be confounded by the skewedness of the underlying data. However, the COVID-19 testing, case, and death rates from the DOHMH were the best and only available data to estimate population-level COVID-19 in NYC. Second, this analysis is susceptible to many common problems in neighborhood-level analyses. It is susceptible to the ecological fallacy: that the associations found in aggregated data may not translate to corresponding associations at the individual level. Additionally, in ecological studies measurement errors in the predictor variables can bias results away from the null (Brenner et al., 1992) . Our unit of analysis, ZCTA, and similar geographic boundaries can be subject to the modifiable area unit problem (MAUP) -potential bias due to the artificial aggregation of point-based data (Wong, 2009) . Third, there is a temporal mismatch between the COVID-19 statistics and the socio-demographic data used in this analysis. COVID-19 data were retrieved on June 10, 2020, whereas the American Community Survey data are estimates from a 5-year survey conducted between 2014 and 2018. Fourth, the analysis is also susceptible to potential residual confounding, such as by the uneven distribution of underlying health conditions by neighborhood. For example, chronic respiratory and cardiovascular diseases can increase the risk for COVID-19 death (Jordan et al., 2020) , and may also be associated with neighborhood conditions, but such measures were not Boldface indicates statistical significance.

incorporated in this analysis due to lack of available data. Lastly, the cellular phone data-based estimates of out-migration provided by The New York Times may not fully capture people's movement in and out of the city, since phone usage may systematically differ from actual mobility of the residents in corresponding ZCTA.

This study provides important information on neighborhood-level factors and their association with COVID-19, in the context of a large metropolitan city with a high burden of COVID-19 in the United States. In addition to socio-demographic characteristics like neighborhoodlevel distributions of sex, age, and race/ethnicity, we must also focus on the impacts of the built environment on COVID-19 transmission and mortality. Future research should emphasize interactions between health behaviors (i.e. social distancing and commuting behaviors) and built environments in order to shed light on the environmental determinants of COVID-19.

This research did not receive any specific grant or funding from agencies in the public, commercial, or not-for-profit sectors.

Disparities in coronavirus 2019 reported incidence, knowledge, and behavior among US adults

How New York City Residential Buildings Are Tackling Coronavirus. Curbed

Estimating the burden of United States workers exposed to infection or disease: a key factor in containing risk of COVID-19 infection

Model selection and Akaike information criterion (aic) -the generaltheory and its analytical extensions

Effects of nondifferential exposure misclassification in ecologic studies

COVID-19 Guidance for Shared or Congregate Housing

Testing the association between two spatial processes

Model-dependent variance inflation factor cutoff values

Validation of Walk Score® for estimating neighborhood walkability: an analysis of four US metropolitan areas

Specification searches in spatial econometrics: the relevance of Hendry's methodology

COVID-19 mortality rates in the European Union, Switzerland, and the UK: effect of timeliness, lockdown rigidity, and population density

Community transmission of SARS-CoV-2 at two family gatherings -chicago

2020. N.Y.C.'s Subway, a 24/7 Mainstay, Will Close for Overnight Disinfection. The New York Times

Bivariate correlation with spatial data

Descriptive analysis of social determinant factors in urban communities affected by COVID-19

The Subways Seeded the Massive Coronavirus Epidemic in new york City

Potential association between COVID-19 mortality and health-care resource availability

Covid-19: risk factors for severe disease and death

Spatial epidemic dynamics of the COVID-19 outbreak in China

Multicollinearity in associations between multiple environmental features and body weight and abdominal fat: using matching techniques to assess whether the associations are separable

Introduction to Spatial Econometrics

Emerging study on the transmission of the Novel Coronavirus (COVID-19) from urban perspective: evidence from China

New York City Department of City Planning

COVID-19: Data Summary

COVID-19: FAQ for Residential and Commercial Buildings

Incident Command System For COVID-19 Response -NYC Coronavirus (COVID-19) Data. New York City Department of Health and Mental Hygiene

As New York Nears First Phase of Reopening. Mayor de Blasio Calls on MTA to Take Concrete Steps to Keep New Yorkers Safe on Mass Transit. The Official Website of the City of New York

Hospitalization and Mortality among Black Patients and White Patients with Covid-19

The Richest Neighborhoods Emptied Out Most as Coronavirus Hit New York City. The New York Times

Disparities in the population at risk of severe illness from COVID-19 by race/ethnicity and income

High population densities catalyse the spread of COVID-19

A Comparison between Spatial Econometric Models and Random Forest for Modeling Fire Occurrence

Socioeconomic Disparities in Subway Use and COVID-19 Outcomes

American Community Survey 5-year Estimates

Department Of Housing And Urban Development

Spatial Regression Models

COVID-19 and Racial/Ethnic Disparities

The modifiable areal unit problem (MAUP)

Spatial transmission of COVID-19 via public and private transportation in China

The authors gratefully acknowledge Kevin Quealy at the New York Times for providing the cellular phone usage data that supported this analysis. The authors also thank the New York City Department of Health and Mental Hygiene for the rapid and open provision of COVID-19 data.

Supplementary data to this article can be found online at https://doi. org/10.1016/j.healthplace.2021.102539. Boldface indicates statistical significance. a Adjusted for Out-migration index. b Annual income in $1000. c % changes in unique cellular phone signals between March 1, 2020 and May 1, 2020.