key: cord-0984637-upnggmpa authors: Hamidi, Shima; Hamidi, Iman title: Subway Ridership, Crowding, or Population Density: Determinants of COVID-19 Infection Rates in New York City date: 2021-01-26 journal: Am J Prev Med DOI: 10.1016/j.amepre.2020.11.016 sha: 6ecc4e277cd5d6366684e1244b6bd4e284fd43b7 doc_id: 984637 cord_uid: upnggmpa Introduction This study aims to determine whether subway ridership and built environmental factors such as population density and points of interests are linked to the per capita coronavirus disease 2019 (COVID-19) infection rate in New York City ZIP codes, after controlling for racial and socioeconomic characteristics. Methods Spatial lag models were employed to model the cumulative COVID-19 per capita infection rate in New York City ZIP codes (N=177) as of April 1 and May 25, 2020, accounting for the spatial relationships among observations. Both direct and total effects (through spatial relationships) were reported. Results This study distinguished between density and crowding. Crowding (and not density) was associated with the higher infection rate on April 1. Average household size was another significant crowding-related variable in both models. There was no evidence that subway ridership was related to the COVID-19 infection rate. Racial and socioeconomic compositions were among the most significant predictors of spatial variation in COVID-19 per capita infection rates in New York City, even more so than variables such as point of interest rates, density, and nursing home bed rates. Conclusions Point of interest destinations not only could facilitate the spread of virus to other parts of the city (through indirect effects) but also were significantly associated with the higher infection rate in their immediate neighborhoods during the early stages of the pandemic. Policymakers should pay particularly close attention to neighborhoods with a high proportion of crowded households and these destinations during the early stages of pandemics. Without any statistical analysis and largely based on observational data, the study argued that the New York subway system was a major disseminator and likely served as the transmission vehicle for the spread of the COVID-19 pandemic, particularly on early days during the first 2 weeks of March. The study concluded that ZIP codes that are located along the subway lines had a higher number of confirmed cases than ZIP codes that were not served by subway. 5 In absence of data and statistical analysis, claims in this paper have fueled political debates on conservative media outlets and among policymakers. In NYC, 4 council members cited this paper in their letter to New York Governor Cuomo demanding the complete shutdown of the New York subway system. The petition was largely pushed back by the Metropolitan Transit Authority, emphasizing the critical role of public transit in providing mobility for the frontline essential workers during the pandemic. 6, 7 In addition, there is very little evidence on the relationship between population density and crowding and spatial variations in COVID-19 infection rates at the ZIP code level in NYC. The effects of population density on COVID-19 have been at the center of attention; however, population density is distinct from crowding, which is defined as a large number of people gathered closely together. Crowding could happen in bars, restaurants, sport events, and any other destination that could attract visitors; in other words, "points of interest" (POIs). 8, 9 The Pearson correlation coefficient between population density and POIs per 1,000 population in NYC ZIP codes is <0.052, which also confirms the distinction between the 2 measures. Very little is known about the relationship between different types of crowding venues at the neighborhood level and the COVID-19 infection rate. Another factor that has been largely missed by existing studies is the extent to which NYC neighborhoods have been emptying out to escape the pandemic. According to the New York Times, as of May 1, in many neighborhoods in Manhattan between 30% and 50% of residents were gone. 10 This study is the first to conceptualize and integrate 3 dimensions of crowding including households, businesses, and subways in a comprehensive framework. The major aim of this study is to investigate the relationship among these 3 crowding variables, population density, and other confounding factors and the COVID-19 (per capita) infection rate during the early stages (as of April 1) and after the epidemic curve was flattened (as of May 25) at the ZIP code level in NYC. Spatial autoregressive modeling techniques were employed to control for the spatial dependency of observations (ZIP codes) in the sample. The authors hypothesize that, during the early stages, crowding-related factors such as POIs and crowded housing explain the spatial distributions of infection rates, whereas on May 25, racial and socioeconomic characteristics had the strongest relationship with the per capita infection rate. The sample in this study consisted of 177 ZIP Code Tabulation The independent variable of greatest interest is subway ridership. Raw data on transit ridership were obtained from the Metropolitan Transit Authority. 14 test. 15 This analysis also accounted for the number of POIs within each ZCTA in NYC, utilizing data from SafeGraph. 16 The SafeGraph database measures foot traffic patterns to POIs based on GPS data from >45 million smartphones in the U.S. POIs include restaurants, cafes, retail shops, movie theaters, parks, and other public places that could attract visitors. Initially, 2 sets of POI variables representing the level of crowding at the baseline and in March were computed for each ZCTA. However, checking the face validity of these variables 17 via ArcMap and Google Maps showed that the most reliable and accurate variable was the number of POIs in each ZCTA per 1,000 population, which was computed and used as a proxy for business crowding in this study. In addition, analyses controlled for the percentage of residents in each ZCTA who left NYC to escape the pandemic in March and April. The data were borrowed from New York Times based on aggregated smartphone location data from Descartes Lab and measured the proportion of population who lived in NYC during the last 2 weeks of February but were not living there on May 1. 10 The population-weighted average of Census tracts was calculated to obtain the ZCTA-level variable. Employing the same methodology as Yost et al., 18 an SES index was developed for each ZCTA based on the following variables from 2018 American Community Survey (5-year estimates) 19 : median household income in the past 12 months; median gross rent; median home value; percentage unemployed (aged ≥16 years); percentage working class (aged ≥16 years); percentage living <150% of poverty line; and the education index, which is a weighted combination of the percentage below high school education, high school graduates, and more than high school degrees (adults aged ≥25 years). The higher value of education index represents higher educational attainments. Using principal component analysis, these variables were combined into 1 score for each ZCTA with the eigenvalue of 5.2, which explains 74.8% of the variance following this equation: (median home value × 0.141) + (median gross rent × 0.17) + median household income × 0.181) + (percentage below poverty × -0.161) + (percentage unemployed × -0.147) + (percentage working class × -0.174) + (education score × 0.177). The score was standardized to have a mean of 100 an SD of 25. Measures of racial composition characteristics, including percentage Black, percentage Hispanic, and average household size, were computed based on data from 2018 American Community Survey (5-year estimates). 19 In addition, population density was computed by dividing the ZCTA's total population by the land area in square miles. Finally, using ArcMap, the number of beds in nursing homes and assisted living facilities for each ZCTA was calculated based on data from the Homeland Infrastructure Foundation-level Data 20 and was converted to a per capita rate variable by dividing the number of beds in each ZCTA by ZCTA population. Pearson correlation coefficients between explanatory variables are presented in Appendix Table 1 . The nature of virus spread is a spatialized phenomenon, meaning that the per capita rate of infection rate in a ZCTA is not independent of the infection rate in surrounding ZCTAs. People move beyond the boundary of ZIP codes and so does the virus. The spatial relationship between ZCTAs violates the assumption of ordinary least squares, which requires the unexplained error term to be randomly distributed across observations. 21 This was also confirmed with the Moran's I analysis of ordinary least squares regression residuals with the coefficient value of 0.38, which was statistically significant at <0.001 level. Two forms of spatial autoregressive modeling methods, spatial lag and spatial error, are used to account for spatial dependency among observations. 21 Based on the results of the Lagrange Multiplier tests, the spatial lag model was selected and performed using R, version 4.0.2 software. The spatial lag model estimates both direct and indirect effects of explanatory variables on the COVID-19 infection rates. The indirect effects are through the spatial relationship between observations (ZCTAs). The total effect is the sum of direct and indirect effects, which is also presented in the Results tables. Except for subway ridership variables and the nursing home bed rate, all other variables were log-transformed to achieve a better fit with the data, reduce the influence of outliers, and adjust for nonlinearity of the data. As a result, the coefficients in the Results tables are interpreted as elasticities. The collinearity diagnostic test was also performed and the tolerance values of explanatory variables, in both models, were higher than the 0.2 threshold, 22 which suggested no issue of multicollinearity. The results of spatial lag models for the COVID-19 infection rate per 1,000 population as of April 1 and May 25 are shown in Tables 2 and 3 , respectively. The comparison between the 2 tables shows noticeable differences between factors that significantly explained the infection rate at these 2 times during the COVID-19 pandemic. The comparison between the 2 models revealed that at early stages of the pandemic and before NYC On the other hand, from the list of control variables, the racial and socioeconomic compositions were among the most significant predictors of the spatial variation in COVID-19 per capita infection rates in NYC, even more so than variables such as POI rates, density, and nursing home bed rates. These findings align with recent findings about the increased prevalence of COVID-19 in low-income, Hispanic-, and Black-majority neighborhoods in NYC, possibly due to their greater risk of occupational exposure and other key social determinants of health. 1, 2, [26] [27] [28] This study found no evidence that subway ridership was related to the COVID-19 infection rate in NYC. The recent experience of a few developed countries in tracing infection clusters confirms this finding. In Japan, since the state of emergency was lifted in late May, the majority of infection clusters were traced to gyms, bars, music clubs, and karaoke rooms whereas not even a single infection cluster, defined as ≥3 COVID-19 infections linked by contact, were associated with its highly popular and often crowded commuter trains. 24 Similarly, according to the National Public Health Institute in France, between May 9 to June 15, from 150 clusters of new COVID-19 infections, none were traced to the nation's public transit system consisting of 6 subway systems, trams, light rail, and bus networks. In fact, most of these clusters had emerged in hospitals, workplaces, and homeless shelters. 25 In addition, findings about the insignificant link between population density and the per capita COVID-19 infection rate run counterintuitive to recent dialogues in News Media Outlets and among policymakers that highlight the role of density on the COVID-19 spread particularly in NYC. 29 It's about the number of people in a small geographic location allowing that virus to spread. ... Dense environments are its feeding grounds." 30 Prior to the COVID-19 pandemic, extensive research has confirmed the environmental and public health benefits of dense, compact, and transit-accessible developments. [31] [32] [33] [34] This study found no evidence that population density was associated with a higher per capita COVID-19 infection rate. Indeed, crowding (and not density) was associated with the higher infection rate on April 1. One limitation of this study is that the analyses were based on ZCTA-level aggregated data and did not control for the individual-level variations and interactions among variables. Therefore, findings could not draw individual-level conclusions particularly related to socioeconomic factors. In addition, the aggregated nature of this study limits the ability to control for individual-level factors such as underlying health conditions that might be associated with the severity of disease and the likelihood of testing. This study offers empirical evidence that distinguishes between population density and different forms of crowding and shows that crowded households, measured in terms of household size, are associated with the significantly higher per capita infection rate across NYC ZIP codes. In addition, destinations (POIs) that could attract visitors not only could facilitate the spread of virus to other parts of the city (through indirect effects) but also are significantly associated with the higher per capita infection rate in their immediate neighborhoods, particularly during the early stages of the pandemic. Policymakers should pay particularly close attention to neighborhoods with a high proportion of crowded households and these destinations (or POIs) during the early stages of pandemics. Another major takeaway of this study is that investigators found no evidence that a higher per capita subway ridership and percentage changes in subway ridership are related to the COVID-19 infection rate across the NYC ZIP codes. These findings challenge Harris, 5 who argued the ZCTAs along the subway lines had significantly higher infection rates than ZIP codes that were not served by subway. Still, it may be too early to draw a definitive conclusion and more studies are needed to further investigate the role of the transit system (including other transit modes) on COVID-19 pandemic spread through contact tracing. Demographic determinants of testing incidence and COVID-19 infections in New York City neighborhoods Neighborhood inequity: exploring the factors underlying racial and ethnic disparities in COVID-19 testing and infection rates using ZIP code data in Chicago and New York Disparities in COVID-19 testing and positivity in New York City The determinants of the differential exposure to COVID-19 in New York City and their evolution over time The subways seeded the massive coronavirus epidemic in New York City. NBER working paper 27021 The New York subway got caught in the coronavirus culture war Fear of public transit got ahead of the evidence Does density aggravate the COVID-19 pandemic? Early findings and lessons for planners Longitudinal analyses of the relationship between development density and the COVID-19 morbidity and mortality rates: early evidence from 1,165 metropolitan counties in the United States. Health Place The richest neighborhoods emptied out most as coronavirus hit New York City. The New York Times The New York Times. About 40% of U.S. coronavirus deaths are linked to nursing homes. The New York Times Confirmed and probable COVID-19 deaths A month of coronavirus in New York City: see the hardest-hit areas. The New York Times Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe Places schema Basic Quantitative Research Methods for Urban Planners Socioeconomic status and breast cancer incidence in California for different race/ethnic groups American Community Survey 5-Year estimates Department of Homeland Security. Homeland infrastructure foundation-level data (HIFLD) Spatial Econometrics: Methods and Models A caution regarding rules of thumb for variance inflation factors Americans rapidly answering the call to isolate, prepare Japan ends its COVID-19 state of emergency Japan and France, riding transit looks surprisingly safe. Bloomberg CityLab COVID-19 and African Americans Health inequalities and infectious disease epidemics: a challenge for global health security The impact of workplace policies and other social factors on self-reported influenza-like illness incidence during the 2009 H1N1 pandemic Density is New York City's big "enemy" in the coronavirus fight. The New York Times Coronavirus is making some people rethink where they want to live Urban sprawl and the emergence of food deserts in the USA Associations between urban sprawl and life expectancy in the United States Urban sprawl as a risk factor in motor vehicle crashes Costs of Sprawl This research was supported by the Bloomberg American Health Initiative at the Johns Hopkins Bloomberg School of Public Health. SH contributed to conceptualization, formal analysis, methodology, validation, supervision, visualization, writing-original draft, and writing-review and editing. IH contributed to data curation.