key: cord-0854853-rseamr1s authors: Reichberg, Samuel B; Mitra, Partha P; Haghamad, Aya; Ramrattan, Girish; Crawford, James M; Berry, Gregory J; Davidson, Karina W; Drach, Alex; Duong, Scott; Juretschko, Stefan; Maria, Naomi I; Yang, Yihe; Ziemba, Yonah C title: Rapid Emergence of SARS-CoV-2 in the Greater New York Metropolitan Area: Geolocation, Demographics, Positivity Rates, and Hospitalization for 46,793 Persons Tested by Northwell Health date: 2020-07-08 journal: Clin Infect Dis DOI: 10.1093/cid/ciaa922 sha: 0a44fc0beae75175f9df96742f898c7498b90f89 doc_id: 854853 cord_uid: rseamr1s BACKGROUND: In March 2020, the greater New York metropolitan area became an epicenter for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. The initial evolution of case incidence has not been well characterized. METHODS: Northwell Health Laboratories tested 46,793 persons for SARS-CoV-2 from March 4 through April 10. The primary outcome measure was a positive reverse-transcription-polymerase-chain-reaction (RT-PCR) test for SARS-CoV-2. The secondary outcomes included patient age, sex, and race if stated; dates the specimen was obtained and the test result; clinical practice site sources; geo-location of patient residence; and hospitalization. RESULTS: From March 8 through April 10, a total of 26,735 of 46,793 persons (57.1%) tested positive for SARS-CoV-2. Males of each race were disproportionally more affected than females above age 25, with a progressive male predominance as age increased. Of the positive persons, 7,292 were hospitalized directly upon presentation; an additional 882 persons tested positive in an ambulatory setting before subsequent hospitalization, a median of 4.8 days later. Total hospitalization rate was thus 8,174 persons (30.6% of positive persons). There was a broad range (greater than 10-fold) in the cumulative number of positive cases across individual zip codes following documented first case incidents. Test positivity was greater for persons living in zip codes with lower annual household income. CONCLUSIONS: Our data reveal that SARS-CoV-2 incidence emerged rapidly and almost simultaneously across a broad demographic population in the region. These findings support the hypothesis that SARS-CoV-2 infection was widely distributed prior to virus testing availability. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has presented major challenges to health care institutions globally. A challenge in identifying these patients is the speed at which patients can develop severe infections following exposure, and the widely varying estimates for case incidence of those infected (1) . Shortly after the first case of SARS-CoV-2 infection was identified in New York State (March 1, 2020), Northwell Health, a large integrated health care system that serves the greater New York City region, began testing for SARS-CoV-2 viral RNA. Our first positive case was found on March 8 for a specimen collected on March 4. Over the next five weeks, Northwell Health Laboratories (NWL) identified positive cases of SARS-CoV-2 in 26,735 of the 180,458 persons (14.8%) identified in New York State (2) . With these data, we sought to understand the spread of SARS-CoV-2 through the greater New York metropolitan region. The population for this study was tested for SARS-CoV-2 by NHL from March 4 (first specimen collection date) through April 10, 2020 (last test result date). As NHL is an integrated laboratory network (3), SARS-CoV-2 testing was made available across the entire Northwell health system. NHL used three real-time reverse-transcription-polymerase-chain-reaction (RT-PCR) tests: Review Board approved this as minimal-risk research using de-identified data collected for routine clinical practice and waived the requirement for informed consent. From March 2 through April 10, 2020, a total of 345,838 SARS-CoV-2 tests were performed in the greater New York City region (the five counties of New York City plus Nassau, Suffolk, and Figure 1 (a time-lapse chronologic display of case accumulation per zip code is available as a Supplementary Video). SARS-CoV-2 was already widespread in our geographic region during the first week of testing, based on the almost simultaneous appearance of SARS-CoV-2 patients residing in widely dispersed zip codes. However, different zip code areas with the same starting date displayed markedly diverse case burden over the course study period, as shown by the growth of cumulative case incidence (Figure 2A) . This diversity is further quantified (Figure 2B ): the percentage of the population cumulatively testing positive per zip code is plotted as a function of the days elapsed after identification of the first case in its respective zip code area (each circle denotes one zip code). On this semi-log plot, a 10-fold range in cumulative case incidence is observed across different zip codes for a fixed appearance date of the first case. The symmetric distribution of the points around the median (blue line) on a log scale indicates a long-tailed, log-normal type distribution, with a few extreme zip codes showing large percentages affected. One data point is Figure 3 . At first, predominantly hospitalized patients were tested (inpatient floor or intensive care unit). As case incidence and familiarity with SARS-CoV-2 clinical presentation increased, the fraction of testing dedicated to hospitalized patients decreased to approximately 20%, while testing in emergency departments, urgent care centers and other outpatient settings increased. Daily Northwell SARS-CoV-2 testing volumes are shown in Figure 4A . The peak aggregate daily case incidence of SARS-CoV-2 occurred on April 1, with 1,862 positive cases. Figure 4B shows SARS-CoV-2 % test positivity rates beginning on March 13 when testing volumes began to increase dramatically; peak % test positivity rates were the last week of March. Northwell daily test % positivity rates substantially exceeded regional rates, particularly from March 16 to 21, with the final cumulative % positive rate on April 10 being 54.5% (Northwell) vs. 46.1% (service area); a ratio of During the study, 24,058 females and 22,610 males were tested (no sex information was available for 125 persons). The age distribution of testing by gender is given in Figure 5A . Although the age distribution of persons tested generally follows the patterns reported in the 2010 US census, persons under 35 years were markedly under-represented (p = 0.021). Test positivity rates increased progressively with age (p < 0.0001), with males showing higher rates (p = 0.003) except for the earliest ages (less than 5) or the latest (greater than 85 years of life);. We estimated the population-normalized distribution of the percentage of population affected by age and gender ( Figure 5B ). Cumulative Northwell SARS-CoV-2 positive cases across our service area accounted for 17% of the total cases reported in New York State. The estimated % of the regional population confirmed as SARS-CoV-2 positive (for females and males) was well below 1% for under age 25. For females age 25 and above, estimated case distribution rose steadily from A c c e p t e d M a n u s c r i p t 9 1.7% to 2.6% through age 84 and was 4.7% for age 85 and above. For males age 25 and above, estimated case incidence rose from 1.6% at age 25 to 4.4% through age 84 and was 6.0% for age 85 and above. Thus, in this population-normalized distribution, males were disproportionately more affected than females above age 25 (p<0.001). We next examined the potential impact of socioeconomic factors and race. Figure 6A shows the % of persons tested by NWL in each zip code, as a function of zip code average annual household income; no significant relationship is evident. Figure 6B shows SARS-CoV-2 % test positivity versus average annual household income by zip code. From $25,000 to $125,000 per annum, there is a strong negative correlation (R 2 = 0.35, p<0.0001). From $125,000 to $800,00 per annum, there is a slightly positive trend, which is not statistically significant. Supplemental Figure 3 shows that while the % of the population tested for SARS-CoV-2 by NWL did not correlate with zip code population and population density, there was a positive correlation of zip code % test positivity with these variables. Supplemental Figure 4A -B shows that zip code average annual household income inversely correlated with zip code population and population density. However, when our testing data was normalized to the respective fraction of the New York State-reported SARS-CoV-2 testing that NWL performed (Supplemental Figure 4C -F), the correlation of % test positivity with zip code average annual household income was eliminated. We therefore examined the relationship of NWL testing to zip code % persons below Poverty Level. Figure 7A shows the relationship of % of the population tested by Northwell per zip code to this variable, for the two counties in New York City for which such data were available and NWL testing represented greater than 20% of all SARS-CoV-2 testing performed. For Queens county but not Richmond county (Staten Island), there was a significant negative correlation between % testing versus % Poverty Level (R 2 0.34, p<0.02). Figure 7C shows the % of persons testing positive by Northwell for SARS-CoV-2 as a function of % Poverty Level. Again for Queens, there was a negative trend in % testing positive, but it did not reach significance ( Figure 7C ; R 2 0.26, p = 0.07). A c c e p t e d M a n u s c r i p t 10 Collectively, these economic data suggest that persons from lower income, higher population density zip codes had access to NWL-based SARS-CoV-2 testing that was comparable to the access of persons from higher income, lower population zip codes, but exhibited higher SARS-CoV-2 % test positivity rates. However, our population sampling from these respective zip codes may have differed from the overall regional SARS-CoV-2 testing as reported by New York State. This premise is supported by the higher % test positivity rates experienced by NWL, particularly during the latter half of March. This may have resulted from differential presention of higher acuity patients from lower income zip codes to Northwell during the early phase of the pandemic, and hence differential sampling of the regional population. We cannot exclude statistical sampling variability as a confounding variable. show the relationship of % of the population tested by NWL and % test positivity, respectively, as a function of the % not-White population per zip code. Statistically significant relationships are not identified, although Queens appears to reveal positive trends ( Figure 7B : R 2 0.13, p = 0.39; Figure 7D : R 2 0.15, p = 0.31). Looking then at our data specifically, information on "White", "Black", or "Asian" racial status was available for 17,574 (37.6%) of the 46,793 persons tested by NWL, with only 244 patients (0.5%) reporting "Hispanic" or "Indian", and unknown racial status for the remainder. Race information was patient-reported for less than 30% of persons below age 40 years, progressively rising to approximately 65% for the older age groups (see Supplemental rates are similar, and greater than White males. In aggregate, test positivity was highest in Blacks, followed by Asians and Whites (p<0.0001). The respective sex differences in test positivity between the three racial groups also were statistically significant (p<0.0001). The relationships of test results, age, and race, are further shown in positive patients who had been tested in an ambulatory setting were subsequently admitted to hospital. These results indicate that SARS-CoV-2 infection was already geographically widespread in the greater New York City region when testing began in early March 2020 (5), a premise supported by sequencing of viral genomes obtained from the New York area (6) and by modeling of the pandemic outbreak (7) . Given literature estimates of serial intervals between infections (4 to 6 days, (1, 8) and R0 values of 2.6 to 3.2 during the exponential period of disease outbreak (9), it is unlikely that six cases from five geographically dispersed zip codes over the next four days could be explained by secondary infections from the first March 4 th case, or from exposure to the first A c c e p t e d M a n u s c r i p t 12 documented case in the New York City area on March 1 in Westchester County (10). It is more likely that the initially observed cases in our study originate from multiple infection sources already present across the geographical area when testing began (7) . While the initial patients tested by NWL had already been admitted to hospital for respiratory illness, the rapid increase in SARS-CoV-2 testing from emergency departments, urgent care centers and ambulatory practice sites reflects the realization that patients presenting with respiratory illness were likely to have this illness (11) . As reported elsewhere, males were more likely to have a positive test, and the % test positivity rates increased markedly with age for both males and females (12) . Our data reveal large spatial heterogeneity in disease progression across the greater New York City region, in keeping with the geographic diversity found in countries across the globe (13, 14) . We observe that current epidemiological models for contagion (e.g., (15) ) largely stratify by demographics, infection status, and location at the county level, while micro-local geography has not been included. Our observations indicate that for accurate modeling of the progression of a pandemic through a geographic region, long-tailed spatial heterogeneity at a small scale will likely be important to incorporate. For this entire study population of 26,735 patients testing positive for SARS-CoV-2, a total of 8,174 persons (30.6%) were admitted to the hospital. This is comparable to hospitalization rates reported by the Centers for Disease Control for cases of SARS-CoV-2 disease, for which case hospitalization statistics are known (6,354 of 24,925 cases; 25.5%; (16) ). Our study provides the additional information that ambulatory patients testing positive for SARS-CoV-2 (either tested-andreleased from emergency departments or otherwise tested at an ambulatory location) remain at risk for subsequent hospitalization. Our study constitutes a minimal estimate of outpatient hospitalization rates, since we did not include patients that might have been admitted to other hospitals. were not White, and % of persons above age 65. Recognizing that a high proportion of SARS-CoV-2 infected individuals who die have comorbid conditions (16) , the strong negative correlation of these comorbidities (obesity, diabetes, hypertension, kidney disease, chronic obstructive pulmonary disease) with median household income by U.S. census tracts is striking when illustrated graphically (20) . Our finding of a strong negative correlation between persons testing positive for SARS-CoV-2 and average household income by zip code for the range of $25,000 to $125,000 per annum provides further supporting evidence for the importance of these socioeconomic factors. Owing to incompleteness in our patient-level data on racial status, care must be taken in drawing conclusions about the impact of race on SARS-CoV-2 burden in our regional community, particularly given the absence of information on Latino/Hispanic persons. For the 37.7% of persons tested who did report their race (almost all as Asian, Black, or White), Blacks had the highest aggregate % test positivity rates. The male predominance of test positivity was true for all three races, but was most pronounced for Blacks. For both both genders and for all three races, the age distribution of persons who tested positive was significantly older than those who tested negative. We note that reporting of SARS-CoV-2 patient race and ethnicity is now required (21) . Limitations. The information reported here only includes the results from one integrated laboratory network serving the parent health system, and does not include other laboratory results, home tests, or other regional testing that were conducted on study subjects during the study period. The number of SARS-CoV-2 tests performed during these initial weeks was a function of the progressively increasing test capacity at Northwell Health Laboratories from March 8 -April 10, 2020, as limited by the availability of reagents and supplies for the performance of these tests, and may have A c c e p t e d M a n u s c r i p t 14 influenced ability to detect cases in the region. We were not using zip codes for areal analysis, seeking instead to use zip codes as a mechanism to explore the chronologic timing of micro-local geographic heterogeneity. However, these results may be limited in their generalizability, because of restricted sample size and the potential selection bias that zip code grouping can introduce into geo-epidemiologic analyses (22) . Reliance on the 2010 census may also introduce inaccuracy in estimates of population cumulative case incidence, to the extent that the regional population has changed in the ensuing 10 years. Reliance on publicly available 2017 data from the U.S. Internal Revenue Service and from the 2010 census permits correlative statements only to be made about the relationship of SARS-CoV-2 cumulative case incidence and geolocalized socioeconomic and racial factors. Lastly, the incompleteness of our patient-level data on racial status limits our ability to make statements about the impact of race on SARS-CoV-2 case incidence. In early March, positive SARS-CoV-2 cases were identified simultaneously across the region, with higher incidences in men and older persons. Our geographic analysis supports the hypothesis that SARS-CoV-2 infection was widely distributed in the greater New York City region when virus testing became available in early March. Test % positivity rates were higher in patients from zip codes with higher population density and lower average annual household income. Our data emphasize the importance of detailed chronologic, geospatial and demographic analysis of regional populations as part of understanding the evolution of SARS-CoV-2 as a pandemic event. A c c e p t e d M a n u s c r i p t 15 We acknowledge and honor all of our Northwell team members who consistently put themselves in harm's way during the COVID-19 pandemic. We dedicate this article to them, as their vital contribution to knowledge about COVID-19 and sacrifices on the behalf of patients made it possible. The data that support the findings of this study are available on request from COVID19@northwell.edu. The data are not publicly available due to restrictions as it could compromise the privacy of research participants. The views expressed in this paper are those of the authors and do not represent the views of the National Institutes of Health, the United States Department of Health and Human Services, or any other government entity. The red asterisks and the right y-axis denote the number of zip codes acquiring a "first case" on any given calendar day. Epidemiology and Transmission of COVID-19 in Shenzhen China: Analysis of 391 cases and 1,286 of their close contacts Statewide-COVID-19-Testing/xdss-u53e Northwell Health Laboratories: The 10 year outcomes after deciding to keep the lab It's hit our front door': Homes for the disabled see a surge of COVID-19 COVID-19 Testing, Epidemic Features, Hospital Outcomes, and Household Prevalence Sequencing identifies multiple, early introductions of SARS-CoV2 to New York City Region. medRxiv Introductions and early spread of SARS-CoV-2 in the New York City area Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the A geoscience perspective on COVID-19 mortality Risk for Transportation of Coronavirus Disease from Wuhan to Other Cities in China Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand Preliminary Estimates of the Prevalence of Selected Underlying Health Conditions Among Patients with Coronavirus Disease 2019 -United States Disparities in reportable communicable disease incidence by Census Tract-level poverty Using geospatial analysis and emergency claims data to improve minority health surveillance Community and socioeconomic factors associated with COVID-19 in the United States: Zip code level cross sectional analysis. medRxiv Who is most likely to die from Coronavirus United States Health and Human Services. HHS Announces New Laboratory Data Reporting Guidance for COVID-19 Testing Embedded link to: COVID-19 Pandemic Response On the use of ZIP codes and ZIP code tabulation areas (ZCTAs) for the spatial analysis of epidemiological data