key: cord-0688542-pvshw2pn authors: Reese, Heather; Iuliano, A Danielle; Patel, Neha N; Garg, Shikha; Kim, Lindsay; Silk, Benjamin J; Hall, Aron J; Fry, Alicia; Reed, Carrie title: Estimated incidence of COVID-19 illness and hospitalization — United States, February–September, 2020 date: 2020-11-25 journal: Clin Infect Dis DOI: 10.1093/cid/ciaa1780 sha: 52053d6010b31a660e8680edd38bc6ace47c7de3 doc_id: 688542 cord_uid: pvshw2pn BACKGROUND: In the United States, laboratory confirmed coronavirus disease 2019 (COVID-19) is nationally notifiable. However, reported case counts are recognized to be less than the true number of cases because detection and reporting are incomplete and can vary by disease severity, geography, and over time. METHODS: To estimate the cumulative incidence SARS-CoV-2 infections, symptomatic illnesses, and hospitalizations, we adapted a simple probabilistic multiplier model. Laboratory-confirmed case counts that were reported nationally were adjusted for sources of under-detection based on testing practices in inpatient and outpatient settings and assay sensitivity. RESULTS: We estimated that through the end of September, 1 of every 2.5 (95% Uncertainty Interval (UI): 2.0–3.1) hospitalized infections and 1 of every 7.1 (95% UI: 5.8–9.0) non-hospitalized illnesses may have been nationally reported. Applying these multipliers to reported SARS-CoV-2 cases along with data on the prevalence of asymptomatic infection from published systematic reviews, we estimate that 2.4 million hospitalizations, 44.8 million symptomatic illnesses, and 52.9 million total infections may have occurred in the U.S. population from February 27–September 30, 2020. CONCLUSIONS: These preliminary estimates help demonstrate the societal and healthcare burdens of the COVID-19 pandemic and can help inform resource allocation and mitigation planning. A c c e p t e d M a n u s c r i p t 4 In the United States, the earliest known patients with coronavirus disease 2019 (COVID- 19) , the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection, were associated with travel to affected countries or known contact with other infected persons [1] . By February 2020, persons with SARS-CoV-2 infection in the U.S. and no known exposure were detected [2] . Between February 27-September 30, 2020, nearly 6.9 million laboratory-confirmed cases of domestically acquired infections were detected and reported nationally. Persons with laboratory-confirmed SARS-CoV-2 infection reported through national surveillance do not represent all infected persons in the U.S. Seroprevalence studies have shown a higher level of SARS-CoV-2 infection than has been reflected by confirmed case counts [2] [3] [4] [5] [6] [7] . Most unreported infections were asymptomatic or mildly ill people who recovered without seeking medical care or testing [8] [9] [10] . However, even persons with SARS-CoV-2 infection in medical settings may not be tested or nationally reported as confirmed cases. Limited availability of tests, reagents, and laboratory capacity reduced case detection, in addition patients may have avoided medical care settings or presented with non-specific symptoms and not been suspected to have SARS-CoV-2 infection. Furthermore, not all infected persons will test positive because of assay sensitivity, timing of specimen collection, or specimen quality [11] . Factors involved in detecting and reporting cases may vary by age, geographically, over time, across healthcare settings, and by severity of disease. Finally, some people may be infected with SARS-CoV-2 and never show clinical symptoms; these asymptomatic persons would be even less likely to be detected [9, 10] . To better estimate the U.S. incidence of SARS-CoV-2 infection since the beginning of the pandemic, we adapted a probabilistic multiplier model to adjust nationally reported counts of confirmed cases for various sources of A c c e p t e d M a n u s c r i p t 5 under-detection [12] ; this model estimates total SARS-CoV-2 infections, symptomatic illnesses, and hospitalized patients in the U.S. population from February 27, 2020-September 30, 2020. Persons with laboratory confirmed SARS-CoV-2 infection by molecular diagnostics are reported to CDC through the Nationally Notifiable Disease Surveillance System (NNDSS) at the person level or as aggregate counts at the reporting jurisdiction level (e.g., state, territory, New York City, District of Columbia) [13, 14] . The NNDSS uses a standardized case report form, including state of residence, age, hospitalization admission, and other demographic and clinical characteristics. Given data entry delays and incomplete national reporting, jurisdictions reported aggregated counts daily for the previous day. Probable, asymptomatic, and travel-associated cases were excluded from counts of confirmed cases used in this analysis. We applied a probabilistic multiplier model to adjust the reported numbers of confirmed symptomatic cases for factors affecting detection of persons with SARS-CoV-2 infection, a method previously used to estimate the incidence of H1N1pdm09 during the 2009 influenza pandemic [12] . This method uses confirmed cases and data on case detection and the asymptomatic fraction to estimate the cumulative number of hospitalized patients A c c e p t e d M a n u s c r i p t 6 with SARS-CoV-2 infection, the total number with symptomatic illness, and the total number of infected persons ( Figure 1 ). To account for variability in detection of SARS-CoV-2 we stratified reported cases into hospitalized and nonhospitalized symptomatic cases, and further by age group (0-4 years, 5-17 years, 18-49 years, 50-64 years, 65 years and older), time period when the case was reported (February-March, April-May, June-July, August-September), and U.S. Department of Health and Human Service (HHS) region [15] . Age group was imputed for cases with missing birth date according to the age distribution within each HHS region and reporting time period. If hospitalization status was missing, we imputed the percentage of patients who were hospitalized based on reported cases with complete data within by age group, HHS region, and reporting time period. More details on this process are available in the supplemental material. We adjusted case counts for three factors that affected national case detection of symptomatic cases: if a patient is symptomatic, they may not have sought medical attention or testing for their illness (parameter C); if a patient sought medical care, they may not have had a SARS-CoV-2 test completed (parameter B); or if a patient was tested, the SARS-CoV-2 assay used may result in a false negative result due to its sensitivity to detect SARS-CoV-2 in the specimen (parameter A). We used several data sources to describe these factors (Table 2) , with under-detection multipliers calculated as an inverse of the product of factors A-C. Each multiplier was calculated within strata of hospitalization status, age group, reporting time period, as data were available, and applied to the relevant stratified cases counts to estimate a number of symptomatic cases within that strata. A c c e p t e d M a n u s c r i p t 7 After adjustment, we summed the strata to a number of estimated symptomatic cases and applied one more source of under-detection -a person infected with SARS-CoV-2 may never show clinical symptoms (parameter D) -to estimate the number of total infections in the population. For all parameters and strata, we included a range of values; estimates were calculated using Latin hypercube sampling with 10,000 iterations, with 95% uncertainty intervals estimated as the 2.5 th and 97.5 th percentile range. Population rates were estimated using bridged-race population estimates from CDC Wonder [16] . Analyses were completed in R (version 3.6.1). Patients infected with SARS-CoV-2 may not always test positive. Sensitivity of approved molecular diagnostic assays may be affected by the limits of detection of specific assays [10] , specimen quality, source, handling, and timing of collection [11] . In a systematic review, 2%-21% of patients ultimately confirmed to have SARS-CoV-2 infection did not have a positive result unless multiple tests were performed over several days [17] . This review was used to estimate the probability that a specimen with SARS-CoV-2 will test positive ( Table 2) . For simplicity, since reported assay specificity has been high with false positive results ranging between 1-4% [18, 19] , we did not adjust for potential false positives. Patients with SARS-CoV-2 infection who are not tested with molecular assays are not included in confirmed case counts. To characterize testing probabilities, we used data from two sources on healthcare visits and SARS-CoV-A c c e p t e d M a n u s c r i p t 8 2 testing, and estimated this parameter separately for hospitalized and non-hospitalized patients. To capture the variability in testing practices across data sources, we represented this parameter using a beta PERT distribution centered on the median value and ranging between the minimum and maximum values reported across both data sources within each stratum of age ( Table 2 ). The beta PERT distribution is a continuous probability distribution, which emphasizes the most likely values in an acceptable range of parameter values (i.e., more often drawing closer to middle value of the interval with a smaller probability on the extremes of the interval). The first source of data was the IBM Watson Health Explorys electronic health record (EHR) database (IBM, Armonk, NY), which includes >39 health system partners across the country. We identified visits with an ICD-10 diagnosis or SNOMED code that indicated an acute respiratory illness (ARI) (Supplemental Table 5 ) and the number of those with evidence of SARS-CoV-2 test results from LOINC codes for SARS-CoV-2 RT-PCR tests (Supplemental Table 6 ). For each setting (inpatient, outpatient ED), visits and tests performed were aggregated into strata for time period and age group. We also included rates of testing in the COVID Near You (CNY) survey platform. CNY is a website application where participants can self-report symptoms, healthcare seeking behaviors, and SARS-CoV-2 testing information [20] [21] [22] . COVID-like illness (CLI) was defined using self-reported presence of shortness of breath or cough, or two or more of: self-reported fever, chills, sore throat, body ache, headache, or loss of taste or smell. Proportions of individuals who self-reported receiving a SARS-CoV-2 test among those who sought care for CLI were estimated for each time period with available data by HHS region, and age group ( Table 2, Supplemental Table 4 ). A symptomatic person with SARS-CoV-2 infection will not be included in confirmed case counts if they never sought medical attention or testing services. To estimate healthcare seeking, we used data obtained from both A c c e p t e d M a n u s c r i p t 9 CNY and Flu Near You (FNY) [23] , which has conducted participatory surveillance for influenza-like illnesses since 2011, to better capture the full time period and differences between participants of the two systems. We considered a range of symptomatic illness including: (1) CLI as described above, but excluding loss of taste or smell for FNY, which was not captured in that platform; (2) a more specific case definition of fever, and either cough or shortness of breath; and, (3) a broader case definition of at least one of fever, cough, or shortness of breath. Among patients who met the given case definition, we calculated the proportion that reported visiting a doctor's office, urgent care clinic, outpatient clinic, emergency department, testing center, telemedicine, or other healthcare setting for symptoms. Care seeking proportions were included using a beta PERT distribution of the median and range of values across the three case definitions and two data sources, stratified by report date and age group ( Table 2, Supplemental Table 2 ). Some people infected with SARS-CoV-2 do not experience symptoms [24] . To estimate the number of infections in the population, we adjusted the sum of hospitalized and symptomatic non-hospitalized cases based on the the proportion of persons with confirmed COVID-19 and no symptoms from a meta-analysis of available literature ( Table 2 ) [17] . During February 27-September 30, 2020, there were 6,891,764 confirmed cases of symptomatic COVID-19 acquired domestically and reported nationally through individual or aggregate case counts. We estimated that approximately 14% of these patients had been hospitalized, with variation by age group, case report date, and HHS region (Table 1) . Table 3 ). Adjusting case counts by HHS region, age group, and report date, we estimated a total of 2,397,777 (95% UI: 2,053,156-2,855,843) hospitalizations with SARS-CoV-2 infection (Table 3) We estimated 7.1 (95% UI: 5.8-9.0) non-hospitalized symptomatic illnesses for every one non-hospitalized case reported nationally, with variation by age group, HHS region, and report date. Under-detection multipliers decreased over time and were consistently highest among children (Supplemental Table 3 ). We summed the estimated hospitalized (Table 3 ) and non-hospitalized (Supplemental Table 5 ) illness for a total of 44.8 million symptomatic illnesses ( Table 4 ). The highest rates of symptomatic illness were among adults 18- (Table 5 ). This indicates that 1 in 7.7, or 13% of total infections were identified and reported. Detection varied by age, with lower detection rates among children, but with improvements over time (Supplemental Table 4 ). We estimated that nearly 53 million SARS-CoV-2 infections, including 42 million symptomatic illnesses and 2.4 million associated hospitalizations, may have occurred in the U.S. through September 30, 2020; with variation by geographic region, age group, and time. These preliminary estimates demonstrate the large incidence of disease in the U.S. population and better quantify the impact of the COVID-19 pandemic on the healthcare system and society, and will be updated as more data on under-detection become available. A c c e p t e d M a n u s c r i p t 12 at least 10 (range by U.S. site: 6-24) for every reported case [3] , with improvements in this ratio by later time points. Severe cases were more likely to be detected and reported; we estimated 2.5 hospitalized patients for each hospitalized case reported. In the Explorys EHR data, the proportion of ICU patients tested for SARS-CoV-2 was >90% by the end of September, though testing remained lower among other inpatients with ARI, and even lower for ARI visits in outpatient settings (Supplemental Figure 1 ). For comparison, COVID-NET is an active, population-based surveillance system for laboratory-confirmed SARS-CoV-2-associated hospitalizations in defined areas of 14 states [26] . While direct comparisons with COVID-NET are imperfect due to the narrower geographic area of the surveillance sites, in 10 of the 14 sites, our estimated hospitalization rates by region were 1.5-3.5 times higher than the reported rates from individual sites within those regions by the end of September, similar to the range of our estimated under-detection multiplier for confirmed hospitalizations. Likewise, COVID-NET showed similar trends across age; adults aged ≥65 years had 5-6 times higher rates of hospitalizations than younger adults aged 18-49 years [27] . Both also showed lower hospitalization rates among children [28, 29] . For comparison of population-level incidence of infection, the estimated 36 million infections represent approximately 16% of the U.S. population, ranging from 9%-31% across regions of the country. This is higher than seroprevalence estimates from a nationwide commercial laboratory seroprevalence survey, which found that 1%-22% of various state populations had antibodies to SARS-CoV-2 by early August, though our estimates include two more months of circulation [31] . There remain uncertainties in the interpretation of seroprevalence estimates, including how they vary by the population surveyed, the serologic assays used, the proportion of infected cases with a detectable antibody response, and how long antibody detection persists after infection. Additional studies and sources of data on population-based incidence will help resolve these concerns and provide better national estimates of illness and infection. A c c e p t e d M a n u s c r i p t 13 We recognize that our model has limitations. From almost a decade of monitoring data on testing practices for influenza [32, 33] , testing rates and the use of more sensitive molecular testing has varied by jurisdictions, care settings, age, and disease severity [34] . The availability and use of testing for SARS-CoV-2 has changed rapidly over time; thus far, data on the proportion of persons who are tested for COVID-19 and how this varies across all the previously described factors remains limited. Although data on testing by time, healthcare setting, and age was available, it lacked the coverage to allow for geographic-specific model inputs. These data limitations could have resulted in overestimation of cases from areas with higher testing rates, including some hospitals that are performing universal testing, or have more outpatient testing facilities and active contact tracing. Likewise, we may underestimate in areas with lower testing and contact tracing. Additionally, some infections, such as those among healthcare workers or from outbreaks in congregate residential settings, may be more likely to be tested and nationally reported compared with the general population, and could overestimate non-hospitalized cases and infections. We continue to seek information on the proportion of cases and testing rates in various settings to improve estimates. With limited but growing information regarding the spectrum of clinical manifestations from SARS-CoV-2 infection, there could be a lower index of suspicion of COVID-19 for patients who present with nonspecific and non-respiratory symptoms; these cases may be less likely to be detected and reported. All of this highlights the importance of having data to monitor the proportions of patients with different clinical syndromes who are being tested for SARS-CoV-2 infection in a variety of healthcare and geographic settings, and not just total numbers of tests performed. Finally, in some heavily affected areas, the size of the outbreaks exceeded capacities to complete detailed case reporting, including patient age and hospitalization status. For cases with missing hospitalization status, we imputed the proportion of reported cases that were hospitalized from the subset with complete data, but it is unclear if age and hospitalization status were missing at random [35] . If not random, and the data were more complete for hospitalized patients, the true hospitalization ratio would be lower than we imputed, and the number of hospitalized cases would be lower than we estimated. Furthermore, this was hospitalization status at the time of the case report, and would miss those A c c e p t e d M a n u s c r i p t 14 diagnosed as an outpatient but became hospitalized after they were reported as a case; thus our estimates of hospitalization may be an underestimate. Despite these limitations, our model provides a relatively simple approach to illustrate why there are more persons who have had a SARS-CoV-2 infection than the reported confirmed case counts at multiple levels of disease severity. We used data currently available to provide a preliminary estimate of the overall incidence of SARS-CoV-2 infection, illness, and hospitalization in the U.S. CDC is actively working on refining methods to synthesize information across multiple data sources to better describe the national burden of SARS-CoV-2 infection on an ongoing basis and will update estimates as data become available. In summary, we estimated that in the U. M a n u s c r i p t 22 A c c e p t e d M a n u s c r i p t 24 Persons Evaluated for 2019 Novel Coronavirus -United States Evidence for Limited Early Spread of COVID-19 Within the United States Seroprevalence of Antibodies to SARS-CoV-2 in 10 Sites in the United States Antibody responses to SARS-CoV-2 in patients of novel coronavirus disease 2019 COVID-19 Antibody Seroprevalence in Seroprevalence of SARS-CoV-2-Specific Antibodies Among Adults Age-dependent effects in the transmission and control of COVID-19 epidemics Epidemiology of COVID-19 Among Children in China Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship False Negative Tests for SARS-CoV-2 Infection -Challenges and Implications Estimates of the prevalence of pandemic (H1N1) 2009, United States Interim Case Definition Interim Case Definition Bridged-Race Population Estimates False-negative results of initial RT-PCR assays for COVID-19: A systematic review. medRxiv False positives in reverse transcription PCR testing for SARS-CoV-2. medRxiv Impact of false-positives and false-negatives in the UK's COVID-19 RT-PCR testing programme Putting the Public Back in Public Health -Surveying Symptoms of Covid-19 Web and phone-based COVID-19 syndromic surveillance in Canada: A cross-sectional study Flu Near You: An Online Selfreported Influenza Surveillance System in the USA A prospective cohort study in non-hospitalized household contacts with SARS-CoV-2 infection: symptom profiles and symptom change over time Hospitalization Rates and Characteristics of Patients Hospitalized with Laboratory-Confirmed Coronavirus Disease 2019 -COVID-NET, 14 States COVID-NET Weekly Summary of U.S. COVID-19 Hospitalization Data Risk Factors for Intensive Care Unit Admission and In-hospital Mortality among Hospitalized Adults Identified through the U.S. Coronavirus Disease 2019 (COVID-19)-Associated Hospitalization Surveillance Network (COVID-NET) Hospitalization Rates and Characteristics of Children Aged <18 Years Hospitalized with Laboratory-Confirmed COVID-19 -COVID-NET, 14 States COVID-19 Estimated Patient Impact and Hospital Capacity by State Estimating influenza disease burden from population-based surveillance data in the United States Annual estimates of the burden of seasonal influenza in the United States: A tool for strengthening influenza surveillance and preparedness. Influenza Other Respir Viruses Estimated Influenza Illnesses, Medical visits, Hospitalizations, and Deaths in the United States -2017-2018 influenza season The proportion of missing data should not be used to guide decisions on multiple imputation Estimating the extent of asymptomatic COVID-19 and its potential for community transmission: systematic review and meta-analysis. medRxiv M a n u s c r i p t A c c e p t e d M a n u s c r i p t 30 A c c e p t e d M a n u s c r i p t 32 Figure 1