key: cord-270953-z2zwdxrk authors: Hittner, J. B.; Fasina, F. O.; Hoogesteijn, A. L.; Piccinini, R.; Kempaiah, P.; Smith, S. D.; Rivas, A. L. title: Early and massive testing saves lives: COVID-19 related infections and deaths in the United States during March of 2020 date: 2020-05-16 journal: nan DOI: 10.1101/2020.05.14.20102483 sha: doc_id: 270953 cord_uid: z2zwdxrk To optimize epidemiologic interventions, predictors of mortality should be identified. The US COVID-19 epidemic data, reported up to 31 March 2020, were analyzed using kernel regularized least squares regression. Six potential predictors of mortality were investigated: (i) the number of diagnostic tests performed in testing week I; (ii) the proportion of all tests conducted during week I of testing; (iii) the cumulative number of (test-positive) cases through 3-31-2020, (iv) the number of tests performed/million citizens; (v) the cumulative number of citizens tested; and (vi) the apparent prevalence rate, defined as the number of cases/million citizens. Two metrics estimated mortality: the number of deaths and the number of deaths/million citizens. While both expressions of mortality were predicted by the case count and the apparent prevalence rate, the number of deaths/million citizens was {approx}3.5 times better predicted by the apparent prevalence rate than the number of cases. In eighteen states, early testing/million citizens/population density was inversely associated with the cumulative mortality reported by 31 March, 2020. Findings support the hypothesis that early and massive testing saves lives. Other factors --e.g., population density-- may also influence outcomes. To optimize national and local policies, the creation and dissemination of high resolution geo-referenced, epidemic data is recommended. To optimize epidemiologic interventions, predictors of mortality should be identified. The US 24 COVID-19 epidemic data −reported up to 3-31-2020− were analyzed using kernel regularized 25 least squares regression. Six potential predictors of mortality were investigated: (i) the number of 26 diagnostic tests performed in testing week I; (ii) the proportion of all tests conducted during 27 week I of testing; (iii) the cumulative number of (test-positive) cases through 3-31-2020, (iv) the 28 number of tests performed/million citizens; (v) the cumulative number of citizens tested; and (vi) 29 the apparent prevalence rate, defined as the number of cases/million citizens. Two metrics 30 estimated mortality: the number of deaths and the number of deaths/million citizens. While both 31 expressions of mortality were predicted by the case count and the apparent prevalence rate, the 32 number of deaths/million citizens was ≈ 3.5 times better predicted by the apparent prevalence rate 33 than the number of cases. In eighteen states, early testing/million citizens/population density was 34 inversely associated with the cumulative mortality reported by 31 March, 2020. Findings support 35 the hypothesis that early and massive testing saves lives. Other factors -e.g., population density− 36 may also influence outcomes. To optimize national and local policies, the creation and 37 dissemination of high-resolution geo-referenced, epidemic data is recommended. 38 39 To control a pandemic associated with a substantial mortality −such as COVID-19−, 46 WHO recommends massive testing [1] . In spite of its relevance, the power of testing-related 47 variables to predict mortality has not yet been empirically investigated in this disease. 48 To predict and identify when and where mortality is likely to occur, at least three types of 49 metrics may be considered, which focus on: (i) cases (counts), (ii) disease prevalence in a 50 specific geographic location and/or time, and (iii) the demographic density of infected locations 51 [2]. However, assessing the actual prevalence of a disease characterized by a substantial number 52 of asymptomatic infections -such as COVID-19− is not possible, unless 100% of the population 53 is tested with a highly sensitive test, repeatedly [3, 4] . Consequently, we use the term apparent 54 prevalence to describe the ratio of test-positive cases to all tested individuals. If expressed per 55 million residents, the apparent prevalence can compare different geographical units, e.g., each 56 and all states of the US. 57 Unfortunately, to conduct comprehensive studies that investigate numerous states, a 58 protracted research program is required. To rapidly provide policy-makers with usable 59 information, here a quasi-real time assessment was designed, which captures both nationwide 60 and state-specific dimensions. Analyzing the epidemic data reported in all 50 states of the USA, 61 during March of 2020 (the month when testing started), we investigated whether testing-related 62 variables -including massive and early testing− predict mortality. 63 Six variables were assessed as possible predictors of fatalities: (i) the number of the first week of testing; (iii) the cumulative number of (test-positive) cases through 3-31-2020, 66 (iv) the number of tests performed/million citizens; (v) the cumulative number of citizens tested; 67 and (vi) the apparent prevalence rate, defined as the number of cases/million citizens. To 68 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2020. . https://doi.org/10.1101/2020.05.14.20102483 doi: medRxiv preprint examine the predictive ability of these variables, we modeled the data using a nonparametric 69 machine learning approach known as kernel regularized least squares (KRLS) regression [5] . To 70 implement the procedure we used the KRLS R software package [6] . KRLS is appropriate when 71 linear regression assumptions −such as linearity and additivity− are not met and the precise 72 functional association between the predictors and criterion is unknown. 73 Because there is no prior knowledge on the use of these composite variables, no pre-74 established method or criterion was chosen to analyze the data. Instead, recognition of patterns 75 observed after the data were collected was adopted. When distinct patterns were observed -such 76 as L-shaped data distributions [9]−, thresholds were selected to match the upper limit of a data 77 segment linearly distributed so that the intersection of two orthogonal lines would identify three 78 groups of data. 79 A public source was used to collect the overall US and state-specific data on the COVID-80 19 pandemic, which was complemented with state-specific population data [7, 8] . All analyses 81 included data from each state of the US (Supplemental Table 1) . 82 The six predictors accounted for 93.5% of the variance in number of deaths and 86.7% of 83 the variance in deaths/million cases (Supplemental Tables 2A, 2B) . Of the six predictors, two 84 were statistically significant: cumulative number of confirmed cases and apparent prevalence 85 rate. These two variables were comparable predictors of mortality count. However, for predicting 86 deaths per million citizens, the apparent prevalence rate was a 3.5 times stronger predictor than 87 was the number of confirmed cases (Supplemental Table 2B) . 88 In addition, the number of tests administered during week one of testing/million 89 citizens/population density distinguished three groups of states when the number of 90 deaths/million citizens was the outcome variable ( Fig. 1A) . Two of these groups exhibited 91 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2020. . https://doi.org/10.1101/2020.05.14.20102483 doi: medRxiv preprint statistically significantly different medians (p<0.001, Mann-Whitney test, Fig. 1B) . 92 Whether cases or fatalities are considered, findings indicate that reporting COVID-19 93 data as counts is not as informative as reporting metrics that consider two or more interacting 94 quantities, such as the apparent prevalence rate and the number of deaths/million citizens. While 95 isolated metrics -e.g., counts− ignore dynamics as well as geographical factors (including 96 population density), composite metrics integrate numerous dimensions that facilitate 97 geographically-specific interventions [3] . 98 Although the KRLS regression method is a powerful and flexible approach to modeling 99 predictive associations, to rapidly generate results, here it was used to only provide a snapshot-100 like assessment. If shorter time intervals were used, the KRLS approach could capture epidemic 101 dynamics. 102 As evidenced by our nonparametric regression results, the variables analyzed offer a 103 combinatorial template that highlights the importance of investigating metrics consisting of 104 interacting quantities. For example, a recombination of those variables (the number of tests 105 performed in week I/million citizens/population density) empirically demonstrate that massive 106 and early testing may save lives (Figs. 1A and B) . Such a finding is likely to also be influenced 107 by several factors, including, but not limited to (i) availability of diagnostic kits, equipment, 108 reagents, and trained personnel, (ii) availability of hospital beds and/or Intensive Care Units, and 109 (iii) local and regional demographic and geographical interactions. For example, regions with a 110 higher population density (more abundant and closer contacts among infected and susceptible 111 citizens) tend to be associated with a higher connectivity (more highways, ports and/or airports), 112 which foster epidemic spread [3] . 113 While composite metrics could address pandemics as a group of local and regional 114 . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2020. . https://doi.org/10.1101/2020.05.14.20102483 doi: medRxiv preprint interacting processes, the COVID-19 related information currently found in the press as well as 115 national and international governmental agencies tends to lack point-based (high-resolution), 116 geo-referenced information. While surface-based data are usually provided (e.g., state--related 117 data), this type of data is an aggregate of geographical points and lines and, consequently, is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2020. . https://doi.org/10.1101/2020.05.14.20102483 doi: medRxiv preprint CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2020. . https://doi.org/10.1101/2020.05.14.20102483 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 16, 2020. . . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2020. . https://doi.org/10.1101/2020.05.14.20102483 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 16, 2020. . https://doi.org/10.1101/2020.05.14.20102483 doi: medRxiv preprint wk I: number of tests performed in the first 7 days of testing. 189 2. Total tested: total number of people tested Wk I / all tests: tests wk I / total tested, i.e., the proportion of all tests that were 191 conducted during the first week of testing, expressed as a percentage Tested/mill: number of tests performed per 1 million inhabitants. 195 6. Cases: cumulative number of confirmed (test-positive) infections Cases/ mill inh: the apparent prevalence, calculated by dividing the number of cases by 197 the population (expressed in million inhabitants) Outcomes (cumulative values through 3-31-2020) 200 1. Mortality count: number of deaths Deaths / mill: number of deaths per 1 million citizens