key: cord-0932705-orayejqe
authors: Revell, L. J.
title: covid19.Explorer: A web application and R package to explore United States COVID-19 data
date: 2021-02-19
journal: nan
DOI: 10.1101/2021.02.15.21251782
sha: 0bd7a05cadf364b74fce1ed417c6415acaf3caf3
doc_id: 932705
cord_uid: orayejqe

Beginning at the end of 2019, a novel virus (later identified as SARS-CoV-2) was characterized in the city of Wuhan in Hubei Province, China. As of the time of writing, the disease caused by this virus (known as COVID-19) has already resulted in over 2 million deaths worldwide. SARS-CoV-2 infections and deaths, however, have been highly unevenly distributed among age groups, sexes, countries, and jurisdictions over the course of the pandemic. Herein, I present a tool (the covid19.Explorer R package and web application) that has been designed to explore and analyze publicly available United States COVID-19 infection and death data from the 2020/21 U.S. SARS-CoV-2 pandemic. The analyses and visualizations that this R package and web application facilitate can help users better comprehend the geographic progress of the pandemic, the effectiveness of non-pharmaceutical interventions (such as lockdowns and other measures, which have varied widely among U.S. states), and the relative risks posed by COVID-19 to different age groups within the U.S. population. The end result, I hope, is an interactive tool that will help its users develop a improved understanding of the temporal and geographic dynamics of the SARS-CoV-2 pandemic, accessible to lay people and scientists alike.

In 2019, a novel infectious disease was first identified in Wuhan, a city of approximately 11 million 27 residents located in the Hubei Province of China. This infectious disease, called Coronavirus disease 28 dictates that this ratio must have a value between 0 and 1.

I decided on a sigmoidal relationship because it seemed reasonable to assume the ratio was very low 131 early in the pandemic when confirming a new infection was limited primarily by testing capacity, but has 132 probably risen (in many localities) to a more or less consistent value as testing capacity increased. Since 133 getting tested is voluntary, and since many infections of SARS-CoV-2 are asymptomatic or only mildly 134 symptomatic, this ratio seems unlikely to rise to very near 1.0 in the U.S. regardless of testing. the reader should keep in mind that in practice this value is estimated separately for each jurisdiction that 140 is being analyzed, and as such might be lower in some states and higher in others, even for a constant IFR 141 value or function.

After fitting this sigmoidal curve to our observed and estimated cases through now −k days, we then 143 turn to the last period. To obtain estimated infections for these days, we simply divide our observed cases 144 from the last k days of data by our fitted values from the sigmoid curve. Figure 2 shows the result of this 145 analysis applied to data for the U.S. state of Massachusetts.

In addition to computing the raw number of daily infections, this method can also be used to compute 147 4/14 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

exactly the opposite effect and thus cause us to overestimate the number of infections that have occurred.

We assume a homogeneous value of k at any particular time. In fact, literature sources report lag-times 172 between two and eight weeks (e.g., Yang et al. 2020) . Nonetheless, we assert that inferences by our 173 method should not be badly off -so long as IFR does not swing about wildly from day to day, and so long 174 as the number of deaths is not extremely few for any reporting period. We likewise assume a constant 175 lag-period, k, through time. This assumption is perhaps a bit more dubious as it seems quite reasonable to 176 suppose that for a specific state or jurisdiction that as IFR falls, k might also increase. This is a complexity 177 that we explicit chose to ignore in our model. 178 We assume that a more or less consistent fraction of COVID-19 deaths are reported as such -that is, In estimating the number of daily infections from k days ago to today, we assume that the relationship 187 between time (since the first infections) and the ratio of confirmed and estimated infections is sigmoidal 188 in shape ( Figure 1 ). This is a testable assumption that seems to hold fairly well across the entire U.S.

( Figure 1 ) and for some jurisdictions, but less well for others. It is equally plausible to suspect that this 190 ratio could shift not only as a function of time, but also as demands on testing capacity rise and fall with 191 case numbers. This should be the subject of additional study, but our suspicion is that this would not be 192 likely to have a large effect on our model compared to other simplifications.

Finally we assume no or limited reporting delay. This is obviously incorrect. There are two main . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02.15.21251782 doi: medRxiv preprint 1.5 Showing observed and estimated unobserved infections using an 'iceberg plot' 200 As noted above, it has long been well-understood that the number of daily confirmed COVID-19 cases 

The purpose of this article is to describe a software tool, which I have largely done in the preceding 245 section. Here, I will attempt to highlight some results and insights that can be obtained by users via 246 interaction with the covid19.Explorer R package or web application. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 19, 2021. ;

The covid19.Explorer R package and web application can be used to evaluate the proportion of 261 individuals in the total population that have been potentially infected with SARS-CoV-2, given our model 262 for COVID-19 IFR through time. The following graph ( Figure 6) shows estimated infections as a fraction 263 of the total population for the U.S. state of Texas, using the same IFR model as in Figures 1, 2, 3 Though Figure 6 suggests that perhaps around 25% of the population in Texas has already been 265 infected, users should keep in mind that this is entirely dependent on how we decided to specify our 266 model of IFR through time! Likewise, though this fraction is considerable, it is still well below the level 267 of infection (e.g., 67%) required to achieve herd immunity given the majority of published estimates for 268 R 0 of SARS-CoV-2. It may be worth noting that some authors have pointed out that the herd immunity 269 threshold from a natural epidemic could be considerable lower than the 1 − 1/R 0 level expected for 270 random vaccination (e.g., Britton et al. 2020; Gomes et al. 2020) . This is an intriguing possibility, and 271 one that could be qualitatively examined with some of the tools of the covid19.Explorer package. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 19, 2021. package can be used to analyze and graph COVID-19 deaths by age and sex, as well as excess mortality 305 by age and jurisdiction.

This functionality too can sometimes lead to valuable insights.

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted February 19, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 

In fact, CDC mortality data show exactly the opposite pattern. Figure 9 gives The SARS-CoV-2 global pandemic of 2020 and 2021 has upended economies and civil society worldwide.

With widespread vaccination campaigns underway in many countries, the pandemic may finally be in allows the scientists and lay people that interact with the software to design their own parameter function 334 (based a hypothesis or external information) that will then be used to estimate infections under the model. 

Lastly, the covid19.Explorer package is completely transparent and open source. It pulls the data 345 directly from public, government repositories. All model assumptions (even those not explicitly described 346 in this paper) are readily identified from the software source code.

Even if COVID-19 soon becomes a distant memory, I hope that this tool (which I plan to make 348 available indefinitely) will continue to be of use to scientists and educated lay people interested in the 349 learning from the successes and failures of policy during the 2020/21 pandemic -perhaps to ensure that 350 there are more of the former and fewer of the latter in our next global infectious disease pandemic.

This research was funded in part by grants from the National Science Foundation (DBI-1759940) and 353 FONDECYT, Chile (1201869).

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted February 19, 2021. ; https://doi.org/10.1101/2021.02.15.21251782 doi: medRxiv preprint

. CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted February 19, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Methods in Ecology and Evolution, 3:217-223.

The incidence of the novel coronavirus SARS-CoV-2 356 among asymptomatic patients: A systematic review

An empirical estimate of the infection fatality rate of COVID-19 411 from the first italian outbreak

Using early data to estimate the