key: cord-328826-guqc5866 authors: Wissel, Benjamin D; Van Camp, P J; Kouril, Michal; Weis, Chad; Glauser, Tracy A; White, Peter S; Kohane, Isaac S; Dexheimer, Judith W title: An Interactive Online Dashboard for Tracking COVID-19 in U.S. Counties, Cities, and States in Real Time date: 2020-04-25 journal: J Am Med Inform Assoc DOI: 10.1093/jamia/ocaa071 sha: doc_id: 328826 cord_uid: guqc5866 OBJECTIVE: To create an online resource that informs the public of COVID-19 outbreaks in their area. MATERIALS AND METHODS: This R Shiny application aggregates data from multiple resources that track COVID-19 and visualizes them through an interactive, online dashboard. RESULTS: The web resource, called the COVID-19 Watcher, can be accessed at https://covid19watcher.research.cchmc.org/. It displays COVID-19 data from every county and 188 metropolitan areas in the U.S. Features include rankings of the worst affected areas and auto-generating plots that depict temporal changes in testing capacity, cases, and deaths. DISCUSSION: The Centers for Disease Control and Prevention (CDC) do not publish COVID-19 data for local municipalities, so it is critical that academic resources fill this void so the public can stay informed. The data used have limitations and likely underestimate the scale of the outbreak. CONCLUSIONS: The COVID-19 Watcher can provide the public with real-time updates of outbreaks in their area. As of April 13 th , the United States of America (U.S.A.) had 30% of novel coronavirus disease 2019 (COVID-19) cases worldwide, the most of any country. [1] At this date, New York City was the epicenter of cases in the U.S., but large outbreaks were present in several other major metropolitan areas, including New Orleans, Detroit, Chicago, and Boston. Several online tools track COVID-19 outbreaks at the county, state, and national level. [1] [2] [3] [4] However, it has become apparent that tracking outbreaks at the city level is critical, as the outbreak in China was centered within and surrounding the city of Wuhan, Lombardy within Italy, Madrid within Spain, and London within the United Kingdom. Our team developed a methodology to aggregate county-level COVID-19 data into metropolitan areas and display these data in an interactive dashboard that updates in real-time. The purpose of this website was to make this information more accessible to the public, and to allow for more granular assessment of infection spread and impact. We assessed three publicly available datasets that are updated daily and include county-and/or state-level counts of COVID-19 confirmed cases and deaths in the U.S.A. The New York Times (NYT) began tracking COVID-19 cases and deaths on the county level in Wissel 4 cases as individuals who tested positive for COVID-19. Cases were attributed to the county in which the person was treated and were counted on the date that the case was announced to the public. If it was not possible to attribute a case to a specific county, then it was still counted for the state in which they were treated. The Johns Hopkins University Center for Systems Science and Engineering was the first group aggregate COVID-19 data and release it to the public in an accessible and sizable manner. [1] This group publishes total cases, recovered cases, and deaths at the national, state, and, as of COVID Tracking Project Data. The COVID Tracking Project is a grassroots effort incubated by The Atlantic that tracks COVID-19 testing in U.S. states. [7] This group releases daily updates for the number of positive tests, negative tests, pending tests, hospitalizations, number of patients in the intensive care unit, and deaths. Since there is a high amount of variability in state reporting, some of these data are not available for every state. These three data resources use different strategies to aggregate COVID-19 data from multiple sources. Since a gold standard has not been established, we compared the consistency of these sources with the Centers for Disease Control and Prevention (CDC). [8] The CDC only releases data for confirmed cases for the entire country, so that was the only metric that could be Wissel 5 compared between all four sources. All 50 states, the District of Columbia, and five U.S. territories were included. We used the U.S. Census Bureau's lists of counties comprising major metropolitan areas [9] To track the proportion of each area's residents that became infected or died of COVID-19, we used the U.S. Census Bureau's 2019 population estimate for each county to normalize data to tests, cases, and deaths per 10,000 residents. [10] Code. The application, referred to as the COVID-19 Watcher, checks for data updates from The NYT and COVID Tracking project every hour. When data updates are released, they are automatically downloaded onto the server and incorporated into the web resource. New data must pass a quality control check that ensures updated data files are the anticipated size and format. Data visualizations are generated using the ggplot2 package [11] in R statistical software version 3.6.1, [12] and the application was developed using R Shiny. [13] The web resource is hosted in an Amazon Web Services (AWS) environment behind a scalable load balancer to accommodate user load. The source code was placed in a public GitHub repository and can be accessed at https://github.com/wisselbd/COVID-Tracker. The site is maintained by the Cincinnati Children's Hospital Medical Center Division of Biomedical Informatics. The COVID-19 Watcher dashboard can be accessed at https://covid19watcher.research.cchmc.org/. The resource includes all U.S. counties, as well as 188 metropolitan areas that are collectively inhabited by over 277 million Americans (83.3% of the population). A screenshot of the web resource is shown in Figure 1 . Users can view COVID-19 cases and deaths from The NYT at the county, city, state, or national level, and the total number of tests reported by the COVID Tracking Project, including the breakdown between positive and negative tests, is shown for each state. Multiple areas can be selected at once and plots autogenerate after each selection. Options include normalizing counts by population size, linear and logarithmic axes, and a button to download a screenshot of the plots. Users can search tables that display rankings of the least and most affected areas. A summary of the COVID-19 data sources is shown in Table 1 . Data are updated at the end of each day in all cases except for The NYT, where they are released the following day. The NYT, Wissel 7 Johns Hopkins, and COVID Tracking Project provide easy-to-access download portals, while the CDC only provides a dashboard without an option to download the data. A comparison of confirmed cases reported in each data source is shown in Figure 2 . The sources were highly consistent at the national level. In the absence of a uniform government standard for tracking COVID-19 outbreaks in the U.S.A., academic and newsgroup-based data repositories have become the de facto standard. While these datasets are publicly available, they require informatics and data visualization to extract and display information due to their complexity and continual updates. Visualizing COVID-19 data in real-time through online dashboards is a pragmatic way to meet the medical community's demand for up-to-date information. The data displayed by the COVID-19 Watcher can be used to evaluate the effectiveness of mitigation efforts. Normalizing data by an area's population shows the relative proportion of the population that have been infected. The logarithmic scale shows the rate of spread, and flattening the exponential curve indicates the spread of the virus is slowing. Users should take caution in using these data to forecast future events. To make projections, these data should be used in conjunction with the University of Washington Institute for Health Metrics and Evaluation (IHME) model, [14] the University of Pennsylvania's COVID-19 Hospital Impact Model for Epidemics (CHIME) model, [15] or other SIR models. The authors welcome community feedback, ideas for further development, and contributions. GitHub repository has a section for issue tracking where users can submit comments about the web resource. [16] Alternatively, contributors can make improvements to the code itself by Wissel 10 forking the repository, modifying their copy of the code and submitting pull requests back to the authors. These modifications will be reviewed and, if judged to be suitable, merged into the main code. In particular, the authors would like to see community contributions related to geopersonalization of the website visualization, various analytics modeling, data points such as addition of countries, and timeline augmentation. Although these datasets reviewed in Table 1 In conclusion, we developed the COVID-19 Watcher to communicate up-to-date COVID-19 information to the medical community and general public. The web application's pipeline was Wissel 11 developed to be extendable and additional data sources will be added as they become available. We hope that by making the code used by this web resource available to the public, developers will submit ideas for improvement. Since it is possible that public data releases will be interrupted in the future, we recommend that the CDC immediately begin public releases of their entire COVID-19 data so academia can drive further innovation. These tools could not have been developed without many individual and selfless efforts to create resources for the public good. Special thanks to Danny T.Y. Wu, PhD and Sander Su for their help launching the site. This research received no specific grant from any funding agency in the public, commercial or not-for-profit sectors. Wissel 12 application's design, submitted feedback on the manuscript for intellectual content, and approved the final version. B.D.W. and J.W.D. have full access to the data and source code and take responsibility for the integrity and accuracy of the report. Figure 1 . Screenshot of the COVID-19 Watcher web resource. Users can view data from The New York Times at the county, city, state, or national level. Multiple areas can be compared at once. Plots for the selected regions automatically generate and have options to view on logarithmic scale or normalize data by the population size. Wissel 15 An interactive web-based dashboard to track COVID-19 in real time An interactive visualization of the exponential spread of COVID-19 Coronavirus in the U.S.: How Fast It's Growing COVID-19 County Tracker An ongoing repository of data on coronavirus cases and deaths in the U Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE The COVID Tracking Project Cumulative total number of COVID-19 cases in the United States by report date Core based statistical areas (CBSAs), metropolitan divisions, and combined statistical areas (CSAs County Population Totals ggplot2: elegant graphics for data analysis R: A language and environment for statistical computing Shiny: web application framework for R. R package version Forecasting COVID-19 impact on hospital bed-days, ICU-days, ventilator-days and deaths by US state in the next 4 months Locally Informed Simulation to Predict Hospital Capacity Needs During the COVID-19 Pandemic COVID-19 Watcher The authors have no competing interests to declare.