key: cord-1025541-w4f3msl5 authors: Cobb, J. S.; Seale, M. A. title: Examining the Effect of Social Distancing on the Compound Growth Rate of SARS-CoV-2 at the County Level (United States) Using Statistical Analyses and a Random Forest Machine Learning Model date: 2020-04-28 journal: Public Health DOI: 10.1016/j.puhe.2020.04.016 sha: 2e398a8f899d004693cc0bf38222f451e038958a doc_id: 1025541 cord_uid: w4f3msl5 Abstract Objectives The goal of the present work is to investigate trends among US counties and COVID-19 growth rate in relation to the existence of shelter in place (SIP) orders in that county. Study Design Prospective cohort study. Methods Compound growth rates were calculated using cumulative confirmed COVID-19 cases from January 21, 2020, to March 31, 2020 in all 3,139 US counties. Compound growth was chosen as it gives a single number that can be used in machine learning to represent speed of virus spread during defined time intervals. Statistical analyses and a random forest machine learning model were used to analyze the data for differences in counties with and without shelter in place orders. Results Statistical analyses revealed that the March 16 presidential recommendation (limiting gatherings to < 10 people) lowered the compound growth rate of COVID-19 for all counties in the US by 6.6%, and the counties that implemented SIP after March 16 had a further reduction of 7.8% over the counties with no SIP after March 16. A random forest machine learning model was built to predict compound growth rate after a SIP order and was found to have an accuracy of 92.3%. The random forest found that population, longitude, and population per square mile were the most important features when predicting the effect of SIP. Conclusions Shelter in place orders were found to be effective at reducing the growth rate of COVID-19 cases in the US. Counties with a large population or a high population density were found to benefit the most from a shelter in place order. Objectives: The goal of the present work is to investigate trends among US counties and COVID-19 growth rate in relation to the existence of shelter in place (SIP) orders in that county. Study Design: Prospective cohort study. Methods: Compound growth rates were calculated using cumulative confirmed COVID-19 cases from January 21, 2020, to March 31, 2020 in all 3,139 US counties. Compound growth was chosen as it gives a single number that can be used in machine learning to represent speed of virus spread during defined time intervals. Statistical analyses and a random forest machine learning model were used to analyze the data for differences in counties with and without shelter in place orders. Results: Statistical analyses revealed that the March 16 presidential recommendation (limiting gatherings to < 10 people) lowered the compound growth rate of COVID-19 for all counties in the US by 6.6%, and the counties that implemented SIP after March 16 had a further reduction of 7.8% over the counties with no SIP after March 16. A random forest machine learning model was built to predict compound growth rate after a SIP order and was found to have an accuracy of 92.3%. The random forest found that population, longitude, and population per square mile were the most important features when predicting the effect of SIP. Conclusions: Shelter in place orders were found to be effective at reducing the growth rate of COVID-19 cases in the US. Counties with a large population or a high population density were found to benefit the most from a shelter in place order. Keywords: Shelter in place, social distancing, COVID-19, SARS-CoV-2, Machine Learning, Statistics Novel coronavirus (SARS-CoV-2 or COVID-19) originated in the province of Wuhan, China in December 2019, after which it spread rapidly across the globe due to infected persons exhibiting little to no symptoms within the first five days of contracting the virus. [1] The devastation and infection rate triggered by the virus caused the World Health Organization to declare it a global pandemic. There are currently over 170 countries infected with COVID-19, and all 50 states in the United States (US) have confirmed cases according to the Centers for Disease Control and Prevention (CDC). In the US, community transmission has become the prominent mode of transmission of the virus. [2] It has therefore become imperative that the effectiveness of the primary forms of limiting social contact employed by local and national governments be evaluated. Enough data at the county level is now available to provide a fair assessment of the efficacy of the presidential guidelines on March 16, 2020, instituting a form of "social distancing" by limiting gatherings to 10 or fewer people. Data is additionally sufficient to assess the efficacy of county-level SIP orders versus counties that did not issue any SIP orders after the March 16 guidelines. County metrics were obtained from the US Census Bureau, USA Counties (2011) datasets from the 2010 census. [3] The data included in this study includes latitude, longitude, population, median age, number of physicians, median income, population/sq. mile, and water use per capita. Counties were placed into one of two bins: (1) counties that had confirmed cases of COVID-19 before March 16 guidelines and experienced a SIP order on or after March 19 (186 counties, referred to as wSIP); (2) counties that had confirmed cases before March 16 and experienced no SIP order (60 counties, referred to as noSIP). A Student t-test was used to compare two groups for significance. ANOVA with Tukey post-hoc test was used to compare multiple groups. Significance was defined as p < 0.05. All data are reported as mean ± 95% confidence interval. There were no statistically significant differences in the US census data between wSIP and noSIP apart from latitude (p<0.0001) and the number of physicians (p = 0.04). wSIP had a latitude of 39.47 ± 0.75, which places it in the northern US, versus noSIP with a latitude of 34.6 ± 1.16, placing it further south. The difference in number of physicians is a function of latitude with a lower mean number of physicians in the south (1,697 ± 500) versus the north (2,677 ± 538). The number of confirmed COVID-19 cases in each county was collected from local health department data and county/state press releases from January 21 to March 25. Confirmed cases from March 26 to 31 were obtained from the New York Times coronavirus data repository. [4] The two datasets were compared to ensure consistency between the collected values. Data collection stopped on March 31, because the mean number of days with confirmed cases was approximately the same as before the March 16 guidelines, after March 16 but before the institution of a SIP order, and after March 16 with a SIP order. This allowed for comparison of these time intervals with an equal number of days (7.62 ± 0.35). Compound growth was calculated using the equation: (final confirmed cases/first confirmed cases)^(1/number of days). Figure 1 shows the compound growth rate for wSIP (1.39 ± 0.044) and noSIP (1.30 ± 0.059) before the presidential guidelines on March 16, the compound growth rate after March 16 for wSIP (1.30 ± 0.023) and noSIP (1.21 ± 0.016), and the compound growth rate for wSIP (1.19 ± .011) after the SIP orders went into effect. The lower compound growth rates seen in noSIP data are due to the difference in latitude between the two datasets and suggest that southern states experienced a slower spread of the virus at the onset. This makes sense, given that before March 16, the hotspots for COVID-19 were northern states such as Washington, New York, and Illinois. The noSIP compound growth rates were normalized to the compound growth rate of wSIP data before March 16 to account for geographical differences. The normalized compound growth rates after March 16 were shown to be statistically similar between the wSIP and noSIP groups (p>0.05, Figure 1 ). This indicates that the presidential guidelines had the same magnitude of effect on reducing the compound growth rate by 6.6% ± 1.4% between wSIP and noSIP before the wSIP group instituted a SIP. After instituting a SIP order, the wSIP group compound growth rate decreased an additional 7.8%, for a total decrease of 14.4% ± 1.6% from the compound growth rates before March 16. This indicates that the effects from the presidential guidelines and SIP orders were additive in the US. This is reasonable considering the virus is thought to spread by viruscontaining airborne droplets and orders for social distancing limit the interaction of people who could potentially be infected. [5] A study modeling the effect of social distancing from China indicated a strong association with both a decrease in the rate the virus spreads and the implementation of social distancing. [6] A random forest machine learning model was trained to predict the compound growth rate after a SIP order was given in a county. Random forest was chosen as it has been shown to have the highest accuracy in disease prediction. [7] The model achieved a mean absolute percentage error (MAPE) of 92.3% on the test dataset. The three most important features were population, longitude, and population per square mile in predicting the compound growth rate after a SIP order was instituted in a county. The data for these features was split into four equal groups to explain how these features matter for the model in predicting the compound growth rate after a SIP was issued. Counties that instituted a SIP with a longitude between -79.7102° and -97.2363° had the l argest decrease in compound growth rate at 10.4% compared to 8.2% for counties outside of that longitudinal range. Counties with the highest populations between 143,962 to 984,8011 saw the largest percent reduction after instituting a SIP, at 10.5% compared to counties with a lower population between 7457 and 142,151 at 8.2%. Similar to population, population/sq. mile showed the largest reduction in compound growth rate in counties with a population/sq. mile between 405.8 to 1,755.5 at 11.6% compared to counties with a lower population density of 2.1 to 405.6 at 8.9%. In conclusion, the data suggest that at a county level in the US, shelter in place is effective at decreasing the compound growth rate of COVID-19 ( Figure 1 ). The counties that have the largest impact from a SIP are ones with a large population or a high population density, as indicated by the random forest feature importance. Figure 1 . Mean compound growth rate of counties that had confirmed cases (1) before the presidential guidelines issued on March 16, (2) March 16 to a SIP, and (3) SIP to March 31 (Black bars) compared to counties with confirmed cases (1) before March 16, and (2) after March 16 and no SIP (white bars). Statistical analyses indicated no differences in summary data for the two groups except for latitude, indicating that counties with a SIP were further north, and those without a SIP were predominately located in the south. The counties without SIP were normalized to those with SIP before March 16 to account for this difference (Gray bars). Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia COVID-19 and the Otolaryngologist -Preliminary Evidence-Based Review Surface Environmental, and Personal Protective Equipment Contamination by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) From a Symptomatic Patient First Two Months of the 2019 Coronavirus Disease (COVID-19) Epidemic in China: Real-Time Surveillance and Evaluation with a Second Derivative Model Comparing Different Supervised Machine Learning Algorithms for Disease Prediction The authors wish to thank William Glenn Bond and Meredith B. Cobb for their input and edits of the manuscript. Ethical approval was not required as this study made use of publicly available data. None. None Declared. Datasets: Counties with SIP.csv, Counties without SIP.csv • The presidential guidelines for COVID-19 issued on March 16 reduced the compound growth rate of confirmed cases by 6.6%.• Counties that issued a shelter in place order after March 16 saw a further reduction in compound growth rate of COVID-19 cases by 7.8%.• The random forest model indicated that population, population/sq. mile, and longitude were the most important features for determining how much a county will benefit from instituting a shelter in place order.