key: cord-1007468-6wqxcshu authors: Rajan, K.; Dhana, K.; Barnes, L. L.; Aggarwal, N. T.; Evans, L.; Wilson, R. S.; Weuve, J.; Evans, D. A. title: Strong Effects of Population Density and Social Characteristics on Distribution of COVID-19 Infections in the United States date: 2020-05-12 journal: nan DOI: 10.1101/2020.05.08.20073239 sha: effb18785ccab4aafdd6817f3e3a98b873105e9d doc_id: 1007468 cord_uid: 6wqxcshu Coronavirus disease 2019 (Covid-19) has devastated global populations and has had a large impact in the United States with the number of infections and deaths growing exponentially. Using a smooth generalized additive model with quasipoisson counts for total infections and deaths, we developed a county-level predictive model that included population demographics, social characteristics, social distancing, and testing data. This model strongly predicted the actual US distribution of Covid-19, accounting for 94.8% of spatial-temporal variation in total infections and 99.3% in Covid-19 related fatalities from March 15, 2020. US counties with higher population density, poverty index, civilian population, and minorities, especially African Americans had a higher number of confirmed infections adjusted for county population. Social distancing measured by the change in the rate of human encounter per km2 relative to pre-covid-19 national average was associated with slower rate of Covid-19 infections, whereas higher testing was associated with higher number of infections. The number of people infected was increasing, however, the rate of increase in new infections was starting to show signs of plateauing starting from the second week of April. Our model projects 2.11 million people to test positive for Covid-19 and 122,951 fatalities by June 1, 2020. Importantly, our model suggests strong social differences in the infections and deaths across US communities, and inequities in areas with larger African American minorities and higher poverty index expected to show higher rates of Covid-19 infections and deaths. Preventive steps including social distancing and community closures have been a cornerstone in stopping the transmission and potentially reducing the spread of the disease. Crucial knowledge of the role of social characteristics in the disease transmission is essential to understand current disease distribution, predict future distribution, and plan additional preventive steps. The Covid-19 is a global pandemic affecting 187 countries with over 3.84 million confirmed infections, 269,000 deaths, and a staggering fatality rate of 7.0%. 1 In the US alone, there are over 1.25 million infections with 76,000 deaths and a fatality rate of 6.0% that has remained steady. However, there are considerable variations in the Covid-19 infection and death rates across US communities over time. Hence, understanding the geospatial and temporal variation in infections and deaths needs serious and urgent attention. Many of these US communities show large variations in chronic health conditions, population density, and socio-economic status with poor access to essentials and lower social distancing, all of which could lead to higher rates of infections and deaths. The objective of this research manuscript is to develop a social transmission model to study the geospatial temporal variation in infections and deaths across US counties. The social transmission model will utilize county-level population demographics, focusing on population density, number of minorities, and age distribution, and social characteristics, 2 such as, poverty index 3 and number of non-professional civilian population, and social distancing using rate of unique human encounters per Km 2 relative to US national pre-COVID baseline. This social transmission model will allow us to study the contribution of population density and social characteristics on the distribution of Covid-19 in US communities. Importantly, studying Covid-19 infections and deaths using our social transmission model will allow us to better understand the predictors of current disease distribution across US communities, the ability to predict future distribution across US communities and develop a national-level estimate, and most significantly identify the US communities that require the most resources to slow the infection and death rates. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint The data for this project comes from several compiled sources for testing data, daily infections, and daily deaths, 2010 US census data, and data on social distancing. More details for each of these sources are provided below. Testing Statistics: The source for total number of tests for Covid-19 came from the COVID tracking project 4 and the US CDC. 5 The COVID tracking project aggregates the testing data by individual states and reports the number of people tested, including private labs. However, not all states report their figures, and this data should be considered as a general indication of testing output. The CDC provides the specimens tested in the CDC labs and public health labs in 49 states, New York City, Puerto Rico, USAF, and 15 California Counties. With these two sources, we would be able to obtain a general count of total tests performed in the US, with the counts having up to 7 days of lag when specimens are accessioned, testing is performed and summarized. Test Cases and Deaths: Several Covid-19 data have been made available for research purposes. We use county-level epidemiological data on confirmed cases and deaths starting from March 1, 2020, which is available from Johns Hopkins University that is updated on a daily time series pattern. 6 Other epidemiological data including WHO situational reports and Atlantic Covid-19 tracking project were also be considered to check the accuracy and reports from these three sources. Data downloads from the source were automatic and a daily update was performed to get the most recent data. The U.S. Census Bureau is the leading source of statistical information about the people living in the US in the form of a decennial census, which count the entire U.S. population every ten years (combination of long and short forms), along with several other All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint surveys. 7 The US census bureau collects several pieces of information from the population and has several hundred identified population and housing tables down to the block level. The 2010 US census data is available, which has been downloaded, curated, and integrated with countylevel infections and deaths. The ACS is an ongoing monthly survey sent to 3.5 million addresses to produce detailed population and housing estimates each year. 8 The ACS is designed to produce critical information on small geographic areas and releases annual estimates for over 35,000 communities. The ACS collected several pieces of economic and community data that are relevant to this project. The ACS is also performed through the census bureau, but more detailed data was only collected starting from 2000. We use economic data from 2008 ACS survey on poverty index and non-professional civilian population for each county. Social Distancing: According to the CDC and WHO, social distancing is currently the most effective way to slow the spread of Covid-19 through US communities. Unacast has developed a social distancing data program that consists of daily encounter, daily visitation, and daily nonessential visits compared to pre-COVID and averaged for the US population. 9 We used encounters rate since it provides the most appropriate to study the change in human encounters per square Km of residents in each US county. Descriptive plots for infections and deaths summarized over all US counties provided information on the cumulative infections and deaths. New daily infections and deaths were estimated as a lagged difference of cumulative infections and deaths between current and previous days. Similar characteristics were estimated for testing, hospitalization, and encounter rates across all US counties. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint For our social transmission model, we used a smooth generalized additive model 10 with quasipoisson counts for total infections and deaths that included population density, poverty index, proportion of non-Hispanic Whites, Blacks, and Hispanics, proportion of females and non-professional civilians, age distributions (below 20, 20-40, 40-60, above 60) and social distancing for each US county. A county-level model was developed in several steps; the first step using time since March 1, 2020 and latitude and longitudinal coordinates for counties explained about 16% variation in the rate of confirmed infections. The addition of population demographics, social characteristics, and social distancing explained around 98% of variation in Covid-19 infections. 11 This model also included splines for time since March 1, 2020, population density, and latitude and longitude. In a separate model, we included testing characteristics and found the predictive models to be unstable due to large county-level missing data and underreporting and severe lags; these additional variables were therefore, excluded from our Intel MKL for parallel mathematical computing using 18 cores. 12 According to the most recent estimate, 1.25 million US residents are infected with Covid-19, (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint downward trend. A similar increasing pattern in cumulative deaths was observed, however, the rate of deaths peaked on April 15, 2020, with a steady downward trend since then ( Figure 1B) We developed a social contagion model to predict the distribution, within the US, of Covid-19 infections and deaths from population demographics, social characteristics, and social distancing. The model for infections accounted for 94.8% variation in the data with 92.2% All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint deviance, and 99.2% variation with 96.8% deviance in deaths across 3,364 US counties. The population demographics and social characteristics were also strongly associated with the rate of increase in confirmed Covid-19 infections. According to the social contagion model, we predict that 2.11 million US residents will have confirmed Covid-19 infections and 122,951 deaths by June 1, 2020. The actual (red line), estimated (blue line), and predicted (brown line) for Covid-19 infections is shown in Figure 2A and for deaths in Figure 2B . In US counties with higher proportions of African Americans, the rate of Covid-19 infections increased by 5.6% for one-unit increase in percentage Blacks ( Figure 5) , whereas, the rate of increase was 2.6% in Whites and 4.9% in Hispanics. Additionally, in US counties with a higher poverty index, the rate of infections was 4.8% for one-unit increase in US census poverty index. In areas with higher non-professional civilian population, the rate of infection was also higher. In US counties with larger young population (20-40) and older population (60-80), the rate of infections was higher, and rates of infection in counties with large young population (below 20) and older population (above 80) the rate of infections were lower. The rate of death increased by 0.6% for each unit increase in poverty index, and 1% for each percent higher All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint proportion of non-Hispanic Blacks (Figure 6) , whereas, the rates decreased by 1% for each percent higher Non-Hispanic whites. Ages over 40 were associated with higher death rates, whereas below 20 was associated with lower death rates. These findings suggest strong effects of population demographics and social characteristics on confirmed Covid-19 infections. Population density, social distancing, time, and geospatial variation were also associated with the number of confirmed Covid-19 infections ( Figure 5 ) and deaths ( Figure 6 ). In US counties with higher population density, the rate of increase in Covid-19 confirmed infections was exponentially higher. About 10% of US counties (N=311) have population density higher than 500 residents per square mile and accounted for 80% total infections and deaths in the US. As social distancing became higher, the rate of Covid-19 infections was lower. The time trend showed a steep increase in the rate of infections from March 15, 2020 to about April 15, 2020, with the rate of infections leveling off and then slowing between April 15, 2020 and May 5 th , 2020. Social distancing has a strong and consistent association with Covid-19 infections and deaths in the US communities. Our social transmission model predicts that by June 1, 2020, the US will have 2,113,073 confirmed Covid-19 cases if the social distancing across all counties remains the same. If we see a 20% decline in social distancing, we project 46,433 additional COVID infections and 66,764 additional infections if social distancing were to decrease by 30%. Social distancing will also have substantial influence on We project that by June 2020, the US will have 122,951 deaths attributed to Covid-19, if the social distancing across all counties remains the same. If we see a 20% decline in social distancing, we project 2,785 additional deaths, and 4,006 additional deaths if social distancing were to decrease by 40%. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. Community-level transmission was slower in communities with higher social distancing. As social distancing increased, the rate of increase in confirmed infections and deaths started to decline, suggesting substantial increase in confirmed infections and deaths may be attributable to reduction in social distancing. The high Covid-19 infections and deaths in the densely populated areas were seen despite higher social distancing. Also, of significance is that communities with high poverty index and social characteristics in general had lower social distancing compared to geographical areas with low poverty index and similar social distancing. If social distancing restrictions were to be reduced in these densely populated lower socio-economic areas, we may be more likely to see higher number of confirmed infections and deaths in these communities. If social distancing can be improved in densely populated areas with high poverty indices, we may be likely to see substantial reductions in confirmed infections and deaths. However, areas with higher proportion of non-Hispanic Blacks shows significantly higher rate of infection and deaths, and these effects were larger than the social distancing effects, which in general was more All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint protective in areas with higher proportion of non-Hispanic Whites. These findings suggest strong effects of race/ethnicity on infections and deaths in the US communities. Age distribution plays a significant role on the rate of increase in Covid-19 confirmed infections and deaths. It is noteworthy that areas with high infections and deaths, also had larger number of younger residents 20-40 years old, and larger number of older residents, 60-80 years old over the age of 60. This population dynamic suggests that young residents may be more likely to be asymptomatic carriers of the coronavirus. In areas with higher number of middle aged adults, those 40-60 years old had lower rate of infection, perhaps suggesting that more social distancing is being maintained in the middle age groups than the younger age groups with those of older ages showing a high susceptibility for infections due to the higher number of chronic health conditions associated with age. The rate of change in number of infections and deaths increased exponentially in late March and early to middle of April. However, the rate of new infections has stabilized over time, reaching a plateau, where it continues to remain steady. The new infection and death rates across the US communities have started to decline, perhaps mostly due to social distancing, however, the evidence for continued decline over 14 days as mandated by the US government is yet to be observed in most of the densely populated areas. The stable rate of new infections, and lack of data on Covid-19 deaths from many counties are troublesome, since even a phased re-opening in the densely populated US communities may cause a large increase in infections and deaths, unless more precautions and preventive measures are put in place. The social transmission model provides a framework for incorporating population demographics and social characteristics in addition to temporal and geospatial patterns as predictors of Covid-19 infections and deaths in the US communities. Even if testing were to be All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint dramatically increased, this approach alone does not address the highly infectious character of Covid-19. The major roles of population demographics and social characteristics may be more effectively reduced through social distancing. Focusing our preventive efforts on population centers with higher number of non-Hispanic Blacks, poverty and low-socioeconomic areas and improving both social distancing and testing in those areas might offer a better chance of reducing the spread of Covid-19 and deaths associated with Covid-19 across US communities. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.08.20073239 doi: medRxiv preprint Socioeconomic disparities in health in the United States: What the patterns tell us Racism and psychological and emotional injury: Recognizing and assessing race-based traumatic stress. The Counseling Psychologist COVID tracking project The public health laboratory testing for COVID-19 Johns Hopkins data repository daily reports