key: cord-270238-5esd2eti authors: Tian, T.; Zhang, J.; Hu, L.; Jiang, Y.; Duan, C.; Wang, X.; Zhang, H. title: Risk factors associated with mortality of COVID-19 in 2692 counties of the United States date: 2020-05-21 journal: nan DOI: 10.1101/2020.05.18.20105544 sha: doc_id: 270238 cord_uid: 5esd2eti Background: The number of cumulative confirmed cases of COVID-19 in the United States has risen sharply since March. A county health ranking and roadmaps program has been established to identify factors associated with disparity in mobility and mortality of COVID-19 in all counties in the United States. Objective: To find out the risk factors associated with mortality of COVID-19 with various levels of prevalence. Design: A negative binomial design was applied to the county-level mortality counts of COVID-19 on April 15, 2020 in the United States. In this design, the infected counties were categorized into three levels of infections using clustering analysis based on time-variant cumulative confirmed cases from March 1 to April 15, 2020. Setting: United States Participants: COVID-19 patients in various counties of the United States from March 1 to April 15, 2020. Measurements: The county-level cumulative confirmed cases and mortality of COVID-19. Results. 2692 infected counties were assigned into three classes where the mild, moderate, and severe prevalence of infections were identified, respectively. Several risk factors are significantly associated with the mortality of COVID-19, where Hispanic (0.024, P=0.002), female (0.253, P=0.027), elder (0.218, P=0.017) and Native Hawaiian or other Pacific islander (2.032, P=0.027) individuals are more vulnerable to the mortality of COVID-19. More locations open to exercise (0.030, P=0.004), higher levels of air pollution (0.184, P=0.044), and segregation between non-White and White increased the mortality rate. Limitation: The study relied on mortality data on April 15, 2020. Conclusion. The mortality of COVID-19 depends on sex, ethnicity, and outdoor environment. The increasing awareness of these significant factors may lead to the reduction in the mortality rate of COVID-19. COVID-19 is an infectious disease caused by a novel coronavirus with an estimated average incubation period of 5.1 days(1). It is spread through person-toperson transmission, and has now spread to 210 countries and regions with over 2 million total confirmed cases as of April 15(2). The United States has the highest number of infections, taking up approximately one-third of the total confirmed cases in the world. Its cumulative confirmed cases were 652,474 on April 15, 2020, an increase of 9,456 times compared with 69 confirmed cases on March 1, 2020(3). Currently, the entire United States is suffering from a rapidly increasing epidemic situation, with deaths resulted from COVID-19 occurring all over the country. For instance, New York City had the largest number of total deaths, accounting for the vast majority of deaths in the country, while no one in Madison county, North Carolina is infected(3). Therefore, it is of great interest to find out the risk factors that influence the mortality of COVID-19. It is known that infectious diseases are affected by factors other than medical treatments (4, 5) . For example, influenza A is associated with obesity (6) , and the spread of SARS depended on seasonal temperature changes (7) . The county health ranking and roadmaps program was launched by both the Robert Wood Johnson Foundation and the University of Wisconsin Population Health Institute(8). This program has provided annual sustainable source data including health outcomes, health behaviors, clinical care, social and economic factors, physical environment and demographics since 2010, which incorporates a total of 64 factors . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . https://doi.org/10.1101/2020.05.18.20105544 doi: medRxiv preprint possibly influencing health across all counties in 50 states. The details about those factors are available on the official website of county health ranking and roadmaps program(8). This paper aims to explore putative risk factors that may affect the mortality of COVID-19 (excluding deaths caused by other causes rather than COVID-19) in different areas of the United States and to increase awareness of the disparity and to form risk reduction strategies. We collected the number of cumulative confirmed cases and total deaths from March 1 to April 15, 2020, for counties in the United States from the New York Times(9). The county health rankings reports from the year 2020 were compiled from the County Health Rankings and Roadmaps program official website(8). There are 77 measures in each of 3142 counties, including the health outcome, health behaviors, clinical care, social and economic factors, physical environment, and demographics. There were 2692 counties which reported confirmed cases until April 15, 2020. Also, 450 counties had no confirmed cases of COVID-19 and were not considered in this study. To find out the relationship between the risk factors and the mortality of COVID-19, we considered the total number of deaths on April 15, 2020 as the outcome. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 21, 2020. . https://doi.org/10.1101/2020.05.18.20105544 doi: medRxiv preprint Putative risk factors(8) were categorized as 5 types of measures: health behaviors, clinical care, social and economic factors, physical environment and demographics. For health behaviors, there were tobacco, alcohol and drug use, diet and exercise, sexual activity and insufficient sleep. Clinical care data included access to and quality of care were considered. Social and economic factors included education, employment, income, family and social support and community safety. For the physical environment, air and water quality and housing and transit were considered. Overall, there were 56 possible risk variables included in the study. All deaths occurred as a result of COVID-19. The trend of the total number of confirmed cases varied greatly in various areas of the United States. We used the PAM clustering algorithm (10, 11) to ensure that the similar trends were assigned to a homogenous class by standardizing the time-series of total confirmed cases from March 1 to April 15, 2020. Based on the clustering results, we used the Kruskal-Wallis test (12) and Chi-square test (13) to detect significant risk factors across different classes of counties. The most important risk factors were identified using random forest (14) in each class, based on which the top 15 factors were selected to build a negative binomial model (15, 16) in each class of the counties. All analysis was conducted in R version 3.6.1. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted May 21, 2020. . https://doi.org/10.1101/2020.05.18.20105544 doi: medRxiv preprint The funder of this study had no role in study design, data collection, data analysis, data interpretation, and writing of the report. The corresponding authors had full access to study data and final responsibility for the decision to submit for publication. According to the clustering, 2692 counties were assigned into 3 classes. There were 2523 counties in the first class with the lowest overall cumulative confirmed cases. It is referred to as the mild class of infections. Its medoid is Austin county in Texas. There were 141 counties in the second class with overall relatively moderate cumulative confirmed cases. We call it the moderate class of infections. Its medoid is Monroe county in Pennsylvania. There were 28 counties in the third class with the highest overall cumulative confirmed cases, which we named as the severe class of infections. Its medoid is Fairfield county in Connecticut. The geographical distribution of the counties in different classes is shown in Figure 1 , where the size of a circle indicated the total confirmed cases on April 15, 2020. Note that the east and west coasts were the most severely hit areas by COVID-19. Most counties in New York and New Jersey belonged to the third class of counties(9). . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . https://doi.org/10.1101/2020.05.18.20105544 doi: medRxiv preprint Table 1 shows the significant difference in demographical distribution between the three classes of counties (P < 0.001). The average population in the mild class was 63,438, which is 8% and 4% of the average populations in the moderate class and severe class, respectively. The average proportion of rural residents in the mild class was 57.58%, vs 2.5% in the severe class. The average proportion of Black in the mild class was 9.75%, as opposed to 16.52% in the severe class. There were differences in the race, ethnicity and geographical location in the three classes of counties. Table 1) . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . https://doi.org/10.1101/2020.05.18.20105544 doi: medRxiv preprint The importance scores of the risk factors were obtained by random forest, and one common factor, namely residential segregation between non-White and White, was identified in each of the three classes of counties. The negative binomial model for this single covariate was used to explore its association with mortality of COVID-19. Residential segregation between non-White and White in was the significant factor associated with the mortality of COVID-19 across the three classes of counties as shown in Figure 3 . Note that the higher value of residential segregation between non-White and White the higher mortality of COVID-19. In the severe class of counties, an increase in the residential segregation between non-White and White resulted in more deaths than other two classes of counties. (Table 2 ). Using the time trends of cumulative confirmed cases in 2692 counties in the United States, we categorized those counties into three levels of infection. The mild class counted for 93.7% of all counties. Their resident population was remarkably smaller than other two classes of counties. Thus, the resident population appeared to be a significant contributor to the mortality of COVID-19. The higher population may increase more contacts in the social distancing (17), leading to a higher risk in the deaths of COVID-19. On the contrary, higher percentage of residents living in rural areas in the mild infections class of counties may reduce the mortality. The segregation index between non-White and White revealed the disparity in health between non-White and White, leading to differences in health status not only at the individuals level but also at the community level (18). Higher values in the . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. For the severe class of counties, there is an age structure difference in the mortality of COVID-19. There is remarkably large resident population in severe class of counties, a higher percentage of elder indicated the larger population of individuals aged over 65, which increased the deaths of COVID-19(25). Sleep time was reported to be associated with the health system(26). We found a higher percentage of people . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. 2020. 2. National Health Commission of the People's Republic of China Influenza-related hospitalizations and poverty levels-United States Individual-and neighborhood-level contextual factors are associated with Mycobacterium tuberculosis transmission: genotypic clustering of cases in Michigan Obesity increases the duration of influenza a virus shedding in adults Environmental factors on the SARS 9 An improved PAM clustering algorithm based on initial clustering centers Automatic PAM Clustering Algorithm for Outlier Detection Ranks and Pseudo-Ranks-Paradoxical Results of Rank Tests The chi-square test of independence Classification and regression by randomForest. R news Negative binomial regression Regression models for count data in R We would like to thank all individuals who are collecting epidemiological data of the COVID-19 outbreak, and people collecting health ranking county data in the county health ranking and roadmaps program. . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)The copyright holder for this preprint this version posted May 21, 2020. . https://doi.org/10.1101/2020.05.18.20105544 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)The copyright holder for this preprint this version posted May 21, 2020. . https://doi.org/10.1101/2020.05.18.20105544 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 21, 2020.