key: cord-0818427-3anst3os authors: Kiaghadi, A.; Rifai, H. S.; Liaw, W. title: Assessing COVID-19 Risk, Vulnerability and Infection Prevalence in Communities date: 2020-05-08 journal: nan DOI: 10.1101/2020.05.03.20089839 sha: d14949a62f87234eac8dc3c07bd395993db605d7 doc_id: 818427 cord_uid: 3anst3os Background: The spread of coronavirus in the United States with nearly one million confirmed cases and over 53,000 deaths has strained public health and health care systems. While many have focused on clinical outcomes, less attention has been paid to vulnerability and risk of infection. In this study, we developed a planning tool that examines factors that affect vulnerability to COVID-19. Methods: Across 46 variables, we defined five broad categories: 1) access to medical, 2) underlying health conditions, 3) environmental exposures, 4) vulnerability to natural disasters, and 5) sociodemographic, behavioral, and lifestyle factors. We also used reported rates for morbidity, hospitalization, and mortality in other regions to estimate risk at the county (Harris County) and census tract levels. Analysis: A principal component analysis was undertaken to reduce the dimensions. Then, to identify vulnerable census tracts, we conducted rank-based exceedance and K-means cluster analyses. Results: Our study showed a total of 722,357 (~17% of the County population) people, including 171,403 between the ages of 45-65 (~4% of County population), and 76,719 seniors (~2% of County population), are at a higher risk based on the aforementioned categories. The exceedance and K-means cluster analysis demonstrated that census tracts in the northeastern, eastern, southeastern and northwestern regions of the County are at highest risk. The results of age-based estimations of hospitalization rates showed the western part of the County might be in greater need of hospital beds. However, cross-referencing the vulnerability model with the estimation of potential hospitalized patients showed that part of the County has the least access to medical facilities. Conclusion: Policy makers can use this planning tool to identify neighborhoods at high risk for becoming hot spots; efficiently match community resources with needs, and ensure that the most vulnerable have access to equipment, personnel, and medical interventions. Much research has focused on clinical outcomes, epidemiological modeling, and 46 transmission dynamics of the novel coronavirus (see for example, [9-12]), but less focus has 47 been placed on risk and vulnerability to contracting the disease. Emerging studies have begun to 48 report on the impacts of social vulnerability on COVID-19 from an incidence and outcome 49 standpoint [2-7,13]. However, the spatial resolution of most studies to date has been at the global 50 or country level, and less attention has been paid to finer spatial resolutions such as the census 51 tract scale within a county. A finer spatial resolution is important from a vulnerability and risk 52 standpoint as demonstrated in a recent study that showed that the poorest neighborhoods in 53 Houston, Texas, might be at a higher risk of hospitalization from COVID-19 [14] based on an 54 analysis of the Centers for Disease Control (CDC) underlying risk factors for severe COVID-19 55 cases [4] that include: asthma, Chronic Obstructive Pulmonary Disease (COPD), heart disease, 56 hypertension, diabetes, and a history of heart attacks or strokes. 57 While the aforementioned underlying medical conditions are important risk factors, they 58 weigh in on the risk of hospitalization but not necessarily on the risk of contracting the disease. 59 As such, underlying medical conditions and sociodemographic variables may not fully represent 60 that include: 1) access to medical, 2) underlying medical conditions, 3) environmental exposures, 84 4) vulnerability to natural disasters and 5) sociodemographic, behavioral, and lifestyle factors. 85 However, understanding the vulnerability of a population to COVID-19 is only one aspect of 86 planning for such a pandemic. Other aspects include expected morbidities, mortalities, and 87 hospitalization rates. Thus, the goals for developing the planning tool are to better understand 88 medical access gaps and demands for hospitalization, identify parts of the county where more 89 protective measures and response actions need to be put in place, and have a data-driven 90 framework for estimating case numbers, hospitalizations, and deaths by census tract. Another 91 goal is to have a better sense of the number of persons that may be affected broadly and more 92 specifically as authorities lift or modify current policies such as the Stay Home Work Safe policy 93 in place for Harris County and the City of Houston. 94 Such a planning tool is critical in order to mitigate the impact of COVID-19 and prepare 95 for future pandemics. Using this tool, policymakers can identify neighborhoods with a higher 96 potential for becoming the next hot spots, efficiently match community resources with 97 community needs, and ensure that equipment, personnel, medications, and support are available 98 to everyone, particularly the most vulnerable and those in greatest need. This strategy is essential 99 to address historical trends that have preferentially delivered resources to those with means 100 resulting in gaps in quality [30] [31] [32] . The planning framework developed in the study is readily 101 transferable to other counties in the US and can be expanded to the state level for decision-102 making on a short-term or long-term basis towards improving the overall health of communities 103 in each state. 104 Harris County, located in the southeastern part of Texas (Fig 1) , is the third-most 109 populous county in the U.S., with more than 4.7 million residents [33] . While ranked number 2 110 in the nation in terms of Gross Domestic Product (GDP) growth, the County exhibits geospatial 111 socioeconomic disparities among its population. The County is experiencing fewer cases, and 112 lower rates of transmission relative to the rest of the U.S. Fig 2 shows showed that none of the datasets were normally distributed. 200 All rights reserved. No reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Normalization) was conducted in IBM SPSS as the first step to reduce the dimensions. Due to 203 the limitation in data availability, as noted before, the PCA was performed for data from 584 204 tracts with all available data. Eigenvalues from random values were generated and compared 205 with the values in this study using a parallel analysis engine [46] . This comparison was made to 206 determine the number of components that should be retained in the analysis; components with 207 eigenvalues greater than the randomized method were kept. The first five components that could 208 explain ~ 80% of the variability in the 46 variables showed eigenvalues larger than the ones 209 generated by the engine. S1 Table in The choice of variables for the study (Table 1 ) was based on the results of the PCA in 212 addition to findings reported in previous studies, and data availability. Category 1 includes 213 access to medical care, including medical facilities, medications, and insurance coverage, routine 214 checkups, and physical exams, as well as household density as a surrogate for interaction among 215 individuals within each tract (e.g., how crowded grocery stores could be in the tract). Category 2 216 includes chronic diseases, medical conditions, disability that could potentially affect the 217 vulnerability to COVID-19, and age distribution. For environmental exposure, pollution events 218 from various sources, the 3-air quality indicators, and the presence of hazardous sites were 219 included. Flooding from Hurricane Harvey was the only metric in Category 4, although this 220 could be expanded in future work to include heat, drought, wildfires, and other natural disasters. 221 Finally and for Category 5, a combination of social, economic, behavioral, and lifestyle factors 222 that could potentially threaten the health of individuals during the COVID-19 pandemic was 223 considered. 224 was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.03.20089839 doi: medRxiv preprint of any developed models for the vulnerability was not possible due to lack of data at the desired 241 spatial resolution and the fact that the pandemic is still developing. Thus, the second model (K-242 means) was used as a benchmark for the first model for comparison purposes. 243 In the rank-based exceedance method, for each variable, sorting the data in Microsoft 244 Excel developed the rank of each census tract relative to other tracts within Harris County. The 245 exceedance rate (percentile) was calculated as follows: 246 Where m is the rank, and n is the total number of tracts (786 in Harris County). The 247 calculated exceedance for a given tract represents the percent of tracts that have a better 248 condition than the selected one. To ensure that the direction of exceedance is the same among all 249 variables, (1 -exceedance) was used for variables with positive nature such as insurance 250 coverage, education, access to medication, and preventive tests. For each category, the average 251 value of exceedance for all of the variables within that category was calculated and reported. In 252 addition to classifying the tracts for each of the aforementioned categories, an overall 253 vulnerability was defined by averaging the exceedance rates of the five defined categories. The 254 percentile associated with each averaged value (for each category and for the overall 255 vulnerability) was calculated and exported to ArcMap to generate decision-support level maps. 256 In the K-means cluster analysis (K-means is an unsupervised machine learning 257 algorithm), three classes were defined for each category. As a result, the output classes were 258 ordered as high (severe), average, and low depending on the order of the final cluster centers. 259 The ANOVA test was conducted on the clusters to ensure that the values of the different 260 variables were significantly different between clusters. Similar to the exceedance method, an 261 overall vulnerability for each census tract was determined by averaging the five output class 262 All rights reserved. No reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Both studies in China and Italy reported the mortality rates among confirmed COVID-19 cases 300 while the New York study reported the total of deaths per 100,000 persons within each age 301 interval. In this study, rates from all three studies, as shown in Table 2 , were used to calculate the 302 mortality rates associated with COVID-19 in Harris County. Multiplying the rates in each study 303 by the associated number and percentages of the population in each age group was used to 304 calculate the risk of morbidity, hospitalization, and mortality rates for each census tract. 305 An important caveat of the approach used in this study is the emerging realization of 306 underreported positive cases in the US and potentially undercounting deaths by not testing all 307 persons who have died in the US since December 2019. A second important caveat is that the 308 All rights reserved. No reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which this version posted May 8, 2020. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.03.20089839 doi: medRxiv preprint ozone is a result of industrial activities, the ozone-NO 2 relationship, and the wind pattern in 353 Houston [54, 55] . In the case of PM 2.5 , the higher concentrations in Harris County have been 354 associated with regional aerosols, biomass burning, and gasoline combustion [56] was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Harris County, are living in the eastern part of the County, specifically areas next to the HSC and 397 GB, and areas identified as opportunity zones [58] . The residents in these neighborhoods are 398 individuals with the least favorable sociodemographics, are exposed to several chemicals (with 399 industrial sources), and subject to flooding both from rainfall and storm surge (such as what was 400 All rights reserved. No reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which this version posted May 8, 2020. Table. S9-S13 Figs show the class of each tract (i.e., high/severe, average, 416 and low) for Category 1 through Category 5, respectively. The results in all categories were 417 similar to the exceedance methods, validating the choice of methodology. The overall 418 vulnerability generated by the K-means methods led to a very similar map (S14 Fig) to the 419 exceedance approach (Fig 7) . 420 For each category, the total population and the distribution of population in two age 422 intervals, 45-65 (the age group with the highest number of reported COVID-19 cases), and +65 423 (the age group with the highest mortality rate), over different percentiles (from low to high with 424 regards to the severity of conditions within each category) is shown in Table 3 . Using the 425 All rights reserved. No reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which this version posted May 8, 2020. . https://doi.org/10.1101/2020.05.03.20089839 doi: medRxiv preprint vulnerability findings presented above for Harris County (Fig 6, and yellow highlighted values in 426 Table 3); a total of 59,307, 98,702, 78,723, 105,431, and 59,624 seniors (+65 years) , who are at 427 most risk of COVID-19 mortality, are living in areas with the highest vulnerability in Category 1 428 through 5, respectively. Considering the fact that Harris County is prone to flooding and the 429 hurricane season is in progress from May through the end of November, a potential hurricane 430 combined with the COVID-19 pandemic could lead to a compound natural disaster event 431 affecting significant numbers of senior citizens as shown in Table 3 . Decision-makers, to prepare 432 for the worst-case pandemic scenario and occurrence of a hurricane, in particular, could use the 433 numbers in Category 1 and 4 for planning response and recovery measures that take into account 434 flooding and increased vulnerability to COVID-19. For overall vulnerability (Fig 7 and cyan 435 highlights in Table 3 was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which this version posted May 8, 2020. Total Population 998,996 927,584 906,212 819,833 722,357 45-65 years 249,385 241,270 213,158 191,118 171,403 > 65 years 97,587 108,035 88,496 81, the morbidity (age-based) and vulnerability results, as shown in Fig 8A and B , it could be 453 concluded that ~ 10.0% of the total population and ~16% of seniors at the highest risk level 454 (80%-100% percentile overall vulnerability) could contract COVID-19. While the actual 455 morbidity rates in Harris County, to date, are lower than New York, China, and Italy, the specific 456 reasons for this are unknown. This could be due to lower population density, relatively higher 457 temperatures, or the fact that New York, China, and Italy had earlier reported and imported cases 458 than Harris County, and social distancing was not deployed immediately or soon after. As 459 mentioned previously, one caveat that places greater uncertainty for Harris County reported rates 460 All rights reserved. No reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 8. A) Overall vulnerability based on determinants in all 5 categories (see Fig 7) , Hygiene [49] . A total of 4,020 deaths with a rate of 92.65 per 100,000 persons were estimated 486 All rights reserved. No reuse allowed without permission. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which this version posted May 8, 2020. was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint (which this version posted May 8, 2020. 7) , B) Morbidity rates in Harris County census tracts based on a worst-case scenario using NYC data [49] , C) Fig S16. The mortality rate for each census tract solely based on the age distribution. Rates associated with each age interval were extracted from [51], and D) The number of potential deaths for each census tract in Harris County. The calculation was based on the age distribution within each tract and rates reported by the New York Department of Health and Mental Hygiene [49] Johns Hopkins Centers for Civic Impact. Coronavirus resource center Population based estimates of comorbidities affecting 531 risk for complications from COVID-19 in the US S. county-level 534 characteristics to inform equitable COVID-19 response Hospitalization Rates and Characteristics of Patients Hospitalized with Laboratory-538 Confirmed Coronavirus Disease 2019 -COVID-NET, 14 States Is the spread of COVID-19 across countries influenced by environmental, 542 economic and social factors? Vulnerability on COVID-19 Incidence and Outcomes in the United States Chemical, and microbial quality of floodwaters in Houston 597 following Hurricane Harvey Toxic trajectories under future climate conditions Impacts of an extreme weather-related 602 Episodic event on the Hudson River and estuary Smoking is Associated with COVID-19 Progression: A Meta-605 CDC COVID-19 Response Team. Geographic Differences in COVID-19 Cases, Deaths, 607 and Incidence -United States Concise Communication: Covid-19 and the N95 Respirator 610 Shortage: Closing the Gap Hospital 613 surge capacity in a tertiary emergency referral centre during the COVID-19 outbreak html 618 30. Nelson A. Unequal treatment: Confronting racial and ethnic disparities in health care Crossing the Quality Chasm: A New Health System for the 21st DC: The National Academies Press The association 624 between income and life expectancy in the United States United States Census Bureau IPUMS national historical geographic 631 information system: Version Federal Emergency Management Administration (FEMA) Characterization of vulnerability of road 637 networks to fluvial flooding using SIS network diffusion model HRSA. Health Resources and Services Administration (HRSA) query data explorer tool Harris County Health System. Harris health system locations The Department of Homeland Security. Homeland Infrastructure Foundation-Level Data 644 (HIFLD) -Urgent Care Facilities Microsoft Bing Maps Platform APIs Texas Air Monitoring Information System 651 (TAMIS) database Texas Commission on Environmental Quality. TCEQ GIS data Superfund: national priorities list (NPL) Parallel analysis engine to aid determining 661 number of factors to retain COVID-19 Dashboard by the Center for Systems Science and 664 COVID-19, Coronavirus Disease New York City Department of Health and Mental Hygiene CDC COVID-19 Response Team Centers for Disease Control and Prevention Estimates of the 676 severity of coronavirus disease 2019: a model-based analysis Stefano Boros, Fortunato (Paolo) D'Ancona MCR COVID-19 outbreak, National update Constraining NOx emissions using 686 satellite NO2 measurements during 2013 DISCOVER-AQ Texas campaign A 15-year climatology of wind pattern 689 impacts on surface The relation between ozone, NOx and hydrocarbons in urban and polluted rural 692 environments The characterization of fine 695 particulate matter downwind of Houston: Using integrated factor analysis to identify 696 anthropogenic and natural sources COVID-19 mortality in the United States Opportunity Zones Spatio-Temporal Patterns of the 2019-nCoV Epidemic 704 at the County Level in Hubei Province, China. International Journal of Environmental 705 Research and Public Health The changing 707 patter of COVID-19 in China: A tempo-geographic analysis of the SARS-CoV-2 epidemic A Social Vulnerability Index for 710 Disaster Management Predict Health Care Access and Need within a Rational Area of Primary Care Service 714