key: cord-334184-2zjbwmqn authors: Weinstein, B.; da Silva, A. R.; Kouzoukas, D. E.; Bose, T.; Kim, G.-J.; Correa, P. A.; Pondugula, S.; Kim, J.; Carpenter, D. O. title: A methodological blueprint to identify COVID-19 vulnerable locales by socioeconomic factors, developed using South Korean data date: 2020-10-27 journal: nan DOI: 10.1101/2020.10.26.20218842 sha: doc_id: 334184 cord_uid: 2zjbwmqn COVID-19 has more severely impacted socioeconomically (SES) disadvantaged populations. Lack of SES measurements and inaccurately identifying high-risk locales can hamper COVID-19 mitigation efforts. Using South Korean COVID-19 incidence data (January 20 through July 1, 2020) and established social theoretical approaches, we identified COVID-19-specific SES factors. Principal component analysis created composite indexes for each SES factor, while Geographically Weighted Negative Binomial Regressions mapped a continuous surface of COVID-19 risk for South Korea. High area morbidity, risky health behaviors, crowding, and population mobility elevated area risk for COVID-19, while improved social distancing, healthcare access, and education decreased it. Our results indicated that falling SES-related COVID-19 risks and spatial shift patterns over three consecutive time periods reflected the implementation of reportedly effective public health interventions. While validating earlier studies, this study introduced a methodological blueprint for precision targeting of high-risk locales that is globally applicable for COVID-19 and future pandemics. disease rates 6 , we recommend Geographically Weighted Negative Binomial Regression (GWNBR) to improve accuracy. This directly takes discrete count data without further transformation, and is robust to overdispersion, spatial/temporal clustering and false-positives 7, 8 . Globally, the COVID-19 pandemic emerged in waves with country-specific mitigation strategies producing sharp declines. We chose South Korean COVID-19 incidence data because it presented extremely high overdispersion and spatial clustering, being more complex than typical infectious disease data. South Korea, as a real extreme case scenario, allowed us to check our framework's functionality. Our study's goals were to 1) provide methodological guidance for identifying COVID-19-vulnerable locales associated with SES factors; and 2) operationalize a framework using South Korean data to demonstrate its value and in the interpretation of the results. We used COVID-19 incidence data from January 20 through July 1, 2020, released from the Korea Centers for Disease Control and Prevention (KCDC) 9 and prepared by the DS4C project 10 . Analytical data consisted of 11 811 COVID-19 cases aggregated by 250 districts (Table S1 ) aligned to SAS's South Korean geographic matrix. Since data was unavailable for Daegu's subparts, we estimated the incidence from KCDC's press release cluster reports. Conceptual model Figure 1 shows the Coleman-Blumenshine Framework (CBF) refined approach, based on Coleman's Social Theory and Blumenshine's mechanistic framework 2, 11 . The model defines SES as a function of social and human capitals 11, 12 and emphasizes pathways by how SES . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020. 10.26.20218842 doi: medRxiv preprint indicators differentially increase SARS-CoV-2 exposure and susceptibility to developing COVID-19 2 . Based on the CBF model and COVID-19 risk factors literature [13] [14] [15] [16] [17] , we identified seven arealevel health and SES factors that determined the SARS-CoV-2 exposure level and the likelihood of developing COVID-19 after exposure. All SES related data sources were retrieved from the Korean Statistical Information Service's (KOSIS) online data archive 18 . Table S3 presents the reviewed data sources used for SES measurement. Table 1 shows 24 data items out of 124 candidates relevant to the seven health/SE areas. We used an independent variable proxy for education, and by Principal Component Analysis (PCA) created six thematic composite indices: healthcare access, health behavior, crowding, area morbidity, education, difficulty to social distancing, and population mobility. Factors were computed as linear combinations of the original variables selected for each health/SE theme. We used the first component scores 19 in calculating the composite scores since they explained the largest data variation. Then we computed each variable's weight by dividing each factor score by the sum of all variable factor scores as: where i relates to each theme's variable and p is the number of each theme's variables. Each thematic composite index was computed as the weighted average for all 250 district values. For example, the composite index for health behavior was calculated as: Health behavior k = 0.438ൈobesity by measurement k + 0.429ൈalcohol drinking k + 0.100ൈcurrent smoking k + 0.033ൈself-reported obesity k (2) where k is the original variable's value for district k. Note that weights sum to 1 (0.438 + 0.429 + 0.100 + 0.033 = 1). Six thematic composite indices and an individual proxy for education . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. is the estimated value for point j, omitting the observation j, and b is the bandwidth. The likelihood of false-positives was corrected by the method of da Silva and Fotheringham 20 . All statistical analyses including specific macro programs for spatial weight matrices and GWNBR models were implemented using SAS (version 9.4). Missing data (2%) were excluded from the analyses. Figure 2 compared the spatial COVID-19 distribution across pandemic phases. The initial outbreak wave occurred in Daegu which then spread to Gyeongsangbuk-do and surrounding provinces in the early phase 16 . The second wave occurred in Seoul and its surrounding metropolises, Ulsan and Busan, and Gyeonggi-do province in the late phase of the pandemic. Global and local spatial models . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020. 10.26.20218842 doi: medRxiv preprint In the entire study period model using GNBR, the COVID-19 risk increased risky health behavior, area morbidity, and difficulty to social distancing (Table 2) . Inverse associations indicate an increased COVID-19 risk with reduced healthcare access, lower education, and increased outflux in population mobility. The crowding-associated risk was not statistically significant. We implemented separate global and local spatial models for the early, middle, and late pandemic phases. Figure 3 presents the relative COVID-19 risk with its 95% CI from GNBR models, and Figure 4 , the relative risk spatial distribution from GWNBR associated with seven thematic areas by pandemic phases. Supplementary Table S4 provides more details on the stratified GNBR models. GWNBR fit data better than the global model given smaller AIC for the middle and late phases, respectively, (AIC gwnbr ~1034 vs AIC gnbr~1 044, AIC gwnbr ~1038 vs. AIC gnbr~1 074) except for the early phase of the pandemic (AIC gwnbr ~3533 vs AIC gnbr~1 527). This reflects the large spatial case cluster emerging from Daegu during the early phase that subsequently spread to its neighboring districts. The GNBR and GWNBR model results agreed across all pandemic phases. In the early phase, lower healthcare access and education, and increased risky health behavior, area morbidity, difficulty to social distancing, and population mobility associated with higher COVID-19 risk. Crowding-associated risk was not significant in GNBR. In the middle phase, healthcare access, area morbidity, education, and difficulty to social distancing remained to be significant factors. In the late phase, only healthcare access, health behavior, and increased crowding remained significant. Early phase maps showed elevated risk in noncontiguous districts, which may reflect virus transmission in the initially affected districts before spreading over expanded areas. Relative risks associated with healthcare access, health behavior, and crowding indices were significant in the early phase where each index varied in spatial coverage ( Figure 4 ). In the middle phase, only healthcare access remained significant with a northwest (capital region) and . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; southwest spatial shift from the south and east. In the late phase, the risk associated with these three factors became significant and concentrated around the capital region, Chungcheongbukdo, and Jeollabuk-do provinces. The period and observed spatial pattern here are consistent with the second wave that emerged in Seoul and its surrounding areas. Other health/SE themes were not significant risk factors in the late phase. Difficulty to social distancing increased COVID-19 risk in the capital and middle regions in the early phase which then shifted to the country's southeast part in the middle phase. Area morbidity-associated risk was concentrated in the western part which then gradually shifted north in the middle phase. Education-associated risk was higher in the west in the early phase until it shifted southwest in the middle phase. Population mobility elevated COVID-19 risk only in the early phase for South Korea's northern, easter, and western parts. We investigated the correlations between all pairs of composite indices (Table S5 ). The largest Pearson's correlation coefficient was 0.603 between healthcare access and area morbidity. We verified no multicollinearity given that the model standard error of area morbidity was the smallest (0.0064) and healthcare access standard error was small (0.025) compared to the largest standard error, 0.09, associated with crowding. For comparison, GNBR and GWNBR models were carried out with the same variables and stratified by the same periods (NBR: Figure 3 , and GWNBR: Figure 4 ). AIC and dispersion coefficients were used to compare the models' performances in computing COVID-19 risk associated with multiple health/SE themes across the country. GWNBR created a continuous surface of relative COVID-19 risk for all 250 districts associated with area-health and socioeconomic determinants by the pandemic phases ( Figure . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.26.20218842 doi: medRxiv preprint 4). Our findings are consistent with individual and population-level studies that reported elevated COVID-19 risk associated with less healthcare access 21 , and education 22, 23 , and more risky health behavior, crowding, specific comorbidities 13, 14, 17 , difficulty to social distancing 15, 24 and population mobility 25 . Our study's high internal validity was shown since the GNBR and GWNBR results agreed except for crowding in the early phase. Our approach captured significant spatial variation by pandemic phases for all themes, consistent with the reported pattern of COVID-19 distribution in the country. Since its first confirmed case on January 20th, 2020, South Korea experienced two major outbreak waves in Daegu and Seoul, and the surrounding Gyeonggi-do province, respectively, in February (early phase) and May 2020 (late phase). The country responded to the first wave with nationwide directives that included mass testing based on contact tracing, self-quarantine/isolation, strengthening medical centers for rapid diagnostics, emergency medical responses, and treatment aids 16 Types of significant SES risk factors varied over the pandemic phases. In the early phase, all health/SE themes were significant risk factors. In the middle phase, all of the previous risk factors except for risky health behaviors, population mobility, and crowding were significant. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; In the late phase, only increased risky health behavior, increased crowding and reduced healthcare access remained significantly associated with COVID-19 incidence. The decrease in risk associated with risky health behaviors, population mobility and crowding in the middle phase could reflect the impact of the Prime Minister's declaration implementing active interventions for social distancing, community health education, testing with local contact tracing and screening Spatial variation in the SES-related risk factors across the pandemic phases potentially reflect the geography-specific control measures and/or the differential public response to the . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.26.20218842 doi: medRxiv preprint measures. GWNBR models revealed the pandemic phase-specific spatial variation for all health/SE themes except for population mobility which was not significant beyond the early phase. This may indicate that the effectiveness of the control measures varied over time potentially due to the differential interventions or the public response across the municipal districts. Our findings may also indicate a dynamic change in population vulnerability throughout the pandemic "a person not considered vulnerable at the outset of a pandemic can become vulnerable depending on the policy response" as a Lancet editorial stated 31 . The factors increasing our recommended framework's robustness include: 1) SES measurement and relationship conceptualization of the exposure (health/SE themes) and outcome (COVID-19 incidence) based on the refined conceptual framework; 2) joint use of conceptual and statistical modeling; 3) complementary use of global and local spatial statistics; and 4) stratified analysis by pandemic phases that enable us to capture the spatial variation over pandemic phases. However, this methodological framework relies on carefully collected countryspecific data. Biases due to differential testing rates country-wide are low since the testing was based on contact tracing, and government-supported (free). Our study is subject to ecological fallacy inherent to the study design. However, our empty hierarchical mixed model accounting for the individual and district-level data shows that 61% of the COVID-19 incidence distribution variation was explained by the district-level factors, leaving 39% of the variability for explanation by individual factors. We verified that the data estimation for Daegu city subparts did not affect the study results significantly. The comparison of the intercept, standard error, relative risk and Pvalue between the models with and without the estimated data showed that the intercept and is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.26.20218842 doi: medRxiv preprint for crowding. The P-value changed from ~0.06 to ~0.04 when the estimated data were excluded. However, the crowding-associated risk remains significant at P = 0.1. Model details are provided in Table S2 . To assess the periodic trend in the relative COVID-19 risk associated with SES factors, we conducted stratified analyses by the early, middle, and late phases corresponding with January 20-March 20, March 21-April 15, and April 16-July 1, 2020. We intended this framework to improve international knowledge exchange and enable rapid pandemic responses in high-risk populations. To illustrate practical application of this framework in South Korea, decision-makers could have prioritized improved healthcare access and promoted protective health behaviors focusing more on the crowded areas in the capital and surrounding regions in the second wave of the pandemic. Such precision targeting can bolster preventive measures to reduce the healthcare burden and economic damage. The work described here was not funded by any source. Dr. Kouzoukas receives research grant support from the US Department of Veterans Affairs (I21 RX003170 to DEK). The views expressed here are those of the authors and do not necessarily reflect the of any government agency or institution. Data availability: The data and SAS code underlying this article will be shared on reasonable request to the corresponding author. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. Data for the entire study period (January 20 -July 1, 2020) b The variance of a negative binomial distribution c Akaike information criterion (AIC), a measure of goodness of model fit . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.26.20218842 doi: medRxiv preprint . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020.10.26.20218842 doi: . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted October 27, 2020. ; https://doi.org/10.1101/2020. 10 0 3 2 8 8 . 0 -7 1 8 8 . 0 8 2 8 8 . 0 --3 2 8 8 . 0 4 3 8 8 . 0 -8 2 8 8 . 0 6 3 8 8 . 0 -4 3 8 8 . 0 8 3 8 8 . 0 -6 3 8 8 . 0 3 8 8 8 . Clinical transplantation of a tissue-engineered airway Pandemic influenza planning in the United States from a health disparities perspective Deprivation indices Surgo Foundation. The COVID-19 Community Vulnerability Index (CCVI) GIS-based spatial modeling of COVID-19 incidence rate in the continental United States On a Statistical Transmission Model in Analysis of the Early Phase of COVID-19 Outbreak Geographically Weighted Negative Binomial Regressionincorporating overdispersion Estimating epidemic exponential growth rate and basic reproduction number Coronavirus Infectious Disease-19 Outbreak in Korea (Regular Briefing on DS4C: Data Science for COVID-19 in South Korea Foundations of Social Theory The measurement of SES in health research: current practice and steps toward a new approach Covid-19 National Emergency Response Center E, Case Management Team KCfDC, Prevention. Coronavirus Disease-19: The First 7,755 Cases in the Republic of Korea Risk factors for SARS-CoV-2 among patients in the Oxford Royal College of General Practitioners Research and Surveillance Centre primary care network: a cross-sectional study Occupational risks for COVID-19 infection National Response to COVID-19 in the Republic of Korea and Lessons Learned for Other Countries