key: cord-0859765-5w70yhvy authors: Kim, Y.; Jiang, X. title: Evolving Transmission Network Dynamics of COVID-19 Cluster Infections in South Korea: a descriptive study date: 2020-05-12 journal: medRxiv : the preprint server for health sciences DOI: 10.1101/2020.05.07.20091769 sha: 6cab26f503af61365106a1be2eaeba79f1096381 doc_id: 859765 cord_uid: 5w70yhvy Background. Extensive contact tracing and testing in South Korea allows us to investigate the transmission dynamics of the COVID-19 into diverse local communities. Objective. Understand the critical aspects of transmission dynamics in a different age, sex, and clusters with various activities. Methods. We conducted a retrospective observational study with 3,127 confirmed cases' contact tracing data from the Center for Disease and Prevention (CDC) of South Korea. We investigated network property concerning infected persons' demographics and different infection clusters. Findings. Overall, women had higher centrality scores than men after week four, when the confirmed cases rapidly increased. Older adults have higher centrality than young/middle-aged adults after week 9. In the infection clusters, young/middle-aged adults' infection clusters (such as religious gatherings and gym facilities) have higher average path lengths and diameter than older adults' nursing home infection clusters. Interpretation. Some women had higher reproduction numbers and bridged successive transmission than men when the confirmed cases rapidly increased. Similarly, some older adults (who were not residents of nursing homes) had higher reproduction numbers and bridged successive transmission than young/middle-aged adults after the peak has passed. The young/middle-aged adults' religious gatherings and group workout have caused long successive transmissions. In contrast, the older adults' nursing homes were a small world where the transmissions within a few steps can reach out to many persons. Amid the global pandemic on coronavirus disease 2019 (COVID- 19) , epidemiologists promote containment strategies to prevent community transmission.(1) Contact tracing of susceptible persons and extensive testing in South Korea demonstrated as an example of successful containment.(1) South Korea reported the first confirmed cases on January 20, 2020, and the instances spiked from February 20 to 29, 2020.(2,3) As of mid-March, 2020, such containment strategies helped flatten the curve of new confirmed cases. (2) The contact tracing and testing allow us to investigate different transmission dynamics into various local communities. Person-to-person transmissions can be represented as a transmission network, and this network facilitates investigation of topological dynamics in disease transmission and detection of super-spreaders. (4, 5) Also, the transmission network can be divided into individual infection clusters that have different characteristics in terms of demographics and network topology. The network analysis has been widely studied in various infectious diseases such as Middle East Respiratory Syndrome (MERS), (4, 5) Several Acute Respiratory Syndrome (SARS),(6)(6,7) tuberculosis, (8, 9) and cholera.(10) Some COVID-19 studies have revealed evolving person-to-person transmission dynamics in China in a longitudinal view. (11, 12) A genomic mutation study has investigated country-to-country transmission dynamics on a global scale. (13) A piece of critical missing knowledge is how person-to-person transmission differs across different demographics and also different super-spreading environments in cluster infections. Thus, the objective of our study was to investigate network characteristics of person-to-person transmission in the different granularity: entire country and individual cluster infections. We provided interactive visualization in the code repository (https://github.com/yejinjkim/covid19-transmission-network) for an intuitive understanding of the evolving transmission network. To the best of our knowledge, our work is first to compare transmission dynamics in cluster-wise. We conducted a retrospective observational study with contact tracing data from the Center for Disease Controls and Prevention (CDC) of South Korea. The South Korea CDC actively performed testing, monitoring, and tracing contacts and disseminated the information to the public. We collected the contact tracing data prepared in the Data Science for COVID-19 Project in South Korea,(14) which enables researchers to access the data publicly. This data contains demographics, exposure type or travel history, confirmed/release date, underlying disease, deceased status. This study used 3,127 patient's transmission data from January 20, 2020 to April 7, 2020, retrieved on April 8, 2020. The evolving transmission network. We built a longitudinal transmission network using the transmission data. The transmission network consisted of nodes for the infected persons and directed edges from infectious nodes (infector) to infected nodes (infectee). The nodes and edges had timestamps at which the infectee's infections are confirmed. This longitudinal network evolved over time as accumulating all previous nodes and edges. Transmission network characteristics. We presented network characteristics in terms of centrality and path lengths: -Centrality is a useful measure to investigate individual patients who shed the virus most (i.e., superspreaders) over time. We analyzed network centrality with degree and betweenness. Node degree is the number of edges linked to the node in the network. Betweenness is the extent to which a node lies on the shortest paths between other nodes.(15) Nodes with high betweenness may have high control over transmission passing between nodes. We used betweenness to measure how much a patient contributed to the transmission as intermediate hosts. We defined a super-spreading person as a person with a degree > 3.5 for degree or top 5% betweenness values.(16) Note that the node degree was calculated based on all accumulated transmissions (whereas the reproduction number is generally calculated within the observation period). -Path length can measure how far the transmission chain of successive cases can go. The path length is the number of nodes or steps in the shortest path from a source node to a target node. The average path length is the average value of the shortest path lengths between all pairs of nodes in the network. The longest path length (also known as diameter) is the maximum amount of the shortest path length. Infection clusters. We divided the transmission network into subgraphs with connected nodes (i.e., the connected components). The connected components are an extended concept of infection cluster, as the connected components include all successive infected persons who are infected from the infection cluster. For briefly, we denoted the connected components as the infection cluster. During the contact tracing, it is expected that the index patient was unidentifiable in each infection cluster. To represent links among the nodes in the same cluster with an unknown index patient, we set a virtual index patient and we made edges from the virtual index patient to infected patients for each infection cluster. The evolving transmission network. From January 20, 2020 to April 7, 2020, we identified a total 1,611 transmission from 3,127 patients (727 person to person; 884 persons in infection clusters with unknown index patient). We presented demographic characteristics of patients and their transmission ( Fig.1 , Table 1 ) when newly confirmed cases rapidly increase (weeks 4, 5, 6) and after the recently confirmed cases decrease (after peak, week 10). Note that the mass cluster infection occurred at week 4 with an outbreak in religious gathering in Daegu. (17) The demographic characteristics of patients and their transmission varied in different observation periods. Until week 4 (the confirmed positive cases were at a low level), male to female ratio was similar (53% male; 47% female). After week 4 South Korea reported more female cases than male ones. Also, female-to-female transmissions were more frequent than male-to-male transmissions (but there was no statistical power as chi-squared value of the contingency table=1.36, p-value=0.2430). Throughout all observations, the largest age group was age 25-49. Although there was no reported transmission among similar aged children and adolescents (presumably due to strict school closure), this young age group sometimes infected their middle-aged parents. In week 6, South Korea reported an increased number of new confirmed cases from older adults, and the transmissions occurred mostly among the older adults. We also presented the network property of the transmission network. The transmission network had a mean degree of 1.88 after aggregating all time windows. The system was getting sparse as new confirmed cases exponentially increased with unknown transmission paths (complete network visualization in Supplement 1). The centrality of individual patients. We further investigated the distribution of the network centrality concerning infected persons' sex and race (animated plot in Supplement 2). We computed the statistical significance on the difference of centrality distribution using the Mann-Whitney-U test. (18) As a result, we found that women have higher centrality scores than men as the confirmed cases exponentially increase around week 5 to 6 (Fig. 3) . During the first four weeks after the first case, the number of instances slowly increased, and men tend to have higher degrees and betweenness. Women, however, started to have higher degrees and betweenness after week 5. We used the Mann-Whitney-U for hypothesis testing on whether super-spreading women have higher centrality values than super-spreading men. The p-values were statistically significant after week 6 for degree (p-value=0.0156) and week 11 for betweenness (p-value=0.0402). We found that older adults have higher centrality than young/middle-aged adults after the peak has passed around week 9 (Fig. 3) . After week 9, more older adults had higher degree and betweenness values than young/middle-aged adults. We used the Mann-Whitney-U for hypothesis testing on whether the super-spreading older adults have higher centrality than super-spreading young adults. The p-values were statistically significant after week 9 for degree (p-value=0.0413). Note that there was a time lag between degree and betweenness as the betweenness increases after successive transmission occurs. There were a total of 147 clusters (defined as connected components) and we selected 12 clusters that have at least 20 patients involved. These clusters turned out to be related to religious gatherings, gym facilities, nursing homes, and customer call center office (Table 3 , Fig. 4) . The clusters' network visualization is available at Supplement1. Overall, the top four largest clusters were related to Guro customer call center, Shincheonji Church, River of Grace Community Church, and gym facilities in Cheonan. A common characteristic of the four largest clusters was that they all had higher women rates (72%, 55%, 60%, and 72%, respectively). Clusters with the smallest path lengths were all related to nursing homes (Gunpo Nursing Home, Cheongdo Daenam Hospital Psychiatric Ward, and Bongwha Nursing Home). Clusters with the most considerable path length were related to Gym facilities in Cheonan and Shincheonji Church. We observed that similar activities share similar path lengths (three nursing homes had the path length of 1.91-1.94; four churches had the path length of 2.42-2.94). All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . The objective of this study was to provide the evolving transmission dynamics of COVID-19 in South Korea. We investigated network property and demographic features from the whole transmission network and also clustered level. The demographics of newly confirmed patients and their transmission provided an overview of how various demographic groups contribute to transmission. The main observation includes: i) In early days until week four most confirmed positives were imported cases who were male in their 40s; ii) women were more than 50% of newly confirmed cases after week 4; iii) there were more female-to-female transmissions than male-to-male transmissions; iv) children and adolescents infected their parents; and v) older adults infected other older adults. Further investigation is needed to obtain statistical power. We expect these observations can provide situational awareness to future contact tracing efforts. The network characteristics of the entire system found the transmission network's density and successive transmission lengths evolved over time. The transmission network was getting sparse as the number of newly confirmed cases rapidly increased. The sparsity limitation was inherited from the contact tracing containing the increasing number of unknown transmission paths in local communities. Thus our estimate on network property (degree, betweenness, diameter, and path length) should be interpreted as a lower bound. The main finding from network centrality analysis was that women had higher centrality (in both degree and betweenness) than men had as the instances rapidly increased. This implies not only that women's reproduction number was larger than men's, but also that some women bridged the long successive transmissions. A possible explanation might be that men are more likely to die or critically ill than women, and consequently men have less opportunity to shed the virus. (19) Note that the higher portion of confirmed cases does not necessarily cause higher centrality. The second finding is that older adults had high centrality after the peak has passed around week 9. Note that these older adults with high reproduction numbers were not related to nursing homes. Older adults are more vulnerable to critical illness.(20) Based on our findings, they were not active super-spreaders in the early days. Still, after the virus propagated into local communities, the older adults were more likely to be infected by the virus and be superspreaders. A breakdown into smaller infection clusters allowed us to find common network characteristics within similar activities. There were three major types of clusters: religious gatherings, gym facilities, and nursing homes. The religious gathering and gym facilities clusters had high path length and diameter. These clusters mainly consisted of young/middle-aged adults. This observation may imply that the active behavior (e.g., physically close gatherings in church, active group workout in closed facilities) of young/middle-aged adults can spread further away to other communities. Thus, restricting trips to other cities would be an adequate policy to prevent the spread effectively. In contrast, the nursing home type clusters consist of older adults with low path lengths and low diameter. Each nursing home cluster grew fast with relatively short path length, implying that the clusters might have a small-world property that spread the virus rapidly as the majority of nodes were reachable in a small number of steps. Among the three nursing homes, Cheongdo Daenam Hospital Psychiatric Ward had the highest fatality, 31.82%. This closed psychiatric ward has many long-term patients with underlying diseases. (17) Apart from the above local community infections, imported cases were also consistently increasing during all observation periods. The mean age of the imported patients was 42 years old at week two and consistently decreased to 35.6 years old until week 11. Men took more than 50% of the imported cases for all observation periods (from 71% to 59% at week two and week 11). This study's limitation mainly comes from the sample bias in contact tracing data. The contact tracing of the rapidly evolving infectious disease inevitably contains case ascertainment biases, non-homogenous sampling over time and location, and uncontrolled correlation.(11) A substantial number of cases were missing (officially confirmed cases from Korea CDC was more than 8,000 as of April). Each local province government performs and reports the contact tracing, but the difficulty of the instances or granularity of reporting varies from each local government. For example, the import cases were more likely to be tracked well, but the mass infection cluster in ShinCheonji Church has not been carefully investigated. Despite the limitations, our descriptive study can provide an overview of evolving transmission dynamics in the country and infection clusters for the rapidly emerging infectious disease. Our findings suggest a different level of reproduction and successive transmission chain over sex, age, and cluster activities. Our analysis code is publicly available to adapt to newly reported cases. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted May 12, 2020. . https://doi.org/10.1101/2020.05.07.20091769 doi: medRxiv preprint From Mitigation to Containment of the COVID-19 Pandemic: Putting the SARS-CoV-2 Genie Back in the Bottle Regular briefing: COVID-19 domestic outbreak status Information Technology-Based Tracing Strategy in Response to COVID-19 in South Korea-Privacy Controversies Topological dynamics of the 2015 South Korea MERS-CoV spread-on-contact networks. Sci Rep Network Analysis of MERS Coronavirus within Households, Communities, and Hospitals to Identify Most Centralized and Super-Spreading in the Arabian Peninsula Network theory and SARS: predicting outbreak diversity Model parameters and outbreak control for SARS. Emerg Infect Dis Transmission network analysis of infectious disease: a study of tuberculosis transmission in Portugal Transmission network analysis to complement routine tuberculosis contact investigations Integration of Spatial and Social Network Analysis in Disease Transmission Studies Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside Hubei province, China: a descriptive and modelling study Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study [Internet]. The Lancet Infectious Diseases Global transmission network of SARS-CoV-2: from outbreak to pandemic Korean Society for Antimicrobial Therapy, Korean Society for Healthcare-associated Infection Control and Prevention The Mann-Whitney U: A Test for Assessing Whether Two Independent Samples Come from the Same Distribution Tutorials in Quantitative Methods for Psychology COVID-19: the gendered impacts of the outbreak Clinical course and risk factors for mortality of adult inpatients with COVID-19 in Wuhan, China: a retrospective cohort study The Lancet We collected the contact tracing data prepared in the Data Science for COVID-19 Project in South Korea. Y.K. and X.J. were partly supported by the UT Startup award, UT STARs award, and Cancer Prevention Research in Texas (CPRIT RR180012), and National Institute of Health (NIGMS R01GM124111). Supplement 1: Network visualization for cluster infections Supplement 2: Animated degree distribution concerning age and sex Supplement 3: Animated evolving transmission dynamics of clusters Author Contributions YK collected data; YK and XJ performed statistical analysis; and YK and XJ prepared the manuscript. The sponsor of the study had no role in study design, data collection, data analysis, data interpretation, or writing manuscript. ** * ** * ** ** ** * * ** ** ** Religious gatherings Imported casesNursing homes