key: cord-0758450-trv04gok authors: Luo, Tianyi; Wang, Jiaojiao; Wang, Quanyi; Wang, Xiaoli; Zhao, Pengfei; Zeng, Daniel Dajun; Zhang, Qingpeng; Cao, Zhidong title: Reconstruction of the Transmission Chain of COVID-19 Outbreak in Beijing's Xinfadi Market, China date: 2022-01-21 journal: Int J Infect Dis DOI: 10.1016/j.ijid.2022.01.035 sha: dc1fd4075da1cc958dabf5a5bb3ab94f42a4ba85 doc_id: 758450 cord_uid: trv04gok Objectives Reconstructing the complete transmission chain of the COVID-19 outbreak in Beijing's Xinfadi Market using data from epidemiological investigations, which contributes to reflecting transmission dynamics and transmission risk factors. Methods We set up a transmission model, and the model parameters are estimated from the survey data via Markov chain Monte Carlo sampling. The Bayesian data augmentation approaches are used to account for uncertainty in the source of infection, unobserved onset, and infection dates. Results The rate of transmission of COVID-19 within households is 9.2 %. Older people are more susceptible to infection. The accuracy of our reconstructed transmission chain was 67.26%. In the gathering place of this outbreak, the Beef and Mutton Trading Hall of Xinfadi market, most of the transmission occurs within 20m, only 19.61% of the transmission occurs over a wider area (>20m), with an overall average transmission distance of 13.00m. The deepest transmission generation is 9. In this outbreak, there were two abnormally high transmission events. Conclusions The statistical method of reconstruction of transmission trees from incomplete epidemic data provides a valuable tool to help understand the complex transmission factors and provides a practical guideline for investigating the characteristics of the development of epidemics and the formulation of control measures. Between June 15 and July 10, a total of more than 10 million citizens and 5342 environmental samples were screened. Eventually, 368 qRT-PCR positive cases were confirmed (Pang et al., 2020) . The Beijing Center for Disease Control and Prevention conducted an epidemiological investigation on 368 cases and obtained valuable data such as incidence data, contact tracking data, and spatial data of patients. Many studies have been carried out based on this dataset. Some studies have focused on investigating the effect of interventions deployed in Beijing after the outbreak in XFD market (Cui et al., 2021; Wang et al., 2021; Wei, Guan, Zhao, Shen, & Chen, 2020) . Such studies usually use modified susceptible-exposed-infectious-recovered (SEIR) transmission models, depending on the specific scenario. Han (Han et al., 2021) used spatial autocorrelation analysis and Spearman correlation analysis to research the spatial clustering characteristics of the COVID-19 pandemic and the impact of environmental factors in Beijing. In addition, many biological and medical scholars (Chen, Shi, Zhang, Wang, & Sun, 2021; Pang et al., 2020) analyzed the epidemiological 5 characteristics and clinical manifestations of this outbreak. However, the available transmission link data in the investigation is limited, which cannot restore the complete transmission process. The complete transmission tree offers many potential benefits. In particular, the complete transmission tree can 1) lead to an improved understanding of transmission dynamics, 2) generate valuable intelligence on key epidemiological parameters and risk factors for transmission, which paves the way for more targeted and cost-effective interventions. Obtaining and reconstructing transmission link data is challenging. First, and most important, transmission dynamics are usually not observed. There is great uncertainty in the course of infection, which is impossible to directly measure an individual's exposure to a potential source of infection (Salje et al., 2016) . The great uncertainty in the source of infection has made available link data scarce. Second, many vital points in the course of a patient's illness depend on the patient's recollection (Cauchemez et al., 2011) . The recall uncertainty led to the deviation of the data. Third, factors that affect the risk of infection are multiple and complex. They often intertwine features of individuals, e.g., age, behavior, mobility, the places they visited, and their social network. This complex intertwinement complicates the reconstruction of the propagation chain modeling. Depending on the data types, there are two most common approaches to infer the propagation chain. The 'pairwise approach' is based on onset time and genetic data, which builds a disease transmission model and incorporates a genetic model that describes Pairwise genetic distance between putative transmission pairs (Jombart et 6 al., 2014; Lau, Marion, Streftaris, & Gibson, 2015; Worby et al., 2016) . The 'phylogenetic approach' uses genetic data to infer the unobserved history of coalescent events between sampled pathogen genomes in the form of a phylogenetic tree. It infers transmission trees consistent with this phylogeny using epidemiological data (Klinkenberg, Backer, Didelot, Colijn, & Wallinga, 2017) . But gene sequence data does not always provide spread information of the epidemics. Genetic diversity across most outbreaks is low, and a significant portion of genetic sequences is expected to be identical. The informativeness of genetic sequence data is also limited by complex evolutionary behavior. Gradually, the patient's attributes and behavioral data were used, including symptom onset time, contact tracking data, spatial data, and location data of visits (Campbell, Cori, Ferguson, & Jombart, 2019; Cauchemez et al., 2011) . We take the outbreak of novel coronavirus in Beijing in June 2020 as the case study. We analyzed detailed data describing the second outbreak caused by Beijing Xinfadi. We built a propagation model and used the reliable Bayesian data-augmentation statistical techniques (Cauchemez et al., 2006; Salje et al., 2016) to account for the uncertainty of infection sources, the date of unobserved onset, and infection. Then, we reconstructed the chain of propagation of the outbreak and assessed the influence of spatial distance, family relationship, visiting relationship, etc., on transmission risk. The accuracy of the propagation chain is also verified. The outbreak investigation was performed by the Beijing Center for Diseases Prevention and Control. The research team investigated the cluster of infections caused by the Xinfadi agricultural products market (XFD) on June 11, 2020, and July 12, 2020. A total of 368 cases of COVID-19 infection have been reported in this outbreak. There are 335 confirmed cases and 33 asymptomatic cases. There are 14 trading halls in XFD market. The beef and mutton trading hall (Referred to as B1 hall) has been identified as the virus spread from XFD market (Pang et al., 2020) . In addition, the basic personal information and the onset information of each case were also collected. Personal information includes sex, age, native place, and address. The information on the occurrence includes the type of infection, the dates of onset, the date of diagnosis, the type of exposure to the XFD and the information of the staff's booth in the XFD. Asymptomatic infections were defined as individuals who have not developed any symptoms but test positive for SARS-CoV-2 by nucleic acid tests. Assuming that individuals with a positive nucleic acid test are infected with SARS-CoV-2, we build a statistical model (Salje et al., 2016) to ascertain risk factors for transmission. In particular, the model is used to estimate the role that the social relation, location, sex, age, exposed type had on transmission dynamics. We build a transmission model for the force of infection exerted on individual 8 at time (Salje et al., 2016) , Where → ( | , ) is the instantaneous hazard of transmission from individual to individual at time : ( , ) represents the transmission rate between individuals and , where and are in the same aggregation relationship. The transmission rate is estimated for five types of pairs of individuals. The set of pairs of individuals is partitioned in five types of interaction: "household" (i.e., individuals from the same household or same stall in the XFD market), "colleagues" (i.e., individuals who work for the same company), "the B1 hall" (i.e., individuals of different stalls on the B1 Hall of the XFD market), "exposed" (i.e., individuals have been exposed together in the same place), "others" (i.e., pairs of individuals without any of the above relationships). A hierarchy is set up so that each pair has one and only one type of interaction: household > colleagues > the B1 hall > exposed > others. For example, a pair of two colleagues from the same household is defined as a household pair. where characterizes the role of sex on risk of infection (male is the Here, we use a Bayesian data augmentation framework to tackle the missing data problem (Cauchemez et al., 2011; Cauchemez, Carrat, Viboud, Valleron, & Boelle, 2004; Cauchemez et al., 2006) . In the past, this approach has been successfully used to deal with similar problems (Cauchemez et al., 2004; Cori, Boëlle, Thomas, Leung, & Valleron, 2009; Walker et al., 2010) . The dataset is "augmented" with missing dates of infection and a few missing dates of symptom onset. If we schematically denote the observed data, the augmented data, and the parameter vector, the joint posterior distribution of augmented data and model parameters is proportional to This For all parameters except for the transmission kernel parameter, we used a lognormal prior distribution with a log(mean) equal to zero and a log(variance) equal to one. For the transmission kernel parameter, we used an exponential prior distribution with parameter of 0.0001. At every iteration of the MCMC sampling scheme, we undertook the following: 1) Metropolis-Hastings update for the parameters in the model. At every iteration, 11 all parameters were updated once. Metropolis-Hastings updates were performed on a log scale with the step size adjusted to achieve an acceptance probability between 20% and 30%. 2) Independent sampler for the days of infection: At each iteration, the day of infection was updated with an independent sampler for 50 randomly selected cases. Candidate values for the length of the incubation period were drawn from the incubation period distribution. 3) Independent sampler for the unobserved days of onset: For cases with unobserved dates of onset, the augmented date of onset was updated with an independent sampler. Outbreak Investigation. In June 2021, one of the largest cluster outbreaks of COVID-19 broke out in Fig.1A shows the number of cases by date of confirmed for the people with and without XFD exposed history. The majority of the 169 XFD employees worked on the B1 Hall (119 cases, 70.4%). Fig. 1B shows the spatial plane of B1 Hall and the diagnosis. The B1 Hall is the main exposure site. Meanwhile, the resting place of XFD is also at high risk of exposure. Fig.1C shows the number of cases by date of symptom onset and sex among B1 Hall. Establish the transmission model and determine the transmission risk factors. All individuals with positive nucleic acid tests were included in the analysis as case studies. Data enhancement techniques were used to combine the uncertainty of the onset and unobserved infection's date. Table 1 shows all the estimated parameters. We estimated the transmission probability between families (the same family and the same stall are considered family relationships), colleagues, different stalls on the B1 Hall, individuals with shared exposure history, and individuals with no apparent relationship. An exponential distribution kernel was used to characterize the distance between different booths in the B1 Hall (i.e., pairs of individuals operating at different booths). We found that there was a 9.2% probability of transmission between family members and the same booth in the B1 Hall [95% confidence interval (CI): 8.5-9.9%] ( Fig. 2A) , but in the B1 Hall, the probability of transmission from 15 meters away was 0.5% (95% CI: 0.2-0.5%), with a probability of 0.02% (95%CI: 0.1-0.2%) at a distance of 50 meters (Fig. 2B) , indicating that transmission is highly concentrated. Women and men are similarly likely to be infected. Women were less likely to be infected by 0.91 factor (95% confidence interval: 0.89-0.94) (Fig. 2C) . The risk of infection in children and adults was of factor 0.70 as in the elderly (95%CI: 0.68-0.71) (Fig. 2C ). After updating the infection time and correcting the onset time of the model, the incubation period of 100 experimental results was statistically analyzed, and the incubation period distribution obtained was shown in Fig. 2D . The incubation period of 5 days was the most common, accounting for 18.2% of the total cases. The incubation period of 50.2% of cases was less than 5 days, and the incubation period of 20.5% of cases was more than 7 days. According to the transmission risk between each pair, we reconstructed 100 Those with more than five secondary cases accounted for 3.77% of the total number 14 of cases. The average number of cases with more than 16 secondary cases was only two. According to the reconstructed transmission tree, the two patients were both staff at the booth on the B1 Hall of XFD Market. 24.5% of people are contributed to 80% of transmission events. Fig. 4B shows the age and gender propagation matrix of this propagation event constructed by the propagation tree. In this outbreak, the male-to-male transmission propagation was 32.06%, and the female-to-female propagation was the lowest, 17.28%. We divided 368 cases into four age groups: 0~20,20~40,40~60,60~80. As can be seen from the figure, this transmission mainly occurred between the ages of 20 and 60. Fig.4C shows the cumulative number of infections over time. The average depth of the reconstructed propagation tree is 9.2 ( Fig. 4D ). That is, the longest propagation generation in this outbreak is 9.2. Fig. 4E is an example of a reconstructed transmission tree. The After constructing the entire transmission tree, the conclusion can be drawn that this outbreak is made up of multiple clustered propagation events. The spread of the virus is largely driven by the movement of people in XFD market. The virus then continues to spread in the households, companies, and other public places. The B1 Hall of the XFD market hall is where the outbreak originated and gathered. The transmission rate within this space is closely related to the distance between booths. When the booth distance is exceeded 40 meters, the transmission probability becomes very small. In this transmission event, only 2.83% of the transmission exceeded 40m. Transmissions between booths of more than 40m depend mainly on the movement of people. Therefore, during the epidemic, vendors should pay special attention to the prevention and control of nearby stalls and mobile visitors. Household transmission is a typical cluster transmission, and the transmission probability between family members is the highest among the several relationships we set. Family transmission events accounted for approximately 23.67% of the transmission events, second only to the number of jointly exposed infectious events in XFD market. In family transmission, the most common is between the ages of 20 and 60. Older people are significantly more susceptible than young people. Therefore, family prevention and control and priority protection measures for the elderly should be paid more attention to. Two abnormally high propagation events occurred during this propagation. One of them was an employee of the booth where the infection originated, and the other was also an employee of the B1 Hall, an asymptomatic infected person. These two cases infected up to 10% of the transmission events. According to the investigation and analysis, the asymptomatic patient had been infected as early as June 7, and the nucleic acid positive was detected only on June 26. This shows that asymptomatic patients still have a strong ability to infect in the asymptomatic stage. Because there are no significant symptoms, the early prevention and control of asymptomatic patients is difficult. In general, we reconstructed the complete transmission chain of the second outbreak of COVID-19 in Beijing through Bayesian data enhancement and unveiled the transmission characteristics of this epidemic through the analysis of the transmission chain. The conclusions from our study can guide the design of more targeted and sustainable mitigation strategies. Our reconstructed transmission models will help analyze risk factors for outbreaks and help calibrate future modeling efforts. This work was supported by the National Natural All authors declared no conflict of interests. The survey was discussed with the Beijing Center for Diseases Prevention and Control and Institute of Automation Chinese Academy of Sciences, who reviewed the content and advised it was exempt from the ethics committee review. This study was considered a continuation of the public health investigation associated with an emerging infectious disease. Bayesian inference of transmission chains using timing of symptoms, pathogen genomes and contact data Role of social networks in shaping disease transmission during a community outbreak of 2009 H1N1 pandemic influenza A Bayesian MCMC approach to study transmission of influenza: application to household longitudinal data Investigating heterogeneity in pneumococcal transmission: a Bayesian MCMC approach applied to a follow-up of schools The re-emergence from the COVID-19 epidemic of Beijing Xinfadi Market Temporal variability and social heterogeneity in disease transmission: the case of SARS in Hong Kong Transmission dynamics and the effects of non-pharmaceutical interventions in the COVID-19 outbreak resurged in Beijing, China: a descriptive and modelling study Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. science Spatial distribution characteristics of the COVID-19 pandemic in Beijing and its relationship with environmental factors Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks A systematic Bayesian integration of epidemiological and genetic data Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Cold-chain food contamination as the possible origin of COVID-19 resurgence in Beijing How social structures, space, and behaviors shape the spread of infectious diseases using chikungunya as a case study A Bayesian approach to quantifying the effects of mass poultry vaccination upon the spatial and temporal dynamics of H5N1 in Northern Vietnam A novel coronavirus outbreak of global health concern. The lancet Coronavirus disease 2019 outbreak in Beijing's Xinfadi Market, China: a modeling study to inform future resurgence response. Infectious diseases of poverty Inference of start time of resurgent COVID-19 epidemic in Beijing with SEIR dynamics model and evaluation of control measure effect Zhonghua liu Xing Bing xue za zhi= Zhonghua Liuxingbingxue Zazhi Reconstructing transmission trees for communicable diseases using densely sampled genetic data. The annals of applied statistics