key: cord-0994391-afek8kgr authors: Li, Xingguang; Wang, Wei; Zhao, Xiaofang; Zai, Junjie; Zhao, Qiang; Li, Yi; Chaillon, Antoine title: Transmission dynamics and evolutionary history of 2019‐nCoV date: 2020-02-14 journal: J Med Virol DOI: 10.1002/jmv.25701 sha: f80b74f407527e7be87f2b27090978bcde3d1d06 doc_id: 994391 cord_uid: afek8kgr To investigate the time origin, genetic diversity, and transmission dynamics of the recent 2019‐nCoV outbreak in China and beyond, a total of 32 genomes of virus strains sampled from China, Thailand, and the USA with sampling dates between 24 December 2019 and 23 January 2020 were analyzed. Phylogenetic, transmission network, and likelihood‐mapping analyses of the genome sequences were performed. On the basis of the likelihood‐mapping analysis, the increasing tree‐like signals (from 0% to 8.2%, 18.2%, and 25.4%) over time may be indicative of increasing genetic diversity of 2019‐nCoV in human hosts. We identified three phylogenetic clusters using the Bayesian inference framework and three transmission clusters using transmission network analysis, with only one cluster identified by both methods using the above genome sequences of 2019‐nCoV strains. The estimated mean evolutionary rate for 2019‐nCoV ranged from 1.7926 × 10(−3) to 1.8266 × 10(−3) substitutions per site per year. On the basis of our study, undertaking epidemiological investigations and genomic data surveillance could positively impact public health in terms of guiding prevention efforts to reduce 2019‐nCOV transmission in real‐time. Previous studies have confirmed that this virus can spread from person to person after identifying clusters of cases among families, as well as transmission from patients to healthcare workers. 1 in origin, with prior studies revealing bats to be the animal host source, [9] [10] [11] [12] and masked palm civets [13] [14] [15] and camels 16, 17 to be the intermediate animal hosts (between bats and humans) of the two diseases, respectively. Recent research has also reported that the 2019-nCoV virus is 96% identical at the genome level to a previously detected bat coronavirus, which belongs to a SARS-related coronavirus species (ie, SARS-CoV). 18 Like SARS-CoV, MERS-CoV, and many other coronaviruses, 2019-nCoV likely originated in bats, but it remains unclear whether an intermediary animal host was involved before the virus jumped to humans. As reported in earlier research, however, although bats could be the original host of 2019-nCoV, the virus may have initially been transmitted to an intermediate animal host sold at the Wuhan Huanan Seafood Wholesale Market, thus facilitating the emergence of 2019-nCoV in humans. 19 In the present study, we investigated the time origin and genetic diversity of 2019-nCoV in humans based on 32 genomes of virus strains sampled from China, Thailand, and the USA with known sampling dates between 24 December 2019 and 23 January 2020. We conducted a comprehensive genetic analysis of four 2019-nCoV genome sequence datasets (ie, "dataset_14," "dataset_24," "data- To assess the recombination for the full dataset (ie, "dataset_32"), we employed the pairwise homoplasy index (PHI) test to measure the similarity between closely linked sites using SplitsTree v4.15.1. 23 The best-fit nucleotide substitution model for "dataset_32" was identified The HIV TRAnsmission Cluster Engine (www.hivtrace.org) 35 was employed to infer transmission network clusters for the full dataset (ie, "dataset_32"). All pairwise distances were calculated and a putative linkage between each pair of genomes was considered whenever their divergence was less than equal to 0.0001 F I G U R E 1 Likelihood-mapping analyses of 2019-nCOV. Likelihoods of three tree topologies for each possible quartet (or for a random sample of quartets) are denoted by a data point in an equilateral triangle. The distribution of points in seven areas of triangle reflects tree-likeness of data. Specifically, three corners represent fully resolved tree topologies; center represents an unresolved (star) phylogeny; and sides represent support for conflicting tree topologies. Results of likelihood-mapping analyses of four datasets (A, "dataset_14"; B, "dataset_24"; C, "dataset_30"; and D, "dataset_32") are shown LI ET AL. | 503 For "dataset_32", the HKY model provided the best fit across the four different methods (ie, AIC, AICc, BIC, and DT) and two different substitution schemes (ie, 24 and 88 candidate models), and was thus used in subsequent likelihood-mapping and phylogenetic analyses for the four datasets. The PHI test of "dataset_32" did not find statistically significant evidence for recombination (P = 1.0). Likelihood-mapping analysis of "dataset_14" revealed that 100% of the quartets were distributed in the center of the triangle, indicating a strong star-like topology signal reflecting a novel virus, which may be due to exponential epidemic spread ( Figure 1A) . Likewise, 91.9%, 81.8%, and 74.7% of the quartets from "dataset_24," "dataset_30," and "data-set_32," respectively, were distributed in the center of the triangle, indicating relatively more phylogenetic signals as additional sequences were analyzed over time ( Figure 1B-D) . ML phylogenetic analysis of the four datasets also showed star-like topologies, in accordance with the likelihood-mapping results (Figure 2 ). Root-to-tip regression analyses between genetic divergence and sampling date using the best-fitting root showed that "dataset_14" had a relatively strong positive temporal signal (R 2 = .2967; correlation coefficient = .5446) ( Figure 3A ). In contrast, "dataset_24" had a minor negative temporal signal (R 2 = 4.4428 × 10 −2 ; correlation coefficient = −.2108) ( Figure 3B ); whereas, "dataset_30" and "dataset_32" both had minor positive temporal signals (R 2 = 1.2155 × 10 −2 ; correlation coefficient = .1102 and 17 January 2020) for "dataset_14," "dataset_24," "dataset_30," and F I G U R E 2 Estimated maximum-likelihood phylogenies of 2019-nCOV. Colors indicate different sampling locations. The tree is midpoint rooted. Results of maximum-likelihood phylogenetic analyses of four datasets (A, "dataset_14"; B, "dataset_24"; C, "dataset_30"; and D, "dataset_32") are shown "dataset_32," respectively (Table 1) . Furthermore, based on Bayesian time -scaled phylogenetic analysis using the tip-dating method, we also estimated the TMRCA dates and evolutionary rates from "dataset_30" and (Table 1) . Due to poor convergence in the MCMC chains, we did not obtain the TMRCA date and evolutionary rate from "dataset_14" and We considered individuals as genetically linked when the genetic distance between 2019-nCoV strains was less than 0.01% substitutions/site. This allowed us to identify a single large transmission cluster that included 30 of 32 (93.75%) genomes, thus suggesting low genetic divergence for "dataset_32" (Figure 6A ). We also considered individuals as genetically linked when the genetic distance between 2019-nCoV strains was less than 0.001% substitutions/site. This allowed us to identify three transmission clusters that included 15 of 32 (46.875%) genomes for "dataset_32" ( Figure 6B ). Clusters ranged in size from two to nine genomes. Two clusters, which contained two January 2020, and subsampled "dataset_14," "dataset_24," and "da- (Figure 1 ). Of note, the strong star-like signal (100% of quartets were distributed in the center of the triangle) from "dataset_14" at the beginning of the virus outbreak suggests that 2019-nCoV initially exhibited low genetic divergence, with recent and rapid human-tohuman transmission. This result is consistent with the ML phylogenetic analyses, which showed polytomy topology from "da-taset_14" (Figure 2A ). The genetic divergence from "dataset_32" and "dataset_30" was higher than that for "dataset_14," but still demon- (Table 1) . This is considered reasonable given the limited genetic divergence and strong star-like signals and is also consistent with our previous study. 36 Using the tip-dating method, the mean TMRCA date and Table 1 ). The TMRCA estimated by the tip-dating method was relatively narrower than that determined by the constrained evolutionary rate method. We identified three phylogenetic clusters with posterior probabilities between .99 and 1.0 using Bayesian inference (Figures 4 and 5) . We also identified three transmission clusters when the genetic distance between the 2019-nCoV strains was less than 0.001% substitutions/site ( Figure 6 ). Intriguingly, only one cluster (Guangdong/20SF028/2020 and Guangdong/20SF040/2020 from Zhuhai) was identified by both phylogenetic and network-based methods. This is a good example showing the differences between phylogenetic (posterior probability or bootstrap value) and network-based (genetic distance) methods. However, our conclusions should be considered preliminary and explained with caution due to the limited number of 2019-nCOV genome sequences presented in this study. The first genome sequence of 2019-nCoV was made public in early January 2020, with several dozen-taken from various peoplenow available. The genome sequences of 2019-nCoV have already led to diagnostic tests, as well as efforts to study its dispersal and evolution. As the outbreak continues, we will require multiple gen- Therefore, we predict that one or more mutations may be selected and sustained during the 2019-nCoV outbreak as the virus adapts to human hosts and possibly reduces its virulence, as reported in the previous study. 37 However, we are uncertain whether this will influence its transmissibility. In conclusion, our results emphasize the importance of likelihood -mapping, transmission network, and phylogenetic analyses in providing insights into the time origin, genetic diversity, and transmission dynamics of 2019-nCOV. Improving the linkage between patient records and genome sequence data would also allow large-scale studies to be undertaken. Such research could directly influence public health in terms of prevention efforts introduced to reduce virus transmission in real-time. This study was supported by a grant from the National Natural Science A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-toperson transmission: a study of a family cluster Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia Epidemiology, genetic recombination, and pathogenesis of coronaviruses Identification of a novel coronavirus in patients with severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome Epidemiology and cause of severe acute respiratory syndrome (SARS) in Guangdong, People's Republic of China Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Middle East respiratory syndrome coronavirus (MERS-CoV): announcement of the Coronavirus Study Group Ecoepidemiology and complete genome comparison of different strains of severe acute respiratory syndrome-related Rhinolophus bat coronavirus in China reveal bats as a reservoir for acute, self-limiting infection that allows recombination events Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China Severe acute respiratory syndrome coronavirus-like virus in Chinese horseshoe bats Bats are natural reservoirs of SARS-like coronaviruses Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China SARS-CoV infection in a restaurant from palm civet MERS coronavirus neutralizing antibodies in camels MERS coronaviruses in dromedary camels Discovery of a novel coronavirus associated with the recent pneumonia outbreak in humans and its potential bat origin Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding disease and diplomacy: GISAID's innovative contribution to global health MAFFT multiple sequence alignment software version 7: improvements in performance and usability BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT Application of phylogenetic networks in evolutionary studies jModelTest 2: more models, new heuristics and parallel computing Maximum-likelihood analysis using TREE-PUZZLE TREE-PUZZLE: maximum likelihood phylogenetic analysis using quartets and parallel computing New algorithms and methods to estimate maximumlikelihood phylogenies: assessing the performance of PhyML 3.0 Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen) Bayesian phylogenetics with BEAUti and the BEAST 1.7 Many-core algorithms for statistical phylogenetics Moderate mutation rate in the SARS coronavirus genome and its implications Transmission and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a descriptive genomic study Spread, circulation, and evolution of the Middle East respiratory syndrome coronavirus Posterior summarization in Bayesian phylogenetics using Tracer 1.7 HIV-TRACE (TRAnsmission Cluster Engine): a tool for large scale molecular epidemiology of HIV-1 and other rapidly evolving pathogens Potential of large 'first generation' human-tohuman transmission of 2019-nCoV Attenuation of replication by a 29 nucleotide deletion in SARS-coronavirus acquired during the early stages of human-to-human transmission Transmission dynamics and evolutionary history of 2019-nCoV The authors declare that there are no conflict of interests. XL conceived and designed the study and drafted the manuscript. XL and AC analyzed the data. XL, WW, XZ, JZ, QZ, YL, and AC interpreted the data and provided critical comments. All authors reviewed and approved the final manuscript. http://orcid.org/0000-0002-3470-2196