key: cord-0893815-sphwclzs
authors: Zhan, Shing Hei; Deverman, Benjamin E.; Chan, Yujia Alina
title: SARS-CoV-2 is well adapted for humans. What does this mean for re-emergence?
date: 2020-05-02
journal: bioRxiv
DOI: 10.1101/2020.05.01.073262
sha: 43f54ba1037a7c771b86d300e993ced4bbc86248
doc_id: 893815
cord_uid: sphwclzs

In a side-by-side comparison of evolutionary dynamics between the 2019/2020 SARS-CoV-2 and the 2003 SARS-CoV, we were surprised to find that SARS-CoV-2 resembles SARS-CoV in the late phase of the 2003 epidemic after SARS-CoV had developed several advantageous adaptations for human transmission. Our observations suggest that by the time SARS-CoV-2 was first detected in late 2019, it was already pre-adapted to human transmission to an extent similar to late epidemic SARS-CoV. However, no precursors or branches of evolution stemming from a less human-adapted SARS-CoV-2-like virus have been detected. The sudden appearance of a highly infectious SARS-CoV-2 presents a major cause for concern that should motivate stronger international efforts to identify the source and prevent near future re-emergence. Any existing pools of SARS-CoV-2 progenitors would be particularly dangerous if similarly well adapted for human transmission. To look for clues regarding intermediate hosts, we analyze recent key findings relating to how SARS-CoV-2 could have evolved and adapted for human transmission, and examine the environmental samples from the Wuhan Huanan seafood market. Importantly, the market samples are genetically identical to human SARS-CoV-2 isolates and were therefore most likely from human sources. We conclude by describing and advocating for measured and effective approaches implemented in the 2002-2004 SARS outbreaks to identify lingering population(s) of progenitor virus.

Several reports have noted that SARS-CoV-2 appears genetically stable and not under much pressure to adapt, which bodes well for diagnostics, vaccine, and therapeutics development (1) (2) (3) (4) . How long a particular antiviral, antibody, or vaccine will be effective against SARS-CoV-2 depends greatly on how fast and how extensively the target gene or protein is evolving. To identify effective therapies, immense efforts have already been directed towards elucidating the precise structure of SARS-CoV-2 proteins, ideally, in complex with potential drug candidates -some of which are already undergoing clinical trials (5) (6) (7) (8) (9) (10) (11) . Any new SARS-CoV-2 variant that can escape or confound these highly precise approaches will subvert efforts to control the pandemic. Therefore, it is very important to ascertain that the SARS-CoV-2 genes targeted in these efforts have stabilized and to identify novel virulent variants as soon as possible.

To gain a better understanding of how stable the SARS-CoV-2 genome is, we first performed a side-by-side comparison of evolutionary dynamics between SARS-CoV-2 and SARS-CoV. For this analysis, we curated high quality genomes spanning ~3-month periods for the following groups: 11 genomes for early-to-mid epidemic SARS-CoV, 32 genomes for late epidemic SARS-CoV, and 46 genomes for SARS-CoV-2 that included an early December, 2019 isolate, Wuhan-Hu-1, and 15 randomly selected genomes from each month of January through March, 2020 sampled from diverse geographical regions (methods in Supplementary Materials). We were surprised to find that SARS-CoV-2 exhibits low genetic diversity in contrast to SARS-CoV, which harbored considerable genetic diversity in its early-to-mid epidemic phase (Figure 1 ) (12) ; nucleotide diversity estimates (13, 14) across all sites, non-synonymous and synonymous, for each locus examined are provided in the Supplementary Table. SARS-CoV was observed to adapt under selective pressure that was highest as it crossed from Himalayan palm civets (intermediate host species) to humans and diminished towards the end of the epidemic (15) (16) (17) (18) ; this series of adaptations between species and in humans culminated in a highly infectious SARS-CoV that dominated the late epidemic phase. In comparison, SARS-CoV-2 exhibits genetic diversity that is more similar to that of late epidemic SARS-CoV (Figure 1,   Supplementary Table) . In fact, the exceedingly high level of identity shared among SARS-CoV-2 isolates makes it impractical to model site-wise selection pressure. As more mutations occur and, ideally, when SARS-CoV-2-like viruses from an intermediate host species are identified, it will become possible to model selection pressure as was done for SARS-CoV.

Maximum likelihood trees built with IQtree (19) . We curated 11 early-to-mid epidemic SARS-CoV genomes, 32 late epidemic SARS-CoV genomes, and 46 SARS-CoV-2 genomes consisting of a December, 2019 Wuhan-Hu-1 isolate and 15 isolates from each month of January, February, and March, 2020. (B) Tip-to-tip distance of each tree: SARS-CoV-2 (red) is less polymorphic than early-to-mid epidemic (blue) SARS-CoV over similar 3-month periods based on the current sampling approach (resampling test, p < 0.01). (C) Distribution of pairwise non-synonymous (dN) and synonymous (dS) substitution rates in the Spike, S RBD, Orf1a, Orf1b, Orf3a, and N gene across 151 SARS-CoV-2 genomes: 50 from each month of January, February, and March, 2020, in addition to the Wuhan-Hu-1 isolate.

An examination of 43 SARS-CoV and 46 SARS-CoV-2 genomes revealed a striking difference in the number of substitutions over similar 3-month periods ( Figure 1A) , with more genetic polymorphism in early-to-mid epidemic SARS-CoV compared to late epidemic SARS-CoV or SARS-CoV-2 ( Figure 1B) . To rule out a subsampling artefact, we performed one hundred sampling experiments from all high-quality SARS-CoV-2 sequences on GISAID; IQtree phylogenies were built from the one hundred 46-taxon subsampled sequence sets, i.e., 15 randomly selected samples from each month of January through March, 2020 from diverse geographical regions, in addition to Wuhan-Hu-1. The maximum tip-to-tip distance (number of substitutions between two genomes) of the early-to-mid epidemic SARS-CoV tree (~85 substitutions) was greater than that of all one hundred resampled SARS-CoV-2 trees (~15-25 substitutions; p < 0.01). Even by April 28, 2020, the SARS-CoV-2 genomes available on GISAID spanning 4 months exhibited modest genetic diversity (Supplementary Figure 1 ) as compared to early-to-mid epidemic SARS-CoV.

A caveat of this analysis is that genetic diversity can be inflated by sampling from diverse or high traffic locations and be skewed by factors such as effective population size and virus transmission rate; whereas a sampling bias towards isolates sharing a more recent common ancestor will underestimate genetic diversity (20) (21) (22) . For this reason, we sampled within similar 3-month periods and ensured that there was geographic spread in the sampling. Unfortunately, due to the scarcity of 2003 SARS-CoV samples and information, we cannot ensure that the early-to-mid epidemic samples did not straddle deep splits in the tree. Nonetheless, the SARS-CoV genomes used in our analysis have been used in dozens of studies to examine SARS-CoV adaptive evolution. The division of SARS-CoV cases into early-to-mid versus late phase epidemiologicallylinked clusters has been validated (17) . Furthermore, the late epidemic SARS-CoV and SARS-CoV-2 samples are from more numerous, international locations compared to early-to-mid epidemic SARS-CoV, which consists of infections only in China. The SARS-CoV outbreak spanning November, 2002 to August, 2003 was estimated to result in 8,422 cases (23) as compared to the more than 850,000 known SARS-CoV-2 cases by the end of March (and more than 3 million cases by the end of April). A considerable portion of SARS-CoV-2 cases are asymptomatic or mild, leading to an underestimation of the virus population size. Yet, early-tomid epidemic SARS-CoV, which was sampled from a more limited population in a more limited location, exhibits the most genetic diversity.

We proceeded to compare the evolutionary dynamics of SARS-CoV and SARS-CoV-2 in terms of the non-synonymous and synonymous substitution rates (dN and dS) in each gene. Nonsynonymous substitutions, as compared to synonymous substitutions, are generally more likely to result in functionally distinct variants. Therefore, dN and dS have commonly been used to model selective pressure on each gene, and were used to determine that the spike (S), Orf3a, and Orf1a genes experienced strong selective pressure in the SARS-CoV epidemic (15) (16) (17) 24) . Importantly, the S protein binds to host receptors and influences host specificity, while the Orf3a-encoded accessory protein facilitates the endocytosis of S (25, 26) . Orf3a and S have been proposed to share a co-evolutionary relationship (27, 28) . We sampled 50 SARS-CoV-2 genomes from each month of January, February, and March, 2020 from diverse locations in addition to Wuhan-Hu-1.

For the spike (S), Orf3a, and Orf1a genes, the dN and dS in SARS-CoV-2 is more similar to late epidemic than early-to-mid epidemic SARS-CoV ( Figure 1C, Figure 2 ). In comparison, the highly conserved Orf1b (encodes RNA-dependent RNA polymerase RdRp and helicase Hel), which did not undergo strong positive selection in SARS-CoV (15), exhibits similarly low dN across the three CoV groups (Figure 1C, Figure 2 ).

In consideration that several therapies and antibodies in development target the SARS-CoV-2 S, it is important to track non-synonymous substitutions and predict the evolution of resistance. We analyzed the non-synonymous substitutions that occurred in the S of SARS-CoV and SARS-CoV-2 over the course of each epidemic. Numerous adaptive mutations that evolved in SARS-CoV S RBD have been experimentally demonstrated to enhance binding to the human ACE2 receptor and facilitate cross-species transmission, e.g., residues N479 and T487 (29, 30) , as well as K390, R426, D429, T431, I455, N473, F483, Q492, Y494, R495 (31); or predicted to have been positively selected, e.g. residues 239, 244, 311, 479, 778 (17) (Figure 3 ). In contrast, the majority of the non-synonymous substitutions in SARS-CoV-2 S are distributed across the gene at low frequency and have not been reported to confer adaptive benefit (Figure 4 ). Yet, the SARS-CoV-2 S has been demonstrated to bind more strongly to human ACE2 and has a superior plasma membrane fusion capacity compared to the SARS-CoV S (32, 33) . The only site of notable entropy in the SARS-CoV-2 S, D614G, lies outside of the RBD and is not predicted to impact the structure or function of the protein (34) . Its prevalence in international COVID-19 cases has been attributed to the substitution occurring early in the pandemic leading to a founder's effect. There is no evidence of a more virulent strain of SARS-CoV-2 emerging despite passage through more than 

Speculations that pangolins are the likely intermediate animal host stemmed from the discovery of a pangolin CoV that shares 95.4% S amino acid identity and six key RBD residues with SARS-CoV-2 (40) . Since then, another closely related lineage of pangolin CoVs has been identified (41) .

However, the unique polybasic furin cleavage site in the SARS-CoV-2 S is not found in pangolin CoVs (42) , and SARS-CoV-2 is not a recent recombinant involving any of the CoVs sampled to date (41, 43, 44) . The CoV that is most closely related to SARS-CoV-2 is RaTG13, a bat CoV that was identified at the Wuhan Institute of Virology and originally isolated from the Yunnan Province of China (45) . RaTG13 shares 96.2% genome identity with the Wuhan-Hu-1 SARS-CoV-2 isolate.

In comparison, the most closely related pangolin CoV MP789 shares only 84.1% and 84.0% genome identity with Wuhan-Hu-1 and RaTG13, respectively. No evidence as yet points to the adaptation of SARS-CoV-2 for human infection in pangolins or the transmission of SARS-CoV-2 from pangolins to humans.

In addition, it is plausible for SARS-CoV-2 S to have evolved its broad species tropism naturally in bats or a wide range of intermediate species. The SARS-CoV-2 S is predicted to bind to ACE2 from potentially more than 100 diverse species (46) (47) (48) , and was demonstrated to bind more strongly than the SARS-CoV S to ACE2 from both bat and human (33) . The S of RaTG13 is also capable of binding to human ACE2 although the virus does not infect humans (49) . Similarly, the S of human MERS-CoV was found to bind to receptors from humans, camels, and bats, and could adapt to semi-permissive host receptors within three passages in cell culture (50) . Therefore, although no sampled bat CoVs have been found to possess a SARS-CoV-2-like S RBD, these findings collectively suggest that some CoVs in nature are evolving S that can bind at an optimal level to the same receptor across diverse species (43) , potentially by interfacing with highly conserved parts of the receptor. As other groups have recommended, CoV sampling from more species -to avoid bias stemming from the focused scrutiny of Malayan pangolins -will provide us with a better grasp of the range of species that harbor CoVs with similar RBDs to SARS-CoV-2, as well as the natural diversity of bat CoVs (43) .

There has been considerable debate among scientists and the public on whether SARS-CoV-2 originated from the Wuhan Huanan seafood market (2) . According to the Chinese CDC's website, 

The lack of definitive evidence to verify or rule out adaptation in an intermediate host species, humans, or a laboratory, means that we need to take precautions against each scenario to prevent re-emergence. We would like to advocate for measured and effective approaches to identify any lingering population(s) of SARS-CoV-2 progenitor virus, particularly if these are similarly adept at human transmission. The response to the first SARS-CoV outbreak deployed the following strategies that were key to detecting SARS-CoV adaptation to humans and cross-species transmission, and could be re-applied in today's outbreak to swiftly eliminate progenitor pools: (i) Sampling animals from markets, farms, and wild populations for SARS-CoV-2-like viruses (38) .

(ii) Checking human samples banked months before late 2019 for SARS-CoV-2-like viruses or SARS-CoV-2-reactive antibodies to detect precursors circulating in humans (56) . In addition, sequencing more SARS-CoV-2 isolates from Wuhan, particularly early isolates if they still exist, could identify branches originating from a less human-adapted progenitor as was seen in the 2003 SARS-CoV outbreak. It would be curious if no precursors or branches of SARS-CoV-2 evolution are discovered in humans or animals. (iii) Evaluating the over-or underrepresentation of food handlers and animal traders among the index cases to determine if SARS-CoV-2 precursors may have been circulating in the animal trading community (57) . While these investigations are conducted, it would be safer to more extensively limit human activity that leads to frequent or prolonged contact with wild animals and their habitats.

We thank contributors of SARS-CoV-2 genomes (GISAID) and SARS-CoV genomes (ViPR); 

Shing Hei Zhan is a Co-founder and lead bioinformatics scientist at Fusion Genomics Corporation, which develops molecular diagnostic assays for infectious diseases.

We analyzed genome sequences at least 29,000 bases in length of SARS-CoV from the 2002-2003 outbreak (early-to-mid phase and late phase) and SARS-CoV-2 from the 2019-2020 pandemic. The sequences of SARS-CoV and SARS-CoV-2 were downloaded from ViPR (1) and GISAID (2), respectively. Contributors of the SARS-CoV-2 genome sequences are listed in our GISAID Acknowledgments file. The sequences from cruise ship patients were excluded.

The sequences from the earliest part of the COVID-19 pandemic (December, 2019 in Wuhan)

were excluded because they were of poor quality (3). Additionally, SARS-CoV-2 sequences were removed from the selection pool if they contained undetermined or ambiguous bases after the ends were trimmed. The SARS-CoV-2 sequences were randomly selected such that the same number of sequences were taken from each month of collection and, within each month, the sequences were evenly sampled from each geographic region; the probability of each particular sequence being sampled was 1/(number of sequences in country X * total number of countries). Using this sampling strategy, we derived sequence sets that are temporally and spatially well represented.

Multiple sequence alignments were built for the SARS-CoV sequences and SARS-CoV-2 sequences separately using MAFFT version 7.453 (4) with the '--auto' option. Then, phylogenetic trees were reconstructed using IQtree version 1.6.12 (5) with automatic model selection using ModelFinder (6) . The GD01 (GenBank: AY278489) and Wuhan-Hu-1 (GISAID: EPI_ISL_402125) sequences were used to root the SARS-CoV and SARS-CoV-2 genome phylogenies, respectively. The GD01 sequence was obtained from one of the early patients of the 2003 outbreak and carries the 29-nt Orf8 region found in palm civet hosts of SARS-CoV but was lost during human-to-human transmission of the virus (7). The Wuhan-Hu-1 sequence is a high-quality genome sequence from an early patient of the COVID-19 pandemic (8) . The phylogenetic trees were processed and visualized using the R packages ape version 5.3 (9), adephylo version 1.1-11 (10) , and ggtree version 2.0.4 (11) .

Nonsynonymous and synonymous substitution rate (dN and dS) were estimated under pairwise maximum-likelihood comparisons (12) using codeml, which is part of the PAML version 4.8 package (13) . The SARS-CoV sequence set was split into two sets corresponding to the earlyto-mid phase and the late phase of the 2003 outbreak. Indels were not considered in our analysis (see Supplementary Discussion on Indels and Orf8 deletion). It is currently impractical to assess the site-wise selective pressure on SARS-CoV-2 genes due to the low genetic diversity. In particular, smaller genes are less amenable to dN and dS analysis within short time frames with limited genetic diversity. Nonetheless, we have plotted additional dN and dS plots in Supplementary Figures 2-4 , but these should be interpreted with caution. Mainly, adding the most closely related animal CoVs to the analysis did not change our findings and potentially added a skew in the case of bat CoV RaTG13, which shares only ~96.2% genome identity with SARS-CoV-2 isolates (Supplementary Figures 2-4) .

In Supplementary Figures 2 and 4 

We found available sequence data for five environmental samples collected on January 1, 2020, from the Huanan seafood market. Unfortunately, the quality of these samples cannot be ascertained because the raw sequencing data is not available. Single nucleotide differences or differences near the ends of fragments could stem from low sequencing quality or coverage. We compared the environmental sample sequences to the human Wuhan-Hu-1 SARS-CoV-2 genome reference from December, 2019 (8):

Sample EPI_ISL_408511, with sequences totaling 28,557-nt, shared 99.97% genome identity and 99.97% S gene identity with Wuhan-Hu-1. Sample EPI_ISL_408512, with sequences totaling 25,342-nt, shared 99.87% genome identity with Wuhan-Hu-1. The fragments covering 3585-nt of the 3822-nt S gene were ~99.9% identical. Sample EPI_ISL_408513 had a 439-nt sequence from its S gene, which was 100% identical to the Wuhan genome reference. The other ~600-nt in fragments were visibly low in quality and mapped to Orf1b, a highly conserved region in SARS-like-CoVs. It was not useful to analyze these low-quality short sequences.

Samples EPI_ISL_408514 and 408515 each had sequences totaling 29,891-nt sharing 99.99% genome identity and 100% S gene identity with Wuhan-Hu-1. The two samples were 99.99% identical to each other.

Many of the single nucleotide differences came from ambiguous base calls, for example, R or Y, in the environmental sample sequences. If these were excluded from our alignments, the percent identity between the environmental samples and Wuhan-Hu-1 would increase.

However, the lack of access to the raw sequencing data prevents us from assessing the reliability of the sequence similarities or disparities. There is also no information about what these environmental samples were derived from, e.g., type of shop (live-animal, meat), species of animals in that area, type of surface the sample was collected from, or whether COVID-19 human cases frequented that area before the samples were collected on January 1, 2020.

The SARS-CoV 29-nt deletion or the deletion of Orf8 was found to be detrimental to virus replication across mammalian cell lines, including primate and human cell lines (14) . Muth et al. (16) . However, in consideration of the complexity associated with deletions in Orf8 of SARS-CoVs, we opted to exclude the analysis of indels in our work due to the inability to discern whether these indels could be perpetuated due to founder effects rather than a true adaptive benefit. 

But That May Not Be A Problem For Humans

How did coronavirus start and where did it come from? Was it really Wuhan's animal market? The Guardian

SARS-CoV-2 Genomes Let Researchers Retrace Viral Spread, Mitigation Effects

How Coronavirus Mutates and Spreads. The New York Times

Structure of the RNA-dependent RNA polymerase from COVID-19 virus

Structure of Mpro from COVID-19 virus and discovery of its inhibitors

Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors

Structure of the SARS-CoV-2 spike receptor-binding domain bound to the ACE2 receptor

Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2

A highly conserved cryptic epitope in the receptor-binding domains of SARS-CoV-2 and SARS-CoV

Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation

Moderate mutation rate in the SARS coronavirus genome and its implications

Molecular Evolutionary Genetics. Columbia University Press

PopGenome: an efficient Swiss army knife for population genomic analyses in R

Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China

Differential stepwise evolution of SARS coronavirus functional proteins in different host species

Adaptive evolution of the spike gene of SARS coronavirus: changes in positively selected sites in different epidemic groups

Cross-host evolution of severe acute respiratory syndrome coronavirus in palm civet and human

IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies

Ecological and Evolutionary Processes Shaping Viral Genetic Diversity

González-Candelas F. The population genetics and evolutionary epidemiology of RNA viruses

Virus population bottlenecks during within-host progression and host-to-host transmission

WHO | Summary table of SARS cases by country

Characterization of severe acute respiratory syndrome coronavirus genomes in Taiwan: molecular epidemiology and genome evolution

The Severe Acute Respiratory Syndrome (SARS)-coronavirus 3a protein may function as a modulator of the trafficking properties of the spike protein

A novel severe acute respiratory syndrome coronavirus protein, U274, is transported to the cell surface and undergoes endocytosis

Strong evolutionary convergence of receptor-binding protein spike between COVID-19 and SARS-related coronaviruses

Characterization of the 3a protein of SARS-associated coronavirus in infected vero E6 cells and SARS patients

Receptor and viral determinants of SARS-coronavirus adaptation to human ACE2

Identification of two critical amino acid residues of the severe acute respiratory syndrome coronavirus spike protein for its variation in zoonotic tropism transition via a double substitution strategy

The SARS coronavirus S glycoprotein receptor binding domain: fine mapping and functional characterization

Inhibition of SARS-CoV-2 (previously 2019-nCoV) infection by a highly potent pan-coronavirus fusion inhibitor targeting its spike protein that harbors a high capacity to mediate membrane fusion

Characterization of the receptorbinding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine

Whole-Genome Sequence of the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) obtained from a South African Coronavirus Disease 2019 (COVID-19) Patient

Robust statistics, ser

WHO | Update 95 -SARS: Chronology of a serial killer

SARS-CoV infection in a restaurant from palm civet. Emerg Infect Dis

Review of bats and SARS. Emerg Infect Dis

The Washington Post

Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 Outbreak

Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins

Evolutionary history, potential intermediate animal host, and cross-species analyses of SARS-CoV-2

Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic

Full-genome evolutionary analysis of the novel corona virus (2019-nCoV) rejects the hypothesis of emergence as a result of a recent recombination event

A pneumonia outbreak associated with a new coronavirus of probable bat origin

Predicting the angiotensin converting enzyme 2 (ACE2) utilizing capability as the receptor of SARS-CoV-2. Microbes Infect

Exceptional diversity and selection pressure on SARS-CoV and SARS-CoV-2 host receptor in bats compared to other mammals

Broad Host Range of SARS-CoV-2 Predicted by Comparative and Structural Analysis of ACE2 in Vertebrates

Structural basis of receptor recognition by SARS-CoV-2

Adaptive Evolution of MERS-CoV to Species Variation in DPP4

Isolation and characterization of viruses related to the SARS coronavirus from animals in southern China

Phylodynamic Analysis | 176 genomes | 6

Clock and TMRCA based on 27 genomes

Epidemiology and cause of severe acute respiratory syndrome (SARS) in Guangdong, People's Republic of China

Prevalence of IgG antibody to SARSassociated coronavirus in animal traders

No intermediate animal host CoVs have been identified for SARS-CoV-2. The dN/dS calculations were performed using the most closely related animal CoVs: palm civet SARS-CoVs from 2003 (99.76% genome identity) alongside the human SARS-CoVs, and bat CoV RaTG13 (96.2% genome identity) alongside the human SARS-CoV-2s. Supplementary References 1. Virus Pathogen Database and Analysis Resource (ViPR) -Genome database with visualization and analysis tools

disease and diplomacy: GISAID's innovative contribution to global health. Glob Chall

Phylodynamic Analysis | 176 genomes | 6

Parallelization of MAFFT for large-scale multiple sequence alignments

IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies

ModelFinder: fast model selection for accurate phylogenetic estimates

A genome sequence of novel SARS-CoV isolates: the genotype, GD-Ins29, leads to a hypothesis of viral transmission in South China

A new coronavirus associated with human respiratory disease in China

APE: Analyses of Phylogenetics and Evolution in R language

new tools for investigating the phylogenetic signal in biological traits

ggtree : an r package for visualization and annotation of phylogenetic trees with their covariates and other associated data

A codon-based model of nucleotide substitution for proteincoding DNA sequences

PAML 4: phylogenetic analysis by maximum likelihood

Attenuation of replication by a 29 nucleotide deletion in SARS-coronavirus acquired during the early stages of human-to-human transmission

Molecular evolution of the SARS coronavirus during the course of the SARS epidemic in China

Discovery of a 382-nt deletion during the early evolution of SARS-CoV-2

Nextstrain: real-time tracking of pathogen evolution