key: cord-0053528-pusyqtbz authors: Farah, Sameera; Atkulwar, Ashwin; Praharaj, Manas Ranjan; Khan, Raja; Gandham, Ravikumar; Baig, Mumtaz title: Phylogenomics and phylodynamics of SARS-CoV-2 genomes retrieved from India date: 2020-11-30 journal: nan DOI: 10.2217/fvl-2020-0243 sha: 87af9ffacde6aa93c2aa29e2252c584a920f5d67 doc_id: 53528 cord_uid: pusyqtbz Background: This is the first phylodynamic study attempted on SARS-CoV-2 genomes from India to infer the current state of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) evolution using phylogenetic network and growth trends. Materials & Methods: Out of 286 retrieved whole genomes from India, 138 haplotypes were used to build a phylogenetic network. The birth–death serial model (BDSIR) package of BEAST2 was used to calculate the reproduction number of SARS-CoV-2. Population dynamics were investigated using the stamp date method as implemented in BEAST2 and BEAST 1.10.4. Results: A median-joining network revealed two ancestral clusters. A high basic reproduction number of SARS-CoV-2 was found. An exponential rise in the effective population size of Indian isolates was detected. Conclusion: The phylogenetic network reveals dual ancestry and possibility of community transmission of SARS-CoV-2 in India. the origin and events of community transmission of the virus in their countries [21] . Genomic studies have already proved crucial in contact tracing of infection and would become more important in the second wave of infection after the release of lockdown in the ongoing covid19 pandemic [22] . In this study, we analyzed 121 whole genomes of SARS-CoV-2 from India to infer the phylogeography of SARS-CoV-2 within India, infection genomics to estimate the reproduction number (R0) and infection rate and the past and present evolutionary trajectory of SARS-CoV-2. Whole-genome sequence information on SARS-CoV-2 isolates from India, available in the GISAID database until 4 May 2020, was retrieved. Out of 286 SARS-CoV-2 genomes, samples with missing information and gaps were discarded to arrive at a total of 219 Indian SARS-CoV-2 genomes. These genomes were aligned using Clustal Omega [23] and the number of haplotypes was determined using DnaSP v6 [24] . Based on the emergence of 138 haplotypes, the resulting dataset was further used for downstream analyses. In evolutionary studies, phylogenetic networking is becoming a method of choice for reconstructing evolutionary pathways in many species. The Median-Joining Network (MJN) is one such algorithm developed to reconstruct the unambiguous evolutionary history of species [25] . An MJN was constructed with 138 genomes covering all major states of India (Supplementary Table 1 ). The birth-death serial model (BDSIR), as implemented in the BEAST2 package, was used to estimate the effective R0 of SARS-CoV-2 in India [26, 27] . In BEAST2, an HKY nucleotide substitution model with a gamma category count of 4, a relaxed lognormal clock with a clock rate of 8.3E-5 subs/site/month corresponding to 1 × 10 -3 subs/site/year [25] were applied. In Markov Chain Monte Carlo (MCMC) analysis, parameters were sampled every 1000 generations over a total of 10 million generations. The basic R0 and BecomeUinfectiousRate were set with the mean distribution to 18.00, assuming a mean recovery time between 18 and 20 days. BEAST v1.10.4, with similar settings to those of the HKY nucleotide substitution model with a gamma category count of 4, a relaxed lognormal clock, and a clock rate of 8.3E-5 subs/site/month corresponding to 1 × 10 -3 subs/site/year [28] was utilized to reconstruct the evolutionary dynamics of SARS-CoV-2. The tree prior was set to the coalescent exponential growth to calculate the growth rates of the virus in India. The effective sample sizes and 95% highest posterior density intervals for parameters like the basic R0, BecomeUinfectiousRate, growth rate and the demographic reconstruction of growth rates with exponential growth priors were inspected using Tracer v1.7.0. The trees file was summarized in TreeAnnotator by setting the burnin percentage to ten and the target tree type to maximum clade creditability tree, while the node heights were set to mean heights. Table 2 ). The stamped-date method with HKY nucleotide substitution as a model coupled with 4 gamma category and coalescent Bayesian skyline tree priors were set for the analysis. The MCMC chain length of 10 million steps was applied, the first 10% were discarded as burn-in, and a strict clock rate of 8.33E-5 subs/per/site/per month was used. The log file and tree log file were analyzed to draw the BSP in Tracer v1.7.0. The MJN showed the occurrence of ancestral clusters alongside their newly mutated daughter clusters and haplotypes. The network was defined by two main clusters marked as 'A' and 'B' (Figure 1 ). Both A and B clusters showed linkage with the Wuhan, China outbreak haplotype (EPI ISL 406798) of 26 December 2019. Cluster A, which was dominant in Gujarat, differed by a single median vector from the Wuhan haplotype as compared with cluster B, dominant in south India, which differed by two median vectors. In a biological network, median vectors are interpreted either as unsampled or extinct individuals. Based on this relationship, cluster A was considered as the ancestral node. Genomes of unknown origin contributed by National Institute of Virology, Pune showed greater affinity with SARS-CoV-2 genomes from Wuhan and Ladakh and were linked to ancestral cluster A. However, genome isolates from West Bengal illustrated affinities with both clusters. Of note, haplotypes from the worst hit Maharashtra state, including Mumbai, exhibited closer relatedness to cluster B. Further, the SARS-CoV-2 haplotypes from Delhi were more widespread in distribution, while those from Madhya Pradesh displayed closer relatedness to ancestral cluster A. Many daughter haplotypes accumulated 1-4 mutations that were derived from clusters A and B. In one instance in the network, a daughter cluster derived from cluster B illustrated sharing of haplotypes from Delhi, Telangana and Assam and can potentially be considered as a third minor subcluster. This subcluster further showed the emergence of newer haplotypes by accumulating mutations in the range of 2-3 The population dynamics of Indian isolates exhibited a sigmoidal type of distribution, with exponential growth of the Ne starting from the last week of January to the second week of February ( Figure 3A ). The pandemic peaked in India around the second week of February and plateaued between the last week of February and 4 May 2020. However, growth rate curve reconstruction using an exponential growth tree prior depicted a continuous increase in the effective population from the last week of January-4 May 2020 ( Figure 3B ). Our phylodynamic analysis confirms the high effective R0 and deaths recorded in India during this period. Most probably, the occurrence of a plateau phase in the BSP after the last week of February resulted from the nationwide lockdown and social distancing measures ( Figure 3A) . Likewise, some findings also suggest that exposure to high temperatures also contributed to the lowering of activity and the lifespan of SARS-CoV-2 [29] [30] [31] . In another recently published study, physical distancing measures taken in Wuhan, China beginning in April played a significant role in lowering the reproductive number of virus [32] . Exponential growth was also confirmed in the population dynamics, which was congruent with the continuous rise of SARS-CoV-2 infections in India. To conclude, our study provides baseline genome-based phylodynamic information, highlighting genetic affinities between viral isolates sequenced from the major states of India. In the coming days, sequencing and analyses of greater numbers of SARS-CoV-2 genomes from India would help in dealing with the second wave of community transmission after relaxation of the lockdown. At the same time, genomic information produced through such studies can also be utilized to fill the gaps created due to unrealistic assumptions, lack of contact tracing, sampling errors and limited diagnostic testing. • The research presented in this study cast light on the phylogenomics and phylodynamics of SARS-CoV-2 genomes retrieved from India. • A total of 286 SARS-CoV-2 whole genomes deposited from 26 December 2019 to 4 May 2020, representing all major regions of India, were analyzed. • Out of 286 retrieved whole genomes, a total of 138 haplotypes were identified and used to build a phylogenetic network using the birth-death serial model (BDSIR) package of BEAST2. The reproduction number (R0) was also calculated using the same dataset. • The population dynamics were also investigated using the stamp date method of constant coalescence as well as exponential growth models as implemented in BEAST2 and BEAST 1. To view the supplementary data that accompany this paper please visit the journal website at: www.futuremedicine.com/doi/suppl/10.2217/fvl-2020-0243 Author contributions S Farah and A Atkulwar retrieved and analyzed the data, MR Praharaj and R Khan assisted in data analysis, R Gandham and M Baig wrote the manuscript, study was conceptualized by M Baig. The geography and mortality of the 1918 influenza pandemic Reassessing the Global Mortality Burden of the 1918 Influenza Pandemic SARS-CoV-2 and COVID-19: facing the pandemic together as citizens and cardiovascular practitioners Corona virus genomics and bioinformatics analysis Genomic characterisation and epidemiology of 2019 novel corona virus: implications for virus origins and receptor binding Genomic characterization of the 2019 novel human-pathogenic corona virus isolated from a patient with atypical pneumonia after visiting Wuhan Identifying SARS-CoV-2 related corona viruses in Malayan pangolins Isolation of a porcine respiratory, non-enteric corona virus related to transmissible gastroenteritis Bats are natural reservoirs of SARS-like corona viruses Bat origin of human corona viruses Recombination, reservoirs, and the modular spike: mechanisms of corona virus cross-species transmission Structure, function, and evolution of corona virus spike proteins From SARS to MERS: 10 years of research on highly pathogenic human coronaviruses From SARS to MERS: evidence and speculation Emergence of the Middle East respiratory syndrome coronavirus A pneumonia outbreak associated with a new corona virus of probable bat origin A new coronavirus associated with human respiratory disease in China Evidence of recombination in coronaviruses implicating pangolin origins of nCoV-2019 Isolation and characterization of 2019-nCoV-like coronavirus from Malayan pangolins Identification of 2019-nCoV related coronaviruses in Malayan pangolins in southern China A genomic perspective on the origin and emergence of SARS-CoV-2 The public health impact of a publicly available, environmental database of microbial genomes Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega DNA sequence polymorphism analysis of large datasets Median-joining networks for inferring intraspecific phylogenies Simultaneous reconstruction of evolutionary history and epidemiological dynamics from viral sequences with the birth-death SIR model Birth-death skyline plot reveals temporal changes of epidemic spread in HIV and hepatitis C virus (HCV) Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (COVID-19) Early dynamics of transmission and control COVID-19: a mathematical modelling study Temperature significant change COVID-19 Transmission in 429 cities Effects of temperature on COVID-19 transmission The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study We gratefully acknowledge all authors and submitting laboratories of the sequences from GISAID. S Farah and M Baig are thankful to Prof. Elizabeth Boulding and Compute Canada for providing access to high computation facilities at www.cedar.computecan ada.ca The authors have no relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript. This includes employment, consultancies, honoraria, stock ownership or options, expert testimony, grants or patents received or pending, or royalties.No writing assistance was utilized in the production of this manuscript.