key: cord-0775222-wim5q9a5 authors: Nour, Islam; Alanazi, Ibrahim O.; Hanif, Atif; Kohl, Alain; Eifan, Saleh title: Insights into molecular evolution recombination of pandemic SARS-CoV-2 using Saudi Arabian sequences date: 2020-05-13 journal: bioRxiv DOI: 10.1101/2020.05.13.093971 sha: 66b16f35496816d1091009e6df08945485e57ffc doc_id: 775222 cord_uid: wim5q9a5 The recently emerged SARS-CoV-2 (Coronaviridae; Betacoronavirus) is the underlying cause of COVID-19 disease. Here we assessed SARS-CoV2 from the Kingdom of Saudi Arabia alongside sequences of SARS-CoV, bat SARS-like CoVs and MERS-CoV, the latter currently detected in this region. Phylogenetic analysis, natural selection investigation and genome recombination analysis were performed. Our analysis showed that all Saudi SARS-CoV-2 sequences are of the same origin and closer proximity to bat SARS-like CoVs, followed by SARS-CoVs, however quite distant to MERS-CoV. Moreover, genome recombination analysis revealed two recombination events between SARS-CoV-2 and bat SARS-like CoVs. This was further assessed by S gene recombination analysis. These recombination events may be relevant to the emergence of this novel virus. Moreover, positive selection pressure was detected between SARS-CoV-2, bat SL-CoV isolates and human SARS-CoV isolates. However, the highest positive selection occurred between SARS-CoV-2 isolates and 2 bat-SL-CoV isolates (Bat-SL-RsSHC014 and Bat-SL-CoVZC45). This further indicates that SARS-CoV-2 isolates were adaptively evolved from bat SARS-like isolates, and that a virus with originating from bats triggered this pandemic. This study thuds sheds further light on the origin of this virus. AUTHOR SUMMARY The emergence and subsequent pandemic of SARS-CoV-2 is a unique challenge to countries all over the world, including Saudi Arabia where cases of the related MERS are still being reported. Saudi SARS-CoV-2 sequences were found to be likely of the same or similar origin. In our analysis, SARS-CoV-2 were more closely related to bat SARS-like CoVs rather than to MERS-CoV (which originated in Saudi Arabia) or SARS-CoV, confirming other phylogenetic efforts on this pathogen. Recombination and positive selection analysis further suggest that bat coronaviruses may be at the origin of SARS-CoV-2 sequences. The data shown here give hints on the origin of this virus and may inform efforts on transmissibility, host adaptation and other biological aspects of this virus. region. Phylogenetic analysis, natural selection investigation and genome recombination analysis 23 were performed. Our analysis showed that all Saudi SARS-CoV-2 sequences are of the same origin 24 and closer proximity to bat SARS-like CoVs, followed by SARS-CoVs, however quite distant to 25 MERS-CoV. Moreover, genome recombination analysis revealed two recombination events 26 between SARS-CoV-2 and bat SARS-like CoVs. This was further assessed by S gene recombination 27 analysis. These recombination events may be relevant to the emergence of this novel virus. 28 Moreover, positive selection pressure was detected between SARS-CoV-2, bat SL-CoV isolates 29 and human SARS-CoV isolates. However, the highest positive selection occurred between SARS- 30 CoV-2 isolates and 2 bat-SL-CoV isolates (Bat-SL-RsSHC014 and Bat-SL-CoVZC45). This further 31 indicates that SARS-CoV-2 isolates were adaptively evolved from bat SARS-like isolates, and that 32 a virus with originating from bats triggered this pandemic. This study thuds sheds further light on 33 the origin of this virus. 16, 17] . Structurally, S is composed of two functional subunits essential for 79 binding to the host cell receptor (S1 subunit) and virus-host cell fusion (S2 subunit) [18] . The S1 breakpoints existed in a single sequence, the sequence region between the breakpoints was 130 denoted the "minor" region, triggered by the minor parent, while the remaining part is called the 131 "major" region, provoked by the major parent. As a consequence, Neighbor-joining phylogenetic 132 trees were generated to display the probable topological shifts of specific sequences. 133 Phylogenetic discrepancy is revealed by a putative recombinant whose distance in the phylogeny 134 is obviously close to a single parent whilst far from another for each sequence segment [27] . Since both recombination events appeared in the S gene region, sequences of S genes of the 13 234 CoVs were extracted for multiple alignment using ClustalW, followed by finding the best 235 substitution model to be implemented in the phylogenetic analysis. GTR and TN93 models were 236 the best fitting owing to achieving the least BIC of 51289.85 and 51325.49, respectively. 237 Consequently, the phylogenetic tree was constructed using TN93 model and although, it was 238 constructed using the NJ method (Fig 4) , and the obtained tree was consistent with the tree Next, a molecular clock analysis was carried out using the ML method to examine if the S gene of 294 the 13 isolates used in the current study have the same evolutionary rate throughout the tree. It 295 was found that the strains are not evolving at similar rate indicated by rejection of the null 296 hypothesis of equal evolutionary rate throughout the tree at a 5% significance level (P = 297 0.000E+000) as shown in Table 5 . 298 Table 5 . Molecular clock analysis of S gene using the ML method. A novel coronavirus outbreak of global health 420 concern A familial cluster of pneumonia associated with the 2019 422 novel coronavirus indicating person-to-person transmission: a study of a family cluster Severe acute respiratory syndrome Coronavirus: covid-19 has killed more people than SARS and MERS 427 combined, despite lower case fatality rate Middle East respiratory syndrome coronavirus infection 429 in dromedary camels in Saudi Arabia Transmission 431 and evolution of the Middle East respiratory syndrome coronavirus in Saudi Arabia: a 432 descriptive genomic study Commentary: Middle East respiratory syndrome 435 coronavirus (MERS-CoV): announcement of the Coronavirus Study Group Severe acute respiratory syndrome coronavirus 438 phylogeny: toward consensus The species Severe 440 acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-441 Emerging coronaviruses: genome structure, replication, and 443 pathogenesis Genomic characterisation and epidemiology of 445 2019 novel coronavirus: implications for virus origins and receptor binding A 448 new coronavirus associated with human respiratory disease in China InRoles of Host Gene and Non-coding RNA Expression in Virus Infection Origin and evolution of pathogenic coronaviruses. Nature reviews 454 Microbiology Inhibition of SARS-CoV-2 infections in engineered human 457 tissues using clinical-grade soluble human ACE2 Epidemiological and 459 genetic analysis of severe acute respiratory syndrome Analysis of angiotensin-converting enzyme 2 (ACE2) from different 461 species sheds some light on cross-species receptor usage of a novel coronavirus CoV-2 (previously 2019-nCoV) infection by a highly potent pan-coronavirus fusion inhibitor 465 targeting its spike protein that harbors a high capacity to mediate membrane fusion A 468 pneumonia outbreak associated with a new coronavirus of probable bat origin Probable pangolin origin of SARS-CoV-2 associated with the COVID-471 19 outbreak Coronavirus Disease 2019 (COVID-19): what we know MEGA X: molecular evolutionary genetics 475 analysis across computing platforms. Molecular biology and evolution RDP4: Detection and analysis of 478 recombination patterns in virus genomes. Virus evolution Codon usage bias and recombination events for neuraminidase and 480 hemagglutinin genes in Chinese isolates of influenza A virus subtype H9N2. Archives of 481 virology A modified bootscan algorithm for 483 automated identification of recombinant sequences and recombination breakpoints. AIDS 484 Research & Human Retroviruses Homologous recombination is very rare or 486 absent in human influenza A virus Extensive homologous recombination in classical swine fever virus: a re-488 evaluation of homologous recombination events in the strain AF407339. Saudi journal of 489 biological sciences Molecular Evolution and Phylogenetics SARS-CoV sequence characteristics and evolutionary 493 rate estimate from maximum likelihood analysis Discovery of a novel coronavirus associated with the 495 recent pneumonia outbreak in humans and its potential bat origin The global spread of 2019-nCoV: a molecular evolutionary analysis. Pathogens 499 and Global Health The proximal origin of SARS-CoV-501 2. Nature medicine Efficiencies of fast algorithms of phylogenetic inference under the 503 criteria of maximum parsimony, minimum evolution, and maximum likelihood when a large 504 number of sequences are used Relative efficiencies of the Fitch-Margoliash, maximum-parsimony, 506 maximum-likelihood, minimum-evolution, and neighbor-joining methods of phylogenetic 507 tree construction in obtaining the correct tree Emergence of a group 3 coronavirus 510 through recombination Epidemiology, genetic 512 recombination, and pathogenesis of coronaviruses Evidence for the Selective Basis of Transition-to-Transversion 515 Substitution Bias in Two RNA Viruses Zhang 517 YJ. Isolation and characterization of a bat SARS Emergence 520 of SARS-CoV-2 through Recombination and Strong Purifying Selection. bioRxiv Structure, function, and evolution of coronavirus spike proteins Bat-to-human: spike features determining 'host jump' of 524 coronaviruses SARS-CoV, MERS-CoV, and beyond The heptad 526 repeat region is a major selection target in MERS-CoV and related coronaviruses. Scientific 527 reports Re-insights into origin and 529 adaptation of SARS-CoV-2. bioRxiv bat-SL-CoV_YNLF_34C, Bat-SL-CoV_As6: bat-SL Phylogenetic tree of recombination event in Saudi SARS-CoV-2 S sequences Phylogenies of the major parental region (1-2094 and 2349-4075) and (B) minor parental 567 region (2095-2348). Phylogenies were estimated using UPGMA. The scale bar represents the 568 number of substitutions per site Bat-SL-CoV_As6: bat-SL-CoV_As6526, bat-SL-CoV_Lon: bat-572 SL-CoV_Longquan-140, Bat-SL-CoV_YNL: bat-SL-CoV_YNLF_34C Saudi Arabia/SCDC-3324/2020 (SARS-CoV-2)