key: cord-0860898-ebpquw9d authors: Obajuluwa, Adejoke Olukayode; Okiki, Pius Abimbola; Obajuluwa, Tiwalola Madoc; Afolabi, Olakunle Bamikole title: In-silico nucleotide and protein analyses of S-gene region in selected zoonotic coronaviruses reveal conserved domains and evolutionary emergence with trajectory course of viral entry from SARS-CoV-2 genomic data date: 2020-11-30 journal: Pan Afr Med J DOI: 10.11604/pamj.2020.37.285.24663 sha: 90ddbafb62d97b6732fef745a5ce5d640d592b96 doc_id: 860898 cord_uid: ebpquw9d INTRODUCTION: the recent zoonotic coronavirus virus outbreak of a novel type (COVID-19) has necessitated the adequate understanding of the evolutionary pathway of zoonotic viruses which adversely affects human populations for therapeutic constructs to combat the pandemic now and in the future. METHODS: we analyzed conserved domains of the severe acute respiratory coronavirus 2 (SARS-CoV-2) for possible targets of viral entry inhibition in host cells, evolutionary relationship of human coronavirus (229E) and zoonotic coronaviruses with SARS-CoV-2 as well as evolutionary relationship between selected SARS-CoV-2 genomic data. RESULTS: conserved domains with antagonistic action on host innate antiviral cellular mechanisms in SARS-CoV-2 include nsp 11, nsp 13 etc. Also, multiple sequence alignments of the spike (S) gene protein of selected candidate zoonotic coronaviruses alongside the S gene protein of the SARS-CoV-2 revealed closest evolutionary relationship (95.6%) with pangolin coronaviruses (S) gene. Clades formed between Wuhan SARS-CoV-2 phylogeny data and five others suggests viral entry trajectory while revealing genomic and protein SARS-CoV-2 data from Philippines as early ancestors. CONCLUSION: phylogeny of SARS-CoV-2 genomic data suggests profiling in diverse populations with and without the outbreak alongside migration history and racial background for mutation tracking and dating of viral subtype divergence which is essential for effective management of present and future zoonotic coronavirus outbreaks. Coronaviruses (CoVs) are enveloped viruses with a positive-sense, single-stranded RNA genome belonging to the coronaviridae family [1] . CoVs are divided into alpha, beta, gamma and delta groups and the beta group is further composed of A, B, C and D subgroups [2] . The virus belongs to the 2B group of the beta-coronavirus family, which includes SARS-CoV and Middle East respiratory syndrome coronavirus MERS-CoV [3] . Their entry into respiratory and oesophageal routes accounts for mild to severe acute respiratory syndromes which has led to global epidemics with high morbidity, mortality and immense economic losses in affected human populations [4, 5] . Encoded within the 3' end of the viral genome are the four main structural proteins of coronavirus particles: spike (S), membrane (M), envelope (E) and nucleocapsid (N) [6] as shown in Figure 1 . Phylogenetic analyses of 15 human CoV whole genomes revealed 2019 novel CoV (2019-nCoV) genome shares highest nucleotide sequence identity with SARS-CoV (79.7%) while its two evolutionarily conserved regions (envelope and nucleocapsid proteins) had sequence homology of 96% and 89.6% with same respectively [3] . Hence, the nomenclature for the novel type of the coronavirus outbreak. Surface proteins which stick out like crown tips (spikes) on coronaviruses binds to host cell receptors-angiotensin converting enzyme 2 (ACE 2) in epithelial cells in hosts. The S1 subunit (N-terminal) of the surface protein facilitates binding to the ACE2 receptor while the S2 subunit (C-terminal) mediates host cell entry through the binding of the viral S protein to human dipeptidyl peptidase 4 (DPP4), marking onset of infection [7, 8] . Interestingly, conserved domains of CoVs have been indicated in literatures as vital entry targets in vaccine and drug development [9, 10] . However, growing variability and mutational changes in viruses can cause lack of specificity and reduce efficiency of therapeutic measures. Recombination serves central function in virus replication and evolution in viral infections such as HIV, Ebola and MERS [11, 12] while molecular mechanisms (RNA fragmentation and trans-esterification reactions) are possible causes of RNA fragments ligation and subsequent increased novel recombination frequency observed among various RNA viruses [13] . Diverse host factors account for a great deal of genome variability in viral recombinants which ranges from multi-resistance to evolutionary novelties [14] . The emergence of novel viral variants trafficked by humans and animals alike through global travel has remained a constant threat in public health and increasing complexity of host-viral interactivity in viral adaptation and evolution [15] . Comparison and analyses of conserved domain of 2019-nCoV/SARS-CoV-2 protein: reference number (initial entry with refSeq number NC_045512.1) SARS-CoV-2 was retrieved from National Centre for Biotechnological Information (NCBI) database and query for its conserved domains (CDS) was launched using affiliated resources. Proteins with similar conserved domains were included in the subsequent multiple sequence alignment of spike gene of zoonotic coronaviruses investigated in this study. . Their nucleotide and S gene protein sequences were pooled using NCBI resource tools while analysis was done using EMBOSS needle, clustal W2 and clustal omega respectively. Homology and phylogeny analysis of the S-protein genes in candidate zoonotic viruses: the identified spike gene protein sequences of animal coronaviruses were retrieved from submitted protein entries in NCBI database, homology analysis of the sequences was compared using clustal omega, EMBOSS needle while phylogenetic trees was constructed using the neighbor-joining method by CLUSTAL X software. in total, we culled the respective genomic and protein data of eight [8] 2019-nCoV/SARS-CoV-2 clinical isolates from beta coronaviruses database in NCBI and these are: [ [7] MT308703;QIV64975.1 (USA, April 2020) and [8] MT308704 (USA, April 2020). Whole-genome alignment and protein sequence identity calculation were performed using multiple sequence alignment in EMBL-EBI database with default parameters in clustal W2 and clustal omega respectively. Conserved domains in SARS-CoV-2: four out of 29 domain hits generated from 2019-nCoV/SARS-CoV-2 CDD query were selected based on the Evalue scores (Table 1 ). These are: non-structural protein (nsp 11), coronavirus RPolN terminus, nonstructural protein (nsp 13) and corona S2 super family. Protein phylogeny assembly of SARS-CoV-2 isolates: protein sequence alignment analyses reveals the closest evolutionarily conservation between 2019-nCoV/SARS-CoV-2 and pangolin S protein with 95.6% similarity and 92.1% identity while 46.8% similarity and 31.2% identity was observed between SARS-CoV-2 and bat S protein ( Figure 2 ). increased level of evolutionary divergence was observed in submitted entries of the recent SARS-CoV-2 genomic data during time of the study (entries from December 2019 till 4 th April) as seen in Figure 3 and Figure 4 , while evolutionary patterns observed between Wuhan SARS-CoV-2 data and other five geographical locations reveal trajectory of infection from reported source of outbreak. The region of 2019-nCoV domain which encodes nsp 11 spans from about 18046-19824bp. It was indicated in countering host innate antiviral response via inhibition of type I interferon (IFN) production using NendoU activity-dependent mechanisms in porcine reproductive syndrome viruses [16] . The nsp 11 is also associated with pathways such as programmed cell death evasion, mitogen-activated protein kinase signaling, histone-related, cell cycle and DNA replication and the ubiquitin-proteasome through RNA microarray analysis [17] [18] [19] [20] and few nsp 11 inhibitors include papain-like proteinase (plPRO) and 3C-like main protease-3CLpro [21] . Coronavirus RNA-directed RNA polymerase (RdRp) terminus covers the N-terminal region of the coronavirus. It spans from about 13480-14538bp in SARS-CoV-2 and its interaction with nsp3 has been indicated in viral replication especially during early onset of infection [22] . The inhibitors of coronavirus RdRp include ATP inhibitors with mfScores lower than 110 [21] . The nsp 13 is regarded as a highly conserved and multifunctional helicase unit and its spans from about 20662-21537 in the SARS-CoV-2 isolate [23] . They are SARS-CoV helicases that are chiefly concerned with RNA processing, DNA replication, recombination and repair, transcription and translation [24] . A few potential inhibitors of nsp13 have been identified [25, 26] and they act by interfering with its unwinding and ATPase activities. The coronavirus S2 super family spans from 23546-25372 and forms the characteristic 'corona' after which the group is named. CoV diversity is reflected in the variable spike proteins (S proteins) and evolves into forms differing in receptor interactions and response to various environmental triggers of virus-cell membrane fusion [27] . The C-terminal (S2) domain directs ectodomain fusion of all CoVs spike proteins following receptor binding [28, 29] . The level of interactions between the S protein and the virus receptor controls the host cell range [30] . A study showed a switch of species specificity via a mutant mouse hepatitis virus (MHV) construct which conferred horizontal gene transfer and ability to infect feline cells which were initially absent in wild MHV cells [30] . This was achieved via the substitution of the spike glycoprotein ectodomain. Another research [31] also indicated role of natural mutations in reactivity between the receptor binding domain of spike and crossneutralization between palm civet coronavirus and SARS-CoVs. Identification of the origin, natural host (s) and evolutionary pathway of viruses which causes pandemics is essential to understand molecular mechanism of their cross-species interactivity and implementation of a proper control measure [32] . Protein sequence alignment analyses reveals the closest evolutionarily conservation between 2019-nCoV/SARS-CoV-2 and pangolin S protein with 95.6% similarity and 92.1% identity while 46.8% similarity and 31.2% identity was observed between SARS-CoV-2 and bat S protein (supplementary data). This finding therefore agrees with reports indicating pangolin as a more recent ancestor of SARS-CoV-2 than bats [33, 34] which could have arisen as a result of recombination (chimera) or interactions between pangolin-CoV-like virus with a bat-CoV-RaTG13like virus going by the homology and subclade of SARS-CoV-2 and pangolin S genes from bat S-gene seen in this study ( Figure 2) . Although, some computational analyses prediction of the improbability of direct binding between receptor binding domains (RBDs) in SARS-CoV-2 and ACE2 in humans suggests otherwise [35, 36] , studies have shown demonstrations of cross-species interactivity through structural (in-silico), in-vitro and in-vivo mechanisms [31, [37] [38] [39] . Series of in-vivo and in-vitro RNA recombination leading to vast genetic variability of positive strand RNA viruses has also been reported [13] . Domestication, consumption and wildlife activities which results in natural selection on a human or human-like ACE2 receptor [33, 36] raises the possibility of SARS-CoV-2 emergence from pangolin. The receptor-binding domain (RBD) in the spike protein and functional polybasic (furin) cleavage site at the S1-S2 boundary [33, 40] . This amongst others, necessitates the strict travel bans, laws and confinement strategies adopted in different countries to curb its spread. Surprisingly, genomic and protein data from Philippines suggests otherwise (Figure 3 and Figure 4 ). Despite the limited data used for SARS-CoV-2 genomic profiling in this study, we found viral subtype divergence (considering distance metrics of SARS-CoV-2 with entries) (Figure 3 and Figure 4 ) suggesting a population-specific post translational modification which could have been influenced by genetic makeup. This is presumed based on subclades formed between protein sequence data from Philippines (BCA37476.1 and BCA37477.1) and another between China (QHD43415.1) and Philippines (BBZ90167.1) countries in the same continent. Also, empirical data points to genetic and epigenetic factors in SARS-CoVs evolution, incidence and infection rates amongst diverse populations and across different racial backgrounds [44] . Viral cellular mechanisms are vital factors necessary for replication during infection. Hence, identification of domains of viral entry and evasion of antiviral mechanisms in host cells is essential for development of effective therapeutic measures. Conserved domains that are vital targets sites for inhibition of SARS-CoV-2 viral entry and replication in host cells found in this study include nsp11, nsp 13, RdRp and corona super family while compounds such as RNA aptamers, ATP inhibitors, papain-like proteinase (plPRO) and 3C-like main protease-3CLpro etc. are viable indicated inhibitors of these domains; also, understanding the evolutionary pathway of the novel coronavirus transmission will not only help combat the current pandemic but assist in mutation tracking for identifying future zoonotic coronaviruses threats. The phylogenetic analyses of candidate zoonotic coronavirus (S) gene with SARS-CoV-2 revealed pangolin as the most recent ancestor which formed a sub-clade with bat S-gene suggesting interspecies recombination of CoV in bats and pangolins. Evolutionary pattern observed between SARS-CoV-2 genomic data from source of outbreak with recent entries analyzed in this study showed relative trajectory course of infection from source to other places except protein data from Philippines suggesting earlier existence of SARS-CoV-2 which should be further investigated. Also, genomic and protein data revealed racial viral subtype divergence and rapid rate of mutation despite the novelty of the outbreak. Precise dating of viral subtype divergence will enable researchers correlate divergence with epidemics and pandemics via viral sequence sampling for proper time-scale measurements of zoonotic threats in human populations. Therefore, there is an urgent need for large scale analysis and profiling of genetic data of SARS-CoV-2 in affected populations especially in Africa where there is paucity of genomic SARS-CoV data for effective therapeutic measures. • Coronavirus envelope protein: current knowledge A pan-coronavirus fusion inhibitor targeting the HR1 domain of human coronavirus spike Network-based drug repurposing for novel coronavirus 2019-nCoV/SARS-CoV-2 Coronavirus infections: epidemiological, clinical and immunological features and hypotheses The impacts on health, society and economy of SARS and H7N9 outbreaks in China: a case comparison study Coronaviruses: an overview of their replication and pathogenesis Dipeptidyl peptidase 4 is a functional receptor for the emerging human coronavirus-EMC Structure of MERS-CoV spike receptorbinding domain complexed with human receptor DPP4 Epitope-based peptide vaccine design and target site depiction against Middle East Respiratory Syndrome Coronavirus: an immune-informatics study Evolution of the novel coronavirus from the ongoing Wuhan outbreak and modeling of its spike protein for risk of human transmission Recombination every day: abundant recombination in a virus during a single multi-cellular host infection PubMed| Google Scholar 13. Burjaski JJ. Genetic recombination in plantinfecting messenger-sense RNA viruses: overview and research perspectives Antiviral drug resistance as an adaptive process Porcine reproductive and respiratory syndrome virus nsp11 antagonizes type I interferon signaling by targeting IRF9 Structural biology of the arterivirusnsp11 endoribonucleases Both Nsp1 beta and Nsp11 are responsible for differential TNF-alpha production induced by porcine reproductive and respiratory syndrome virus strains with different pathogenicity in vitro Differential host cell gene expression and regulation of cell cycle progression by nonstructural protein 11 of porcine reproductive and respiratory syndrome virus Modulation of host cell responses and evasion strategies for porcine reproductive and respiratory syndrome virus Analysis of therapeutic targets for SARS-CoV-2 and discovery of potential drugs by computational methods Structures and functions of coronavirus proteins: molecular modeling of viral nucleoprotein Isolation of inhibitory RNA aptamers against severe acute respiratory syndrome (SARS) coronavirus NTPase/Helicase The nonstructural proteins directing coronavirus RNA synthesis and processing Antiviral drugs specific for coronaviruses in preclinical development 6-Bisarylmethyloxy-5-hydroxy chromones with antiviral activity against both hepatitis C virus (HCV) and SARS-associated coronavirus (SCV) Ready, set, fuse: the coronavirus spike protein and acquisition of fusion competence Pre-fusion structure of a human coronavirus spike protein Structural basis for human coronavirus attachment to sialic acid receptors Retargeting of coronavirus by substitution of the spike glycoprotein ectodomain: crossing the host cell species barrier Natural mutations in the receptor binding domain of spike glycoprotein determine the reactivity of cross-neutralization between palm civet coronavirus and severe acute respiratory syndrome coronavirus CoV-2: an emerging coronavirus that causes a global threat Probable pangolin origin of SARS-CoV-2 associated with the COVID-19 outbreak Isolation and characterization of 2019-nCoV-like coronavirus from Malayan pangolins Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus The proximal origin of SARS-CoV-2 Role of changes in SARS-CoV-2 spike protein in the interaction with the human ACE2 receptor: an in silico analysis Structure analysis of the receptor binding of 2019-nCoV Effects of human anti-spike protein receptor binding domain antibodies on severe acute respiratory syndrome coronavirus neutralization escape and fitness The authors declare no competing interests. The conception and design was achieved by Pius A Okiki and Adejoke O Obajuluwa; acquisition of data, analysis and interpretation of data was done by Adejoke O Obajuluwa and Tiwalola M; Obajuluwa Adejoke O Obajuluwa and Olakunle B Afolabi drafted the article and revised it for important intellectual content; Pius A Okiki approved the final version to be published. All the authors have read and agreed to the final manuscript.