key: cord-0253257-qjxdys4c authors: Alrashedi, Hasan; Jones, Ian; Mulley, Geraldine; Neuman, Benjamin W title: Translated Blast of L Polymerase as a Hit for Novel Arenaviruses Species date: 2017-08-18 journal: bioRxiv DOI: 10.1101/178012 sha: 94b430ecd03a79c734a9cacccc740a7244beb7b1 doc_id: 253257 cord_uid: qjxdys4c Many pathogenic viruses can transmit between human and animals as zoonotic viruses and cause dangerous diseases with obvious clinical signs globally. However, the world deals seriously with these viruses when the viruses infected either human or animals especially if the infection were confirmed that classified as zoonotic (Lal et al. 2005). There are many viruses distribution in many countries around the world including Ebolavirus, Marburgvirus, SARS and MERS coronaviruses, Hendra, Nipah and arenavirus haemorrhagic fever viruses were categorized as zoonotic RNA viruses that cause epidemic in some regions such as African countries (Fichet-Calvet & Rogers 2009) (Ehichioya et al. 2010). Consequently, structural bioinformatics of virus protein like L polymerase of arenaviruses was used for monitor the future outbreak that could be happens by new species of viruses. At this research, significant similarities with hemorrhagic fever viruses including arenaviruses were found on GenBank database. Translated blast (tBLASTn) available on https://blast.ncbi.nlm.nih.gov/Blast.cgi was used for searching translated nucleotide databases using a protein query of arenavirus L polymerase (McGinnis & Madden 2004). At this research, the new and archival metazoan transcriptome sequence data of the new TSA species that available on NCBI was used for identification with arenaviruses genes. Therefore, structure bioinformatics was utilized for better understanding and predication the evolution and natural history of the pools of uncharacterized virus on Genbank database that have led to emerging haemorrhagic fever in near future around the world. The family of Arenaviridae contains 27 within two genuses, Mammarenavirus and Reptarenavirus according to the most recent International Committee on Taxonomy of Viruses (ICTV) report (Adams et al. 2016) . These viruses are distributing worldwide and cause febrile diseases for both humans and animals (Buchmeier et al. 2007 ). Mammalian arenaviruses of the genus Mammarenavirus are traditionally divided in two serogroups, the Old World arenaviruses, which contains the Lassa virus, and lymphocytic choriomeningitis virus (LCMV) serogroups and New World arenaviruses that consist of the Tacaribe virus serogroup (Clegg 2002) . Reptarenaviruses including some arenaviruses that are pending formally to classify as Golden Gate virus (GGV). Arenaviruses have a segmented ambisense single strand RNA genome, which is typically grouped as negative sense RNA viruses based on polymerase homology and replication strategy (Buchmeier et al. 2007 ) (Torre 2009 ). The large segment named as L and the short segment named as S (Bodewes et al. 2013 ). The L segment encoded viral RNA dependent RNA polymerase (RdRp) or L polymerase and small zinc finger motif protein (Z), whereas the S segment encodes nucleoprotein (NP) and glycoprotein precursor (GPC) (Salvato et al. 1989 ). Viral RNA dependent RNA polymerase (L) is a largest protein in arenaviruses that reaches around 7.2 kb (Zapata & Salvato 2013) and contains endonuclease domain that paly significant role in mRNA cap-snatching during viral transcription (Morin et al. 2010) . At this research, structural bioinformatics online tools and some software were used for finding gene sequences of the living species that have homology with some arenaviruses species in GenBank database (Zhang et al. 2005) (Altman & Dugan 2005) (Clote & Backofen 2000) (Pevzner 2000) (Liljas, A. et al. 2001 ) (Gopakumar 2012) including Tacaribe virus (TCRV), lymphocytic choriomeningitis virus (LCMV), Golden Gate virus (GGV) and California Academy of Science virus (CASV) as shown in table 1. L protein of four arenaviruses include TCRV, LCMV, GGV and CASV were utilized for searching for translated nucleotide, as transcriptome ShoutGun Assembly (TSA) of translated blast (tBLASTn) on NCBI facility was used for this purpose (McGinnis & Madden 2004) . Then, ExPASy translated tools was used for identification open reading frames (ORFs) of translated protein for TSA species (RNA transcript) (Gasteiger et al. 2003) . Subsequently, the HHpred tools kits that available on https://toolkit.tuebingen.mpg.de/#/tools/hhpred was used for homology detection and structure prediction of the target proteins sequences (Söding et al. 2005) . As the result, Bayesian tree were designed by MrBayes version 3.2.6 interface for novel/existence arenaviruses with some other viruses amino acids sequences to evaluation evolution history between the viruses (Ronquist et al. 2011 ) (Hall 2011 Value indicates the numbers of the hits during database searching that expected by chance when two or more sequences were aligned. The more significant E. Value when the parameter are less than 1 or zero, whilst the E. value equal to one is not consider since it means there is one chance for similarity between the hits (Pearson 1995) . As a result, the target transcript RNA sequences (table 1) were used for open reading frames (ORFs) detection and then for protein translation, homology detection and structure prediction with other protein that found online databases. At least, all sequences data including translated protein of TSA species and the sequences of Mammarenaviruses (TCRV and LCMV) and Reptarenaviruses (GGV and CASV) used for multiple sequence alignment (MSA) and finally for designing Bayesian phylogenetic trees. So, tBLASTn confirmed that there are many living species that have some genes homology with L gene of Arenaviridea that could carry and/or infected with arenaviruses that make risk of virus transmission to the human as zoonotic. At consequents, it is significant to use translated protein of TSA species in order to figure out protein homology between virus's genes. Transcript RNA for TSA species ( Mammarenaviruses)(TC ))))))))))))))))) 5amr7A) Microsoft PowerPoint was used for drawing the scales. The scale in different color and length depends on gene homology and gene size of the viruses, were 5amr-A (pink color) refers to structure of the La Crosse Bunyavirus polymerase in complex with the 3' viral RNA, 5D98-B (blue color) refers to Influenza C Virus RNA-dependent RNA Polymerase, while the 5a22-A (green color) refers to structure of the L protein of vesicular stomatitis virus. Reptarenaviruses)(GG ))))))))))))))))) 5amr7A) Influenza)C)Virus)RNA7dependent)RNA)Polymerase)) Structure)of)the)La)Crosse)Bunyavirus)polymerase)in)complex)with)the)3')viral)RNA) 5D987B) Talitrus saltator gene homology with the structure of the L protein of vesicular stomatitis virus (5a22-A) in green color ( fig. 1, 2, 3, and 4) . and translated protein of TSA species that found at this study. and arenaviruses that could assist for study evolutionary relationship between them such as when the alignment used for homology detection and 2D and 3D structure prediction of protein or when the data used for designing phylogenetic tree (Orobitg et al. 2015) . In Theory, it is possible for design primers form the identical regions that could be able to amplify the cDNA that was isolated from both TSA species as well as from arenaviruses species by reverse transcription PCR (RT-PCR) (Lozano et al. 1997 ) (Vieth et al. 2007) . So, there are significant conserved genes between translating protein of TSA species and L protein of arenaviruses that might assist for awarding some important details about molecular characteristics features between Arenaviridea and TSA species. To present clear phylogenetic analysis between the living species of TSA and arenaviruses that use at this study, it significant for designing phylogenetic tree that shows the evaluation history between theses species. MrBayes version 3.2.6 program (Ronquist et al. 2011 ) was used for creating phylogenetic tree by Bayesian Inference (table 2) were used as root viruses of phylogeny, then the alignment was saved as .phlip file at a location on PC. Second, the file was converted to .nxs file in order to be readable by MrBayes program. Then, MrBayes was opened were mb file found and then the .nxs file was dragged in after MrBayes prompt (>) in order to input file location, whereas there is a possibility to write file location instead of dragging after MrBayes prompt. Next, mcmc was input in the MrBayes block, the command mcmc option indicates how to analysis the data via MrBays and the analysis begins after mcmc statement is processed until standard deviation reach 0.01-0.05 or less. When the average standard deviation of split frequencies reach less than 0.01, command line (control-C) was generated by pressing C key were holding ctrl key on keyboards after choosing yes option. Next, the sumt command was wrote after MrBayes block, which specifies the program how to summarize the consensus phylogenetic tree were convergence has been completed. Finally, command sump was input that indicates MrBayes interface how to summarize the values of the parameter as it were saved. At the result, the tree as shown in Fig. 6 As demonstrate in Fig. 6 Moreover, there are expected virus genera potentially classify within Arenaviridea were L polymerase was used for tBLASTn. These genera including two expected species of fishes (Channa and Catostomus) that are more close to Reptarenaviruses including Golden Gate virus (GGV) and California Academy of Science virus (CASV) rather than Mamarenaviruses. Also, the tree shows that the fishes arenaviruses were located between Mamarenaviruses and Reptarenaviruses, while Rhizarenavirus is located between arenaviruses and Orthomyxoviridae (RSV). These viruses are potentially having similar characteristic features with arenaviruses and could transmit to human and animal via many routs such as by bite, sting and consumption respectively (Ter Meulen et al. 1996) (Kernéis et al. 2009 interface. (ORFs) form TSA data by using standard translation tool as ExPAy were used for this purpose (Gasteiger et al. 2003) . It was found by HHpred version HHsuite-2.0.16mod (Söding et al. 2005 ) that the translated protein of TSA species have homology with structure of the La Crosse Bunyavirus polymerase in complex with the 3' viral RNA (5amr-A), Influenza C Virus RNA-dependent RNA Polymerase (5d98-B) and with L protein of vesicular stomatitis virus (5a22-A). These finding could support the predicating about TSA species that could carry some viral genes that might cause diseases for both human and animals around the areas that they found such as the countries that used the murine animals for breeding and/or feeding. Moreover, there is some gene similarities between L protein of arenaviruses and translated protein of TSA species at some location when the alignment carried by using the Clustal Omega (1.2.4), which could assist for confirmation the protein homology by designing some gene specific primers that could amplified synthesized cDNA of TSA species and L gene of arenaviruses (Blazewicz et al. 2013 ) (Ortuño et al. 2015) which could be used for further experimental design in Vitro. In fact, reverse transcription PCR (Paweska et al. 2009 ) and real time PCR (Cordey et al. 2011) were used for finding novel/existence arenaviruses as degenerate oligonucleotide primers of L (Vieth et al. 2007 ), NP and GPC (Strigl et al. 1998 ) genes of arenaviruses were designed and used for virus investigation. Interestingly, phylogenetic tree for alignment of all proteins sequences that used and obtained at this research were utilized for designing phylogenetic tree by using Bayesian Inference (BI) were MrBayes v3.2.6 interface run for this purpose (Ronquist et al. 2011) and FigTree v1.4 .3 available on http://tree.bio.ed.ac.uk/software/figtree/ was used for the final presenting of the tree. The finding expected novel TSA genus of fishes arenaviruses that consists of Channa punctate, Catostomus commersonii that potentially be within arenaviruses were located between Mammareanvirus and Reptareanvirus. Also, the tree shows that the Rhizarenavirus located between Mammareanvirus and Rice Stripe virus (RSV) of Orthomyxoviridae. Besides, there are many viruses that used for phylogenetic tree root such as viruses of the family Orthomyxoviridae, Bunyaviridae and Monoegavirales (table 2) . Thus, there is evidence of gene sequencing of TSA living species that have some genes homology of the L protein of arenaviruses during using some bioinformatics tools and software including online databases, which assist for monitoring future outbreak for human and animals cause by these viruses. Ratification vote on taxonomic proposals to the International Committee on Taxonomy of Viruses Defining bioinformatics and structural bioinformatics BlastAlign: A program that uses blast to align problematic nucleotide sequences G-MSA -A GPU-based, fast and accurate algorithm for multiple sequence alignment Detection of novel divergent arenaviruses in boid snakes with inclusion body disease in The Netherlands Arenaviridae: the viruses and their replication Molecular phylogeny of the arenaviruses.Current Topics in Microbiology and Immunology Computational Molecular Biology: An Introduction Analytical validation of a lymphocytic choriomeningitis virus real-time RT-PCR assay Lassa fever Risk maps of lassa fever in West Africa ExPASy: The proteomics server for in-depth protein knowledge and analysis Bioinformatics: Sequence and Structural Analysis Phylogentic Trees Made Easy Fourth Edi Multiple sequence alignment with affine gap by using multi-objective genetic algorithm Prevalence and risk factors of lassa seropositivity in inhabitants of the Forest Region of Guinea: A cross-sectional study ZOONOTIC DISEASES OF PUBLIC HEALTH IMPORTANCE Characterization of arenaviruses using a family-specific primer set for RT-PCR amplification and RFLP analysis its potential use for detection of uncharacterized arenaviruses BLAST: At the core of a powerful and diverse set of sequence analysis tools Hunting of peridomestic rodents and consumption of their meat as possible risk factors for rodent-to-human transmission of Lassa virus in the Republic of Guinea The N-terminal domain of the arenavirus L protein is an RNA endonuclease essential in mRNA transcription High Performance computing improvements on bioinformatics consistency-based multiple sequence alignment tools Comparing different machine learning and mathematical regression models to evaluate multiple sequence alignments Nosocomial Outbreak of Novel Arenavirus Infection , Southern Africa Comparison of methods for searching protein sequence databases JavaScript DNA Translator: DNA-Aligned Protein Translations Computational Molecular Biology: An Algorithmic Approach Multiple sequence alignment using multiobjective based bacterial foraging optimization algorithm MrBayes Version 3.2 Manual: Tutorials and Model Summaries. Manual MrBayes The primary structure of the lymphocytic choriomeningitis virus L gene encodes a putative rna polymerase The HHpred interactive server for protein homology detection and structure prediction Detection of Lassa Virus Antinucleoprotein Immunoglobulin G ( IgG ) and IgM Antibodies by a Simple Recombinant Immunoblot Assay for Field Use Molecular and Cell Biology of the PrototypicArenavirus LCMV: Implications forUnderstanding and Combating HemorrhagicFever Arenaviruses RT-PCR assay for detection of Lassa virus and related Old World arenaviruses targeting the L gene Jalview Version 2-A multiple sequence alignment editor and analysis workbench Arenavirus variations due to host-specific adaptation Bioinformatics Technologies