key: cord-0022525-1d5451ik authors: Khan, Taimoor; Khan, Abbas; Wei, Dong-Qing title: MMV-db: vaccinomics and RNA-based therapeutics database for infectious hemorrhagic fever-causing mammarenaviruses date: 2021-10-22 journal: Database (Oxford) DOI: 10.1093/database/baab063 sha: cbff028e9f790aeee1caa1eb49fd54a3d0d6b77e doc_id: 22525 cord_uid: 1d5451ik The recent viral outbreaks and the current pandemic situation urges us to timely address any emerging viral infections by designing therapeutic strategies. Multi-omics and therapeutic data are of great interest to develop early remedial interventions. This work provides a therapeutic data platform (Mammarenavirus (MMV)-db) for pathogenic mammarenaviruses with potential catastrophic effects on human health around the world. The database integrates vaccinomics and RNA-based therapeutics data for seven human pathogenic MMVs associated with severe viral hemorrhagic fever and lethality in humans. Protein-specific cytotoxic T lymphocytes, B lymphocytes, helper T-cell and interferon-inducing epitopes were mapped using a cluster of immune-omics-based algorithms and tools for the seven human pathogenic viral species. Furthermore, the physiochemical and antigenic properties were also explored to guide protein-specific multi-epitope subunit vaccine for each species. Moreover, highly efficacious RNAs (small Interfering RNA (siRNA), microRNA and single guide RNA (sgRNA)) after extensive genome-based analysis with therapeutic relevance were explored. All the therapeutic RNAs were further classified and listed on the basis of predicted higher efficacy. The online platform (http://www.mmvdb.dqweilab-sjtu.com/index.php) contains easily accessible data sets and vaccine designs with potential utility in further computational and experimental work. Conclusively, the current study provides a baseline data platform to secure better future therapeutic interventions against the hemorrhagic fever causing mammarenaviruses. Database URL: http://www.mmvdb.dqweilab-sjtu.com/index.php The human-infecting seven mammarenaviruses associated with causing viral hemorrhagic fever are named Lassa virus (LASV), Chapare virus (CHAPV), Lujo virus (LUJV), Guanarito virus (GTOV), Junín virus (JUNV), Machupo virus (MACV) and Sabiá virus (SABV). Demographically, LASV and LUJV are considered indigenous to Africa, whereas CHAPV, GTOV, JUNV, MACV and SABV are common in American countries (1, 2) . Mammarenavirus (mammalian arenaviruses) is an important genus of animal viruses and accommodates in the family called Arenaviridae. These are basically enveloped and spherical viral particles with a diameter of 50-300 nm (3, 4) . The genome consists of two single-stranded (RNA) molecules known as L (large) and S (small) segments. Each genomic segment is responsible for the production of two different proteins, whereas the L segment serves as a genomic code for zinc-binding matrix protein (Z) and viral RNA-dependent RNA polymerase (L). Similarly, the S segment encodes an envelope glycoprotein precursor (GPC) and a nucleoprotein (NP) (5, 6) . Additionally, two glycoprotein subunits called GP1 and GP2 of the spike are obtained after posttranslational cleavage during GPC synthesis (7, 8) . The virions then utilize GP1 subunit to facilitate cell-surface receptor binding and enter the cell through endocytosis (9, 10) . The other GP2 subunit mediates pH-dependent membrane fusion followed by uncoating and releasing viral ribonucleoprotein complexes inside the cell (11) . The mechanism of viral pathogenesis exhibits severe outcomes and a high fatality rate (12, 13) correlated with mammarenaviral hemorrhagic fevers. These viruses can be transmitted through aerosol or contact with infected person (3) . The virus then gains systematic entry into the host lymphoid system with undetected pneumonic symptoms (14) . The prominent targeting of macrophages (15, 16) and liver damage are considered a hallmark of pathogenicity (17, 18) during these human mammarenavirus infections. Similarly, compromised immune response function with secondary bacterial infections (19, 20) and leukocyte dysfunction in polymorphonuclear cells causing leukopenia (21, 22) are also associated with mammarenaviral infections. Other abnormalities, including the defective function of (23) (24) (25) , has also been linked with hemorrhagic diseases. The therapeutic approaches adapted for hemorrhagic fevers caused by different mammarenaviruses with related symptoms (26) (27) (28) vary with pathological conditions. Till now, there have been very few effective treatment options available to combat hemorrhagic fever in clinical setups. These treatment regimens include administering an adequate dosage of neutralizing antibodies during immune serum treatment (29) with related complications of transient cerebellarcranial nerve syndrome (30, 31) . Such passive antibody therapy options are also harbored by transfusion-borne diseases and require alternate treatments (32) . The current antimammarenaviral therapy also includes the use of ribavirin (1-β-d-ribofuranosyl-1 H-1,2,4-triazole-3-carboxamide) with partial efficacy against some mammarenavirus infections. Meanwhile, the use of ribavirin is harbored by associated toxicity and adverse side effects, including severe anemia, thrombocytosis and birth-related defects in humans (33) (34) (35) , Another candidate drug called T-705 (favipiravir) with targeted inhibition of target viral RNA synthesis and broad antiviral activity against RNA viruses (36, 37) is also used as a treatment option. Furthermore, no Food and Drug Regulatory Authority (FDA)-licensed vaccines are currently available to prevent mammarenavirus infections. The only designated live-attenuated vaccine that advanced to human clinical trials is called Candid 1 (Candidate no. 1), with efficacy against JUNV mediated infections (38, 39) . Still, the continued search of potential vaccines expanded to several recombinant viruses, inactivated mammarenaviruses or alike particles (40) and other candidates tested in various animal models (41, 42) needs further evaluation with its potential therapeutic significance as a human vaccine. For instance, the development of biological web databases for different diseases, i.e. breast cancer and cytomegaloviruses, are of great interest to researchers (43, 44) . In this study, annotated data sets based on the genome and proteome analysis for seven species of human-infecting mammarenaviruses are presented. The analysis basically comprised of genome/proteome collection, immune-based epitopes prediction, vaccine designing and RNA-based therapeutics analysis. Additionally, the comprehensive therapeutic information is curated in the form of data sets available for free access to researchers. The extensive genomic and protein-specific investigation provides putative vaccine designs and RNA therapeutics options for utility in both advanced computational and experimental research. The novel platform, with proteinspecific vaccine designs for each species and shortlisted potential siRNAs, microRNAs (miRNAs), and sgRNAs with all the necessary information, will aid in future therapeutic strategies against mammarenaviruses infections. The whole-genome sequences (L and S segments) information used in this study were retrieved from the available online platform National Center for Biotechnology Information (https://www.ncbi.nlm.nih.gov/) (45) , whereas the protein sequences (Z, L, G, N) of human-infecting mammarena species were collected from UniProt (https://www. uniprot.org/) (46) . Further analysis was performed for the shortlisted human hemorrhagic fever-causing mammarena species. The accession and basic information of the genomic data set comprising of seven MMV species included in the study are given in Table 1 . A list of protein sequences and accession IDs used in the study are also listed (Supplementary Sheet S1). The basic data information and sequences were then subjected to further analysis. All the protein sequences of each mammarena species were initially scanned for immunogenic cytotoxic T lymphocyte (CTL) epitopes, B lymphocyte (B cell) epitopes, Helper T lymphocyte (HTL) epitopes and IFN-gamma-inducing peptides. The obtained epitope sequences for each species were further utilized to design highly immunogenic and antigenic epitopes-based in silico vaccines against each strain. To achieve the desired objectives, CTL epitopes for each protein of all species were predicted with the help of NetCTL 1.2 server (http://www.cbs.dtu.dk/services/NetCTL/) (47) and further characterized on the basis of combined score. The cut-off value used to predict CTL epitopes was set at 0.75. Similarly, B-cell epitopes prediction was carried out through ABCPred (http://crdd.osdd.net/raghava/abcpred/) server (48) . The predicted linear B-cell epitopes were further filtered with a defined cut-off score of 0.5 in the process. Epitope ranking was done based on the binding score: the higher the score, the higher the probability of peptide inducing an immune response. Next, HTL epitopes (15mer) were obtained from the immune epitope database (IEDB) server (http://tools.iedb.org/mhcii/) (49) that showed good affinity for human Major Histocompatibility Complex (MHC) molecules (HLA-DRB1*01:02, HLA-DRB1*01:01, HLA-DRB1*01:04, HLA-DRB1*01:03 and HLA-DRB1*01:05), whereas the percentile ranking is inversely proportional to epitopes binding affinity and implies that a lower percentile rank would depict higher binding affinity (49) . Furthermore, IFN-γ-inducing peptides were filtered among these positive MHC-II peptides by employing IFNepitope web server (http://crdd.osdd.net/raghava/ifnepitope/) (50) . The predictions were performed using IFNepitope server. Next, to select the best combination of epitopes that passes all experimental principles, antigenic epitopes were screened among the predicted cell epitopes by using Vaxijen v2.0 (51) with a default threshold of 0.4. To discriminate between allergens and nonallergens, AllerTOP v.2.0 (52) based on the k-nearest neighbors approach was used. The analyzed shortlisted peptides for each target protein with increased potential efficacy were included in further vaccine constructs. Computational methods are of great interest to understand the molecular mechanisms of pathogenesis, drug resistance, and the development of novel therapeutics (44, (53) (54) (55) (56) (57) . All the predicted epitopes for each protein were ranked accordingly based on the higher binding affinity. The final vaccine candidates were composed of adjuvant CTL, HTL (IFN +ive), and B-cell epitopes joined together by AAY, GPGPG, and KK linkers (58, 59) , respectively. Herein, the vaccine sequences were further stabilized with added N-terminal human beta defensin-2 (hBD-2) sequence to ensure enhanced immunogenic response (60) . The vaccine construct also needed to be antigenic for eliciting the proper immune response. For this purpose, the VexiJen server (http://www.ddgpharmfac.net/vaxijen/VaxiJen/VaxiJen.html) (51) was employed to predict the vaccine's antigenicity while keeping the threshold at the default 0.4. Another critical parameter, allergenicity, was predicted with the help of AlgPred server (http://crdd.osdd.net/raghava/algpred/) (61) at an accuracy of around 85%. Allergenic sequence can be identified when there is a score greater than the threshold (>−0.4). Physiochemical properties such as amino acid composition, molecular weight, theoretical pI, in vivo and in vitro half-life, instability index, aliphatic index, and grand average of hydropathicity (GRAVY) for experimental processing parameters were also employed to verify the vaccine. It was performed to unveil these properties for each vaccine construct by opting for an online webserver ProtParam (https://web.expasy.org/protparam/) (62) . Furthermore, the 3D structures for all the vaccine constructs were predicted using the Robetta web server (https://robetta.bakerlab.org/) (63) . In this procedure, the submitted sequences undergo domain-based initial recognition to forecast structure. This is followed by 3D modeling of submitted sequences depending on the type of templates available in the database. If matching templates are available, then comparative modeling is performed; otherwise, de novo modeling for 3D structures is performed. Finally, to address therapeutic implications, all the developed vaccine designs were listed and included as a separate data set for all the proteins of each MMV species. The genome sequences (L and S Segments) for each species were further analyzed to predict siRNAs against each virus. Herein, virus-specific VIRsiRNApred server was employed with utilizing model 2 (64) . The model is constructed based on integrated variable features, including hybrid nucleotide frequencies, thermodynamic properties, and binary pattern of already identified 1725 viral siRNAs. siRNAs that are highly efficacious with inhibition ≥50% were included. To evaluate immunomodulatory (IM) impact, the imRNA tool (65) was utilized to investigate IM and non-IM siRNAs. Similarly, putative miRNAs for Mammarena (MM) viruses were also predicted using two-step approaches. First, VMir algorithm (66) was used to predict precursor miRNA (pre-miRNA) hairpins by deploying default parameters. Second, the Mature Bayes tool (67) was used to identify mature miRNAs. All possible sgRNAs for MM viruses were also predicted using the ge-CRISPR tool (68) based on the Protospacer Adjacent Motif. This algorithm scans 'NGG' motifs in both forward and reverse strands of the genome and picks up putative sgRNAs located 20 nucleotides upstream of the motif. A regression-based algorithm was run on geCRISPR predictions to predict sgRNA with an efficiency of 0% to 100%. The intricate process of database development was followed by using Apache HTTP (Hypertext Transfer Protocol) server v2.2.1.7 through open-source Linux, MySQL (My Structured Query Language) and PHP (Hypertext Preprocessor) to develop and deploy online the 'MMV-db' database. Front-end development and user interaction interface were designed using CSS (Cascading Style Sheets), HTML (Hypertext Markup Language), PHP, and JavaScript, which also provides searching and downloading function. For back-end development of the database WAMP (Windows, Apache, MySQL, PHP) server accompanied by scripting in environments like HTML and PHP was used. Data storage, manipulation and retrieving from the databases were managed through MySQL to confer complete control over the web contents. MMV-db focus spanned from basic protein features profiling to advanced epitopes-based vaccine designs and RNAbased therapeutics for all human-infecting MM viruses. This database is a collective platform for a total of seven hemorrhagic fever-causing-related mammarenaviruses. The database includes multiple-features profiling, including genome and proteins sequences, vaccine designs and therapeutic RNAs information represented in different tabs of the developed online platform. The overall workflow of the strategy, including the data source utilized in the design of this database, has been given in Figure 1 . The antigenic and nonantigenic proteins for each species were identified with a VaxiJen threshold scoring system (51) . The server utilizes an alignment-free, covariance-based approach with a focus on the properties of amino acids (51) . We choose the target organism as a virus and initiated the analysis with a sequence-based output with default criteria. The antigenicity profiling of all the proteins was performed including all the mammarena viruses. The cut-off value of 0.4 was used as an indicative threshold to differentiate between potential viral antigenic and nonantigenic proteins (51) . Proteins were further subjected to allergenicity prediction analysis. The performed allergenicity check helps to prevent any possible allergic responses in the host (69) . The server algpred v. 2.0 (70) was utilized to predict the allergenicity of the proteins, whereas a score greater than the threshold (>−0.4) represents allergenic sequences [49] . The input sequence was added as a single letter amino acid code, while the selected prediction approach was an amino acid composition-based Support Vector Machines (SVM) module (70) . The immune-based analysis of antigenicity and allergenicity was performed to profile each species-specific protein. The output data were arranged on the basis of obtained scores to differentiate between antigenic, nonantigenic, allergenic and nonallergenic proteins. The antigenicity and allergenecity profiles for each of the four (Z, L, G and N) specific proteins of all species are shown in Figure 2 . Similarly, a total of 639 CTL epitopes, 2275 B-cell epitopes, 116 746 HTL epitopes and 9945 IFN epitopes were analyzed for all the mammerenavirus species. The predicted epitopes were further classified on the basis of species-specific proteomes. The total count of whole proteome-specific T-cell, B-cell, HTL and IFN-inducing epitopes were calculated for each studied species and presented as shown in Figure 3A -D, respectively. The immunogenic and potential vaccine epitopes screening was performed in a sequential manner for all the seven mammarena viruses i.e. LASV, CHAPV, LUJV, GTOV, JUNV, MACV, and SABV. Moreover, the order of proteins in the results is presented as Z, L, G (representing GPC) and N (representing NP) for each species. The prediction of T-cell, B-cell and HTL epitopes obtained after protein sequencebased analysis are presented with the total number of each type of epitope for a specific protein in individual species ( Figure 4) . First, the prediction of potential CTL epitopes related to the four L, S, G and N was performed. For this purpose, Net CTLpan v1.2 was utilized (47) , and predictions were performed using 12 different supertypes of human leucocyte antigen with the rest of the default parameters. The sequences of the peptides having a % Rank <1% (