key: cord-0979141-exnkl805 authors: Bala, Sumit; Ghosh, Ambarnil; Pradhan, Subhra title: Development of Web Application for the Comparison of Segment Variability with Sequence Evolution and Immunogenic Properties for Highly Variable Proteins: An Application to Viruses date: 2021-12-03 journal: bioRxiv DOI: 10.1101/2021.12.01.470810 sha: 3691a88a9611905c835c367c07a3435af6733770 doc_id: 979141 cord_uid: exnkl805 High rate of mutation and structural flexibilities in viral proteins quickly make them resistant to the host immune system and existing antiviral strategies. For most of the pathogenic viruses, the key survival strategies lie in their ability to evolve rapidly through mutations that affects the protein structure and function. Along with the experimental research related to antiviral development, computational data mining also plays an important role in deciphering the molecular and genomic signatures of the viral adaptability. Uncovering conserved regions in viral proteins with diverse chemical and biological properties is an important area of research for developing antiviral therapeutics, though assigning those regions is not a trivial work. Advancement in protein structural information databases and repositories, made by experimental research accelerated the in-silico mining of the data to generate more integrative information. Despite of the huge effort on correlating the protein structural information with its sequence, it is still a challenge to defeat the high mutability and adaptability of the viral genomics structure. In this current study, the authors have developed a user-friendly web application interface that will allow users to study and visualize protein segment variabilities in viral proteins and may help to find antiviral strategies. The present work of web application development allows thorough mining of the surface properties and variabilities of viral proteins which in combination with immunogenicity and evolutionary properties make the visualization robust. In combination with previous research on 20-Dimensional Euclidian Geometry based sequence variability characterization algorithm, four other parameters has been considered for this platform: [1] predicted solvent accessibility information, [2] B-Cell epitopic potential, [3] T-Cell epitopic potential and [4] coevolving region of the viral protein. Uniqueness of this study lies in the fact that a protein sequence stretch is being characterized rather than single residue-based information, which helps to compare properties of protein segments with variability. In current work, as an example, beside presenting the web application platform, five proteins of SARS-CoV2 was presented with keeping focus on protein-S. Current web-application database contains 29 proteins from 7 viruses including a GitHub repository of the raw data used in this study. The web application is up and running in the following address: http://www.protsegvar.com. Advancement in protein structural information databases and repositories, made by 23 experimental research accelerated the in-silico mining of the data to generate more integrative 24 information. Despite of the huge effort on correlating the protein structural information with 25 its sequence, it is still a challenge to defeat the high mutability and adaptability of the viral 26 genomics structure. In this current study, the authors have developed a user-friendly web 27 application interface that will allow users to study and visualize protein segment variabilities study lies in the fact that a protein sequence stretch is being characterized rather than single 36 residue-based information, which helps to compare properties of protein segments with 37 variability. In current work, as an example, beside presenting the web application platform, 38 five proteins of SARS-CoV2 was presented with keeping focus on protein-S. Current web-39 application database contains 29 proteins from 7 viruses including a GitHub repository of the 40 raw data used in this study. The web application is up and running in the following address: . Later similar method was applied on VP7 protein of rotavirus and four such surface exposed conserved 112 regions were found (Ghosh et al., 2012) . 113 In this current work, a comprehensive protein stretch variability visualizing platform 114 (web-application) is developed and applied to viral proteins. This approach standout from 115 others on the basis that it is a stretch variability miner than a single position of amino acids. In 116 addition, the graphical interface has scope that will allow users to browse through different currently working on updating the server database with more viruses and application features. Currently the server only runs with a window of nine amino acid stretch, but a development on 289 the stretch length selection method is underway and feature will be deployed soon with further 290 development in the interface. Why do HIV-1 and 309 HIV-2 use different pathways to develop AZT resistance Hepatitis C virus genome to identify the structural and functional dependency network of 312 viral proteins HotSpot3D web server: an integrated resource for 314 mutation analysis in protein 3D structures Clinical significance of hepatitis B surface antigen mutants Antibodies against MERS coronavirus in 320 dromedary camels MUSCLE: multiple sequence alignment with high accuracy and high throughput Multiple sequence alignment. Current opinion in structural 324 biology Coronaviruses, a new group of animal RNA viruses. Avian diseases PVS: a web server 327 for protein sequence variability analysis tuned to facilitate conserved epitope discovery 330 Characterization of Conserved Regions in Rotaviral VP7 Proteins: A Graphical Representation 331 Approach towards Epitope Prediction Annual Meeting of the Indian Biophysical Society (IBS). Indian Habitat Center In Silico Study 335 of Rotavirus VP7 Surface Accessible Conserved Regions for Antiviral Drug/Vaccine Design Graphical representation and mathematical characterization of 338 protein sequences and applications to viral proteins Computational analysis and determination of a highly 341 conserved surface exposed segment in H5N1 avian flu and H1N1 swine flu neuraminidase Computational study of 344 dispersion and extent of mutated and duplicated sequences of the H5N1 influenza 345 neuraminidase over the period Antibody response to influenza vaccination in the 348 elderly: a quantitative review Virus Variation Resource-improved response to 351 emergent viral outbreaks History and recent advances in coronavirus discovery. The 353 Pediatric infectious disease journal NetSurfP-2.0: 356 Improved prediction of protein structural features by integrated deep learning Viral zoonotic risk is homogenous among taxonomic orders 359 of mammalian and avian reservoir hosts Numerical characterization of protein sequences and 362 application to voltage-gated sodium channel α subunit phylogeny Complexities of viral mutation rates Mechanisms of viral mutation. Cellular and molecular life 366 sciences SeqKit: a cross-platform and ultrafast toolkit for FASTA/Q file 368 manipulation The immune epitope database (IEDB): 2018 update. Nucleic 371 acids research A new coronavirus associated with human respiratory disease in China An evolutionary NS1 mutation enhances Zika virus evasion of host 377 interferon induction Direct Coupling Analysis of RNA and Protein Sequences A pneumonia outbreak associated with a new coronavirus of probable bat origin. 382 nature