key: cord-1016181-43irxe9y authors: Srivastava, Vijay Kumar; Kaushik, Sanket; Bhargava, Gazal; Jain, Ajay; Saxena, Juhi; Jyoti, Anupam title: A Bioinformatics Approach for the Prediction of Immunogenic Properties and Structure of the SARS-COV-2 B.1.617.1 Variant Spike Protein date: 2021-10-05 journal: Biomed Res Int DOI: 10.1155/2021/7251119 sha: 876260f7f33dbb57ea7a791c71a65175db334bf8 doc_id: 1016181 cord_uid: 43irxe9y BACKGROUND: B.1.617.1, a variant of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) causing respiratory illness is responsible for the second wave of COVID-19 and associated with a high incidence of infectivity and mortality. To mitigate the B.1.617.1 variant of SARS-CoV-2, deciphering the protein structure and immunological responses by employing bioinformatics tools for data mining and analysis is pivotal. OBJECTIVES: Here, an in silico approach was employed for deciphering the structure and immune function of the subunit of spike (S) protein of SARS-CoV-2 B.1.617.1 variant. METHODS: The partial amino acid sequence of SARS-CoV-2 B.1.617.1 variant S protein was analyzed, and its putative secondary and tertiary structure was predicted. Immunogenic analyses including B- and T-cell epitopes, interferon-gamma (IFN-γ) response, chemokine, and protective antigens for SARS-CoV 2 S proteins were predicted using appropriate tools. RESULTS: B.1.617.1 variant S protein sequence was found to be highly stable and amphipathic. ABCpred and CTLpred analyses led to the identification of two potential antigenic B cell and T cell epitopes with starting amino acid positions at 60 and 82 (for B cell epitopes) and 54 and 98 (for T cell epitopes) having prediction scores > 0.8. Further, RAMPAGE tool was used for determining the allowed and disallowed regions of the three-dimensional predicted structure of SARS-CoV-2 B.1.617.1 variant S protein. CONCLUSION: Together, the in silico analysis revealed the predicted structure of partial S protein, immunogenic properties, and possible regions for S protein of SARS-CoV-2 and provides a valuable prelude for engineering the targeted vaccine or drug against B.1.617.1 variant of SARS-CoV-2. Coronaviruses (CoVs), belonging to the family Coronaviridae, are enveloped nonsegmented, single-stranded positive-sense RNA viruses and infect humans and various animals (bats, birds, camels, cats, dogs, and mice) [1] . Based on the genome sequence, CoVs have been further categorized into four genera, i.e., the alpha, beta, gamma, and delta [2] . Six different species of CoVs infecting humans, all belonging to betacoronavirus, have been identified, i.e., human coronavirus (HCoV) 229E, HCoV-OC43, SARS-CoV, HCoV-NL63, HCoV-HKU1, and Middle East respiratory syndrome coronavirus (MERS-CoV). Except for SARS-CoV and MERS-CoV, the other four viruses cause the common cold in immunocompromised subjects [3] . In 2003, the SARS-CoV emerged in Guangdong province in South China, which causes severe acute respiratory syndrome [4] . In 2012, the Middle East respiratory syndrome (MERS) was first identified in Saudi Arabia infecting 2500 of which more than 800 resulted in death and rapidly spread in 27 countries across the globe [5] . Both SARS-CoV and MERS-CoV are zoonotic, and in human, they infect the upper respiratory tract causing common cold as well as lower respiratory tract resulting in bronchitis, whooping cough, and pneumonia [4, 5] , and till to date, there is no approved therapeutic molecule for the treatment. In December 2019, a large number of cases with pneumonia were reported and epidemiologically linked with the seafood market in Wuhan in Hubei province in China [6] . The causative agent was identified as novel CoV using state-of-art next-generation sequencing technology of the specimen isolated from the patient. Further, it has been coined the name SARS-CoV-2 due to 87% sequence similarity with the two bat-derived SARS-like CoV strains (bat-SL-CoVZC45 and bat-SL-CoVZXC21) having singlestranded RNA genome with size from 29 to 30 Kb [7, 8] . On 11 February 2020, the World Health Organization (WHO) named COVID-19 for the new disease caused by SARS-CoV-2 (https://www.who.int/emergencies/diseases/ novel-coronavirus-2019/technical-guidance/naming-the-co ronavirus-disease-(covid-2019)-and-the-virus-that-causes-it ). COVID-19 rapidly spread across Asia (India, Iran, Japan, Pakistan, Saudi Arabia, South Korea, and Turkey), Europe (France, Germany, Italy, Netherlands, Switzerland, and the UK), North America (Mexico and the USA), South America (Brazil, Chile, and Peru), Africa (Algeria, Egypt, Ghana, Nigeria, and South Africa), and Oceania (Australia, New Zealand, and French Polynesia) (https:// covid19.who.int/). The menace remains unabated and is continuously ravaging in other parts of the world. As of September 16, 2021, globally, there have been 225,680,357 confirmed cases of COVID-19, including 4,644,740 deaths (https://covid19.who.int/). Human to human transmission of SARS-CoV-2 has been reported, and infected patients are diagnosed with fever, cough, fatigue, and difficulty breathing [9, 10] . SARS-CoV-2 genome encodes several nonstructural, structural, and accessory proteins [11] . There has been a global endeavour by the researchers to decipher the structural-functional relations of the important proteins of SARS-CoV-2, to get an insight into the mechanistic details of their binding targets on human cells [11] [12] [13] [14] [15] . The S protein on the virus interacts with the angiotensin-converting enzyme 2 (ACE2) receptor present over human cells led to the internalization of SARS-CoV-2 within the cells [16] . Mutations in the S protein have been reported to enhance the binding with ACE2 [17] . The therapeutic strategies to inhibit the host recognition, and attachment of host with the virus by targeting S protein could be an attractive paradigm for developing anti-SARS-CoV-2 drugs. The ongoing global spread of SARS-CoV-2 has led to the emergence of new strains with profound and stable mutations. Among these strains, B.1.617.1 lineage first identified in India and subsequently spread to the other parts of the world are characterized by mutations in S as well as other proteins. This lineage has been categorized as variants of interest by CDC, hence underscoring the importance of study in terms of physiochemical properties, immunogenic potential, and protein structure prediction. Both patient/host response and virus-specific information are pivotal in the clinical management of the disease including diagnosis and therapeutics. Recognition of the pathogen key protein by host cells to induce the immune system is of paramount importance as this is helpful in the designing of the vaccine. Identifying key pathogenic protein using homology modeling, a state-of-the-art bioinformatics tool is a viable strategy for designing of vaccine and therapeutic molecule. Further, identification of the permissible and nonpermissible regions is critical for identifying the potential drug targets with therapeutic efficacy. Here, the in silico approach was employed to decipher the structure and function of the partial S protein of SARS-CoV-2 B. ProtParam tool (http://web.expasy.org/protparam/) on ExPASy server was used for determining the physicochemical properties, i.e., molecular weight (Mw), isoelectric point (pI), amino acid composition, extinction coefficient (EC), instability index (II), aliphatic index (AI), and grand average of hydropathicity (GRAVY) of QUX03874.1 S protein [18] . Online tools including ABCpred [19] , CTLpred [20] , CHEMOpred [21] , and Vaxijen server [22] were used to predict B-cell Further, interferon-gamma (IFN-γ) response for predicted epitopes was evaluated using the IFNepitope (http://crdd.osdd.net/raghava/ifnepitope/index .php) [23] . For the prediction of chemokines, ChemoPred, a support vector machine-based approach (https://webs .iiitd.edu.in/raghava/chemopred/index.html) was used with default parameters. 2.3. Alignment of the Sequence. The primary S protein sequence from SARS-CoV-2 B.1.617.1 variant, QUX03874.1, was identified from the ExPASy database [24] . The BLASTP against the Protein Data Bank (PDB) was carried out to determine the protein template for the QUX03874.1 and subsequent prediction of the model. The search revealed an identical sequence from Homo sapiens viral protein (Human SARS coronavirus) with PDB entry 7KQE [25] . This sequence was then used for in silico modeling. ClustalW tool [26] was used for the equivalent sequence alignment with 7KQE as a template. Validation. SWISS-MODEL, a fully automated server that creates protein structure homology modeling [27] , was used to predict the 3D structure of QUX03874.1 (partial S protein) from SARS-CoV-2 B.1.617.1 variant. The program comprises three steps, i.e., (i) the PDB file of the structures (7KQE), (ii) the alignment of the target sequence and recognized structures, and (iii) the visualization of the predicted structure using PyMol (http://www.pymol.org/). The Qualitative Model Energy ANalysis (QMEAN) and Global Model Quality Estimation (GMQE) values of the SWISS-MODEL server were used for assessing the fidelity of the structure. PROCHECK determines the stereochemical quality of the protein structure (http://www.ebi.ac.uk/thornton-srv/software/PROCHECK) [28] and was thus used for determining the attributes of the predicted 3D structure of SARS-CoV-2 B.1.617.1 variant S protein (QUX03874.1). Further, RAMPAGE was used to decipher the Ramachandran plot analysis of the model, which revealed the phi versus psi dihedral angles for each residue in the input PDB file, and also exhibited the allowed and disallowed regions for the in silico structured model based on the density-dependent smoothing. (Table 2 ). B-cell epitope predictions showed 12 sequences having probability to be as 3 BioMed Research International epitope with score > 0:51. Among these, two peptide sequences TEIYQAGSTPCNGVQG and LQSYGFQPTNGVGYQP peptides at 60 and 82 positions were highly antigenic with scores 0.93 and 0.9, respectively. T-cell epitope predictions displayed 46 sequences as proba-ble CTL epitope with score > 0:51. Among these, two peptides' sequences FERDISTEI and YRVVVLSFE at 54 and 98 positions with a highly antigenic score of 1 and 0.99, respectively. Further, QUX03874.1 of SARS-CoV-2 S protein displayed antigenic response with a score 0.56 and no Table 3) , some of the residues marked as stars are different from the known structure ( Figure 1 ). The predicted topology showed that QUX03874.1 protein comprises two helices and five beta sheets (Figure 2 ). For the subsequent analysis, 7KQE was used as a reference for modeling the QUX03874.1 protein based on the already recognized electron microscopic structure of Homo sapiens viral protein (Human SARS coronavirus). The model generated was accurate for the angle and length of the bonds. SWISS-MODEL was then employed for generating a single model from the ClustalX files generated through sequence alignment and visualized qualitatively and quantitatively, which revealed the lowest root mean square deviation (RMSD) value with the template (Table 3 ; Figures 3(a) and 3(b) ). Subsequently, the PROCHECK was employed for deciphering the stereochemistry (psi and phi angles) of the models, which produced several files comprising detailed data of the amino acids and the stringency of the generated structure (Table 4 ; Figure 4 ) in concurrence with the structures of the similar resolution [31] . The Ramachandran plot analysis of the recognized structures of QUX03874.1 revealed 84.2% of amino acids are in the most favored regions and 15.8% in additionally allowed regions, and no amino acid detected in the generously allowed and disallowed regions. Overall, the analysis revealed the fidelity of the predicted model and concurred with the 7KQE. The COVID-19 pandemic has resulted in a loss of more than 4 million human life with maximum casualties in the USA, Brazil, India, Mexico, Peru, Russian Federation, the UK, and Italy (as of September 16, 2021; https://www .worldometers.info/coronavirus/#countries). Further on, mutations and emergence of new variants of SARS-CoV-2 led to the surge of the second and third waves of COVID-19, and it has cost many lives. Among the different variants, B.1.617.1 has rapidly spread in India and to several countries throughout the world. Recent report has suggested that this variant is 6.8-fold less susceptible to neutralization by sera from COVID-19 convalescent and Moderna-and Pfizervaccinated individuals [32] . The number of deaths continues to increase across the globe, and there seems to be no respite from this menace. Therefore, there has been an unprecedented global endeavor almost at the war footing by the researchers to design and develop a potent vaccine against SARS-COV-2 B.1.617.1 variant to mitigate highly contagious and life-threatening COVID-19. In this context, an in silico approach for In the current paper, we have predicted the structure and functions of SARS-CoV-2 B.1.617.1 variant partial S protein using state of art bioinformatics approach. The validity of the predicted structure was also studied. Further, immunogenic properties of B.1.617.1 variant S protein using B-cell epitopes, T-cell epitopes, chemokines, antigen, and IFN-γ response prediction tools were also employed. The 3D structures of some of the important proteins of SARS-CoV-2 have now been predicted [33-36; (Table 1) and its hydrophilic nature suggested its high stability and thus deemed to be a potential candidate for engineering vaccine against COVID-19. The specific residues present in the protein act as antigenic epitopes [19] . In silico tool ABCpred, CTLpred, CHEMOpred, and Vaxijen servers were used for predicting the immunogenic properties of B.1.617.1 variant S protein (Table 2) . Two potential B-cell linear epitopes were predicted with scores equal or more than 0.8 in QUX03874.1. This is in agreement with the recent study where B-cell epitopes have been predicted using Bepipred 2.0 [13, 37] . We also predicted and analyzed T cell epitopes in B.1.617.1 variant S protein. We found two potential T-cell epitopes with scores equal or above 0.99 in QUX03874.1. Hence, upon SARS-CoV-2 infection, both the arms of adaptive immunity (B and T cells) are likely to elicit immunological responses. The predicted immunoepitopes may play an important role in the initiation of the immune response. The topology of the B.1.617.1 variant S protein revealed the fold comprising α-helices and β-sheets ( Figure 2 ). It is the most prominent protein structure spanning the plasma membrane and can form hydrogen bonds, which confers stability [38] . Superimposition of QUX03874.1 with 7KQE revealed a high degree of structural overlap and sequence similarity, which was corroborated with the lowest RMSD (Table 3 ; Figures 3(a) and 3(b) ). An earlier study has also reported an inverse correlation between the high incidence of the structural and sequence identity and RMSD value [39] . The QMEAN, Z-score, and analysis of the Ramachandran plot validated the high-quality of the 3D structure of QUX03874.1 (Table 4; Figure 4 ) and concurred with an earlier study on the hypothetical protein MG_377 in Mycoplasma genitalium [41] . The predicted model of QUX03874.1 could be used as a template for identifying the interaction of the protein and docking with the ligand and putative drugs, which may aid in the discovery of novel drug molecules for fighting the viral disease. However, at present, the function of QUX03874.1 remains enigmatic and merits in-depth studies involving their threedimensional X-ray structural analysis and posttranslational modifications. An in silico approach was employed for deciphering the structure and key immunogenic properties, for partial S protein of SARS-CoV-2 B.1.617.1 variant. The study provides valuable insights that could be useful for the development of monoclonal antibodies, inhibitors, or vaccines targeting S protein of SARS-CoV-2 B.1.617.1 variant, as well as diagnostic tools shortly, which warrants empirical validation by rigorous and stringent wet-lab experiments. All the data in this manuscript is available with the corresponding author upon formal request. BioMed Research International Coronavirus pathogenesis Epidemiology, genetic recombination, and pathogenesis of coronaviruses Origin and evolution of pathogenic coronaviruses Epidemiology and cause of severe acute respiratory syndrome (SARS) in Guangdong, People's Republic of China Isolation of a novel coronavirus from a man with pneumonia in Saudi Arabia Outbreak of pneumonia of unknown etiology in Wuhan, China: the mystery and the miracle Pathogenicity and transmissibility of 2019-nCoV-A quick overview and comparison with other emerging viruses A novel coronavirus from patients with pneumonia in China A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China Structural genomics of SARS-CoV-2 indicates evolutionary conserved functional regions of viral proteins Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation A sequence homology and bioinformatic approach can predict candidate targets for immune responses to SARS-CoV-2 Discovery of potential multi-target-directed ligands by targeting hostspecific SARS-CoV-2 structurally conserved main protease Structural basis of receptor recognition by SARS-CoV-2 Structure of SARS coronavirus spike receptor-binding domain complexed with receptor Receptor recognition by the novel coronavirus from Wuhan: an analysis based on decade-long structural studies of SARS coronavirus Protein identification and analysis tools on the ExPASy server Prediction of continuous B-cell epitopes in an antigen using recurrent neural network Prediction and classification of chemokines and their receptors VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines Designing of interferon-gamma inducing MHC class-II binders ExPASy: SIB bioinformatics resource portal Bi-paratopic and multivalent VH domains block ACE2 binding and neutralize SARS-CoV-2 Clustal W and Clustal X version 2.0.," bioinformatics Comparative protein structure modeling using MODELLER PROCHECK: a program to check the stereochemical quality of protein structures Correlation between stability of a protein and its dipeptide composition: a novel approach for predictingin vivostability of a protein from its primary sequence Thermostability and aliphatic index of globular proteins A live attenuated severe acute respiratory syndrome coronavirus is immunogenic and efficacious in golden Syrian hamsters Infection and vaccineinduced neutralizing antibody responses to the SARS-CoV-2 Crystal structure of SARS-CoV-2 main protease provides a basis for design of improved α-ketoamide inhibitors Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2 Angiotensin-converting enzyme 2 (ACE2) as a SARS-CoV-2 receptor: molecular mechanisms and potential therapeutic target IFN-gamma down-regulates MHC expression and antigen processing in a human B cell line Expression and membrane integration of SARS-CoV E protein and its 7 BioMed Research International interaction with M protein Relation between sequence and structure in membrane proteins In silico structural and functional annotation of Mycoplasma genitalium hypothetical protein MG_377 The authors declare no conflict of interest. Vijay Kumar Srivastava, Sanket Kaushik, and Gazal Bhargava contributed equally to this work.