key: cord-0692321-fytc8op9 authors: Saha, Priyanka; Banerjee, Arup Kumar; Tripathi, Prem Prakash; Srivastava, Amit Kumar; Ray, Upasana title: A virus that has gone viral: Amino acid mutation in S protein of Indian isolate of Coronavirus COVID-19 might impact receptor binding and thus infectivity date: 2020-04-11 journal: bioRxiv DOI: 10.1101/2020.04.07.029132 sha: 13ccf8271af099cb86ef23b4441e0ef9f77bb4ec doc_id: 692321 cord_uid: fytc8op9 Since 2002, beta coronaviruses (CoV) have caused three zoonotic outbreaks, SARS-CoV in 2002, MERS-CoV in 2012, and the recent outbreak of SARS-CoV-2 late in 2019 (also named as COVID-19 or novel coronavirus 2019 or nCoV2019. Spike(S) protein, one of the structural proteins of this virus plays key role in receptor (ACE2) binding and thus virus entry. Thus, this protein has attracted scientists for detailed study and therapeutic targeting. As the 2019 novel coronavirus takes its course throughout the world, more and more sequence analyses are been done and genome sequences getting deposited in various databases. From India two clinical isolates have been sequenced and the full genome deposited in GenBank. We have performed sequence analyses of the spike protein of the Indian isolates and compared with that of the Wuhan, China (where the outbreak was first reported). While all the sequences of Wuhan isolates are identical, we found point mutations in the Indian isolates. Out of the two isolates one was found to harbour a mutation in its Receptor binding domain (RBD) at position 407. At this site arginine (a positively charged amino acid) was replaced by isoleucine (a hydrophobic amino acid that is also a C-beta branched amino acid). This mutation has been seen to change the secondary structure of the protein at that region and this can potentially alter receptor ding of the virus. Although this finding needs further validation and more sequencing, the information might be useful in rational drug designing and vaccine engineering. A virus gone viral. First case of COVID-19 was reported in December 2019 at Wuhan (China) and then it has spread worldwide becoming a pandemic, with maximum death cases in Italy, although initially, the maximum mortality was reported from China [1] . According to a WHO report, as of 02.04.2020 there were confirmed 8, 23,626 COVID-19 cases and 40598 deaths, that includes cases which were both locally transmitted or imported [2] . There are published reports which suggest that SARS-CoV2 shares highest similarity with bat SARS-CoV [3] . Scientists across the globe are trying to elucidate the genome characteristics using phylogenetic, structural and mutational studies [4] . Spike protein, one of the key proteins of SARS-CoV2 is involved directly with virus infection as it is involved in receptor recognition, attachment, binding and entry [5] [6] [7] . Sequence analyses of the spike protein can give us a plethora of information which can be instrumental in drug and vaccine development. In the present piece of work, we retrieved S protein sequences of the SARS-CoV2 from different geographical locations to identify notable features of S protein especially in Indian isolates. These analyses include identification of mutational signatures and their correlation with virus infection. Our analyses show unique point mutations in the spike protein of the Indian subtypes. Since COVID 19 or SARS-CoV-2 started from Wuhan, China, we started our analyses with Spike protein sequences from Wuhan. For our study we have considered all the full-length sequences that were available in GenBank. We first compared 17 available S protein sequences from Wuhan. Since they showed 100% sequence similarities, we considered one of these for our further analyses. Since Italy has also been affected aggressively by COVID-19, we included the sequence in our study. In this paper we have focussed on COVID-19 isolates from India. Till date only two complete genomes of Indian COVID-19 have been submitted in the database. For our sequence alignments we have used NCBI BLAST, CLUSTAL W and CLUSTAL OMEGA. To predict secondary structure, we have used CFSSP (Chou and Fasman secondary structure prediction) server. Mutprep server was used to analyse the mutation. JMol and ConSurf tools were used to predict the structure of the proteins. PyMoL standalone software was used to visualize the structure and understand the pattern of bonding. Further kinetics and structure analyses were performed by the Dynemut Server and Chimera version 11. SARS-CoV-2 sequence data is expanding rapidly in the databases as the virus spreads worldwide. Although many sequences from various countries have been deposited, limited full genome sequences are available from most of the countries. This virus has infected people in various countries like China, Italy, Spain, USA, Germany, France, United Kingdom, India and many more and the data gets updated almost regularly by the World Health Organization (WHO). As of now, compared to many countries, the rate of transmission is comparatively controlled in India. Although To compare the Indian isolates, we aligned the S protein sequences of these isolates with Wuhan isolates and a sequence from Italy. While Wuhan and Italian isolates matched completely, we found few mutations in case of Indian isolates 29 and 166 as shown in Figure 2 . Tertiary structure analyses showed that there because of the mutation there is an introduction of additional oxygen molecule to the next residue. The protein stability score drops sharply (-4.08) and the thereby its electrostatic force. Such a condition makes the protein flexible and might affect interaction with the receptor. Alteration to the structure will cause shift in the hydrogen bonds and also the bond angle, two main pre-requisite for strong interaction with the receptor. The hydrogen potential tends to increase from 10 to 13.2 in case of mutation. For the isolate 166 of India, we found a different mutation at position 930 of the spike protein ( Figure 2C) . Here there was a substitution of A (alanine) to V (valine) at position 930 (A930V). Since both the amino acids are hydrophobic in nature, any change that might occur due to this mutation might get masked upon tertiary structure formation A B and thus not imposing a functional change in the protein i.e. a conservative mutation. Despite this possibility, valine has some unique characteristics. Valine is one of the Cbeta branched amino acids like threonine and isoleucine. C beta branched amino acids are bulkier towards the main chain and it is difficult for them to attain alpha helical conformations. Such amino acids have restricted conformations, are destabilizing in nature causing distortion in local helix backbone [9] . S protein if SARS-CoV2 has two domains: S1 and S2 [8] . While S1 has the RBD and in involved in receptor binding, Authors declare no conflicts of interest. The origin, transmission and clinical therapies on coronavirus disease 2019 (COVID-19) outbreak-an update on the status Fractal kinetics of Covid-19 pandemics (with update 3/1/20), MedRxiv preprint Repurposing therapeutics for COVID-19: Supercomputerbased docking to the SARS-CoV-2 viral spike protein and viral spike protein-human ACE2 interface Preliminary identification of potential vaccine targets for the COVID-19 coronavirus (SARS-CoV-2) based on SARS-CoV immunological studies Strong evolutionary convergence of receptor-binding protein spike between COVID-19 and SARS-related coronaviruses, bioRxiv Role of changes in SARS-CoV-2 spike protein in the interaction with the human ACE2 receptor: An in silico analysis Composition and divergence of coronavirus spike proteins and host ACE2 receptors A B predict potential intermediate hosts of SARS-CoV-2 Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: implication for development of RBD protein as a viral attachment inhibitor and vaccine Stabilizing and Destabilizing Effects of Placing Beta-Branched Amino Acids in Protein Alpha-Helices We would like to thank Dr. Anupam Das Talukdar, Department of Life Sciences and Bioinformatics, Assam University for sharing server. We also thank Department of Biotechnology, DBT for funding to PS. CSIR is also acknowledged.