key: cord-0852920-kpnb9ldy authors: Kumar, Ballamoole Krishna; Rohit, Anusha; Prithvisagar, Kattapuni Suresh; Rai, Praveen; Karunasagar, Indrani; Karunasagar, Iddya title: Deletion in the C-terminal region of the envelope glycoprotein in some of the Indian SARS-CoV-2 genome date: 2020-11-06 journal: Virus Res DOI: 10.1016/j.virusres.2020.198222 sha: 4e0e7ec8d6e91d824f7e8c2137485c0493b2dd48 doc_id: 852920 cord_uid: kpnb9ldy The envelope glycoprotein (E) is the smallest structural component of SARS-CoVs; plays an essential role in the viral replication starting from envelope formation to assembly. The in silico analysis of 2086 whole genome sequences from India performed in this study provides the first observation on the extensive deletion of amino acid residues in the C-terminal region of the envelope glycoprotein in 34 Indian SARS-CoV-2 genomes. These amino acid deletions map to the homopentameric interface and PDZ binding motif (PBM) present in the C-terminal region of E protein as well as immediately after the reverse primer binding region as per Charité protocol in 26 of these genomes, hence, their detection through RT-qPCR may not be hampered and therefore E gene-based RT-qPCR would still detect these isolates. Eight genomes from the State of Odisha had deletion even in the primer binding site. It is possible that the deletions in the C-terminal region of E protein of these genomes are a result of adapting to a newer geographical area and host. The information on the clinical status was available only for 9 out of 34 cases and these were asymptomatic.. However, further studies are indispensable to understand the functional consequences of amino acid deletion in the C terminal region of SARS-CoV-2 envelope protein in the viral pathogenesis and host adaptation. The coronavirus disease 2019 (COVID-19) caused by SARS-CoV-2 become a pandemic affecting over 200 countries, causing more than 33.8 million cases and associated with over 1.01 million deaths worldwide. The disease spreads mostly through droplets released when an infected person coughs or speaks or sneezes, and the disease spreads rapidly among contacts. SARS-CoV2 is an enveloped, single-stranded positive-sense RNA virus belonging to the genus β-coronavirus Malik et al., 2020) . Similar to SARS-CoV and MERS-CoV, the 29.9 kb genome of SARS-CoV2 encodes four major structural proteins -spike (S), envelope (E), membrane (M), as well as nucleocapsid (N) protein, 16 non-structural proteins (nsp1- 16) , and five to eight accessory proteins . Based on the currently available pieces of evidence, it's noteworthy to mention that the severity of COVID-19 differs significantly within populations and geographical locations. Episodes of asymptomatic infection were also reported in more than 80% of the tested positive COVID-19 cases in India (Acharya and Porwal., 2020). Genomic data on Indian isolates of SARS-CoV-2 is coming from different laboratories, and interim analysis shows the introduction of this virus to India from multiple sources such as China, Europe, USA, Canada and the Middle East (Potdar et al., 2020; Singh and Sharma 2020; Somasundaram et al., 2020) . Analysis of 361 genomes of SARS-CoV-2 from India revealed that there were 5 clusters, 4 of these being known clades identified by Nextstrain: A2a, A3, B, and B4. 62% of the genomes belonged to A2a clade, but 29% belonged to a distinct cluster designated clade I/A3i, not reported outside India (Banu et al., 2020) . It has been suggested that the evolution of Clade I/A3i is primarily determined by changes in the structural proteins, N and Table 1 ). The detailed information on the sequences used and their metadata is given in Supplementary Table S1 . As the success of PCR based molecular diagnostics mainly depends on the efficient primers and/or probes to specifically amplify the target gene, any genetic variations, especially in in 26 genomes (except the ones from the State of Odisha discussed above), hence, their detection through RT-qPCR may not be hampered and therefore E gene based RT-qPCR would still detect these isolates. Among them, 9 individuals were asymptomatic (Table 1) Table S1 , of the 34 sequences with E gene deletions, 15 belonged to 19A, one belonged to 19B, 4 belonged to 20A and 20B each and remaining sequences could not be categorised into any of the clades defined by Nextstrain. Our data suggests that C-terminal deletion in the E-gene of SARS CoV-2 was spread across different lineages and this deletion event would have occurred independently in different leanages and geographical locations. Further, sequential B cell epitopes on the E protein was predicted using BepiPred-2.0 (http://www.cbs.dtu.dk/services/BepiPred/index.php). This showed the presence of highly conserved antigenic determinant regions in the N-terminal region (SEET) and C-terminal region (YVYSRVKNLNSSRVP) of E protein (Fig 1) . It is also important to highlight that; we were unable to map the C terminal antigenic determinant region in those 34 isolates of SARS-CoV-2 E proteins as there was a deletion of 25-55 amino acid residues in its C-terminal region. Based on PROSITE analyses (https://prosite.expasy.org/), it was predicted that SARS-CoV-2 E protein has two N-glycosylation sites at N48 and N66 but lacking in all 34 isolates of SARS-CoV-2 which had an extensive deletion/gaps in C-terminal region amino acid residues. Recent study of The in silico analysis performed in this study provides the first observation on the extensive deletion of amino acid residues in the C-terminal region of the envelope glycoprotein in some of the Indian SARS-CoV2 genomes. It is possible that the deletions in the C-terminal region of E protein of these genomes are a result of adapting to a newer geographical area and host. Sometimes sequencing of samples with low viral titters could lead to assemblies with low coverage and/or spurious gaps in the genome, but this is unlikely in this case since gaps in the same region in 34 isolates from are different geographical location and sequences by different laboratories. It is also possible that these have reduced virulence since nine of the thirty-four individuals from whom isolates were obtained were asymptomatic and clinical information for others were not available. Lack of travel history in the infected individuals suggests that the virus J o u r n a l P r e -p r o o f isolates might be circulating in this region for some time. However, further studies are indispensable to understand the functional consequences of amino acid deletion in the C terminal region of SARS-CoV-2 envelope protein in the viral pathogenesis and host adaptation. Nothing to declare A vulnerability index for the management of and response to the COVID-19 epidemic in India: an ecological study Evaluation of RdRp & ORF-1b-nsp14-based real-time RT-PCR assays for confirmation of SARS-CoV-2 infection: An observational study A distinct phylogenetic cluster of Indian SARS-CoV-2 isolates Identification of a Golgi complextargeting signal in the cytoplasmic tail of the severe acute respiratory syndrome coronavirus envelope protein Pathogenicity of severe acute respiratory coronavirus deletion mutants in hACE-2 transgenic mice The PDZ-binding motif of severe acute respiratory syndrome coronavirus envelope protein is a determinant of viral pathogenesis Biochemical and functional characterization of the membrane association and membrane permeabilizing activity of the severe acute respiratory syndrome coronavirus envelope protein Genomic characterization and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding Emerging novel coronavirus (2019-nCoV)-current scenario, evolutionary perspective based on genome analysis and recent developments Genomic epidemiology reveals multiple introductions and spread of SARS-CoV-2 in the Indian state of Karnataka Genomic analysis of SARS-CoV-2 strains among Indians returning from Italy, Iran & China, & Italian tourists in India Coronavirus envelope protein: current knowledge Severe acute respiratory syndrome-coronavirus 2 and novel coronavirus disease 2019: An extraordinary pandemic Genomics of Indian SARS-CoV-2: Implications in genetic diversity, possible origin and spread of virus. medRxiv A SARS-CoV-2 variant with the 12-bp deletion at E gene. Emerging Microbes & Infections The SARS coronavirus E protein interacts with PALS1 and alters tight junction formation and epithelial morphogenesis Diagnostic detection of 2019-nCoV by real-time RT-PCR accessed on Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China The authors are grateful to Nitte (Deemed to be University) for providing computing infrastructure for the execution of this research.