key: cord-0865218-i9wjpbe7 authors: Cherian, Sarah; Potdar, Varsha; Jadhav, Santosh; Yadav, Pragya; Gupta, Nivedita; Das, Mousmi; Rakshit, Partha; Singh, Sujeet; Abraham, Priya; Panda, Samiran title: Convergent evolution of SARS-CoV-2 spike mutations, L452R, E484Q and P681R, in the second wave of COVID-19 in Maharashtra, India date: 2021-05-03 journal: bioRxiv DOI: 10.1101/2021.04.22.440932 sha: c107688b49bcaf15d65822e540dc63141d8db7cb doc_id: 865218 cord_uid: i9wjpbe7 As the global severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic expands, genomic epidemiology and whole genome sequencing are being constantly used to investigate its transmissions and evolution. In the backdrop of the global emergence of “variants of concern” (VOCs) during December 2020 and an upsurge in a state in the western part of India since January 2021, whole genome sequencing and analysis of spike protein mutations using sequence and structural approaches was undertaken to identify possible new variants and gauge the fitness of current circulating strains. Phylogenetic analysis revealed that the predominant clade in circulation was a distinct newly identified lineage B.1.617 possessing common signature mutations D111D, G142D, L452R, E484Q, D614G and P681R, in the spike protein including within the receptor binding domain (RBD). Of these, the mutations at residue positions 452, 484 and 681 have been reported in other globally circulating lineages. The structural analysis of RBD mutations L452R and E484Q along with P681R in the furin cleavage site, revealed that these may possibly result in increased ACE2 binding and rate of S1-S2 cleavage resulting in better transmissibility. The same two RBD mutations indicated decreased binding to select monoclonal antibodies (mAbs) and may affect their neutralization potential. Experimental validation against a wider panel of mAbs, sera from vaccinees and those that recovered from natural infection needs to be studied. The emergence of such local variants through the accumulation of convergent mutations during the COVID-19 second wave needs to be further investigated for their public health impact in the rest of the country and its possibility of becoming a VOC. The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) surface spike (S) protein mediates entry into host cells by binding to the host receptor angiotensin-converting enzyme 2 (ACE2) via its receptor-binding domain (RBD). Crystal structures of SARS-CoV-2 S protein or its RBD complexed with ACE2 from different hosts reveal that the RBD contains a core and a receptor-binding motif (RBM) which forms contacts with ACE2 1 . A number of naturally selected mutations in the RBM have been shown to affect infectivity, human-to-human transmission, pathogenesis and immune escape 2 . As the global SARS-CoV-2 pandemic expands, genomic epidemiology and whole genome sequencing are being constantly used to investigate the transmissions and evolution. The expansion of the genome-wide diversity resulted in delineation of the viral strains into clades, lineages, and sub-lineages. Clades G/GH/GR/GV/GRY as per the Global Initiative on Sharing All Influenza Data (GISAID) database (https://www.gisaid.org/) 3 India, enhanced sequencing was undertaken. This was to identify possible new variants and analyze spike protein mutants, in particular, to gauge the fitness of current circulating strains, using bioinformatics sequence and molecular structural approaches. The State of Maharashtra (latitude 19°39' N, longitude 75°18'E) is located in west central India, in the north western part of the Indian Subcontinent. Since the end of January 2021, a concentrated spurt in COVID-19 cases was noted in several districts of Maharashtra (Suppl. Fig. 1 ). As per the State Government directives, samples from international travelers and 5% of surveillance samples from the positive cases with Ct value <25 including those from clusters, long haulers, mild/moderate/severe and deceased cases, were referred to the ICMR-NIV, Pune for whole genome sequencing. Nasopharyngeal swabs (n=733, November 25, 2020-March 31, 2021) were processed for whole genome sequencing. Briefly, nucleic acid extraction was performed using 280 μ L of each sample in duplicate by Qiagen viral RNA extraction protocol and quantified RNA was further processed for template preparation using the Ion Chef System. Purified template beads were submitted to meta transcriptome next-generation sequencing (NGS) in the Ion S5 platform (ThermoFisher Scientific) using an Ion 540™ chip and the Ion Total RNA-Seq kit v2.0, as per the manufacturer's protocol (ThermoFisher Scientific) 8 . Sequence data was processed using the Torrent Suite Software (TSS) v5. 10 .1 (ThermoFisher Scientific, USA). Coverage analysis plugins were utilized to generate coverage analysis report for each of the samples. Referencebased reads gathering, and assembly were performed for all the samples using Iterative refinement meta-assembler (IRMA) 9 . A subset of the samples (were also sequenced on the Illumina machine and analyzed using on the CLC Genomics Workbench version 20 (CLC, Qiagen) as described elsewhere 10 . Of 733 whole genomes obtained, 598 were considered for different analyses based on coverage of ≥ 93% of the genome. For each whole genome sequence, the GISAID Clade assignment, was done using CoVsurver: Mutation Analysis of hCoV-19 (https://www.gisaid.org/epiflu-applications/covsurver-mutations-app/). For lineage assignment, the web application ie., Phylogenetic Assignment of Named Global Outbreak LINeages (PangoLIN) COVID-19 Lineage Assigner (https://pangolin.cog-uk.io/), was implemented 7 . The 598 genomes were aligned using MUSCLE and a Maximum Likelihood phylogenetic tree was constructed using IQ-tree v1. 6 11,12 , employing the GTR as the substitution model and 1,000 bootstrap replications. The average frequency of nonsynonymous mutations in the genomes over the period from 1st December, 2020 to 31 st March, 2021 was calculated, considering Wuhan as the reference strain. For further structural characterization of the S protein mutations, the crystal structure of SARS-CoV-2 S glycoprotein complexed with ACE2 was obtained from the protein data bank (PDB ID: 7A98 13 . The top ten mutations in the S protein were mapped using Biovia Discovery studio visualizer 2020. To assess the effect of noted RBD mutations on ACE2 binding, the crystal structure of SARS-CoV-2 spike RBD domain complexed with ACE2 (PDB ID: 6LZG 14 ) was used. For assessment of the noted mutations on binding to neutralizing antibodies, the SARS-CoV-2 spike RBD domain complexed with two selected mAbs REGN10933/ P2B-2F6 were retrieved (PDB ID: 6XDG; resolution 3.90Å and 7BWJ; resolution 2.65 Å respectively) 15, 16 . Point mutations were carried out using Biovia Discovery studio visualizer 2020 and the structures of the complexes were subjected to energy minimization using macro model tool in Schrodinger 2020 using default parameters. The molecular interactions between the RBD-ACE2 interface, within the RBD and between the neutralizing mAbs-RBD were analyzed using nonbonded interactions tool in Biovia Discovery studio visualizer 2020. The genomic surveillance for the spurt in the COVID-19 cases in Maharashtra was carried out to identify the circulating lineages/VOCs and identify possible functionally-significant mutations in the spike protein. Figure 1) . Phylogenetic analysis revealed four subclusters within B.1.617 that could be linked to mutations specific to the spike region (Figure 1 ). Among these, cluster B.1.617:C (n=107), included majority of the strains from eastern part of Maharashtra while B.1.617:B (n=80) also included sequences from major cities like Pune, Thane and Aurangabad, more on the western part of the State. The frequency of mutations L452R and E484Q within the RBD and mutations G142D and P681R within the spike but outside the RBD region, increased from January 2021 (Figure 2) . The P681R is noted in the S1-S2 furincleavage site. A mutation H1101D specific to clusters B.1.617: A and B.1.617:B, was not noted in January 2021 (Figure 1) . The B.1.617:C cluster possessed a T95I-specific mutation. Notably a small cluster of B.1.617 (n=21) emerged in late March and did not possess the E484Q mutation. This clustered with lineage B.1.596 and shared common mutations T19R and D950N in the spike protein. A synonymous mutation D111D was observed to be co-occurring with the RBD mutations L452R and E484Q and it was absent in the cluster that did not possess E484Q ( Figure 1 ). Based on the monthly averages of non-synonymous mutation events when compared to the Wuhan reference strain NC_045512.2, an increased frequency of non-synonymous mutations was noted in February 2021 (Figure 3 ). The key mutations in the S protein are mapped on a furin-cleaved structure of the S protein Figure 5C ). The intermolecular interactions of the wild-type complex indicate that L452 is involved in hydrophobic interactions with residues I103 and V105 of the heavy chain of the antibody. Residue E484 forms H-bonds with N33 and Y34 of the light chain as well as a salt bridge and H-bond interaction with R112 side chain of the light chain. The L452R mutation breaks the hydrophobic interactions with both residues I103 and V105 and also disrupts the H-bond and electrostatic interaction with R112. In addition to the three global VOCs, some of the global variants of interest (VUIs) include Structural basis of receptor recognition by SARS-CoV-2. Nature Receptor Recognition by the Novel Coronavirus from Wuhan: an Analysis Based on Decade-Long Structural Studies of SARS Coronavirus disease and diplomacy: GISAID's innovative contribution to global health. Global Challenges. Wiley Online Library Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus Evaluating the Effects of SARS-CoV-2 Spike Mutation D614G on Transmissibility and Pathogenicity Early transmissibility assessment of the N501Y mutant strains of SARS-CoV-2 in the United Kingdom A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nature Microbiology Phylogenetic classification of the whole-genome sequences of SARS-CoV-2 from India & evolutionary trends Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler Fullgenome sequences of the first two SARS-CoV-2 viruses from India MUSCLE: multiple sequence alignment with high accuracy and high throughput IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies Receptor binding and priming of the spike protein of SARS-CoV-2 for membrane fusion Structural and Functional Basis of SARS-CoV-2 Entry by Using Human ACE2 Studies in humanized mice and convalescent humans yield a SARS-CoV-2 antibody cocktail Human neutralizing antibodies elicited by SARS-CoV-2 infection Comprehensive mapping of mutations in the SARS-CoV-2 receptor-binding domain that affect recognition by polyclonal human plasma antibodies Multiple SARS-CoV-2 variants escape neutralization by vaccine-induced humoral immunity Comprehensive analysis of genomic diversity of SARS-CoV-2 in different geographic regions of India: an endeavour to classify Indian SARS-CoV-2 strains on the basis of co-existing mutations Global analysis of more than 50,000 SARS-CoV-2 genomes reveals epistasis between eight viral genes SARS-CoV-2 immune evasion by variant B.1.427/B.1.429. bioRxiv [Preprint