key: cord-0913299-mmi56dye authors: Salles, Tiago Souza; Cavalcanti, Andrea Cony; da Costa, Fábio Burack; Dias, Vanessa Zaquieu; de Souza, Leandro Magalhães; de Meneses, Marcelo Damião Ferreira; da Silva, José Antônio Suzano; Amaral, Cinthya Domingues; Felix, Jhonatan Ramos; Pereira, Duleide Alves; Boatto, Stefanella; Guimarães, Maria Angélica Arpon Marandino; Ferreira, Davis Fernandes; Azevedo, Renata Campos title: Genomic surveillance of SARS-CoV-2 Spike gene by sanger sequencing date: 2022-01-20 journal: PLoS One DOI: 10.1371/journal.pone.0262170 sha: 2af5b08b0d7ac7b4a13731be940b842feff3b6e4 doc_id: 913299 cord_uid: mmi56dye The SARS-CoV-2 responsible for the ongoing COVID pandemic reveals particular evolutionary dynamics and an extensive polymorphism, mainly in Spike gene. Monitoring the S gene mutations is crucial for successful controlling measures and detecting variants that can evade vaccine immunity. Even after the costs reduction resulting from the pandemic, the new generation sequencing methodologies remain unavailable to a large number of scientific groups. Therefore, to support the urgent surveillance of SARS-CoV-2 S gene, this work describes a new feasible protocol for complete nucleotide sequencing of the S gene using the Sanger technique. Such a methodology could be easily adopted by any laboratory with experience in sequencing, adding to effective surveillance of SARS-CoV-2 spreading and evolution. The SARS-CoV-2 responsible for atypical pneumonia, evidenced in China by the end of 2019, was classified into the severe acute respiratory syndrome-related coronaviruses, member of Betacoronavirus genus, Coronaviridae family, been denominated Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Coronaviruses are enveloped positive single-strand RNA viruses, with 30,000 bases in length, being the largest RNA genome identified up to date [1] . The SARS-CoV-2 genome has several ORFs; the first ORF1a/b stands at the RNA 5' end and translates the non-structural proteins (nsP1 -nsP16). The RNA 3' end holds the genes of the four structural (E, M, N e S) and accessories proteins. In the mature virus particle, protein S, a homo-trimeric type I fusion glycoprotein, is located on the surface of the virus particle and is responsible for binding to the PLOS ONE PLOS ONE | https://doi.org/10.1371/journal.pone.0262170 January 20, 2022 1 / 9 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 cell receptor. In humans, the angiotensin-converting molecule (ACE-2) was assigned as the primary receptor for SARS-CoV2. Several research groups have solved the complete structure of the SARS-CoV-2 S protein attached or not to the receptor ACE-2 [2] . This protein has approximately 1,273 amino acids, and its domains are delimited. Due to the relevance for virus attachment and entrance at susceptible cells, mutations in the receptor-binding domain (RDB) receive greater attention. In addition, mutations at other domains, like the amino (N) -terminal domain (NTD), can also lead to conformational changes in S protein structure and impact their function [3] . SARS-CoV-2 has particular evolutionary dynamics, and an extensive polymorphism is observed. However, the frequency of mutation across the SARS-CoV-2 genome is not uniform. Polymorphism (SNP) is mainly observed in protein S, RNA polymerase, RNA primase, and nucleoprotein [4] . According to the World Health Organization (WHO), isolates those present changes in amino acids that lead to suspected or confirmed cases with a phenotypic impact are considered variants of interest (VOI). Furthermore, these variants are classified as a concern (VOC) when they are associated with increased transmissibility, virulence, changes in the clinical presentation of COVID-19, and reduced containment measures, such as escaping diagnostic tools decreasing the effectiveness of vaccines and therapies [5] . Since the S protein is the primary target of neutralizing antibodies, monitoring insertions, deletions, or substitutions of amino acids can reveal variants with the potential to evade vaccine immunity. In this context, genomic information is quickly shared through initiatives like the GISAID platform, and variants are counted and georeferenced [6] . Up to Jun 2021, five variants were classified as a concern (VOC), named; B.1.1.7 (Alpha), B.1.351(Beta), P.1 (Gamma), B.1.617+ (Delta), first detected in the United Kingdom, South Africa, Brazil, and India, respectively (Fig 1) . Early identification of the variants of concern (VOC) could provide excellent auxiliary information to decision making, allowing an earlier action towards measures to refrain the spreading of the virus such as reinforcement of mobility restriction or relaxation of such measures in areas where the variants are no present. Despite the reduction in the costs of new generation sequencing (NGS), the implementation of this system still requires a significant financial contribution, and the price per sample remains high for developing countries. The discrepancy in the number of sequences deposited in databases between countries reflects the difficulties of sequencing, as also shown in Table 1 . Unlike NGS methodologies, nucleotide sequencing based on the Sanger technique is widespread worldwide. In addition, the costs for sequencing small fragments are affordable. Therefore, to support the urgent surveillance of changes in SARS-CoV-2 S gene, this work describes a feasible protocol for complete nucleotide sequencing of the S gene using the Sanger technique. Thus, any laboratory with experience in sequencing can adopt this protocol. This work was previously approved by the Ethics Committee of Clementino Fraga Filho University Hospital (HUCFF/UFRJ) (number: 4.546.307). To evaluate this study, three samples from patients of confirmed COVID-19 presenting high viral load (Ct value < 20) were randomly selected. Patients were admitted to different hospitals in Rio de Janeiro, and a nasopharyngeal swab was collected to confirm clinical diagnosis by Rio de Janeiro Public Health Reference Laboratory-LACEN-RJ. Human samples were used after the conclusion of the diagnostic investigation. All patients' personal information was anonymized, only the municipalities of residence were disclosed. Therefore, the ethics committee waived the requirement for informed consent from patients. According to the manufacturer's instructions, the commercial kit MagMax Viral Pathogen (Thermo fisher, EUA) was used in the automated equipment King Fisher Apex (Thermo Fisher, EUA) to obtain the viral RNA from 200uL of respiratory secretion samples collected in nasopharyngeal swabs. The suspected samples of COVID-19 were tested in the diagnostic routine of the Noel Nutels Central Public Health Laboratory (LACEN-RJ) using the SARS-CoV-2 Duplex Kit (E/RP), Biomaguinhos (Fiocruz, Brasil). The reactions were performed using the QuantStudio 5 (Applied Biosystems, Thermo Fisher, EUA). The samples with ct values below 20 were selected for sequencing. Six sets of primers targeting the S segment and two sets flanking it were designed based on the sequences deposited in GISAID until September 2020. 29 samples from 13 regions were aligned, and conserved regions were chosen using Clustal W program. An overlap of 100 nucleotides was programmed (Fig 4) . Table 2 presents the sequence of primers used. Standard RT-PCR was performed using Superscript III one-step RT-PCR kit (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions, with 0.7 μM primers and temperature conditions according to Table 3 . The amplification of the fragments was visualized by 1,5% Agarose Gel Electrophoresis. The samples were quantified with the nanodrop one (Thermo Scientific™ NanoDrop™ One Microvolume UV-Vis Spectrophotometers) for sequencing. The nucleotide sequences were determined from 200 ng of the amplicon, using the Big Dye Terminator kit 3.1 (Applied Biosystems), following the manufacturer's procedure. Amplicons were sequenced in the ABI 3730 genetic analyzer (Applied Biosystems, USA) following the manufacturer's protocol. Raw sequence data were aligned, edited, assembled using the BioEdit Sequence Alignment Editor, Version 7.0.5.3. The protocol described in this peer-reviewed article is published on protocols.io, https://dx. doi.org/10.17504/protocols.io.bx6kprcw and is included for printing as S1 File with this article. This methodology covered 100% of the S gene sequenced (3,822 pb). The sequences obtained were deposited at GISAID numbers EPI_ISL_4496739, EPI_ISL_4497141, EPI_ISL_4497286. All the eight primers set produced single amplicons for the three samples used to evaluate this protocol ( Fig 4A) ; therefore, sequencing reaction could be performed without extracting the bands from agarose gel. In addition, no mismatch in the primer regions that could lead to the escape of known VOCs was observed (S2 File). The samples sequenced in this study originated from Rio de Janeiro City, Santo Antônio de Pádua and Seropédica, in Rio de Janeiro state. The obtained sequences were aligned with reference sequences of each VOC, in order to detect and compare mutations. Spike protein from Rio de Janeiro city sample displayed the same amino acid changes found in reference sequence of Gamma variant, suggesting that this sample is probably classified into P.1 lineage (Table 4 ). According to the literature, the P.1 lineage (gamma) emerged in Manaus, Amazonas, evolved from a B.1.1.28 clade in late November 2020 and replaced its parental lineage in less than two months [7] . We found a strain displaying similar spike protein with that of P.1 lineage circulating in Rio de Janeiro as early as February 2021. The samples from Santo Antônio de Pádua and Seropédica didn't show similar mutation patterns with gamma VOC (Table 4) , however, they presented some mutations of importance, like E484K and D614G (Fig 5) . The change from glutamic acid to a lysine in the 484 th amino acid position of the Spike protein (E484K) already occurred 228,871 times (4.27% of all samples with spike sequence) in 166 countries, according to GISAID Spike Glycoprotein Mutation Surveillance. This mutation has been reported in the literature to be related to enhanced host receptor binding [8] and antigenic drift [9] either alone or in association with other mutations [10] . The mutation D614G is widely spread and has already occurred 5,285,437 times (98.51% of all samples with Spike sequence) in 204 countries. It was reported to be related to the increase in infectivity of SARS-CoV-2, higher viral loads, increased replication fitness, and virulence [11, 12] . Apart from the mutations of high importance, the sequence from Seropédica also presented some rare mutations. The amino acid substitutions D775V, T866P and M869K are present in less than three sequences in GISAID database. The effects of these mutations are still unknown. Due to its essential role in establishing infection, as well as inducing immune response, the genomic surveillance of the S protein of SARS-CoV-2 is of paramount importance. Monitoring the emergence of new variants, and the interactions between their mutations, allow the scientific community to develop better strategies to control the pandemic. The count of genomic sequences obtained in each country reveals a vast disproportion that becomes evident in surveillance platforms like GISAID. One of the reasons for this disparity is the limited access to NGS methodologies by most groups. Therefore, this work describes a protocol for complete nucleotide sequencing of the S gene using the Sanger technique, which could be helpful to keep tracking SARS-CoV-2 protein S evolution. A Novel Coronavirus from Patients with Pneumonia in China Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation Evolutionary Dynamics and Dissemination Pattern of the SARS-CoV-2 Lineage B.1.1.33 During the Early Pandemic Phase in Brazil Genotyping coronavirus SARS-CoV-2: methods and implications World Health Organization: COVID-19 weekly epidemiological update disease and diplomacy: GISAID's innovative contribution to global health. Global challenges COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence Deep Mutational Scanning of SARS-CoV-2 Receptor Binding Domain Reveals Constraints on Folding and ACE2 Binding Increased Resistance of SARS-CoV-2 Variants B.1.351 and B.1.1.7 to Antibody Neutralization SARS-CoV-2 501Y.V2 escapes neutralization by South African COVID-19 donor plasma. bioRxiv: the preprint server for biology Tracking Changes in SARS-CoV-2 Spike: Evidence that D614G Increases Infectivity of the COVID-19 Virus SARS-CoV-2 D614G variant exhibits efficient replication ex vivo and transmission in vivo We want to thank all health professionals, especially LACEN-RJ staff, for their collaboration during the implementation of this protocol and for all efforts in facing the COVID-19 pandemic.