key: cord-0801034-sp0qbjlo
authors: Aguilar-Gamboa, Franklin Rómulo; Salcedo-Mejía, Luis Alberto; Serquén-López, Luis Miguel; Mechan-Llontop, Marco Enrique; Tullume-Vergara, Percy Omar; Bonifacio-Briceño, Juan José; Salas-Asencios, Ramsés; Silva Díaz, Heber; Cárdenas, Juan P.
title: Genomic Sequences and Analysis of Five SARS-CoV-2 Variants Obtained from Patients in Lambayeque, Peru
date: 2021-01-07
journal: Microbiol Resour Announc
DOI: 10.1128/mra.01267-20
sha: b7792182d003798f7d1bd85aaf7319c15b0eba30
doc_id: 801034
cord_uid: sp0qbjlo

Here, we report the genomic sequences of five severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains obtained from nasopharyngeal samples from five tested coronavirus disease 2019 (COVID-19)-infected patients from the Lambayeque region in Peru during early April 2020.

H ere, we present the genomes of five severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (family Coronaviridae; genus Betacoronavirus; subgenus Sarbecovirus) variants sequenced from five patients from whom samples were taken in different districts in the Lambayeque region of Peru (La Victoria, Monsefú, Chiclayo, José Leonardo Ortiz, and Lambayeque), during early April 2020.

Ethical approval for sample recollection and analysis protocols was given by the ethics committee of the Regional Lambayeque Hospital (code 0212-028-20CEI). This study has been registered in Proyectos de Investigación en Salud, del Instituto de Salud del Perú (PRISA; dependent on the National Health Ministry, Peru), with code 37EBF149-F123-447F-8970-680BF549DE1D.

Nasopharyngeal swabs from SARS-CoV-2-positive patients (cycle threshold [C T ] values obtained by qPCR, ,25) were sampled, and total RNA was isolated using the GenUP total RNA kit (BiotechRabbit, Germany). rRNA depletion was carried out using the Ribozero rRNA removal kit (Illumina). The cDNA libraries were generated using the TruSeq stranded total RNA LT kit (Illumina), using a single index and a fragment size range of 388 to 523 bp. Sequencing for every sample was performed in an Illumina NextSeq 500 model (midoutput 300 cycles), generating paired-end reads (150 bases long). Illumina sequencing was performed at Genoma Mayor (Universidad Mayor, Chile).

Raw paired-end reads were processed by using Trimmomatic (1) version 0.39, generating trimmed 129-bp reads. Viral genomes were assembled using a reference-guided approach, using the Wuhan Hu-1 strain genome (accession number MN908947) as a reference. Read mapping was done using SAMtools version 1.10 (2); mapped reads were analyzed by using Trinity (3) version 2.1.1 in the mode "--genome_guided_bam," obtaining near-full-length genomes (shortest assembly length, 29,854 bp). Genomes were annotated with the RATT tool (4), using the annotation of MN908947 as a reference.

Sequence variants were checked by two approaches for each sample: the first one was to detect the variants from the consensus assemblies, by the use of the global alignment tool needle from the EMBOSS package (5), comparing each assembly against the aforementioned reference. The second strategy was made from read mapping and single-nucleotide polymorphism (SNP) variant calling in comparison with the reference by the use of the tool bcftools call (6) version 1.7, using the parameters "--ploidy 1 -c." Both strategies gave the same results. Their incomplete 59 and 39 ends (trimmed due to very low sequence coverage) did not affect the completeness of any coding sequences (CDS). Therefore, all assemblies were technically complete according to GenBank standards.

In order to classify these Peruvian variants, genomes were analyzed by using different tools. According to PANGOLIN (Phylogenetic Assignment of Named Global Outbreak LINeages) (7), all five variants were classified in the B.1.1.1 lineage. According to the NextClade tool from Nextstrain (8), all variants were classified in the 20B clade. Those PANGOLIN and Nextstrain clusters were previously reported as containing representative groups of sequenced Peruvian SARS-CoV-2 strains (9). According to the aforementioned sequence variant process, there are between 12 and 14 mutations for each strain (Table 1 ), including the previously described mutation P323L in Nsp12 (P4715L from ORF1b) and D614G in the spike protein, and such variants are becoming increasingly abundant worldwide (10) . 

a QC, quality checked; UTR, untranslated region; NA, not applicable.

Trimmomatic: a flexible trimmer for Illumina sequence data

Genome Project Data Processing Subgroup. 2009. The Sequence Alignment/Map format and SAMtools

De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis

RATT: rapid annotation transfer tool

EMBOSS: the European Molecular Biology Open software suite

BCFtools/csq: haplotype-aware variant consequences

A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology

Nextstrain: real-time tracking of pathogen evolution

Phylogenomic reveals multiple introductions and early spread of SARS-CoV-2 into Peru

Epidemiologically most successful SARS-CoV-2 variant: concurrent mutations in RNA-dependent RNA polymerase and spike protein

We dedicate this work to the memory of Juan José Bonifacio-Briceño, colleague and friend, to honor his friendship and legacy as a young scientist. J.P.C. holds the role of external advisor for Umbrella Genomics Company S.A.C. (Lima, Peru). From this company, J.P.C. received financial support in the genome sequencing costs for this project.