key: cord-0755023-jwl4ls80 authors: Davis, James J.; Long, S. Wesley; Christensen, Paul A.; Olsen, Randall J.; Olson, Robert; Shukla, Maulik; Subedi, Sishir; Stevens, Rick; Musser, James M. title: Analysis of the ARTIC Version 3 and Version 4 SARS-CoV-2 Primers and Their Impact on the Detection of the G142D Amino Acid Substitution in the Spike Protein date: 2021-12-08 journal: Microbiology spectrum DOI: 10.1128/spectrum.01803-21 sha: 765eeb1df88550ccbaf8facbdb779e3f5e27582f doc_id: 755023 cord_uid: jwl4ls80 The ARTIC Network provides a common resource of PCR primer sequences and recommendations for amplifying SARS-CoV-2 genomes. The initial tiling strategy was developed with the reference genome Wuhan-01, and subsequent iterations have addressed areas of low amplification and sequence drop out. Recently, a new version (V4) was released, based on new variant genome sequences, in response to the realization that some V3 primers were located in regions with key mutations. Herein, we compare the performance of the ARTIC V3 and V4 primer sets with a matched set of 663 SARS-CoV-2 clinical samples sequenced with an Illumina NovaSeq 6000 instrument. We observe general improvements in sequencing depth and quality, and improved resolution of the SNP causing the D950N variation in the spike protein. Importantly, we also find nearly universal presence of spike protein substitution G142D in Delta-lineage samples. Due to the prior release and widespread use of the ARTIC V3 primers during the initial surge of the Delta variant, it is likely that the G142D amino acid substitution is substantially underrepresented among early Delta variant genomes deposited in public repositories. In addition to the improved performance of the ARTIC V4 primer set, this study also illustrates the importance of the primer scheme in downstream analyses. IMPORTANCE ARTIC Network primers are commonly used by laboratories worldwide to amplify and sequence SARS-CoV-2 present in clinical samples. As new variants have evolved and spread, it was found that the V3 primer set poorly amplified several key mutations. In this report, we compare the results of sequencing a matched set of samples with the V3 and V4 primer sets. We find that adoption of the ARTIC V4 primer set is critical for accurate sequencing of the SARS-CoV-2 spike region. The absence of metadata describing the primer scheme used will negatively impact the downstream use of publicly available SARS-Cov-2 sequencing reads and assembled genomes. primers (ARTIC V1) designed to completely sequence the SARS-CoV-2 genome with overlapping 400-bp amplicons. Shortcomings identified in the V1 protocol, primarily due to regions of amplicon drop out, led to two more iterations of primer design, which resulted in ARTIC V3 being used by many laboratories in 2020 and into 2021 (1) . As the COVID-19 pandemic continued, new variants emerged with unique mutations and enhanced transmissibility (2) (3) (4) . Some of these mutations occurred in primer binding sites in genes that encode key proteins such as spike protein, resulting in amplicon dropout and poor sequence coverage in critical regions. ARTIC V4 was a new set of tiling primers posted on June 18, 2021 (https://github.com/artic-network/artic-ncov2019/tree/master/primer_schemes/nCoV-2019), designed using multiple variant sequences as input to address these issues. In particular, there were spike protein amino acid changes common to the Beta, Delta, and Gamma variants that occurred in known V3 primer binding sites, including G142D (Delta) in the 2_Right primer, the 241/243del (Beta) that occurs in the 74_Left primer, and the K417N (Beta) or K417T (Gamma) which occur in the 76_Left primer (https://community.artic.network/t/sars -cov-2-version-4-scheme-release/312). From the beginning of the pandemic, we strived to sequence all patient samples with SARS-CoV-2 in the Houston Methodist Hospital system, a large 2,500-bed health care system in Houston, TX, USA. In the summer of 2021, we experienced a massive surge of patients with COVID-19 that corresponded with an increase in Delta variant cases (4). Although we have been using the ARTIC primer sets throughout the pandemic, we elected to validate the ARTIC V4 primer set prior to adopting the new protocol. We chose a random set of 663 SARS-CoV-2 clinical samples isolated between July 2-18, 2021, and each of the 663 samples was amplified using the V3 and V4 primers. SARS-CoV-2 nucleic acid present in the samples was amplified by methods described previously (5, 6) . Samples were sequenced with an Illumina NovaSeq 6000 instrument. Paired-end reads for both the V3 and V4 amplified samples were assembled with the assembly service of the National Institute of Allergy and Infectious Diseases (NIAID)funded Bacterial and Viral Bioinformatics Resource Center (BV-BRC) (https://www.bv -brc.org), which follows the One-Codex workflow (https://github.com/onecodex/sarscov-2). The workflow uses seqtk version 1.3-r116 for quality trimming (https://github .com/lh3/seqtk.git); minimap version 2.143 for aligning reads against Wuhan-Hu-1 (NC_045512.2) (7); samtools version 1.11 for sequence and file manipulation (8); and iVar version 1.2.2 (9) for primer trimming and SNP calling. Default parameters were used in all cases except that the maximum read depth in mpileup was limited to 8,000, and the minimum read depth for a variant call in iVar was set to 3. Lineages were assigned with Pangolin version 3.1.11 using pangoLearn module 2021-08-24 (https:// cov-lineages.org/resources/pangolin.html) (10) . All sequencing reads are available at SRA under bioproject, PRJNA767338. Overall, we observed considerable improvement in sequence quality of the V4 assemblies relative to V3. The median read depths tended to be higher at each nucleotide position (Fig. 1A) and at each primer position ( Fig. 1B and C) . Notably, the V3 region of low coverage spanning approximately nucleotide positions 22,320-22,530 (corresponding to V3 primer pair 74) located in the spike gene, is corrected in V4. Consistent with previous analysis of ARTIC V4 (https://community.artic.network/t/sars -cov-2-version-4-scheme-release/312), we observe an area of slightly lower coverage in the V4 sequences at approximate nucleotide positions 26,950-27,180 (corresponding to V4 primer 90), but this was less problematic because we observed fewer assemblies with runs of ambiguous base calls in the V4 set. Among the 663 samples, 53 had different pangolin calls in the V3 versus V4 assemblies ( Table 1) . None of these 53 sample pairs had identical spike protein sequences. The most common nucleotide difference occurs at nucleotide position 21,987, which is the G to A transition that causes the G142D amino acid spike variant. The second most common SNP occurs in position 24,410, which causes the D950N amino acid variant. Three hundred sixty-eight of the V3 assemblies had an ambiguous base at this position compared with only 3 of the V4 assemblies. Except for the ends of the assembled sequences which can be jagged, and therefore ambiguous, 3 samples Public repositories such as GISAID (11) and the INSDC resources (12) host SARS-CoV-2 genome sequences collected globally. These databases have been indispensable for epidemiological analyses, early identification of variants of concern, and downstream translational research activities such as vaccine formulation. From June 2021 through August 2021, the rapid increase in the G142D amino acid substitution present in Delta variants in public repositories appeared to indicate a rapid evolutionary sweep (Fig. 1D) , bearing resemblance to previous evolutionary sweeps, including the D614G substitution in 2020 (13), B.1.1.7 (Alpha) last fall and winter (2, 5) , and Delta this spring and summer (3, 4) (GISAID acknowledgment table can be found at doi: https://doi.org/ 10.1101/2021.09.27.461949). However, our data lead us to conclude that the sharp uptick in spike protein G142D was caused by community adoption of the V4 primers. Indeed, when we examine 12,441 samples from Houston Methodist patients collected since April of 2021, comparing the occurrence of G142D with L452R (another hallmark Delta substitution in spike), it becomes clear that the G142D uptick is an artifact that corresponds precisely with our adoption of the V4 primers in mid-July 2021 (Fig. 1E) . Indeed, only 2 Delta variant genomes collected after July 1, 2021 had the ancestral glycine at position 142 (4). Conclusion. The results of this study are consistent with those published by the ARTIC network (https://community.artic.network/t/sars-cov-2-version-4-scheme-release/ 312). We observe substantially improved sequence quality, including higher median read depths and fewer regions of ambiguous base calls in the V4 assemblies compared with V3. We also observe that the ancestral glycine at spike position 142 is extremely rare in Delta variants collected in Houston. This study indicates that the primer scheme used for amplifying and sequencing SARS-CoV-2 genomes is an important consideration for interpreting epidemiological data and identifying variants of concern. Improvements to the ARTIC multiplex PCR method for SARS-CoV-2 genome sequencing using nanopore CMMID COVID-19 Working Group. 2021. Estimated transmissibility and impact of SARS-CoV-2 lineage B. 1.1. 7 in England SARS-CoV-2 B.1.617.2 Delta variant emergence and vaccine breakthrough Delta variants of SARS-CoV-2 cause significantly increased vaccine breakthrough COVID-19 cases in Sequence analysis of 20,453 severe acute respiratory syndrome coronavirus 2 genomes from the Houston metropolitan area identifies the emergence and widespread distribution of multiple isolates of all major variants of concern Trajectory of growth of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variants in Minimap2: pairwise alignment for nucleotide sequences The sequence alignment/map format and SAMtools An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology GISAID: global initiative on sharing all influenza data: from vision to reality The international nucleotide sequence database collaboration Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus We thank Liliana Brown, Elodie Ghedin, Wiriya Rutvisuttinunt, and many other colleagues for persistently encouraging this project. We thank Emily Dietrich and Heather McConnell for help with manuscript and figure preparation. We thank our dedicated SARS-CoV-2 genomic sequencing team, including Akanksha Batajoo, Jessica Cambric, Ryan Gadd, Regan Mangham, Matthew Ojeda Saavedra, Sindy Pena, Layne Pruitt, Kristina Reppond, Madison N. Shyer, Rashi M. Thakur, Trina Trinh, and Prasanti Yerramilli for their tireless efforts throughout the pandemic in generating the sequencing data. This work was supported by the Houston Methodist Academic Institute Infectious Diseases Fund and many generous Houston philanthropists. JJD, RO, and MS were funded in whole or in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. 75N93019C00076 to principal investigator Rick Stevens.We declare no conflicts of interest.