key: cord-0884356-etjhuhut authors: Liu, Hongjie; Li, Jinhui; Lin, Yanfeng; Bo, Xiaochen; Song, Hongbin; Li, Kuibiao; Li, Peng; Ni, Ming title: Assessment of two‐pool multiplex long‐amplicon nanopore sequencing of SARS‐CoV‐2 date: 2021-09-23 journal: J Med Virol DOI: 10.1002/jmv.27336 sha: c06d538cf2744570b4a9fd771f27a7a864087461 doc_id: 884356 cord_uid: etjhuhut Genomic surveillance of severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) plays an important role in COVID‐19 pandemic control and elimination efforts, especially by elucidating its global transmission network and illustrating its viral evolution. The deployment of multiplex PCR assays that target SARS‐CoV‐2 followed by either massively parallel or nanopore sequencing is a widely‐used strategy to obtain genome sequences from primary samples. However, multiplex PCR‐based sequencing carries an inherent bias of sequencing depth among different amplicons, which may cause uneven coverage. Here we developed a two‐pool, long‐amplicon 36‐plex PCR primer panel with ~1000‐bp amplicon lengths for full‐genome sequencing of SARS‐CoV‐2. We validated the panel by assessing nasopharyngeal swab samples with a <30 quantitative reverse transcription PCR cycle threshold value and found that ≥90% of viral genomes could be covered with high sequencing depths (≥20% mean depth). In comparison, the widely‐used ARTIC panel yielded 79%–88% high‐depth genome regions. We estimated that ~5 Mbp nanopore sequencing data may ensure a >95% viral genome coverage with a ≥10‐fold depth and may generate reliable genomes at consensus sequence levels. Nanopore sequencing yielded false‐positive variations with frequencies of supporting reads <0.8, and the sequencing errors mostly occurred on the 5′ or 3′ ends of reads. Thus, nanopore sequencing could not elucidate intra‐host viral diversity. PCR followed by sequencing is a widely-used strategy to obtain viral genomes. PCR panels for SARS-CoV-2 usually have short amplicons. For example, the ARTIC network (https://artic.network/ncov-2019) proposed a SARS-CoV-2 panel with~400-bp amplicons. For amplicons that are shorter than the read lengths obtained by massively parallel sequencing (MPS) devices such as Illumina MiSeq, it is unnecessary to fragment the PCR products in MPS library preparation. However, because of the inevitability of biased PCR efficiency among different amplicons, the use of multiplex PCR pools is likely to generate uneven coverage. For full-genome sequencing of SARS-CoV-2, distributing primers in multiple pools 19, 20 or amplifying each amplicon separately 21 might reduce bias, but increase labor and economic cost. One approach to improve the coverage evenness of multiplex PCR assays is to reduce the number of primers per panel. In our previous study, to recover Ebola virus genomes from clinical samples, two panels with~1000-bp and~500-bp amplicon sizes were implemented. 22 ,23 A long-amplicon panel is preferred as it may confer higher coverage and evenness, and if it fails in the evaluation of highly degraded samples, a short-amplicon panel may be used. Longamplicon panels are well-suited to the Oxford Nanopore MinION apparatus, which can generate ultra-long reads. MinION generates long reads and can be implemented outside conventional laboratories. 24, 25 It provides an important supplement to MPS devices and has been used to sequence the SARS-CoV-2 genome. 20, 26 In this study, we developed a new two-pool 36 The bioinformatics analysis of MPS data was similar to that used in our previous studies. 22, 27, 28 Primer trimming of reads was performed with iVar v1.0. 29 Reads were then aligned to the reference genome (Wuhan-hu-1 strain, GenBank accession MN908947.3) by using BWA mem v0.7.17. 30 The alignments were then analyzed with SAMtools v1.9 31 to obtain a sequencing depth file and "mpileup" formatted files. A previously developed homemade workflow named "iSNV-calling" (http://github.com/generality/iSNV-calling) was implemented to identify viral single nucleotide variations (SNVs) with requirements of ≥Q20 base quality, ≥100-fold depth, and ≥20% reads supporting each SNV. Based on our previous assessment, 32 these bioinformatics workflows and filters can identify viral SNVs reliably during amplicon sequencing. For MinION nanopore sequencing data, reads with the desired length (between 750 and 1100 bases for the 1K-panel amplification) were selected and trimmed with a start and end of 30 bases. We used both NGMLR v0.2.7 33 and Minimap2 v2.21 34 for alignment with the setting of "-x ont" and "-ax map-ont", respectively. Depth profiles of sequencing reads and mpileup format alignments, based on NGMLR and Minimap2, respectively, were generated with SAMtools v1.9. SNVs were identified by using the homemade "iSNV-calling" workflow with requirements of ≥Q20 base quality, ≥100-fold depth, and ≥20% reads supporting the SNVs. For comparison, the bioinformatics workflow proposed by Bull et al. 21 was also performed in parallel. Briefly, nanopore sequencing reads were aligned by using Minimap2 v2.21. VarScan2 v2.4.4 35 was employed for identification of SNV with requirements of ≥Q20 base quality, ≥100-fold depth, and ≥20% reads supporting the variant. We sought to improve the coverage evenness of the multiplex PCR panel targeting SARS-CoV-2 by increasing amplicon sizes. To find a favorable amplicon size, we designed and synthesized three two-pool panels consisting of~1,000-bp,~2,000-bp, and~3,000-bp amplicons (Tables S2-S4) . One nasopharyngeal swab sample with a C t value of 25 was used for preliminary validation of the three panels. The size distribution of multiplex PCR products was analyzed with CE. The expected PCR lengths were obtained for the 1000-bp and 2000-bp panel, but not for the 3000-bp panel ( Figure S1 ). PCR products were then sequenced with MinION. Based on the alignment of sequencing reads with the reference genome (Wuhan-hu-1 strain), both the 1000-bp and 2000-bp amplicon panels generated full coverage of the viral genome. However, the 3000-bp panel failed to generate full coverage of the viral genome, which was consistent with the CE analysis and could be ascribed to RNA/cDNA degradation. The 2000-bp panel had a much larger coverage bias among amplicons than the 1000-bp panel ( Figure S2 ). For the 2000-bp panel, 30.6% of sequencing data were assigned to one amplicon. Therefore, we determined that the~1000-bp amplicons were preferable for the multiplex amplification panel of the SARS-CoV-2 genome. The 1000-bp panel contained 36 primer pairs in two pools (18 pairs each). Amplicon sizes ranged from 880-bp to 1027-bp with an average overlap of 112 bp. We thereafter referred to the panel with~1000-bp amplicons as the 1K-panel. We next compared coverage evenness by using the 1K-panel and a widely used 98-plex primer panel provided by the ARTIC network (http://artic.network/ncov-2019, version V3). RNA was extracted from six nasopharyngeal swabs with varied SARS-CoV-2 titers (C t values, 22.9-31.0). Aliquots were amplified with the 1K-and ARTIC panels, followed by sequencing using MiSeq. We defined viral genome regions with ≥20% mean depth as high-coverage regions. The proportion of the high-coverage regions was used to quantify coverage evenness. We found that the 1K-panel generated a more even sequence coverage than the ARTIC panel ( Figure 1A ). With the 1K-panel, the proportion of high-coverage regions averaged 93.0% (SD = 1.7%) for the six samples, compared with 80.6% (SD = 8.7%) for the ARTIC panel. Coverage evenness was dependent on sample viral titers. We found that the 1K-panel was less affected by low viral titers than the ARTIC panel ( Figure 1B ). As C t values increased, the proportions of the high-coverage regions decreased slightly from 95.3% to 90.5% for the 1K-panel, compared with 88.1%-62.4% for the ARTIC panel. We also evaluated a higher cutoff (≥30% mean depth) to define high-coverage regions and found that the 1K-panel maintained its advantage ( Figure 1B ). Next, we included another 49 nasopharyngeal swabs to further evaluate the performance of the 1K-panel to recover the viral genome. We observed a comparable efficiency of viral amplification and genome coverage evenness ( Figure 1C ). Among the 29 samples with C t values <30, all except one had a viral genome recovery with ≥90% high-coverage regions (mean 93.8%, SD = 8.1%). Evenness decreased for samples with C t values >30, but five of six samples with a 30-33 C t value had >70% high-coverage regions. Recent studies have shown that nanopore sequencing can generate accurate consensus genomes of SARS-CoV-2, but are error-prone to We obtained the receiver operating characteristic curves respectively based on NGMLR and Minimap2 alignments (Figure 2A In countries where the outbreak appears to be leveling off, such as China, continual regional resurgences of COVID-19 have been observed. 38 To improve coverage evenness, we adopted~1000-bp amplicons, which are longer than those of several widely-used primer panels, such as ARTIC. Moore et al. 19 recently proposed a multiplex primer panel with amplicons ranging from 956-bp to 1450-bp. However, their primer pairs were distributed in six multiplex pools, with an obvious bias of certain amplicons during amplification, which induced a more uneven coverage of the viral genome than the 1Kpanel. Another major concern of the long-amplicon panel is its utility were prevalent, and many were shared among samples, reflecting the characteristic systematic nature of nanopore sequencing errors. 44, 45 F I G U R E 3 Nanopore sequencing of eight samples with a MinION Flongle flow cell. The bar plot denotes the proportions of viral genomes with a ≥5-fold depth. The sequence data quantity per sample is shown by the dashed line and dot plot. The number of identified SNVs are shown below sample identifiers (i.e., 2/4 means two of four benchmarked SNVs were found). SNV, single nucleotide variation The occurrence of artificial SNVs was also related to alignment software, and artificial SNVs were enriched on the 5′ and 3′ ends of nanopore sequencing reads. Thus, it provides a clue for further improvement of bioinformatic analyses for sub-clonal SNV identification. Currently, nanopore sequencing might not be applicable to studies that aim to analyze the intra-host diversity of viral genomes; MPS would be necessary. In sum, we developed and validated a new two-pool, long-amplicon 36-plex PCR primer panel for the full-genome sequencing of SARS-CoV-2 from primary samples. The panel may generate a more even coverage of SARS-CoV-2 genomes than the ARTIC short-amplicon panel and subsequently may require less sequencing data. For the samples with a C t <30,~5 Mbp data by MinION provided ≥95% genome coverage with a ≥10-fold depth. Meanwhile, our assessment showed that nanopore sequencing with MinION identified dominant viral populations (consensus viral genomes) reliably within samples, but was highly error-prone for the discovery of minor viral populations. SNVs with a <0.8 MuAF should be regarded as unreliable. A new coronavirus associated with human respiratory disease in China GISAID: global initiative on sharing all influenza data-from vision to reality Cryptic transmission of SARS-CoV-2 in Washington state An emergent clade of SARS-CoV-2 linked to returned travellers from Iran Introductions and early spread of SARS-CoV-2 in France Nextstrain: real-time tracking of pathogen evolution Rapid SARS-CoV-2 whole-genome sequencing and analysis for informed public health decision-making in the Netherlands A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology On the origin and continuing evolution of SARS-CoV-2 Emergence of a Highly Fit SARS-CoV-2 Variant Tracking changes in SARS-CoV-2 spike: evidence that D614G Increases Infectivity of the COVID-19 Virus Tracking SARS-CoV-2 variants Contribution of TGF-beta-mediated NLRP3-HMGB1 activation to tubulointerstitial fibrosis in rat with angiotensin ii-induced chronic kidney disease Assay techniques and test development for COVID-19 Diagnosis Single-strand RPA for rapid and sensitive detection of SARS-CoV-2 RNA Single-copy sensitive, field-deployable, and simultaneous dual-gene detection of SARS-CoV-2 RNA via modified RT-RPA Reverse-transcription recombinaseaided amplification assay for rapid detection of the 2019 novel coronavirus (SARS-CoV-2) Investigation of a COVID-19 outbreak in Germany resulting from a single travelassociated primary case: a case series Amplicon-based detection and sequencing of SARS-CoV-2 in nasopharyngeal swabs from patients with COVID-19 and identification of deletions in the viral genome that encode proteins involved in interferon antagonism Rapid, sensitive, full-genome sequencing of severe acute respiratory syndrome coronavirus 2 Analytical validity of nanopore sequencing for rapid SARS-CoV-2 genome analysis Intra-host dynamics of Ebola virus during 2014 Genetic diversity and evolutionary dynamics of Ebola virus in Sierra Leone Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples Real-time, portable genome sequencing for Ebola surveillance Genomic epidemiology of SARS-CoV-2 in Guangdong Province Dynamics of HIV-1 quasispecies diversity of participants on long-term antiretroviral therapy based on intrahost single-nucleotide variations Phylogenomic analysis unravels evolution of yellow fever virus within hosts An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar Fast and accurate short read alignment with Burrows-Wheeler transform The sequence alignment/map format and SAMtools An assessment of amplicon-sequencing based method for viral intrahost Accurate detection of complex structural variations using single-molecule sequencing Minimap2: pairwise alignment for nucleotide sequences VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing Genomic epidemiology of SARS-CoV-2 in Guangdong Province Insight into biases and sequencing errors for amplicon sequencing with the Illumina MiSeq platform Genomic surveillance of COVID-19 cases in Beijing Genomic elucidation of a COVID-19 resurgence and local transmission of SARS-CoV-2 in Guangzhou SARS-CoV-2 within-host diversity and transmission Genomic characterization of SARS-CoV-2 identified in a reemerging COVID-19 outbreak in Beijing's Xinfadi market in 2020 Cold-chain food contamination as the possible origin of COVID-19 resurgence in Beijing Identifying the Risk of SARS-CoV-2 Infection and Environmental Monitoring in Airborne Infectious Isolation Rooms (AIIRs) Nanopore sequencing: review of potential applications in functional genomics The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community Assessment of two-pool multiplex long-amplicon nanopore sequencing of SARS-CoV-2 We would like to thank Ms. Yan Zhang for revising this manuscript. Ming The authors declare that there are no conflict of interests. The codes for bioinformatics analyses are available at https://github. com/Ming-Ni-Lab/1K-amplicon-ONT-sequencing-of-SARS-CoV-2.The consensus genomes of SARS-CoV-2 are available at the GISAID database (https://gisaid.org), GenBank (https://www.ncbi.nlm.nih. gov/genbank) and the China National Microbiology Data Center (http://www.nmdc.cn/coronavirus) databases through the accession numbers provided in the Table S1 . http://orcid.org/0000-0001-9465-2787