key: cord-0818263-l9rca3eo authors: Charre, Caroline; Ginevra, Christophe; Sabatier, Marina; Regue, Hadrien; Destras, Grégory; Brun, Solenne; Burfin, Gwendolyne; Scholtes, Caroline; Morfin, Florence; Valette, Martine; Lina, Bruno; Bal, Antonin; Josset, Laurence title: Evaluation of NGS-based approaches for SARS-CoV-2 whole genome characterisation date: 2020-07-15 journal: bioRxiv DOI: 10.1101/2020.07.14.201947 sha: d9b83fc7f2119e6f0331c01d90b8db71ca6993af doc_id: 818263 cord_uid: l9rca3eo Since the beginning of the COVID-19 outbreak, SARS-CoV-2 whole-genome sequencing (WGS) has been performed at unprecedented rate worldwide with the use of very diverse Next Generation Sequencing (NGS) methods. Herein, we compare the performance of four NGS-based approaches for SARS-CoV-2 WGS. Twenty four clinical respiratory samples with a large scale of Ct values (from 10.7 to 33.9) were sequenced with four methods. Three used Illumina sequencing: an in-house metagenomic NGS (mNGS) protocol and two newly commercialized kits including a hybridization capture method developed by Illumina (DNA Prep with Enrichment kit and Respiratory Virus Oligo Panel, RVOP) and an amplicon sequencing method developed by Paragon Genomics (CleanPlex SARS-CoV-2 kit). We also evaluated the widely used amplicon sequencing protocol developed by ARTIC Network and combined with Oxford Nanopore Technologies (ONT) sequencing. All four methods yielded near-complete genomes (>99%) for high viral loads samples, with mNGS and RVOP producing the most complete genomes. For mid viral loads, 2/8 and 1/8 genomes were incomplete (<99%) with mNGS and both CleanPlex and RVOP, respectively. For low viral loads (Ct ≥25), amplicon-based enrichment methods were the most sensitive techniques yielding complete genomes for 7/8 samples. All methods were highly concordant in terms of identity in complete consensus sequence. Just one mismatch in two samples was observed in CleanPlex vs the other methods, due to the dedicated bioinformatics pipeline setting a high threshold to call SNP compared to reference sequence. Importantly, all methods correctly identified a newly observed 34-nt deletion in ORF6 but required specific bioinformatic validation for RVOP. Finally, as a major warning for targeted techniques, a default of coverage in any given region of the genome should alert to a potential rearrangement or a SNP in primer annealing or probe-hybridizing regions and would require regular updates of the technique according to SARS-CoV-2 evolution. We implemented the CleanPlex SARS-CoV-2 panel (Paragon Genomics, Inc, Hayward, CA, 104 USA) protocol for target enrichment and library preparation (17). In short, from reverse-105 transcribed RNA, multiplex PCR reactions were performed using 343 pairs of primers 106 separated into two pools covering the entire genome of SARS-CoV-2 ranging from 116bp to 107 196bp, with a median size of 149 bp. Illumina indexes were introduced by PCR. 108 For these three approaches the prepared libraries were sequenced on an Illumina NextSeq TM 109 550 with mid-output 2x150 bp (mNGS and CleanPlex) or 2x75 bp (RVOP) flow cells. 110 We tested a multiplexed PCR amplicon approach implementing the ARTIC Network nCoV-112 2019 sequencing protocol slightly modified by ONT (Oxford Nanopore Technologies, 113 Oxford, UK) for better performance. The multiplex PCR primers set v3 was used to span the 114 whole genome (12). Briefly, synthesized cDNA was used as template and tiling 400nt-115 amplicons with 20 base pairs overlaps (not including primers) were generated using two pools 116 of primers for 35 cycles. Samples were multiplexed by using the native barcode kits from 117 ONT (EXP-NBD104 and EXP-NBD114). The library was prepared using the SQK-LSK109 118 kit and then sequenced on a FLO-MIN106 (R9.4.1) flow cell, multiplexing 24 samples per 119 run. 120 Sequencing data analysis 121 mNGS data were analysed with an in-house pipeline. Briefly, low quality and human reads 122 were filtered out (dehosting) and remaining reads were aligned to the SARS-CoV-2 reference 123 genome (isolate Wuhan-Hu-1, EPI_ISL_402125) using the BWA-MEM (v0.7.15-r1140) 124 algorithm. Consensus sequences were generated through a simple majority rule using custom 125 perl script. These sequences were used as the patients' own mapping reference for further 126 realignment of the reads. Final consensus sequence was called at 10x using no-clip alignment. While RVOP and mNGS enabled to generate the complete genome (maximum coverage was 157 100% for highest viral loads), amplicon-based target enrichment did not allow to cover 158 SARS-CoV-2 genome ends, as expected by considering the design of ARTIC-ONT and 159 CleanPlex protocols. The highest coverage was 99.6% and 99.7% for ARTIC-ONT and 160 CleanPlex, respectively. Furthermore, a novel 34nt-deletion in the ORF6 previously observed with our in-house 215 mNGS method, was detected by the two evaluated tiling amplicon sequencing methods 216 (ARTIC and CleanPlex). However larger deletions spanning two primer annealing regions 217 may arise and would be difficult to define using amplicon-based target enrichment methods. 218 Missing regions with no coverage should be carefully investigated using other methods. The 219 34nt-deletion was also detected using RVOP despite an adjacent oligoprobes design. As the 220 oligoprobe set is larger than the primer panel used in tiling amplicons protocols, a more 221 comprehensive profiling of regions is obtained independently of genomic rearrangement. We would like to thank all the patients, laboratory technicians and clinicians who contributed 258 to this investigation. We are also grateful to Philip Robinson (DRCI, Hospices Civils de 259 Lyon) for help in manuscript preparation. 260 Data will be deposited on SRA database prior to publication. 262 A Novel Coronavirus from 297 Patients with Pneumonia in China Detection 299 of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR. Eurosurveillance Structure-based design of antiviral 303 drug candidates targeting the SARS-CoV-2 main protease Sequence analysis of SARS-CoV-2 genome reveals features important for vaccine design. 308 bioRxiv Tracking virus outbreaks in the twenty-first century Rapid SARS-CoV-2 whole genome sequencing for informed public health decision 313 making in the Netherlands Consortium TC-19 GU genomic-surveillance-network(d537cd78-4d18-4935-90c3-2645ddf9b34f).html 320 8. Preliminary analysis of SARS-CoV-2 importation & establishment of UK 321 transmission lineages. Virological. 2020 [cited Genomic surveillance of SARS-CoV-2 in Thailand reveals mixed imported populations, a 326 local lineage expansion and a virus with truncated ORF7a. medRxiv Molecular characterization of SARS-CoV-2 in the first COVID-19 cluster in France reveals 330 an amino acid deletion in nsp2 (Asp268del) An 81 334 nucleotide deletion in SARS-CoV-2 ORF7a identified from sentinel surveillance in Arizona Quality 342 control implementation for universal characterization of DNA and RNA viruses in clinical 343 respiratory samples using single metagenomic next-generation sequencing workflow Early 346 phylodynamics analysis of the COVID-19 epidemics in France. medRxiv Comprehensive 349 viral enrichment enables sensitive respiratory virus genomic identification and analysis by 350 next generation sequencing 354 Integrative genomics viewer Metagenomics and future perspectives in virus 360 discovery Comparing library preparation methods for SARS-CoV-2 multiplex amplicon sequencing on 366 the Illumina MiSeq platform. bioRxiv Genome Sequencing of SARS-CoV2 using 1200 bp Tiled Amplicons and Oxford Nanopore 369 Rapid Barcoding. bioRxiv Full-Genome Sequencing of Severe Acute Respiratory Syndrome Coronavirus 2 -372