key: cord-1050908-fulol6hn authors: Bhoyar, Rahul C.; Senthivel, Vigneshwar; Jolly, Bani; Imran, Mohamed; Jain, Abhinav; Divakar, Mohit Kumar; Scaria, Vinod; Sivasubbu, Sridhar title: An optimized, amplicon-based approach for sequencing of SARS-CoV-2 from patient samples using COVIDSeq assay on Illumina MiSeq sequencing platforms date: 2021-08-02 journal: STAR Protoc DOI: 10.1016/j.xpro.2021.100755 sha: 772d805389814febb940c709c6d31487d9c83b6b doc_id: 1050908 cord_uid: fulol6hn Sequencing of SARS-CoV-2 genomes is crucial for understanding the genetic epidemiology of the COVID-19 pandemic. It is also critical for understanding the evolution of the virus and also for the rapid development of diagnostic tools. The present protocol is a modification of the Illumina COVIDSeq test. We describe an amplicon-based next-generation sequencing approach with short turnaround time, adapted for bench-top sequencers like MiSeq, iSeq and MiniSeq. The Illumina COVIDSeq assay is an amplicon-based next-generation sequencing (NGS) approach for the detection of SARS-CoV-2 RNA isolated from nasopharyngeal, oropharyngeal, and mid-turbinate nasal swabs from patients. The assay is one of the first NGS based SARS-CoV-2 detection assays approved for use under U.S Food and Drug Administration's Emergency Use Authorization (EUA). The COVIDSeq test was first standardized and validated for Illumina NovaSeq sequencer and can accommodate 384 to 3072 samples, depending on the configuration of flow cells and the instrument (Bhoyar et al. 2021) . RNA required for SARS-CoV-2 sequencing can be extracted from decontaminated nasopharyngeal swabs using the methods mentioned in EUA approval that include; Quick-DNA/RNA Viral MagBead Kit (Zymo Research, # R2141) or QIAamp Viral RNA Mini Kit (Qiagen, part # 52906). However, the protocol is compatible with all commonly used viral RNA isolation methods including, Nextractor® NX-48S (GENOLUTION), Trueprep (Molbio Diagnostics Pvt. Ltd.) and TRIzol (Invitrogen). While originally adapted and optimised for larger sequencers, this assay could be optimised for low throughput bench-top equipment for smaller batch sizes. This manuscript details the optimised protocol with marginal modifications along with an open-source pipeline for the analysis and interpretation of data. While the protocol has been standardized for the Illumina MiSeq sequencing platform, by considering the final loading concentration of the library and the data output, the proposed protocol can be adopted for any bench-top sequencer from Illumina Inc. This protocol consists of the following key steps: RNA extraction-Prior to RNA extraction, the VTM samples were subjected to heat inactivation at 50°C for 30 minutes. RNA is extracted from the inactivated specimen (VTM), using any of the following mentioned protocol that includes; QIAmp (QIAGEN), Nextractor® NX-48S (GENOLUTION), TruePrep (Molbio Diagnostics Pvt. Ltd.) and TRIzol (Invitrogen). cDNA synthesis-Generates complementary DNA to the RNA by reverse transcriptase using random hexamers. Target amplification-The synthesized cDNA undergoes two separate PCR reactions to amplify the virus genome present in the sample. Library preparation, Pooling and Quantification-During this process, the pooled amplified products undergo bead-based tagmentation where it gets fragmented and tagged to the adapter sequences. The adapter-tagged fragments undergo another round of PCR amplification, after which indexed tagged libraries will be pooled and cleaned using the purification beads. The pooled library product is quantified using Qubit High Sensitivity dsDNA quantification kit (Invitrogen). The sequencing-ready libraries are clustered onto a flow cell and sequenced using sequencing by synthesis (SBS) chemistry on the Illumina MiSeq sequencing system. Analysis-This step includes genome assembly, variant calling and phylogenetic analysis using established protocols (available in the GitHub repository:https://github.com/bnijolly/Genepi). cDNA preparation During this process, the extracted SARS-CoV-2 RNA is annealed using random hexamers and the RNA fragments primed with random hexamers are reverse-transcribed into first strand cDNA using reverse transcriptase enzyme. 2. Annealing of RNA a. Label a new 96 well PCR plate as cDNA plate (Fig.1) . b. Add 8.5 μl of the isolated SARS-CoV-2 sample into the wells of the cDNA plate. c. Add 8.5 μl of EPH3 HT to each well. J o u r n a l P r e -p r o o f d. Mix well with a P10 or P20 multichannel pipette.If plate shaker is available, within the laminar hood, shake the plate at 1600 rpm for 1 minute. e. Seal and spin the plate at 1000 x g for 1 minute. f. Place the cDNA plate on a thermal cycler and run the following program i. Choose the preheat lid option ii. Set the final volume of the reaction as 17 μl iii. 65 o C for 3 minutes iv. Hold Step-by-Step Method Details In this step, the prepared cDNA of the SARS-CoV-2 genome is tagmented (tagged and fragmented), indexed and amplified to become sequencing-ready libraries. Add 100 μl of TWB HT to each well. iii. Seal and shake at 1600 rpm for 1 minute and centrifuge at 500 x g for 1 minute. iv. Remove the seal and place the plate on the magnetic stand. Wait for 3 minutes/until the liquid is clear. v. Remove and discard the supernatant from each well. j. Repeat the wash steps (step i; i-v) one more time. CRITICAL: Do not discard the supernatant after the second wash, to prevent the beads from over drying. 3. Amplification of tagmented amplicons a. In a 15 ml tube, prepare Enhanced PCR Mix i. Add 2304 μl of EPM HT (24 μl x 96 samples). Add 2304 μl of Nuclease-free water (24 μl x 96 samples). iii. Vortex the tube to mix. b. Remove and discard the supernatant from the TAG1 plate. c. Use a 10 μl or 20 μl pipette to remove any remaining TWB HT from the TAG1 plate. d. Add 40 μl of Enhanced PCR Mix to each well of the TAG1 plate. e. Add 10 μl of index adapters to each well of the TAG1 plate. f. Seal and shake the plate at 1600 rpm for 1 minute and centrifuge the plate at 500 x g for 1 minute. CRITICAL: Inspect the tubes to make sure the beads are fully resuspended and are not found at the bottom of the tubes. g. Place on a thermal cycler and run the following PCR program i. Choose the preheat lid option and set the temperature to 100 o C. ii. Set the total volume of the reaction to be 50 μl. b. Remove the seal and place the plate on a magnetic stand and wait for 3 minutes/until the liquid is clear. c. Transfer 5 μl of library from each well into an 8-tubes strip. This results in a pooled volume of 60 μl in each tube of the 8-tubes strip. CRITICAL: Discard and change tips for each column of samples while transferring. d. Transfer 55 μl of pooled library from each tube to a 1.5 ml microcentrifuge tube. This results in a total of 440 μl of pooled libraries (from 96 samples). e. Add 396 μl of ITB to each tube (ITB volume is calculated by multiplying the total volume of the pooled libraries with 0.9 i.e., for 96 samples: 440 μl x 0.9 = 396 μl) f. Vortex the tube to mix well. g. Incubate the tube at room temperature for 5 minutes. h. Spin briefly and place the tube on a 1.5ml tube magnetic stand and wait for 5 minutes. i. Remove and discard all supernatants. j. Wash the beads as follows: i. Add 1000 μl of freshly prepared 80% ethanol to the tube and wait for 30 seconds. ii. Remove and discard the supernatant. iii. Repeat the above-mentioned wash steps one more time for a total of 2 washes with 80% ethanol. k. Remove any residual ethanol left on the tube. l. Air dry the beads for 2 minutes. m. Add 55 μl of Resuspension Buffer to the tube and vortex to mix. n. Incubate at room temperature for 2 minutes. o. Spin briefly and place the tube in the magnetic stand and wait for 2 minutes. p. Label a new 1.5 ml tube as Final Pool. q. Transfer 50 μl of the supernatant into the Final Pool tube. Pause Point: The procedure can be paused here and the Final Pool tube can be stored at -20 o C for 30 days. 5. Quantification and Normalization of the library pool a. Using the Qubit High Sensitivity dsDNA quantification kit (Invitrogen), quantify the library pool i. Dilute 1 μl of the pooled library sample in the ratio of 1:3. ii. Use 2 μl of the diluted library for quantification. iii. Calculate the actual concentration of the library by factoring in the dilution. b. Load 2 μl of the pooled library (undiluted) onto a 2% agarose gel. Add 180 μl of the 20pM library from the Library-Final tube. iii. Add 420 μl of pre-chilled HT1 and mix well. iv. Discard 6 μl of the mixture. v. Add 6 μl of the 20pM PhiX library for 1% spike in as recommended by Illumina Inc. vi. Mix well with a pipette, spin briefly and place the tube on ice. The library is ready to be loaded onto the flow cell. Upon successful tagmentation and size selection the final library pool will appear as ~300 bp size band on 2% agarose gel (Figure 2 ). Any deviation in the fragment size may be the result of erroneous tagmentation or size selection, which can be minimised with proper handling and pipetting practices. This step describes the basic bioinformatics pipeline for genome assembly and identification of variants in the SARS-CoV-2 genome. The raw sequencing data generated by Illumina sequencing platforms are in the form of Binary Base Call (BCL) files and require conversion to FASTQ format before further processing. 1. The BCL files generated by the sequencer are demultiplexed to FASTQ files using the bcl2fastq conversion software provided by Illumina using the command: bcl2fastq --runfolder-dir --sample-sheet --output-dir 2. The raw FASTQ files need to undergo additional processing steps, including quality control, trimming, alignment against the SARS-CoV-2 reference genome and sorting. The resulting BAM file containing mapped and sorted reads are then used to call variants and generate a consensus sequence of the genomes in FASTA file format. The steps to assemble SARS-CoV-2 genomes from raw FASTQ files are detailed in a recent protocol (Poojary et al 2020) . Sample parameters and metrics that may be used to assess the quality of the assembled genomes are given in Table 1 . 3. For assignment of lineages to the assembled genome FASTA sequences, we use the Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) software command line tool using the command: pangolin .fasta Alternatively, the web version of PANGOLIN (available at https://pangolin.cog-uk.io/) can be used to assign lineages to the generated FASTA sequences (Rambaut et al 2020). Figure 1: Layout of the sample plate to be used denoting the samples, PC and NC. Up-to 94 samples can be processed at once. A positive control (CPC HT) and Negative Control (ELB HT)is included in every plate that is processed using the COVIDSeq protocol. Figure 2 : Quality of the final library pool was analyzed by agarose gel electrophoresis (2%), the expected fragment of ~300 bp size was observed. High throughput detection and genetic epidemiology of SARS-CoV-2 using COVIDSeq next-generation sequencing Computational protocol for assembly and analysis of SARS-nCoV-2 genomes A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology Nextstrain: real-time tracking of pathogen evolution Computational Analysis and Phylogenetic Clustering of SARS-CoV-2 An amplicon-based next-generation sequencing approach for detection of SARS-CoV-2 RNA 2. Optimized for low throughput bench-top equipment for smaller batch sizes 3. The protocol is compatible with all commonly used viral RNA isolation methods 4. An accelerated SARS-CoV-2 sequencing protocol with turnaround time < 30 hours Authors acknowledge funding from the Council of Scientific and Industrial Research (CSIR), India through grants CODEST and MLP2005. The funders had no role in the design of experiment, analysis or decision to publish. All of the authors contributed substantially to the conception and design of the study, the acquisition of data, analysis and interpretation. S.S.B. and V.S. Funding Acquisition. The authors declare no competing interests. 4. The FASTA sequence files can also be used to analyse and visualise evolutionary relationships through phylogenetic clustering. For performing the phylogenetic analysis, we use the opensource tool-kit Nextstrain following a recently detailed protocol (Jolly and Scaria 2021) . The comparison of data output, coverage and time taken for analysis between MiSeq and NovaSeq 6000 is provided in the supplementary Tables S1 and S2. Users may find the genome coverage in certain samples is low and the samples do not return any variant after the analysis.Potential Solution:Users may ensure that the starting material has a high viral load i.e., we recommend samples with a CT value of <25 for 99% genome coverage. Further information and requests for resources and reagents should be directed to and will be fulfilled by the Lead Contact, Dr. Sridhar Sivasubbu (sridhar@igib.in). All the materials used in the protocol are available without any restrictions; the details of the kits used in the protocol are given in the "key resource table". All the scripts used for the SARS-CoV-2 sequencing analysis and interpretation are adopted from Poojary et al 2020 (GitHub repository: https://github.com/banijolly/Genepi).