key: cord-0784987-mo5tnv0w authors: James, P.; Stoddart, D.; Harrington, E. D.; Beaulaurier, J.; Ly, L.; Reid, S.; Turner, D. J.; Juul, S. title: LamPORE: rapid, accurate and highly scalable molecular screening for SARS-CoV-2 infection, based on nanopore sequencing date: 2020-08-11 journal: nan DOI: 10.1101/2020.08.07.20161737 sha: 9f8da542d7608fc9bccb8a62b5872ecdffdff267 doc_id: 784987 cord_uid: mo5tnv0w LamPORE is a rapid way of testing/screening large numbers of samples for the presence or absence of SARS-CoV-2, the virus causing COVID-19. It combines barcoded multi-target amplification, 15-minute barcoded library preparation and real-time nanopore sequencing. Starting with extracted RNA, results can be obtained from 12 samples in approximately an hour and from 96 samples in under 2 hours. High scalability is achieved by combinatorial barcoding. Performance characteristics are currently being established and regulatory clearance to market is underway. scale screening to be worthwhile, it is important to have assays that are high throughput, accurate and very fast. Epidemiological models show that testing frequency and time-to-result are important components of a surveillance system (3) . There is little benefit in being able to screen a large number of samples if the results are not made available quickly enough to inform quarantine decisions or contact tracing. In the US, many labs are taking 5-7 days or more to turn tests around (4) . LamPORE addresses this by combining multiplexed barcoded loop mediated amplification (RT-LAMP), rapid barcoded library preparation and real-time nanopore sequencing to create a way of rapidly testing and screening large numbers of samples for the presence or absence of SARS-CoV-2 RNA. The method fits both large-scale and small-scale laboratory environments. LAMP is a method of targeted isothermal amplification (5) which can generate micrograms of product from tens of copies of the target region, within 30 minutes at 65°C. Successful amplification is often inferred from a proxy measurement, such as increased turbidity, a colour change or changes in fluorescence. However, although the LAMP reaction itself is very robust, these proxy measurements are less robust and can be affected by substances present in biological samples. It is also not uncommon to see a colour change or increase in turbidity in no-template controls, arising from amplification of primer artefacts, which would lead to a false positive call. Instead of relying on proxy measurements, sequencing can be used as a readout (6) . On-target amplification events contain sequences that are not present in the primers and can be identified without ambiguity by alignment. In addition, sequencing provides an opportunity to amplify and detect multiple targets in a single tube. During nanopore sequencing, an electrical current is measured as template strands pass through each pore on the flow cell array. Conversion of this current into basecalls can start while a strand is translocating. In addition, there is no fixed run time with nanopore sequencing, meaning that the sequencing run can be matched to the data requirements. As a consequence, data analysis can be performed in real time, and results can be returned very rapidly. To maintain these speed advantages, it is necessary to use a correspondingly rapid method of library preparation to convert the amplified products into a form that is compatible with sequencing. Oxford Nanopore multiplexing would be insufficient to satisfy the throughput requirements of frequent testing. It has been shown that a short barcode can be added to one of the primers used in the LAMP reaction (6, 7) , and during the course of the reaction the barcode becomes copied multiple times into the products. This raises the possibility of performing dual/combinatorial barcoding without adding additional library preparation steps, thus creating a rapid method of targeted sequencing that is capable of high levels of multiplexing. In this manuscript, we show that performing a multiplexed amplification reaction, in which three separate regions of the SARS-CoV-2 genome are targeted, performs with high sensitivity (to around Ct37 as measured by RT-qPCR). In addition, the inclusion of a fourth primer set, targeting human actin mRNA allows true negatives to be distinguished from invalid results, where the initial sample was not taken or processed adequately. Suboptimal sampling is suspected to be responsible for false negative results in many SARS-CoV-2 tests (8, 9) . Starting with RNA extracted from swabs, results can be obtained from a small number of samples in approximately an hour, and from 96 samples in under 2 hours. This assay is simple to scale from a small number of samples to thousands, with greater degrees of multiplexing achievable by increasing the numbers of LAMP barcodes and / or ONT rapid barcodes. Primer sequences for the amplification of three SARS-CoV-2 targets and human actin mRNA were obtained from New England Biolabs (Ipswich, MA, and (10)), and short barcodes were added to the forward inner primers (FIP) as described (6, 7) . Primers were synthesised and HPLC-purified by IDT (Coralville, IA). The concentration of actin primers was intentionally lower than for the SARS-CoV-2 primers to prevent amplification of the human target overwhelming any SARS-CoV-2 amplification. For each FIP barcode, a 10x primer pool was prepared containing each oligonucleotide at the appropriate concentration. Reactions were performed in 96-well plates in such a way that each well in a row received the same barcoded FIP oligo mix, with different barcoded FIPs being used in the different rows. Each LAMP reaction consisted of 25 µl 2x LAMP Master Mix (NEB E1700), 5 µl 10x primer pool and 20 µl RNA sample (or no-template control). Reactions were incubated at 65°C for 35 minutes, followed by 80°C for 5 minutes. Following amplification, reactions were pooled by column, giving 12 pools, each consisting of 8 separate reactions (Fig. 1 ). All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020. . Library preparation was performed separately on each of the 12 pools, in a volume of 10 µl per reaction. Each reaction consisted of 6.5 µl nuclease-free water, 1 µl of pooled LAMP product and 2.5 µl of the appropriate rapid barcode (Oxford Nanopore Technologies, SQK-RBK004). Reactions were mixed and spun down, before being incubated at 30°C for 2 minutes and then 80°C for 2 minutes. All reactions were then pooled into a single 1.5 ml Eppendorf LoBind tube. The pooled products were purified using 0.8x AMPure beads, were washed in fresh 80% ethanol and were eluted in 15 µl EB buffer. 11 µl of eluate was transferred to a clean 1.5 ml Eppendorf LoBind tube, along with 1 µl rapid adapter (RAP). Reactions were incubated for 5 minutes at room temperature, before being sequenced on a single MinION flow cell for 1 hour, following the manufacturer's instructions. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020. . https://doi.org/10.1101/2020.08.07.20161737 doi: medRxiv preprint In order to call the presence or absence of virus in the sample, the number of reads from each LAMP target must be counted for each sample in the sequencing run. This requires the accurate identification of i) the barcode added during library preparation by the Rapid Barcoding Kit (RBK), ii) the barcode added as part of the FIP oligo during the LAMP reaction and iii) the sequence of the LAMP product associated with each target region. The RBK barcodes are identified using the guppy_barcoder software (version 4.0.11; command line options "--barcode_kits SQK-RBK004 --detect_mid_strand_barcodes --min_score_mid_barcodes 40"). The FIP barcode is identified by sequence alignment with the barcode-containing FIP sequence. The LAMP product associated with each read is identified by alignment with the sequence spanning the F2-B1 primer locations. A valid LAMP product is detected if the alignment length is greater than 80 nucleotides and greater than 80% identity. The multimeric nature of the LamPORE reads allows an additional layer of quality control. Each read should only contain sequence from a single LAMP target for a single sample, therefore reads with multiple rapid barcodes, conflicting FIP barcodes or incompatible FIP-product pairings are removed from further consideration. The specific nature of the sequencing analysis allows nonspecific amplification, for example primer artefact, to be measured and excluded. Reads with RBK and FIP classifications, but which fail product classification or contain conflicting product regions, are counted as "unclassified". ii) Determining presence/absence of SARS-CoV-2 Per-sample results of the assay are returned as either positive, negative, inconclusive, or invalid. The calls are made based on the aggregated read counts for each sample across the various targets (i.e. human actin and the three SARS-CoV-2 target regions) and cutoffs were chosen based on 1 hour of sequencing. An invalid call is returned if <50 total classified reads are obtained from across all targets (including both human actin and SARS-CoV-2). A negative call is returned if a sum of <20 reads are obtained from the three SARS-CoV-2 targets (and >= 50 reads in total). An inconclusive call is returned if a sum of >=20 and <50 reads are obtained from the three SARS-CoV-2 targets. A positive call is returned if a sum of >=50 reads are obtained from the three SARS-CoV-2 targets. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020. . https://doi.org/10.1101/2020.08.07.20161737 doi: medRxiv preprint iii) ROC and F1 score curves To evaluate the sensitivity and specificity of the assay against the known status of 80 COVID-19 positive clinical RNA samples and a similar number of human RNA-only negatives, we generated receiver operating characteristic (ROC) curves using the metrics.roc_curve function from the scikitlearn package (11) . The sum of read counts across each of the three SARS-CoV-2 targets (AS1, E1, and N2) serves as the scoring metric for calling the results positive, negative, inconclusive, or invalid. The ROC curve therefore reveals the sensitivity and specificity of the assay at various thresholding values of that scoring metric. In addition to the curve generated for the SARS-CoV-2 read count sum, we also generated curves for read counts from each individual SARS-CoV-2 target. The F1 score represents the harmonic mean of the sensitivity and specificity of the assay, defined as 2 * where TPR is the true positive rate and FPR is the false positive rate. We chose the read count threshold (>= 50 total SARS-CoV-2 target reads) in order to maximize the F1 score. An assay that targeted a single locus from the SARS-CoV-2 genome would potentially lack robustness to sequence variants that occur as the virus evolves. To overcome this, we chose to target three different regions in the SARS-CoV-2 genome, in a single multiplexed reaction. These are ORF1a and the envelope (E) and nucleocapsid (N) genes, with primer sets AS1 (10), E1 and N2 (12), respectively. In addition, as a control for the quality of the initial sample preparation, RNA extraction, reverse transcription and LAMP amplification, we included a set of primers to amplify the human actin mRNA (12). The primers target either side of a splice junction and do not amplify from genomic DNA. As long as the sample has been taken and prepared correctly, actin RNA should be present in all the swab samples, regardless of their SARS-CoV-2 status, and so this provides a way to differentiate between true negatives and invalid samples. To assess the inclusivity of the triplex SARS-CoV-2 assay, we aligned all primer sequences to the 46,872 human SARS-CoV-2 genomes deposited at GISAID on June 16, 2020 (13, 14). Since not all genomes are high coverage or complete, we excluded 2,105 sequences belonging to 1,939 samples from analysis of at least one primer set because they covered fewer than 90% of all bases in that region. Of the 44,933 genomes with sufficient coverage in all three regions, 2,554 (5.68%) genomes and 179 (0.40%) genomes had a mismatch in one or two primer sets respectively, but a full match for the others. Only 2 (0.004451%) genomes had a mismatch in all three primer sets. The primer sets All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in In order to assess the potential for cross-reactivity with other viruses, we aligned the LAMP primer sequences against sequences of common viruses as well as coronaviruses related to SARS-CoV-2. We determined sequence identity by dividing the sum of aligned primer bases by the sum of primer lengths ( Table 2) . SARS-CoV, which is closely related to SARS-CoV-2, was the sole virus to have a match against the total sequence length of the SARS-CoV-2 primers greater than the recommended threshold of 80%. The E-gene primer set has a match >90% with SARS-CoV, but the AS1 and N2 primer sets differ significantly, matching at only 44.5% and 74%, respectively. The likelihood of a false positive is low since SARS-CoV is not known to be in active circulation at present (16) . Furthermore, should this situation change, the presence/absence stage of the analysis can be easily modified to identify positive results that are dependent entirely on amplification of the E-gene primer. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Table 2 . Organisms assessed in silico for potential cross-reactivity to the SARS-CoV-2 LamPORE Assay. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020. . https://doi.org/10.1101/2020.08.07.20161737 doi: medRxiv preprint ii) Barcode demultiplexing LAMP products contain multiple copies of each ~150 bp target region joined end-to-end, forming strands of up to approximately 5 kb, with consecutive copies of the target region in alternating orientation (Fig. 2a) . Following library preparation with the ONT RBK, fragments are reduced to a modal length of around 500 bp, so still typically contain several copies of the target region. More than one forward and reverse primer is used in each LAMP reaction at each target region, so the repeating units are not of a uniform length (Fig. 2b) , and because of the location of the barcodes within the FIP oligo, not all copies of the repeating unit contain the LAMP barcodes. This makes it necessary to select reads that do contain LAMP barcodes (Fig. 2c) . All LamPORE reads contain an ONT barcode at one of the ends, and by selecting for LAMP barcodes, we typically retain the approximately 70% of reads which thus contain both barcodes and the target region. Primer artefacts can accumulate during the LAMP reaction, and as a result, the consequence of judging successful amplification by a proxy measurement, such as a colour change or increase in turbidity, can be a false positive call. This is avoided when sequencing is used as a readout: reads are All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020. . aligned to a reference sequence, and for a read to be considered valid, it must consist of inverted repeats of large stretches of the target region, including target-specific sequences present that do not exist in the primers. Alignments of valid reads are contiguous across the majority of the target region (Fig. 3a) . In contrast, primer artefacts consist entirely of sequences covered by the primers, and these tend to align as short segments interspersed with gaps. Figure 3 . Distinguishing between valid reads and primer artefacts by alignment a) valid reads consist of inverted repeats that align across the majority of the target region, whereas b) primer artefacts align as short segments interspersed with gaps. iv) FIP barcode optimisation Verification of the FIP barcodes for each target was carried out using a dilution series of the Twist Synthetic RNA Control 2 (Twist Biosciences) for the SARS-CoV-2 loci and total human RNA extracted from GM12878 (Coriell) for the actin control. Template quantities ranged from 20-250 copies per reaction. We observed that not only does the presence of the barcode influence the sensitivity of the reaction, the sequence of the barcode also affects performance, with some barcoded FIPs working with higher sensitivity than others. We excluded the worst-performing barcoded FIPs and in this way we reduced our initial 12 barcodes to the best-performing 8, all of which were capable of amplifying from 20 copies in a 50 µl LAMP reaction (Fig. 4) . When used in combination with 12 rapid barcodes, 96 combinations are produced. Performance characteristics are currently being established and regulatory clearance to market is underway. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020. . (Table 3 and Supplementary Table 1 ). Performance characteristics are currently being established and regulatory clearance to market is underway. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020 Fig. 5a ). We correctly called 79 samples positive across the 80 COVID-19 positive samples at our optimal read count threshold. Read count results suggest that contamination led to the amplification of the SARS-CoV-2 targets E1 or N2 in the four samples that generated false positive calls. The optimal read count threshold of >=50 reads for a positive call was selected by maximizing the F1 score corresponding to the ROC curves (Fig. 5b) . Performance characteristics are currently being established and regulatory clearance to market is underway. All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020. . https://doi.org/10.1101/2020.08.07.20161737 doi: medRxiv preprint The F1 score was therefore used to identify the optimal read count threshold for calling a SARS-CoV-2 positive sample. In this manuscript we present a method which combines the rapid target-specific amplification provided by LAMP, a method of transposase-based library preparation, and real-time nanopore sequencing and data analysis. The resulting combination, LamPORE, is rapid, sensitive and highly scalable and here we demonstrate LamPORE's efficacy for detecting the presence or absence of SARS-CoV-2 RNA in clinical samples. The end-to-end procedure, beginning with 96 RNA extracts, and ending with positive and negative calls, can be performed in under 2 hours when sequencing on a MinION™ or GridION™. The number of samples that can be sequenced in parallel can be increased by expanding either the number of LAMP barcodes or the number of ONT rapid barcodes. In these circumstances, it is necessary to extend the length of the sequencing run. When using 12 different LAMP barcodes combined with 96 rapid barcodes (= 1,152 samples) we found that 4 hours of MinION sequencing was sufficient. After sequencing it is possible to remove the sample strands from the flow cell with a nuclease flush, and to load a fresh set of samples. Having a larger number of pores per flow cell, the length of the corresponding sequencing run is shorter on the PromethION, but alternatively, it is possible to use larger multiplexes. The remaining bottleneck in the current end-to-end workflow is that of extracting RNA from the biological samples. Recent publications have indicated that saliva is a suitable source of SARS-CoV-2 RNA in infected patients (19, 20) , and we have found that following heat-treatment, saliva with All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted August 11, 2020. . https://doi.org/10.1101/2020.08.07.20161737 doi: medRxiv preprint spiked-in inactivated SARS-Cov-2 virions can be amplified and sequenced successfully. This is currently being investigated further. We have seen in the work presented here that LAMP is capable of amplifying several targets simultaneously. LamPORE relies on sequencing, as opposed to a colour change, which raises the possibility of using a single multiplexed LamPORE reaction to detect many different pathogens. In the case of co-infection, it should also be possible to identify which combination of pathogens is present. We are currently developing a LamPORE assay to cover several viral respiratory illnesses including influenza, in multiplex. The nucleotide sequences of the SARS-CoV-2 genomes used in the primer inclusivity and crossreactivity analysis are available, upon free registration, from the GISAID database (https://www.gisaid.org/). Virological assessment of hospitalized patients with COVID-2019 Test sensitivity is secondary to frequency and turnaround time for COVID-19 surveillance Long delays in getting test results hobble coronavirus response Loop-mediated isothermal amplification of DNA LAMP-Seq: population-scale COVID-19 diagnostics using a compressed barcode space Serotyping dengue virus with isothermal amplification and a portable sequencer Suboptimal biological sampling as a probable cause of false-negative COVID-19 diagnostic test results False-netagive results of initial RT-PCR assays for COVID-19: a systematic review SARS-CoV-2 detection using an isothermal amplification reaction and a rapid, inexpensive protocol for sample inactivation and purification Scikit-learn: Machine Learning in Python GISAID: Global initiative on sharing all influenza data -from vision to reality Detection of Middle East respiratory syndrome coronavirus using reverse transcription loop-mediated isothermal amplification (RT-LAMP) World Health Organisation (2020) SARS (Severe Acute Respiratory Syndrome) A SARS-CoV-2 protein interaction map reveals targets for drug repurposing CDC 2019-Novel Coronavirus Saliva is a reliable tool to detect SARS-CoV-2 Saliva is more sensitive for SARS-CoV-2 detection in COVID-19 patients We would like to thank Dr. Thushan de Silva, University of Sheffield, UK and Dr. Cariad Evans, Sheffield Teaching Hospitals NHS Foundation Trust, UK for providing the positive, RT-qPCR-verified SARS-CoV-2 samples. We are also grateful to Adam Peltan and Nathan Tanner at New England Biolabs for discussions about LAMP assays and their help with primer sequences.We gratefully acknowledge the authors from the originating laboratories responsible for obtaining the specimens, and the submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which some of this research is based. All authors are employees of Oxford Nanopore Technologies and are shareholders and/or share option holders. For the version of the product intended for IVD use the company will be pursuing regulatory clearance to market. Performance characteristics of the IVD product are currently being established.