key: cord-0742214-0hxan9rw authors: Li, Chenyu; Debruyne, David N.; Spencer, Julia; Kapoor, Vidushi; Liu, Lily Y.; Zhou, Bo; Pandey, Utsav; Bootwalla, Moiz; Ostrow, Dejerianne; Maglinte, Dennis T; Ruble, David; Ryutov, Alex; Shen, Lishuang; Lee, Lucie; Feigelman, Rounak; Burdon, Grayson; Liu, Jeffrey; Oliva, Alejandra; Borcherding, Adam; Tan, Hongdong; Urban, Alexander E.; Gai, Xiaowu; Bard, Jennifer Dien; Liu, Guoying; Liu, Zhitong title: Highly sensitive and full-genome interrogation of SARS-CoV-2 using multiplexed PCR enrichment followed by next-generation sequencing date: 2020-05-18 journal: bioRxiv DOI: 10.1101/2020.03.12.988246 sha: d8e8c924795d799685bd231df5e144481ae64491 doc_id: 742214 cord_uid: 0hxan9rw Many detection methods have been used or reported for the diagnosis and/or surveillance of COVID-19. Among them, reverse transcription polymerase chain reaction (RT-PCR) is the most commonly used because of its high sensitivity, typically claiming detection of about 5 copies of viruses. However, it has been reported that only 47-59% of the positive cases were identified by some RT-PCR methods, probably due to low viral load, timing of sampling, degradation of virus RNA in the sampling process, or possible mutations spanning the primer binding sites. Therefore, alternative and highly sensitive methods are imperative. With the goal of improving sensitivity and accommodating various application settings, we developed a multiplex-PCR-based method comprised of 343 pairs of specific primers, and demonstrated its efficiency to detect SARS-CoV-2 at low copy numbers. The assay produced clean characteristic target peaks of defined sizes, which allowed for direct identification of positives by electrophoresis. We further amplified the entire SARS-CoV-2 genome from 8 to half a million viral copies purified from 13 COVID-19 positive specimens, and detected mutations through next generation sequencing. Finally, we developed a multiplex-PCR-based metagenomic method in parallel, that required modest sequencing depth for uncovering SARS-CoV-2 mutational diversity and potentially novel or emerging isolates. A variety of methods for detecting SARS-CoV-2 have been reported and discussed 1,2 , including RT-PCR, serological testing 3 and reverse transcription-loop-mediated isothermal amplification 4, 5 . Currently, RT-PCR is considered the gold standard for diagnosing SARS-CoV-2 infections because of its ease of use and high sensitivity. RT-PCR has been reported to detect SARS-CoV-2 in saliva 6 , pharyngeal swab, blood, rectal swab 7 , urine, stool 8 , and sputum 9 . In laboratory conditions, the RT-PCR methodology has been shown to be capable of detecting 4-8 copies of virus or lower, through amplification of targets in the Orf1ab, E and N viral genes, at 95% confidence intervals [10] [11] [12] . However, only about 47-59% of the true positive cases were identified by RT-PCR, and 75% of RT-PCR negative results were actually later found to be positive with other assays, hence mandating repeated testing 8, [13] [14] [15] . In addition, there is evidence suggesting that heat inactivation of clinical samples causes loss of viral particles, thereby hindering the efficiency of downstream diagnosis 16 . Therefore, it is necessary to develop robust, sensitive, specific and highly quantitative methods for reliable diagnostics 17, 18 . The urgency to develop an effective surveillance method that can be easily used in a variety of laboratory settings is underlined by the wide and rapid spreading of SARS-CoV-2 [19] [20] [21] . In addition, such method should also distinguish SARS-CoV-2 from other respiratory pathogens such as influenza virus, parainfluenza virus, adenovirus, respiratory syncytial virus, rhinovirus, human metapneumovirus, SARS-CoV, etc., as well as Mycoplasma pneumoniae, Chlamydia pneumoniae and other causes of bacterial pneumonia [22] [23] [24] [25] . Furthermore, obtaining full-length viral genome sequence through next generation sequencing (NGS) prove to be essential for the surveillance of SARS-CoV-2's evolution and for the containment of community spread [26] [27] [28] [29] . Indeed, SARS-CoV-2 phylogenetic studies through genome sequence analysis have provided better understanding of the transmission origin, time and routes, which has guided policy-making and management procedures 27, 28, [30] [31] [32] [33] . Here, we describe the development of a highly sensitive and robust detection assay incorporating the use of multiplex PCR technology to identify SARS-CoV-2 infections. Theoretically, the multiplex PCR strategy, by simultaneously targeting and amplifying hundreds of targets, has significantly higher sensitivity than RT-PCR and may even detect nucleotide fragments resulting from degraded viral genomes. Multiplex PCR has been shown to be an efficient and low-cost method to detect Hantaan orthohantavirus and Plasmodium falciparum infections 34, 35 , with high coverage (median 99%), specificity (99.8%) and sensitivity. Moreover, this solution can be tailored to simultaneously address multiple questions of interest within various epidemiological settings 35 . Similar to a recently described metagenomic approach for SARS-CoV-2 identification 36 , we also establish a user-friendly multiplex-PCR-based metagenomic method that is capable of detecting SARS-CoV-2, and could also be applied for the identification of novel pathogens with a moderate sequencing depth of approximately 1 million reads. Several RT-PCR methods for detecting SARS-CoV-2 have been reported to date 6,10-12 . Among them, two groups reported the detection of 4-5 copies of the virus 10,12 . To investigate the opportunity for further improvement upon the sensitivity of RT-PCR, we built a mathematical model to estimate the limit of detection (LOD) for SARS-CoV-2. The reported RT-PCR amplicon lengths are around 78-158bp, and the SARS-CoV-2 genome is 29,903bp (NC_045512.2). Thus, we chose 100bp amplicon length and 30kb SARS-CoV-2 genome size for mathematical modeling. With the assumption of 99% RT-PCR efficiency 11 , we found that RT-PCR assays could only detect 4.8 copies of SARS-CoV-2 at 95% probability (Fig. 1A) , which is consistent with the experimental results previously reported 10 . In this model, the predicted probability of RT-PCR assays to detect one copy of SARS-CoV-2 is only 26% (Supplemental Fig. 1 ). This finding may explain, at least in part, the reported 47-56% detection rates of SARS-CoV-2 with known positive samples by RT-PCR 8, 13 . We further discovered that the LOD appears to be independent of the viral genome size. For genomes of 4 to 100kb, the detection limit remains 4.8 copies at 95% probability. One way to elevate the sensitivity is to simultaneously target and amplify multiple regions of the viral genome in a multiplex PCR reaction, thereby increasing the frequency of occurrence in the mathematical model. Amplifying multiple targets has the advantage of potentially detecting fragments of degraded viral nucleotide fragments while tolerating genomic variations, thus allowing for the detection of new and ever-evolving viral strains. The amplification efficiency of multiplex PCR is critical for LOD. We estimated that the efficiency of our multiplex PCR technology is about 26% if using Unique Molecular Identifier (UMI)-labeled primers to count the amplified products after NGS sequencing (Supplemental Fig. 2 and Supplemental Table 1 ). However, the amplification efficiency could be lower, and amplicons would not be equally amplified if the template used is one single strand of cDNA. Thus, more amplicons are potentially required for multiplex PCR to detect limited copies of the virus. We designed a panel of 172 pairs of multiplex PCR primers in order to increase the sensitivity of detecting SARS-CoV-2 (Fig. 1B) . The average amplicon length is 99bp. The amplicons span across the entire SARS-CoV-2 genome with an average 76bp gap (76±10bp) between adjacent amplicons. Since the observed efficiency of multiplex PCR is about 26% in amplifying the four DNA strands of a pair of human chromosomes, we assumed an efficiency of 6% in amplifying the single-strand of a cDNA molecule. In addition, it has already been reported that 79% of variants are recovered when directly amplifying 600 amplicons from a single cell using our technology 37 . Therefore, we assumed that 80% of targeted regions would be amplified successfully. Using the same mathematical model described above, we estimated that our SARS-CoV-2 panel can detect 1.15 copies of the virus at 95% probability (Fig. 1C) . Again, the LOD is independent of virus genome size. We also designed a second pool of 171 multiplex PCR primer pairs. The target regions of these primer pairs overlap with the gaps between target regions of the previous pool of 172 primer pairs (Fig. 1B) . Together, these two overlapping pools of primers provide full coverage of the entire viral genome. Most importantly, using both pools in detection would lead to a calculated detection limit of 0.29 copies at 95% probability. The workflow was designed so that the multiplex PCR products are further amplified in a secondary PCR, during which sample indexes and NGS sequencing primers are added (Fig. 1D ). The PCR products were first analyzed by electrophoresis to visualize potential positives. Since dozens of target regions could be amplified from a single copy of SARS-CoV-2, electrophoresis peaks with a defined distribution of peak sizes were expected. Multiplex PCR could potentially amplify not only SARS-CoV-2, but also other coronaviruses, due to shared sequence similarities despite the fact that we designed primers specific to the SARS-CoV-2 genome to avoid cross amplification. In that context, electrophoresis analysis provides a fast and sensitive indication of infection from at least that family of viruses. For specificity, the generated NGS sequencing library can be interrogated for definitive identification of the specific virus. Two plasmids, containing the full sequence of the S and N genes of SARS-CoV-2, respectively, were used to validate our multiplex PCR method. A total of 28 targets are expected to be amplified within our 172-amplicon panel. To simulate the use of real clinical samples, these two plasmids were spiked into cDNA generated from human total RNA. The copy number of each plasmid was precisely determined by droplet-based digital PCR (ddPCR) with a QX200 system from Bio-Rad 38 . The two plasmids were diluted from approximately 9,000 copies to below one copy, and were amplified in multiplex PCR reactions. The library peaks of expected sizes were obtained from 8,900 to 2.8 copies of plasmids ( Fig. 2A) . Quantification of peaks demonstrated a wide dynamic range from 1 to about 1,000 copies of plasmids (Fig. 2B ). The yield of the libraries started to saturate when the copy number reached 1,700. It is possible that the saturation point could be even lower when all of the 172 amplicons are amplified from positive clinical COVID-19 samples, and the library peaks could be observed with even fewer viral copies. In contrast, the detected quantities of a single target on S gene by RT-PCR rapidly dropped when using 2.85 copies ( Fig. 2B and Supplemental Fig. 3 ). Estimated from the mathematical model described above, employing 28 amplicons provides a 16% chance to detect one single copy of the virus. We tested this predicted probability using one copy of plasmid in multiplex PCR reactions. The theoretical calculation gives a 66% probability to sample 1.1 copies, and a 12% chance to detect them based on a multiplex PCR efficiency of 6%. In practice, we experimentally observed a significantly higher 56% probability to detect 1.1 copies (Fig. 2C ). These results suggest that the efficiency of multiplex PCR is actually higher than the previously estimated 6% when a single-stranded cDNA molecule is amplified. When the amplified products were sequenced, we found that the recovered reads were within a range of about 20-fold relative depth with about 1.4 to 2.8 plasmids, and were uniformly distributed across the GC range (Fig. 2D ). When detecting down to 1.4 copies of plasmids, only the reads from one amplicon were about 100-fold lower than the average. Approximately 96% of the amplicons were recovered with 14 copies of plasmids, 77% with 2.8 copies, and 37% with 0.6 copies (Fig. 2E ). The above two pools of primers were used to amplify the full SARS-CoV-2 genomes from a total of 13 nasopharyngeal swab specimens. These specimens were previously diagnosed to be SARS-CoV-2 positive by using RT-PCR. The viral load was found to be from 8 to 675,885 RNA copies/uL (Supplemental Table 2 ). These 13 viral genomes were successfully amplified and subsequently sequenced on the Illumina MiSeq. We found a correlation between genome coverage and viral copy number, as expected. While about 95% of the genome was covered at 100X for 8 copies of virus, 98-99% of the genomes were covered at 100X for 22 to 675,885 copies of virus at an average sequencing depth of 5,000 reads per amplicon (Fig. 3A ). One genome, from the sample with 5,000 estimated viral copies, was covered at 96% for 100X. This coverage was lower than expected, and might have been caused by the poor sample quality resulting from processing or handling the viral RNA or library. CleanPlex libraries are usually sequenced at about 1,000 paired-end reads while still generating sufficient data for detecting mutations. To confirm SARS-CoV-2 libraries could be sequenced at about 1,000 reads per amplicon, we sub-sampled the data of 5 genomes to 150,000 and 100,000 total reads per library, which were equivalent to 437 and 291 reads per amplicon, respectively. At least 93% of the genome was covered at 100X with 150,000 total reads, and 86% of the genome was covered at 100X with 100,000 total reads. Even at 50,000 total reads per library, at least 58% of the genome was covered at 100X (Fig. 3B) . The high coverage was also manifested by the superior uniformity of the number of amplicons amplified in the multiplex PCR reaction and recovered in the sequencing (Fig. 3C) , and by the log10 distance of the number of each amplicon to the mean number of all amplicons in the library (Fig. 3D ). The mutations in the SARS-CoV-2 genome from the 13 specimens were detected by two independently developed algorithms. Only those mutations that were detected by both methods were reported. Assuming all viral particles from a single patient contained identical mutations, mutations with frequencies (%AF) > 60% were considered to be empirically true. The majority of the mutations identified in these 13 SARS-CoV-2 genomes clustered around 7 loci, probably reflecting the collection of these specimens in close communities or the transmission of the virus (Fig. 3E) . According to the similarity of these mutations, samples were categorized into three groups. The majority of the mutations in the first group showed >98% mutation frequency. Groups 1 and 2 shared a considerable number of identical mutations, while Group 3 was significantly different, suggesting that the origin of this isolate might be traced back to a distinct lineage (Supplemental Table 3 ). Of note, all 13 strains contained at least one mutation that has been reported to be associated with SARS-CoV-2 virulence 39 . The D614G mutation (a G-to-A base change at position 23,403 in the reference strain NC_045512.2), which began spreading in Europe in early February, was then transmitted to new geographic regions and became a dominant form 40 , was found in 11 of these 13 specimens. We considered mutations with %AF ≤ 60% as a likely reflection of intra-host heterogeneity 41 . To eliminate noise from PCR amplification and sequencing, only mutations with %AF ≥ 20% were considered. The 20% cutoff was selected based on the mutation profile from sequencing the synthetic SARS-CoV-2 RNA controls from Twist Bioscience. Some true intra-host mutations with %AF < 20% might be missed. Since no UMI was used in the multiplex PCR amplification, intra-host mutations might still be contaminated with noise originating from PCR amplification, even though a 20% cutoff was applied. The occurrence of such noise may be exacerbated by low viral copy inputs in the multiplex PCR, or low read depth per amplicon in sequencing. In groups 1 and 3, where both viral copy numbers and read depth per amplicon were high, only one intra-host mutation per genome was found in some of the specimens. In contrast, group 2 had low copy numbers, and low read depth per amplicon, and the numbers of apparent intra-host mutations were considerably higher ( Fig. 3E and Supplemental Table 3 ). Some of these intrahost mutations occurred at the aforementioned 7 loci, or were found at other loci and were identical among the 4 specimens in group 2, while the remaining ones appeared random. Our findings suggested that these recurring mutations might not be true intra-host mutations. Indeed, it is possible that the low copy number inputs, as well as sequencing depth, caused a reduction in the %AF, and additionally introduced false intra-host mutations. Therefore, copy number and sequencing depth should be cautiously considered when a mutation is found to have low %AF. In order to characterize highly mutated viruses that would otherwise not be amplified by the predesigned primer pairs, and to discover unknown pathogens, we subsequently developed a user-friendly multiplex-PCR-based metagenomic method. In this method, random hexamer-adapters were used to amplify DNA or cDNA targets in a multiplex PCR reaction. The large amounts of non-specific amplification products were removed by using Paragon Genomics' background removal reagent, thus resolving a library suitable for sequencing. For RNA samples, Paragon Genomics' reverse transcription reagents were used to convert RNA into cDNA, resulting in significantly reduced amount of human ribosomal RNA species. We sequenced a library made with 4,500 copies of N and S gene-containing plasmids spiked into 10 ng of human gDNA, which roughly represents 3,300 haploid genomes. Even though the molar ratios of viral targets and human haploid genomes were comparable, the N and S genes, which encompass about 4kb of targets, were a negligible fraction of the 3 billion base pairs of a human genome. If every region of the human genome were amplified and sequenced at 0.6 million reads per sample, only one read of viral target would be recovered. In fact, our results showed that 16% of the recovered bases, or 13% of the recovered reads, were within the viral N and S genes ( Fig. 4A and Supplemental Table 4 ). 80% of SARS-CoV-2 and 78% of mitochondrial targets were covered (Fig. 4B) , and their base coverage was significantly higher than for human targets (Fig. 4C) . In contrast, only 0.08% of human chromosomal regions were amplified. Furthermore, the human exonic regions were preferentially amplified (Fig. 4D ). This suggested that the random hexamers deselected a large portion of the human genome, while favorably amplifying regions that were more "random" in base composition. Indeed, long gaps and lack of coverage in very large repetitive regions were observed in human chromosomes (Fig. 4E) . On the contrary, the gaps in SARS-CoV-2 and mitochondrial regions were significantly shorter (Fig. 4F) . We further optimized this method so that 96% of the SARS-CoV-2 genome was recovered (Fig. 4G) . The depth of the recovered bases was within a 10-fold range on average. This 10-fold difference in coverage has been routinely observed with our multiplex PCR technology (Supplemental Fig. 4 and Supplemental Table 5 ). Therefore, increasing sequencing depth alone might not improve the coverage of the targeted regions further. This study provides a highly sensitive and robust multiplex PCR method for the detection of SARS-CoV-2. By amplifying hundreds of targets simultaneously, our multiplex PCR method is more sensitive than RT-PCR, and tolerates the presence of mutations in SARS-CoV-2. For the purpose of diagnosis, only one of two primer pools could be used, or the two pools could be alternatively used in adjacent samples to prevent cross contamination. While the amplification products from positive samples are mainly viral amplicons, low quantities of primer dimers are produced in the negative samples. Therefore, a simple measurement of the dsDNA concentration by fluorometry or spectrophotometry would not be sufficient. High-resolution electrophoresis is required to resolve the length of the amplification products in order to differentiate the target amplicons from the primer-dimers. Alternatively, a low-depth sequencing in the range of 50K reads per sample would provide definitive diagnostic results. When both primer pools are used, the entire genome of SARS-CoV-2 can be enriched, sequenced and interrogated for the presence of any mutations. We demonstrated that mutations were detected from samples with viral loads ranging from 8 to half a million copies. For accurate sequencing and phylogenetic studies, a high-depth sequencing in the range of 300K reads per sample, along with an input of high viral load (>100 copies), are deemed necessary. We caution the interpretation of intra-host mutations obtained with a low input number of viral particles and in low sequencing depth data. The current SARS-CoV-2 panel was demonstrated to specifically amplify the entire SARS-CoV-2 genome, and sequencing data obtained from the 13 COVID-19 RT-PCR positive samples clearly differentiate SARS-CoV-2 from other human coronaviruses, such as MERS-CoV, CoV 229E, CoV OC43, CoV NL63, CoV HKU1. 0% of the obtained sequencing reads from each of the 13 samples were aligned to the genome sequence of any of the above viral species. This is in contrast to what was reported for a similar SARS-CoV-2 panel 42 . Such high specificity argues strongly that this panel could be further expanded to include simultaneous detection of other respiratory viruses including influenza virus. Metagenomic method is a powerful technology that can theoretically detect any sequences in the specimens. However, metagenomic methods usually require very high sequencing depth in order to find the target sequences, and hence are economically prohibitive as a diagnostic assay. To overcome this constraint, we developed a multiplex-PCR-based metagenomic method that achieved >96% coverage of the S and N genes of SARS-CoV-2 in the contest of human gDNA, while only required ~0.6M of total reads per library. This coverage was superior given the recommended 50% threshold of coverage for drafting a genome 43 . The results were obtained with no additional means of host depletion to remove human gDNA and rRNA. The viral bases were 16% of the total recovered bases in the sequencing. Yet it still necessary to verify and validate the detection of SARS-CoV-2 and the other coronaviruses and respiratory viruses by this metagenomic method. Clinical SARS-CoV-2 samples were collected at Children's Hospital of Los Angeles. Samples and ancillary clinical and epidemiological data were de-identified before analysis, and are thus considered exempt from human subject regulations, with a waiver of informed consent according to 45 CFR 46.101(b) of the US Department of Health and Human Services. Analysis of the nasopharyngeal swab samples from patients with COVID-19 disease was approved by the Ministry of Health in the US. Patients in the 2020 COVID-19 outbreak from 1 January 2020 to 30 August 2020 provided oral consent for study enrolment and the collection and analysis of their nasopharyngeal swab. Consent was obtained at the homes of patients or in hospital isolation wards by a team that included staff members of Children's Hospital of Los Angeles. The Universal Human Reference RNA was from Agilent Technologies, Inc. (Cat#74000). The plasmids containing either S or N gene of SARS-CoV-2 (pUC-S and pUC-N, respectively) were purchased from Sangon Biotech, Shanghai, China. The PCR primers used in ddPCR and RT-PCR reactions for S gene are 5'-TGTACTTGGACAATCAAAAAGAGTTGAT and 5'-AGGAGCAGTTGTGAAGTTCTTTTC; for N gene are 5'-GGGGAACTTCTCCTGCTAGAAT and 5'-CAGACATTTTGCTCTCAAGCTG, respectively. Panel design is based on the SARS-CoV-2 sequence NC_045512.2 (https://www.ncbi.nlm.nih.gov/ nuccore/NC_045512.2/). In total, 343 primer pairs, distributed into two separate pools, were selected by a proprietary panel design pipeline to cover the whole viral genome except for 92 bases at its ends. Primers were optimized to preferentially amplify the SARS-CoV-2 cDNA versus background human cDNA or genomic DNA. They were also optimized to amplify the covered genome uniformly. 50ng of Universal Human Reference RNA was converted into cDNA using random primers and SuperScript IV Reverse Transcriptase, following the supplier recommended method (Thermo Fisher Scientific, Cat# 18090050). After reverse transcription, cDNA was purified with 2.4X volume of magnetic beads, and washed twice with 70% ethanol. Finally, the purified cDNA was dissolved in 1X TE buffer and used per multiplex PCR reaction. Plasmids pUC-S and pUC-N, in combination with human cDNA, were used in each reaction. Paragon Genomics' CleanPlex secondary PCR mix was used with 100nM of each PCR primers in 10ul reactions. The PCR thermal cycling protocol used was 95°C for 10min, then 98°C for 15sec, 60°C for 30sec for 45 cycles. ddPCR ddPCR was performed on QX200 from Bio-Rad. Plasmids pUC-S and pUC-N at the estimated copy numbers 1 (6 repeats), 2 (3 repeats), and 100 (3 repeats) were tested. In each reaction, the ddPCR thermal cycling protocol used was 95°C for 5min, then 95°C for 30sec, 60°C for 1min with 60 cycles, 4°C for 5min and 90°C for 5min, 4°C hold. The resulting data were analyzed by following the supplier recommended method. Paragon Genomics' CleanPlex multiplex PCR reagents and protocol were used. Briefly, a 10µl multiplex PCR reaction was made by combining 5X mPCR mix, 10X Pool 1 of the panel, water and viral template cDNA. The reaction was run in a thermal cycler (95°C for 10min, then 98°C for 15sec, 60°C for 5min for 10 cycles), then terminated by the addition of 2µl of stop buffer. The reaction was then purified by 29µl of magnetic beads, followed by a secondary PCR with a pair of primers for 25 cycles. The secondary PCR added sample indexes and sequencing adapters, allowing for sequencing of the resulting products by high throughput sequencing. A final bead purification was performed after the secondary PCR, followed by library interrogation using a Bioanalyzer 2100 instrument with Agilent High Sensitivity DNA Kit (Agilent Technologies, Inc. Part# 5067-4626). A cumulative Poisson probability was used to build the mathematical model. In Microsoft Excel, the following function was used: 80% of targets were assumed to be successfully amplified in multiplex PCR. For RT-PCR, = f × n × m. q = number of detected virus genomes. q is used to plot the graph reported in the paper, e.g., the copies of virus against the matching probabilities. For multiplex PCR, q = n/a, a = the number of amplicons used in the multiplex PCR. For RT-PCR, q = n × f. Paragon Genomics' CleanPlex metagenomic reagents and protocol were used. Briefly, a 10µl multiplex PCR reaction was made by combining 5X mPCR mix, 10X random hexamer-adapters, water and the viral template cDNA. The PCR thermal cycling protocol used was 95°C for 10min, then 98°C for 15sec, 25°C for 2min, 60°C for 5min for 10 cycles. The reaction was then terminated by the addition of 2µl of stop buffer, and purified by 29µl of magnetic beads. The resulting solution was treated with 2µl of CleanPlex reagent at 37°C for 10min to remove non-specific amplification products. After a magnetic bead purification, the product was further amplified in a secondary PCR with a pair of primers for 25 cycles to produce the metagenomic library. This metagenomic library was further purified by magnetic beads before sequencing. High throughput NGS sequencing was performed using Illumina iSeq 100, MiSeq and MGI sequencers (DNBSEQ-G400 and its research-grade CoolMPS sequencing kits). Detailed information for the samples sequenced and used in this manuscript can be found in Supplemental Table 4 . Raw sequencing data were trimmed for adaptors using cutadapt version 1.14. The sequences obtained were mapped to the SARS-CoV-2 genome (NC_045512.2) with bwa-mem using Sentieon version 201808.01. Duplicate read marking was skipped. Base-quality recalibration, re-alignment of indels and quality metrics was accomplished with Sentieon. The resulting BAM files were then used to calculate depth and coverage metrics using Samtools version 1.3.1. Algorithms developed independently at Children's Hospital of Los Angeles and paragon Genomics were sued to detect the mutations in the genome of SARS-CoV-2. All sequencing data used in this publication are available for downloading at NCBI's Sequence Read Archive (https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA614546). The workflow of the multiplex PCR method. The prepared libraries can be detected using high-resolution electrophoresis, and sequenced together with other samples using high-throughput sequencing. (A) Two plasmids, containing SARS-CoV-2 S and N genes, respectively, were diluted in human cDNA and amplified in multiplex PCR with pool 1 (172 pairs of primers). The number of plasmid copies per reaction, determined by ddPCR, were from 8,900 to 0.6. The resulting products obtained after multiplex PCR were resolved by electrophoresis. The specific amplification products (the library) can still be seen with 2.8 copies of plasmids. (B) The library yields can be detected down to 0.6 copies of plasmids (n=4) by multiplex PCR (black line), while only down to 2.8 copies by RT-PCR (> 4.5-fold difference) (red line). (C) Poisson process was used to estimate the chance of sampling around 1 copy of viral particles, and the mathematical model was used to estimate the chance of detecting them (red line). There is 12% of probability to detect 1.1 copies with a multiplex PCR efficiency of 6%. In reality, we observed a significantly higher 56% probability for 1.1 copies and 100% probability for 1.4 copies. (D) After sequencing 1.4 and 2.8 copies of plasmids, the reads of all 28 amplicons spanning both N and S genes were clustered within a 20fold range of coverage (n=3). With 1.4 copies, only the reads of one amplicon were about 100-fold lower than the average (n=3). (E) About 96% of amplicons were recovered with 14 copies of plasmids, 77% with 2.8 copies, and 37% with 0.6 copies (n=3). . The entire genomes of SARS-CoV-2 were amplified and sequenced from a cohort of 13 COVID-19 positive patients. The copies of virus that were used in the multiplex PCR reaction range from 8 to 675,885. 98.3-99.9% of the genomes were covered at 100X with an average of 5,000 reads per amplicon from 22 to 675,885 copies. The coverage slightly decreased to 95% with 8 copies of virus. (B) Sub-sampling of 5 samples to 150,000 total reads slightly reduced the 100X coverage to 93-97%. (C) The amplification performance of the 343 amplicons for each sample was measured by the uniformity of 0.2X mean of the reads. Each circle represents one sample. (D) In one sample, the performance of each amplicon was evaluated by their log10 distance to the mean reads. Each circle represents one amplicon. (E). The mutations in each SARS-CoV-2 genome were detected and the SARS-CoV-2 samples were segregated into 3 groups based on similarities. The majority of the mutations were shared by groups 1 and 2. Group 2 contained low viral copy numbers in multiplex PCR, low reads per amplicon in sequencing and a considerable amount of apparent intra-host mutations. Mutations in group 3 were different from the other 2 groups. The mutations that associate with virulence 39 are in bold. A23403G, which associates with high transmission 40 , is in red. The reference genome used was NC_045512.2. Black vertical lines represent point mutations, green vertical lines represent intra-host mutations, red vertical lines represent deletions. (A) Random hexamer-adapters were used in multiplex PCR to amplify 4,500 copies of plasmids in the background of 3,300 haploid human gDNA molecules. The resulting libraries were sequenced at an average of 0.6 million total reads. Of the total bases recovered, 16% were mapped to SARS-CoV-2 S and N genes. (B) 80% of S and N genes, and 78% of the human mitochondrial chromosome were amplified with >= 1X coverage, while only 0.08% of the human chromosomes were. (C) On average, S and N genes were covered at 2,346X, mitochondrial and human chromosomes at 77X and 20X, respectively. (D) Human exons were relatively over-amplified, at about 4-fold higher compared to their actual ratio within the genome. (E) Gaps and long regions of absence of amplification were observed for human chromosomes. An example shown here is chromosome Y. Small gaps were additionally found in the enlarged cluster of amplification. The long absent region (red double arrow) overlapped with the repetitive regions on Y chromosome. (F) Representation of the recovered regions in S and N genes and the human mitochondrial chromosome. The coverage was from 1,000-to 10,000-fold for S and N genes, and 30-to 500-fold for the mitochondrial chromosome. (G) This metagenomic method was subsequently improved, resulting in 95.7% of the SARS-CoV-2 S gene covered at 22.5X and 96.2% of the N gene covered at 12.8X, with a sequencing depth of 0.53M total reads. Measures for diagnosing and treating infections by a novel coronavirus responsible for a pneumonia outbreak originating in Wuhan Recent advances in the detection of respiratory virus infection in humans Molecular and serological investigation of 2019-nCoV infected patients: implication of multiple shedding routes Rapid Detection of Novel Coronavirus (COVID-19) by Reverse Transcription-Loop-Mediated Isothermal Amplification. medRxiv Rapid colorimetric detection of COVID-19 coronavirus using a reverse tran-scriptional loop-mediated isothermal amplification (RT-LAMP) diagnostic plat-form: iLACO. medRxiv Consistent detection of 2019 novel coronavirus in saliva Detectable 2019-nCoV viral RNA in blood is a strong indicator for the further clinical severity Comparison of different samples for 2019 novel coronavirus detection by nucleic acid amplification tests Comparison of throat swabs and sputum specimens for viral nucleic acid detection in 52 cases of novel coronavirus (SARS-Cov-2) infected pneumonia (COVID-19). medRxiv Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR Molecular Diagnosis of a Novel Coronavirus (2019-nCoV) Causing an Outbreak of Pneumonia Development of Genetic Diagnostic Methods for Novel Coronavirus 2019 (nCoV-2019) in Japan Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases Chest CT for Typical 2019-nCoV Pneumonia: Relationship to Negative RT-PCR Testing Comparison of the clinical characteristics between RNA positive and negative patients clinically diagnosed with Racing Towards the Development of Diagnostics for a Novel Coronavirus (2019-nCoV) Clinical and biochemical indexes from 2019-nCoV infected patients linked to viral loads and lung injury First cases of coronavirus disease 2019 (COVID-19) in France: surveillance, investigations and control measures Novel Coronavirus Outbreak in Wuhan, China, 2020: Intense Surveillance Is Vital for Preventing Sustained Transmission in New Locations Laboratory readiness and response for novel coronavirus (2019-nCoV) in expert laboratories in 30 EU/EEA countries Interpretation of "Guidelines for the Diagnosis and Treatment of Novel Coronavirus (2019-nCoV) Infection by the National Health Commission (Trial Version 5 Diagnosis and clinical management of 2019 novel coronavirus infection: an operational recommendation of Peking Union Medical College Hospital Genome Detective Coronavirus Typing Tool for rapid identification and characterization of novel coronavirus genomes Diagnosis, treatment, and prevention of 2019 novel coronavirus infection in children: experts' consensus statement Emerging novel coronavirus (2019-nCoV)-current scenario, evolutionary perspective based on genome analysis and recent developments Potential of large "first generation" human-to-human transmission of 2019-nCoV Transmission dynamics and evolutionary history of 2019-nCoV Cross-species transmission of the newly identified coronavirus 2019-nCoV Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding The 2019-new coronavirus epidemic: Evidence for virus evolution The global spread of 2019-nCoV: a molecular evolutionary analysis The first two cases of 2019-nCoV in Italy: Where they come from Comparison of targeted next-generation sequencing for whole-genome sequencing of Hantaan orthohantavirus in Apodemus agrarius lung tissues Sensitive, highly multiplexed sequencing of microhaplotypes from the Plasmodium falciparum heterozygome. bioRxiv RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak Abstract 439: Targeted single cell DNA sequencing without prior whole genome amplification for mutational analysis of circulating tumor cells Comparison of four digital PCR platforms for accurate quantification of DNA copy number of a certified plasmid DNA reference material Patient-derived mutations impact pathogenicity of SARS-CoV-2. medRxiv Spike mutation pipeline reveals the emergence of a more transmissible form of SARS-CoV-2. bioRxiv Characterization of intra-host SARS-CoV-2 variants improves phylogenomic reconstruction and may reveal functionally convergent mutations. bioRxiv A rapid, low cost, and highly sensitive SARS-CoV-2 diagnostic based on whole genome sequencing. bioRxiv Supplemental Information Highly sensitive and full-genome interrogation of SARS-CoV-2 using multiplexed PCR enrichment followed by next-generation sequencing Utsav Pandey 5 , Moiz Bootwalla 5 , Dejerianne Ostrow 5 , Dennis Maglinte 5 CA 94545 USA 2 Department of Psychiatry and Behavioral Sciences, Department of Genetics Children's Hospital Los Angeles The authors would like to thank Jian Xu of MGI, BGI-Shenzhen for technical support in MGI sequencing, Dr. Jin Billy Li of Stanford University for the coordination of academic support, Dr. Alexander E. Urban of Stanford University for suggestions in preparing the manuscript, and Dr. Zihuai He of Stanford University for discussions on a potential mathematical model of the metagenomic method. The authors declare no competing interests.