key: cord-0997914-s9scrjkb authors: Hoff, Kendall; Ding, Xun; Carter, Lucas; Duque, John; Lin, Ju-Yu; Dung, Samantha; Singh, Priyanka; Sun, Jiayi; Crnogorac, Filip; Swaminathan, Radha; Alden, Emily N.; Zhu, Xuechen; Shimada, Ryota; Posavi, Marijan; Hull, Noah; Dinwiddie, Darrell; Halasz, Adam M.; McGall, Glenn; Zhou, Wei; Edwards, Jeremy S. title: Highly Accurate Chip-Based Resequencing of SARS-CoV-2 Clinical Samples date: 2021-04-13 journal: Langmuir DOI: 10.1021/acs.langmuir.0c02927 sha: 52b7c5f615eeb07e1cb21a5642e88733680b11ce doc_id: 997914 cord_uid: s9scrjkb [Image: see text] SARS-CoV-2 has infected over 128 million people worldwide, and until a vaccine is developed and widely disseminated, vigilant testing and contact tracing are the most effective ways to slow the spread of COVID-19. Typical clinical testing only confirms the presence or absence of the virus, but rather, a simple and rapid testing procedure that sequences the entire genome would be impactful and allow for tracing the spread of the virus and variants, as well as the appearance of new variants. However, traditional short read sequencing methods are time consuming and expensive. Herein, we describe a tiled genome array that we developed for rapid and inexpensive full viral genome resequencing, and we have applied our SARS-CoV-2-specific genome tiling array to rapidly and accurately resequence the viral genome from eight clinical samples. We have resequenced eight samples acquired from patients in Wyoming that tested positive for SARS-CoV-2. We were ultimately able to sequence over 95% of the genome of each sample with greater than 99.9% average accuracy. To date, there have been over 128 million confirmed cases and nearly 2.8 million deaths worldwide due to COVID-19, caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2). 1 COVID-19 is highly contagious and rapidly spread within the human population and was defined as a global pandemic in March 2020 by the World Health Organization. Vigilant testing and tracing are essential for controlling the SARS-CoV-2 virus, so a technology to monitor the evolution of the viral genome and the emergence of virus variants and detect possible transmission chains is highly desirable. 2,3 Such a technique requires a rapid, inexpensive, and accurate tool that can detect genetic variants within the SARS-CoV-2 genome with single base resolution. Next-generation sequencing (NGS) methods for sequencing the viral genome are established and accurate. However, it is hard to take advantage of the throughput of NGS by multiplexing a large number of samples, and major costs are associated with library preparation. Therefore, the cost of NGS sequencing of viral genomes is relatively high (cost per assembled base of a viral genome is ∼10 000 times more expensive than the human genome). To remedy this problem, a number of studies have demonstrated the capability of DNA arrays in the detection, 4 surveillance, 5, 6 and screening of multiple viral strains, including coronavirus. 7, 8 However, DNA arrays originally only used a limited number of oligonucleotide features, leading to a bias in genome coverage. However, improvements in DNA array technology and decreasing production costs led to the development of whole genome tiling arrays with high-density oligonucleotide features that cover each base in the genome with sense and antisense probes to the genome of interest. 9, 10 These tiling arrays can be applied to resequence the genome (identify single nucleotide variants) from clinical samples at a very low cost. 11, 12 Here, we describe a full genome tiling array with more than 240 000 features that provide 2× coverage of the entire SARS-CoV-2 genome and the use of such a genome tiling array to sequence the genome from eight clinical samples from SARS-CoV-2-positive subjects. Our results indicate that we can sequence at least 95% of the viral genome with on average greater than 99.9% accuracy. Sample Preparation for Illumina Sequencing. Samples were prepared as previously described using the ARTIC sequencing methods. In brief, cDNA was prepared from total RNA extracted from clinical samples using SuperScript IV (SSIV, Thermo Scientific) and random hexamer priming. The resultant cDNA was amplified in two polymerase chain reaction (PCR) reactions using the ARTIC Pool1 and Pool2 SARS-CoV-2 v3 primer sets and Q5 high fidelity DNA polymerase (NEB). Following PCR, samples were purified using AMPure XP SPRI beads (Beckman Coulter). Illumina adaptors were added using the NEBNext Ultra II DNA Library Prep Kit (NEB), and SPRI bead purification was repeated. Sample Preparation for Sequencing on Chips. To prepare samples for hybridization to the chips, 0.05 μL of the purified PCR product was amplified using the ARTIC protocol and Pool1 and Pool2 v3 primer sets for 35 cycles with 50 μM biotin-11-dUTP (Jena Biosciences) added to the reaction mixture. Pool1 and Pool2 were combined for each sample and fragmented using DNase I (D4263, Sigma-Aldrich). Two thousand Kunitz units of lyophilized DNase I was resuspended on ice using 2 mL of 1× DNase I Buffer (10 mM Tris−HCl pH 7.5, 2.5 mM MgCl 2 , 0.1 mM CaCl 2 ). The resuspended enzyme was diluted 1000-fold using 1x DNase I Buffer, and an equal volume was added to samples prewarmed to 37°C. Samples were incubated for 30 min at 37°C, and the reactions were stopped by adding ethylenediaminetetraacetic acid (EDTA) to a final concentration of 12.5 mM and incubating for 20 min at 75°C. Hybridization. Forty-five microliters of the fragmented sample was hybridized overnight at 45°C to the chip in a 60 μL final volume containing 5 mM EDTA, 6.25 mM HEPES pH 8.0, 312.5 mM NaCl, 1.25% Ficoll 400, 0.5 nM Cy3-AM1 (GCTGTATCGGCTGAATCG-TA). Following hybridization, chips were washed for 10 min at room temperature in Wash A (2× SSC, 0.1% TWEEN-20) and then for 10 min at 39°C in Wash B (0.5×SSC, 0.1% TWEEN-20). Chips were stained for 15 min at room temperature using 0.02 mg/mL Cy3-Streptavidin (Thermo) in 4× SSC and washed for 5 min at room temperature using 4× SSC. Chips were scanned using a custom-built confocal scanner for 0.5, 1, 4, and 8 s in the green (Cy3) channel in 4× SSC. Reverse Transcription Polymerase Chain Reaction (RT-PCR) Using the CDC N1 and N2 Primer Sets. RT-PCR was performed using the CDC N1 and N2 primer sets. Five hundred copies of the SARS-CoV-2 genome (Twist Biosciences) were amplified in a 25 μL reaction volume using the SuperScript IV One-Step kit (Thermo Fisher) containing 250 nM of each primer (Primer mix 1−152), 50 μM biotin-11-dUTP (Jena Biosciences), and 0.5 μL of RT enzyme mix. Cycling was performed as follows: 12 min at 45°C, 2 min at 98°C , 40 cycles of 10 s at 98°C, 11 s at 61°C, and 11 s at 72°C, followed by a final extension of 2 min at 72°C. The PCR product was hybridized to the array as described above with one modification: hybridization was performed for 90 min instead of overnight. Base Calling Approach. Base Calling for N1 and N2 Amplicon Experiments. RT-PCR products are hybridized to chips and imaged on the custom-built confocal scanner. A synthetic alignment marker sequence, "Cy3-AM1", is added to the hybridization mixture containing the RT-PCR products and hybridization buffer. This sequence hybridizes in a square pattern at predetermined regularly spaced locations across the chip, as illustrated in Figure 1 . The images are stitched together and gridded to create a composite image and using the positional information from the Cy3-AM1 sequences, and intensities for each feature on the chip are extracted from the image and stored in a. csv text file. Each base has two corresponding probe sets: one for the sense strand and one for the antisense strand. Each probe set consists of four features, one for each base, ATCG; thus, Figure 2 . Development of the maximum likelihood base caller for SARS-CoV-2 genome sequencing using full genome tiling arrays. (A) Density plot derived from a two-dimensional (2D) histogram of the incorrect calls from all tiling array probe sets including sense and antisense data for a single exposure. This image was constructed by "calling" each base in the genome using all probe sets. With this approach, each base is called twice, once from the sense probe sets and once from the antisense probe sets. The difference and differential of a call are included in the histogram if the base call does not match the reference. Contours indicate a likelihood function proportional to the two-dimensional cumulative sum of the density; the sum is normalized to indicate the fraction of wrong calls whose quality parameters are higher than the given point; higher values indicate a higher likelihood that a call is "wrong". (B) Same as A, except the 2D histogram is for the correct calls from all tiling array probe sets. Contours indicate a cumulative sum of the density, normalized to indicate the likelihood that a call is correct. It can be observed that the distribution of the difference and differential of "correct" calls is very different from the "incorrect" calls. (C) Using the observation from panels A and B, we constructed a function to assign the likelihood that a probe set is calling the correct base for a given position. The dotted contours define the (combined) likelihood that a probe set is correctly calling the correct base, based on the difference and differential score for that probe set. The triangle points on the plot illustrate the different and differential values for probe sets for all variant sites that have been reported in Wyoming samples in the GISAID database as of August 2020. The green triangles indicate that the base call from this scan suggests a reference call, whereas a red triangle indicates that the call suggests a nonreference base at this position. If the triangle points up, this is from the sense probe set, whereas a downward pointed triangle indicates the data is for the antisense probe set. The triangle outline is filled in if this probe set from this scan resulted in the highest likelihood for the correct call among all scans for this position. This image is the 4 s scan of the WY64 sample. To call all of the bases, we construct a similar likelihood function for each scan, and this information was combined as described in the methods to make the final base call. are calculated for the sense and antisense probe sets separately. A putative base is called for the probe set with the highest intensity for the sense set and separately for the antisense set. If the differential is smaller than a threshold (2% default) and/or if the difference is smaller than a threshold (20 for low-intensity chips or 100 for highintensity chips), the base is called "N" for unknown. If a base is not called N, the differentials for sense or antisense set are compared, and the set with a larger differential is used to call the base. FASTA files are generated from these parameters and aligned to references. Whole Viral Genome Base Calling Using Likelihood Maps for Multiple Exposures. Multiple successive images (0.5, 1, 4, and 8 s exposures) are taken for one chip, resulting in a set of N E measures (typically N E = 3 or 4). For each position within the genome, we obtain 2N E sets (sense and antisense) of four intensities {I ⃗ j } j = 1···2N E , I ⃗ = [I A ,I T ,I C ,I G ]. The intensities within each set are sorted I max ≡ I 3 ≥ I 2 ≥ I 1 ≥ I 0 ≡ I min , and the base corresponding to the highest one is tentatively called for the respective location. Ideally, the call from each of the sets would indicate the same base, but this is not always the case. We use the difference D ≡ I max − I min , differential d rel ≡ (I max − I 2 )/D, and signal magnitude I max = I 3 to assign a credibility score to each base call from the sense and antisense probe sets at each exposure. We follow a Bayesian-inspired approach, where we rely on a reference genome to indicate whether a preliminary call is likely After identifying the base with the highest intensity for each of 2N G sets for a given exposure time, we group the sets into two groups, those that match the reference call and those that do not ( Figure 2 ). The distributions of the "quality parameter" values (D,d rel ) for these groups follow different geometric patterns, as illustrated in Figure 2 . Each of the parameters taken separately is informative, in that higher values indicate higher confidence that a call is correct. However, the geometry of the distributions in Figure 2A ,B illustrates the difficulties associated with comparing two calls, where one has a higher absolute difference and the other has a higher relative differential. To illustrate our methods, the likelihood score using one quality parameter is described. If we rely on one quality parameter x, which could be one of D,d rel ,I max to infer the likelihood that a read is correct, we can make the following argument. Without knowledge of the quality parameter x, the probability that a call is correct is approximated by the fraction of calls that are correct, P correct ≈ N correct /N total . Regarding the dependence on x, first note that the likelihood that a call is correct increases with x, that is, a call with parameter value x′ > x is more likely correct than the one with x. Second, given that a call is correct, its parameter x follows some (continuous) distribution ρ correct (x). This local density of correct calls is not a good estimate that a call is correct since it will decrease as x increases beyond the location of the peak density. The local probability that a call is correct should follow from the ratio of the local density of correct calls to that of all calls (correct and incorrect); however, these are too small to estimate far away from the center of the respective distributions. We assign a likelihood that a call with parameter value x = λ is correct by comparing the fraction of correct calls with lower or higher x values. If the likelihood of a call being correct increases with x, then all of the calls with x < λ are less likely and those with x > λ are more likely to be correct. A call whose x exceeds that of all correct calls is assigned the full probability P correct . Thus, the likelihood that a call with parameter x is correct is (conservatively) estimated by A similar argument can be made for the likelihood that the call is wrong, except this will decrease as x increases, working out to We obtain a normalized score by combining the two likelihoods: To use two parameters, we construct a credibility score that maps a pair of parameters (D,d rel ) to a continuous value between [0,1] as = Without knowledge of its quality parameters (x,y) = (D,d rel ), the probability that a given call is correct is approximated by N correct /N total . We assume that a call with parameters (x,y) is more likely to be correct than a call with lower parameters x′ ≤ x,y′ ≤ x. Let ρ correct (x,y) denote the normalized probability density for correct calls in two dimensions. Given that a read is correct, the probability that its quality parameters are both below (x,y) is Thus, w correct (D,d rel )/N total approximates the probability that a call is correct and has quality parameters equal or worse than (D,d rel ). Effectively, this approach assumes that the likelihood that a read with D,d rel is correct is proportional to the number of reference-matching reads with both D′ ≤ D and d′ ref ≤ d ref . Similarly, w incorrect (D,d rel )/ N total approximates the probability that a call is wrong and has equal or better (higher) quality parameters. For calls whose parameters are above those of every nonmatching call, w incorrect = 0. We added the offset term in the denominator of S to avoid a "perfect" score S = 1 for such calls, so their rank among correct calls is still taken into account. The score S(D,d rel ) described above is calculated efficiently for every read of every base using a 2D histogram of the distributions of matching and nonmatching calls. The raw bin counts are converted into a smoothened local density obtained as a moving average over neighboring bins, and the averages are used to compute cumulative sums in both directions. Each read is assigned a score corresponding to the 2D bin into which it falls. Reads for a given genome location are sorted by their score, and the final call is made consistent with the top-scoring read. The S score is interpreted as the likelihood that the read is correct. FASTA files for sequencing results generated from microarray images are aligned to the reference sequences for SARS-CoV-2 and human RNase P, NC_045512.2 Wuhan seafood market pneumonia virus isolate Wuhan-Hu-1, complete genome (GISAID accession EPI_ISL_402125). ■ RESULTS AND DISCUSSION SARS-CoV-2 Genome Tiling Arrays. We have constructed a genome tiling array of the SARS-CoV-2 genome (NC_045512) 13 ( Figure 1A) . The 3 mm × 3 mm array area is divided into 250 000 "features". Each feature contains densely packed identical single strands of DNA, each 25 bases long and The number of "noncalls" refers to the number of base positions that were not called (or removed) because they have a Phred score lower than 20 or reside in one of the six lowest short read Illumina sequencing coverage regions. "Variants" is the number of putative variants from the sample called by the genome tiling array sequencing. "Accuracy" is the number of correctly called positions divided by the number of base calls. "Coverage" refers to the percentage of the SARS-CoV-2 genome for which we have called a base. The "illumina coverage" column refers to the sequencing coverage of the samples in the GISAID database. Langmuir pubs.acs.org/Langmuir Article serves as a probe. Each column of the array is divided into probe sets of four features. Within each probe set, each feature consists of ssDNA with a sequence that matches the same portion of the genome except for the 13th base, which always contains an A in the first feature, T in the second, and C and G in the third and fourth, respectively (Figure 1) . The probe sets are arranged in rows such that in one row the sequences in successive probe sets tile across the genome and match the genome except for the 13th base (for which only one feature within the set has the matching base) ( Figure 1E ). The feature within the set that matches the genome sequence will hybridize with the genome fragment. The genome fragments are each tagged with a fluorescent molecule so that in a fluorescence image ( Figure 1F ) the positions of the hybridized genome molecule "light up" and the base can be identified. There is the possibility of an error if there is more than one region within the genome with identical 25-mer sequences; however, the probability of this occurring is vanishingly small for a sequence of 30 000 bases. Although the SARS-CoV-2 genome is a single-stranded RNA, during the processing of the clinical sample, the genome is reverse-transcribed and amplified generating a doublestranded DNA. Therefore, the genome tiling arrays were constructed to assay both the sense and antisense strands, and hence, there are two measurements for each base in the SARS-CoV-2 genome. Since the genome consists of ∼30 000 bases, In addition to the genome tiling features, the arrays also contain alignment marks and several control features for human sequences. Illustration of the Approach by Sequencing of the N1 Gene Sequence. As an initial test, we amplified a region of the N1 and N2 genes in the SARS-CoV-2 genome (synthesized by Twist Biosciences) using the N1 and N2 CDC primer pairs obtained from Integrated DNA Technologies (Supporting Information, Table S1 ). The PCR products were labeled and hybridized to the tiling array, as described in the Experimental Section. The arrays were imaged using a custom-automated fluorescent scanning confocal microscope. The fluorescence intensity and x,y coordinates of each feature were stored in a. csv file and mapped to each base in the N1 and N2 gene regions of the SARS-CoV-2 genome. The intensities from the features containing each of the four features for each base position were ranked {I 0 ≤ I 1 ≤ I 2 ≤ I 3 }. To call the base, we define two parameters that characterize the quality or confidence in the call: the difference (D = I 3 − I 0 ) and differential ( = − d I I D rel 3 2 ). Each base is called by identifying the feature with the highest intensity for both the sense and antisense features. The base call was typically consistent between the sense and antisense probe sets. However, if they are inconsistent, the probe set with the higher differential is selected as the final call. Using this approach, we were able to sequence the N1 and N2 regions of the SARS-CoV-2 genome with 100% accuracy. Resequencing the Full SARS-COV-2 Genome. We extended the above technique to sequence all ∼30 000 bases of the SARS-CoV-2 genome. We obtained eight SARS-CoV-2positive clinical samples from the Wyoming Public Health Laboratory, which were blinded prior to resequencing (the samples were unblinding after the base calling was complete). The viral genome in these clinical samples was previously fully sequenced on an Illumina MiSeq instrument, and the results were deposited in the GISAID database. Thus, the sequences obtained from our genome tiling arrays can be directly compared with those obtained from Illumina sequencing. The samples we analyzed were USA/WY-WYPHL-00024/2020, USA/WY-WYPHL-00026/2020, USA/WY-WYPHL-00032/ 2020, USA/WY-WYPHL-00036/2020, USA/WY-WYPHL-00041/2020, USA/WY-WYPHL-00044/2020, USA/WY-WYPHL-00059/2020, and USA/WY-WYPHL-00064/2020. We will simply refer to these samples as WY24, WY26, WY32, WY36, WY41, WY44, WY59, and WY64, respectively. By the time we acquired these samples, there was a limited amount of material remaining because the samples had already been tested via a standard qPCR clinical test at the Wyoming Public Health Laboratory and had the complete genome sequenced at the University of New Mexico. Therefore, to resequence the genome from these samples using our genome tiling array, we started with the remaining PCR products that were amplified with the ARTIC SARS-CoV-2 v3 primer sets (the sample names were blinded) (Table S1 ). We further amplified the samples with labeled dUTP and then hybridized the purified PCR products to the SARS-CoV-2 genome tiling arrays, as described in the Experimental Section. Maximum Likelihood Method to Accurately Sequence the SARS-CoV-2 Genome. The method described above was sufficient for analyzing the N1 and N2 PCR products but was only able to resequence the full SARS-CoV-2 genome to ∼98−99% accuracy. Therefore, we developed a more sophisticated maximum likelihood method for calling each base in the SARS-CoV-2 genome. We also noticed that the intensity of the probe sets varied across the entire genome, so we scanned the tiling array with three different exposure times (0.5, 1, and 4 s). The longer exposure times enabled us to accurately call bases in the "weak" intensity regions; however, the "brighter" regions were fully saturated by the longer exposures. In principle, one could combine the three different exposures of the same base position into a single, normalized intensity and call the base with the highest value. However, the resulting integrated reads would still have to be combined for the sense and antisense probe sets, and conflicting calls would require a likelihood-based criterion. This prompted us to adopt an approach that treats the N E = 3 exposures as quasi-independent "reads". We assigned a credibility score to each read based on the difference, D, and differential, d rel , of the intensities, as illustrated in Figure 2 . Briefly, for each of the probe sets for a given base location in the genome, we sorted the intensities from the four bases (or features), identified a tentative call (the feature with the highest intensity), and computed D,d rel . We grouped the base calls or reads for all sense and antisense probe sets for each base in the genome for each exposure into likely "correct" and incorrect by comparing the tentative call with the SARS-CoV-2 genome reference sequence and constructed separate density maps in the D,d rel plane for the correct (reference matching) and incorrect (nonmatching) calls (Figure 2A,B) . While D and d rel correlate individually with a higher likelihood of correct calls, the two-dimensional densities reveal a pattern where the correct and incorrect calls are concentrated in largely nonoverlapping regions. We developed an approach that takes advantage of this feature. A read with parameters (D,d rel ) is assigned a score, S = w correct /(w correct + w incorrect ), where w correct is based on the number of reads with D′ < D and d′ rel < d rel , and w incorrect is the number of likely incorrect reads with higher D′,d′ rel . The score is calculated efficiently for every read of every base using a 2D histogram of the distributions of correct and incorrect calls and then computing the cumulative sums for each bin. Reads for a given location are sorted by their score, and the final call is made consistent with the topscoring read. Analysis of Resequencing Accuracy from USA/WY-WYPHL-00064/2020. Using the maximum likelihood method described above, we resequenced the SARS-CoV-2 genome from eight clinical samples from the Wyoming Public Health Laboratory. The probability, P, of an incorrect call was determined for each base in the genome. The "Phred scores", Q (Q =−10 log 10 P), versus genome coordinates for WY64 are shown in Figure 3 (similar figures for all samples are in the Supporting Information, Figures S25−S32) . In WY64, there were 183 bases with a Phred score less than 20 (81% of these calls were correct). The short read Illumina sequencing assembly of this sample consisted of 246 Ns or uncalled bases. We filtered the 181 uncalled bases from the assembly and therefore called 29 663 bases (Table 1) Next, we further investigated the sequencing errors and found that seven errors were made in the 100 base region beginning at position 19 417. This region is within the larger region that contained the bases that were not called from the Illumina short read sequencing data due to low sequencing coverage (Figure 3 ). Since we started with the same original PCR products, we filtered variants called by our tiling array within the six regions in the genome with the lowest short read coverage, with the reasoning that the ARTIC SARS-CoV-2 v3 primer sets may not be providing adequate amplification of the respective amplicons in these regions. After filtering calls in these regions, we still called all eight of the variants identified from the Illumina sequencing data, as well as nine additional variants that are presumably incorrect, which resulted in 99.97% sequencing accuracy spanning 95.8% of the SARS-CoV-2 genome. Equivalent results for all eight sequenced strains are shown in Table 1 (figures equivalent to Figure 3 for all samples are available in the Supporting Information, Figures S25−S32) . Low-Quality Assembled Regions from the Tiling Array. In the GISAID database, the WY64-assembled FASTA file contains "Ns" in the region between 19 300 and 19 547. This is consistent with the region of the genome that had the lowest short read sequencing coverage. In this region, after filtering the base calls with a Phred score below 20, our genome tiling array called 167 bases, and 160 of these bases matched the reference base. We also detected seven putative variants called by the tiling array with a Q-score higher than 20. Next, we blasted the 248 base consensus sequence (the region unassembled from the Illumina data, without filtering any bases) assembled from the tiling array against the SARS-CoV-2 reference (NC_045512) and found our sequence to be 94% similar to the reference, with 24 mismatches. This indicates that our genome tiling array can accurately identify the sequences as SARS-CoV-2 even with very little starting material. We further explored the impact of low amplicon abundance (measured by low coverage in the Illumina short read data) by analyzing the Phred score we obtained from the genome tiling array. In Figure 4A , we plot positions within the genome, where variants are called within the SARS-CoV-2 genome. It can be seen in this figure that variants are called at a high density at a few discrete locations within the genome. Some of these locations are known to have low coverage in the Illumina short read sequencing data and are thus presumed to have poor amplification of the respective amplicons ( Figure 4B ). To further explore the amplification of these regions of the genome, we plotted the maximum signal intensity from each probe set for each position in the genome. Figure 4C focuses on the region discussed above (between positions 19 300 and 19 547), and it can be seen that the signal intensity is reduced in this region of the genome, which is consistent with the low short read sequencing coverage in this region. As the novel SARS-CoV-2 virus continues to impact the world, it is imperative that viral genome evolution and the tracing of viral spread are closely monitored. With researchers working toward minimizing the fatalities due to COVID-19, there is a need for rapid and cost-effective monitoring of viral variants. We believe our technology meets this need. We have designed and constructed a tiled genome array for rapid and inexpensive SARS-CoV-2 genome resequencing. We have resequenced eight clinical samples to demonstrate the ability of this array to accurately sequence over 95% of the viral genome. Additionally, we have shown that the primary variable limiting our accuracy and sequencing coverage is the ARTIC multiplex PCR primer sets, which do not amplify all amplicons with sufficient efficiency. Therefore, with improved amplification of the viral genome, we anticipate that over 99% of the genome can be sequenced with an accuracy greater than 99.9% using a genome tiling array. The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acs.langmuir.0c02927. List of primer sequences used (Table S1) United States Adam M. Halasz − Department of Mathematics ) WHO. Coronavirus Disease (COVID-19) Dashboard. World Health Organization: Geneva Temporal dynamics in viral shedding and transmissibility of COVID-19 Towards a genomics-informed, realtime, global pathogen surveillance system DNA Sequencing by Hybridization with Arrays of Samples or Probes Advanced sequencing technologies: methods and goals Basic concepts of microarrays and potential applications in clinical microbiology Experimental Design and Analysis of Microarray Data Parallel human genome analysis: Microarray-based expression monitoring of 1000 genes Characterization of genetic diversity and population structure in wheat using array based SNP markers Applications of DNA tiling arrays for whole-genome analysis Rapid genome sequencing with short universal tiling probes Identification of Genome-Wide Mutations in Ciprofloxacin-Resistant F. tularensis LVS Using Whole Genome Tiling Arrays and Next Generation Sequencing A new coronavirus associated with human respiratory disease in China The authors declare the following competing financial interest(s): The Centrillion affiliated authors are employees and the company may commercialize the work described herein. The authors thank Dr. Ori Sargsyan for assistance with bioinformatics. This research was partially supported by the UNM Comprehensive Cancer Center Support Grant NCI (P30CA118100) and an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health (P20GM103451).