key: cord-0686574-swuxgo0w authors: Palmieri, Dario; Siddiqui, Jalal; Gardner, Anne; Fishel, Richard; Miles, Wayne O. title: REMBRANDT: A high-throughput barcoded sequencing approach for COVID-19 screening date: 2020-05-17 journal: bioRxiv DOI: 10.1101/2020.05.16.099747 sha: 8dfeb6d3ce1c0658eda734c372e5f8f2e3f4315e doc_id: 686574 cord_uid: swuxgo0w The Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), also known as 2019 novel coronavirus (2019-nCoV), is a highly infectious RNA virus. A still-debated percentage of patients develop coronavirus disease 2019 (COVID-19) after infection, whose symptoms include fever, cough, shortness of breath and fatigue. Acute and life-threatening respiratory symptoms are experienced by 10-20% of symptomatic patients, particularly those with underlying medical conditions that includes diabetes, COPD and pregnancy. One of the main challenges in the containment of COVID-19 is the identification and isolation of asymptomatic/pre-symptomatic individuals. As communities re-open, large numbers of people will need to be tested and contact-tracing of positive patients will be required to prevent additional waves of infections and enable the continuous monitoring of the viral loads COVID-19 positive patients. A number of molecular assays are currently in clinical use to detect SARS-CoV-2. Many of them can accurately test hundreds or even thousands of patients every day. However, there are presently no testing platforms that enable more than 10,000 tests per day. Here, we describe the foundation for the REcombinase Mediated BaRcoding and AmplificatioN Diagnostic Tool (REMBRANDT), a high-throughput Next Generation Sequencing-based approach for the simultaneous screening of over 100,000 samples per day. The REMBRANDT protocol includes direct two-barcoded amplification of SARS-CoV-2 and control amplicons using an isothermal reaction, and the downstream library preparation for Illumina sequencing and bioinformatics analysis. This protocol represents a potentially powerful approach for community screening, a major bottleneck for testing samples from a large patient population for COVID-19. Introduction: COVID-19 is an infectious disease whose etiopathogenic agent is the Severely Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) RNA virus (1) . To date, this viral infection has globally affected over 4.5 million of patients and claimed more than 300,000 lives (2). The enormous volume of COVID-19 patients has placed significant strain on healthcare systems around the world and led to the confinement of over half of the world's population. Different regions of the world are enduring different stages in the battle against COVID- 19 . These range from initial identification of acute COVID-19 cases to preparing to re-open schools and businesses. Regardless, rapid large-scale screening for COVID-19 is essential. At present, the efficacy of seroconversion tests is still to be determined. Moreover, seroconversion following a SARS-CoV-2 infection displays an as yet unknown protective immunity (3) . Thus, detection of the viral RNA genome remains the best predictor of both early infection, pre-or asymptomatic stages as well as an indicator of viral clearance and reduced or eliminated viral transmissibility. To accommodate state-and country-wide populations that will require serial monitoring of 1-100 million residents, a scalable diagnostic test is required that may be established with widely available and existing equipment. Here, we describe, and provide reduction-to-practice data of a mass COVID-19 screening platform: Recombinase Mediated BaRcoding/AmplificatioN Diagnostic Tool (REMBRANDT; Figure 1 ). This protocol contains a number of key advances that maximize output and sensitivity whilst retaining speed and efficiency. Specifically, REMBRANDT uses recombination and repair enzymes to detect and amplify the viral genomic RNA (4) . The amplification simultaneously tags the individual samples with dual barcoded primers ( Figure 2 ). This critical step rapidly generates DNA products that are significantly more stable than RNA. Utilizing two independent barcoded primers per well enables a single patient sample within a 96-well plate to be independently marked with a well-specific barcode and a plate-specific barcode. This algorithm reduces the number of barcodes 100-fold from conventional barcoding methods. Once barcoded, the patient samples from multiple 96-well plates can be pooled and purified, ultimately minimizing reagent usage, time and sample-to-sample variation. While the system is further scalable, we provide barcoded primers sufficient to distinguish individual samples for ninety six 96-well plates or 9,216 patient samples. Processing these samples together enables rapid library construction using any one of the barcoded Illumina kits. Utilizing 12 Illumina barcodes allows further sample multiplexing such that twelve different 9,216 combined samples may be further mixed together to perform Next Generation (NextGen) Sequencing analysis of 110,592 patient samples in a single run on a number of Illumina-based platforms. The smaller numbers of barcodes requires less timeconsuming computational processing, as trimmed sequences can be quickly divided based on barcode, and mapped to SARS-CoV-2 N gene or a control human gene (RNAse P) sequences. Importantly, this protocol introduces a unique, synthetic SARS-CoV-2 N gene sequence into every 96-well plate. This control has identical primer annealing regions to the SARS-CoV-2 N gene but contains 6 engineered base pair substitutions that distinguishes it from native sequences; providing an internal quality control for the process and a measure of batch effects. The laboratory and computational framework is designed to maximize SARS-CoV-2 detection efficiency while minimizing reagent usage, processing and turn-around time. Starting Considerations: 1: This protocol can effectively analyze 110,592 patient samples using 192 total barcoding primers (96 forward and 96 reverse) and 12 different adapters for High-Throughput sequencing. However, for ease of description the steps outlined here are defined for 9,216 samples or ninety six 96-well plates. To achieve this number, the protocol (steps RS1-8) should be performed 96 times. Each 96-well plate uses the same set of barcoded reverse primers to identify the patient and a separate barcoded forward primer to identify the plate. Each plate of 96 samples therefore has a different barcoded forward primer. This ultimately enables 12 different Illumina libraries to be combined and analyzed together for Next Generation Sequencing to reach the maximum potential of over 110,000 samples. To reach maximum throughput, the use of robotic aliquoting/pipetting system is recommended. For this method, the RNA extraction from patients' samples is not required. Based on previously reports, the protocol may be performed without anticipated issues using RNA extracted from nasopharyngeal swabs, following the CDC guidelines. However, recent manuscripts suggest that patients' samples can be processed directly from nasopharyngeal swab, throat swab and saliva skipping the RNA extraction process (5). These direct detection approaches may reduce the number of amplification-available viral RNAs. Comparable results to commercial RNA extraction kits have been obtained using a 5-min direct detection preparation method of nasopharyngeal samples following 1:1 dilution with the Quick Extract DNA extraction Solution (Lucigen) (6). Moreover, different types of specimens, such as saliva, may be more sensitive for the detection of SARS-CoV-2 (7). We currently advise nasopharyngeal swabs as recommended by the US CDC. However, the saliva observations described above would appear to be the least invasive and easiest for patient collection. 3: RNA isothermal amplification can provide high sensitivity but are also prone to contamination. We recommend the use appropriate PPE, and the decontamination of benches to prevent RNA degradation. Sterile and barrier tips are recommended throughout, as is the separation of research spaces for the amplification, purification and library preparation steps. Personnel safety must be a priority when performing any COVID-19 diagnostic approaches. Please refer to CDC guidelines for the most appropriate use of equipment and PPEs when working with SARS-CoV-2 and other respiratory viruses. PP2: Aliquot 1 µl of every well-specific reverse primer into each well of a 96-well plate. Repeat this step for ninety six individual 96-well plates. This will generate ninety six plates that contain the same well-specific reverse barcoding primer in each respective well. PP3: aliquot 1 µl of a single plate specific forward barcoding primer into each well of a single specific 96-well plate from PS2. Repeat this step with a unique forward plate specific barcoding primer per plate from PS2. These preparation steps generate ninety six 96-well plates (9, 216 unique wells), with each well containing one plate-specific forward primer and one well-specific reverse primer. PS2 and 3 can be repeated multiple times to optimize the procedure and aliquoted plates stored at -20C before use. RS1: Prepare Isothermal Amplification Buffer 2X (IAB2X). Figure 3B ). The optimized RT-RPA conditions were then evaluated for the SARS-CoV-2 N amplicon, which were found to efficiently amplify this region ( Figure 3C ). These results demonstrate that the REMBRANDT approach can directly rapidly amplify RNA from the CDC-recommended control gene and SARS-CoV-2 N gene regions. We then examined different combinations of barcoded SARS-CoV-2 N gene primers. We found that 8 unique barcoded forward and 12 unique barcoded reverse primers efficiently amplified the COVID-19 region. From this analysis, we determined that our test primer combinations could be utilized to detect the SARS-CoV-2 RNA ( Figure 3D ). Following the protocol described above, these barcoded DNA fragments can be rapidly assembled into a library for standard RNA-sequencing (RNA-Seq). Unlike normal RNA-Seq runs that focus on mapping a small number of barcodes to a large number of genes, our computational strategy simply demultiplexes large numbers of barcodes and then maps them to three genes ( Figure 4) . Annotation Format (SAF) file is also generated for these sequences. Using the align() function, alongside our assembled indexes, we align our demultiplexed 'fastq' files to the control SARS-CoV-2 N, Human RPP30, and SARS-CoV-2 N gene sequences. The counts per sequence are then summarized from the resulting BAM alignment files using the featureCounts() function alongside our SAF annotation file. We use the parLapply() function from the snow R package to run large numbers of samples in parallel to maximize efficiency (11) . The steps for this pipeline are detailed in Figure 4 . SARS-CoV-2 is a highly infectious single-stranded RNA virus. However, increasing evidence suggests that the vast majority of infected individuals display few or very mild symptoms (12) . These people may still spread the virus for over 20 days after the initial contagion (13) . For this reason, a key counter-measure against COVID-19 has been social distancing and city-or countrywide lockdowns. This approach has significantly slowed the number of new cases in the populations that practice these methods. Unfortunately, these lockdowns place real burdens on residents, and are designed to be temporary measures. The rapid identification and quarantining of infected individuals is a key measure to containing the spread of SARS-CoV-2. Areas of the world such as South Korea and Northeastern Italy that have quickly tested their population en masse for SARS-CoV-2 have significantly reduced the spread of the disease. The relatively low incidence and mortality from these examples in comparison with adjacent countries and regions suggests that extensive population testing will be important for the foreseeable future, as cities and countries re-open their businesses and borders, to assess the COVID-19 status in real-time, at the community level (14) . One of the major current limitations of population testing is the availability of reagents (swabs, viral transport medium, RNA isolation kits, probes and other molecular biology reagents). The current clinical standard for COVID-19 diagnosis (qRT-PCR) requires suitable equipment for the amplification of the viral RNA and the detection of the infection. Each of these steps significantly slows sample processing and limits the number of tests that can be performed in a day. For this reason, new diagnostic tools are needed that are fast and efficient as well as scalable to population sized numbers and based on readily available reagents. To address this need, the scientific community has delivered a remarkable and unprecedented number of assays to diagnose COVID-19. Several diagnostic approaches that can provide fast and cheap tests have been developed. One of the leading approaches uses blood tests to identify IgG and IgM antibodies against SARS-CoV-2 proteins (15) . These tests show great promise. However, this assay can only examine the immunological status of the patient and cannot determine the viral load in exhaled respiratory particles, the major source of disease spread. These diagnostic kits will therefore be important to identify patients which may have developed immunity against the virus, but are not appropriate to test the ability of a patient to infect others. A number of groups have been working to develop novel tools to detect the genetic material of SARS-CoV-2 within patient airways. These approaches allow the preliminary detection of potentially infected asymptomatic patients, whose diagnosis should be confirmed using standard qRT-PCR. The different tests range from the quick and portable kits (16) (17) , with the potential to become home-testing kits, to high-throughput multiplexed RT-PCR reactions. One particularly promising approach uses viral RNA reverse transcription and patient-specific barcoding of the single strand of cDNA, followed by cDNA amplification and NGS analysis (18) . This approach, although interesting, requires 1 barcode per patient rather than multiplexing barcodes. Ultimately, that means to screen 10,000 patients one would require 10,000 barcodes. In addition, these large primers with large barcodes are likely to vary significantly in their amplification efficiency, making the individual testing of these primer sets essential. construction. We therefore have confidence that this system could be readily implemented in most communities, including those with limited resources, provided the availability of partners capable of NGS analysis. We are currently developing approaches to scale-up the production of the recombinant RT-RPA enzymes for the management of the potential increase in their demand. As REMBRANDT uses an isothermal RNA reverse transcription and amplification reaction, it does not require PCR amplification. Moreover, since pairing of the template and primer during the amplification step relies on the activity of recombinases UvsX and UvsY, it is minimally affected by different Tm or Ta of the primers. The REMBRANDT pipeline also offers flexibility and can be readily adapted to detect other viral genes, viruses and/or other pathogenic species, by switching the design of the amplification primers. The protocol described here provides experimental evidence that the designed RT-RPA is effective in amplifying the synthetic SARS-CoV-2 N-gene RNA. Although the clinical efficacy of this approach remains untested, isothermal amplification has been previously used to detect SARS-CoV-2 levels in patients, from a number of bodily fluids (19) . The detailed protocol and experimental results for the fast (less than 24h) concurrent screening of over 100,000 potential COVID-19 samples as well as the framework to analyze NGS sequencing data is fully described. When clinically validated, REMBRANDT may represent a useful tool for screening easy-to-access bodily fluids for COVID-19 diagnosis. COVID-19 pathophysiology: A review Antibody responses to SARS-CoV-2 in patients with COVID-19 DNA detection using recombination proteins Saliva is more sensitive for SARS-CoV-2 detection in COVID-19 patients than nasopharyngeal swabs Comparative analysis of mRNA targets for human PUF-family proteins suggests extensive interaction with the miRNA regulatory system ShortRead: a bioconductor package for input, quality assessment and exploration of highthroughput sequence data The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads Snow: A parallel computing framework for the R system Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus Temporal dynamics in viral shedding and transmissibility of COVID-19 Reopening Society and the Need for Real-Time Assessment of COVID-19 at the Community Level Towards effective diagnostic assays for COVID-19: a review CRISPR-Cas12-based detection of SARS-CoV-2 Point-of-care testing for COVID-19 using SHERLOCK diagnostics Diagnostic Assay for Simultaneous Testing of 19200 Patient Samples Rapid Detection of Novel Coronavirus (COVID-19) by Reverse Transcription-Loop-Mediated Isothermal Amplification This work was supported by The Ohio State University Comprehensive Cancer Center (RF). The authors thank Dr. Kristine Yoder, Laura Miles and Anna Tessari for their critical support. Primer and oligonucleotides sequences used for this manuscript are detailed in Supplementary Information. Barcoded oligonucleotides to perform RT-RPA/barcoding on 100,000 simultaneous individual samples are also reported. REMBRANDT analysis code is deposited on Github (https://github.com/MilesLab/Rembrandt) and publicly available.