key: cord-0704092-bby9hls2 authors: Shental, Noam; Levy, Shlomia; Skorniakov, Shosh; Wuvshet, Vered; Shemer-Avni, Yonat; Porgador, Angel; Hertz, Tomer title: Efficient high throughput SARS-CoV-2 testing to detect asymptomatic carriers date: 2020-04-20 journal: nan DOI: 10.1101/2020.04.14.20064618 sha: 571504fb5bc0d85bd92d609590e8fbc332569db3 doc_id: 704092 cord_uid: bby9hls2 The COVID-19 pandemic is rapidly spreading throughout the world. Recent reports suggest that 10-30% of SARS-CoV-2 infected patients are asymptomatic. Other studies report that some subjects have significant viral shedding prior to symptom onset. Since both asymptomatic and pre-symptomatic subjects can spread the disease, identifying such individuals is critical for effective control of the SARS-CoV-2 pandemic. Therefore, there is an urgent need to increase diagnostic testing capabilities in order to also screen asymptomatic carriers. In fact, such tests will be routinely required until a vaccine is developed. Yet, a major bottleneck of managing the COVID-19 pandemic in many countries is diagnostic testing, due to limited laboratory capabilities as well as limited access to genome-extraction and Polymerase Chain Reaction (PCR) reagents. We developed P-BEST - a method for Pooling-Based Efficient SARS-CoV-2 Testing, using a non-adaptive group-testing approach, which significantly reduces the number of tests required to identify all positive subjects within a large set of samples. Instead of testing each sample separately, samples are pooled into groups and each pool is tested for SARS-CoV-2 using the standard clinically approved PCR-based diagnostic assay. Each sample is part of multiple pools, using a combinatorial pooling strategy based on compressed sensing designed for maximizing the ability to identify all positive individuals. We evaluated P-BEST using leftover samples that were previously clinically tested for COVID-19. In our current proof-of-concept study we pooled 384 patient samples into 48 pools providing an 8-fold increase in testing efficiency. Five sets of 384 samples, containing 1-5 positive carriers were screened and all positive carriers in each set were correctly identified. P-BEST provides an efficient and easy-to-implement solution for increasing testing capacity that will work with any clinically approved genome-extraction and PCR-based diagnostic methodologies. The COVID-19 pandemic is rapidly spreading throughout the world. Recent reports suggest that 10-30% of SARS-CoV-2 infected patients are asymptomatic [1] [2] [3] . Other studies report that some subjects have significant viral shedding prior to symptom onset 4 . Since both asymptomatic and presymptomatic subjects can spread the disease 1,2 , identifying such individuals is critical for effective control of the SARS-CoV-2 pandemic. A major bottleneck of managing the COVID-19 pandemic in many countries is diagnostic testing which is primarily performed on symptomatic patients, due to limited laboratory capabilities as well as limited access to genome-extraction and Polymerase Chain Reaction (PCR) reagents. Hence, there is an urgent need to increase diagnostic testing capabilities in order to allow screening of asymptomatic populations which contribute to disease spread. In fact, such tests will be routinely required until a vaccine is developed. If the percentage of carriers in the tested set of samples is sufficiently low (~1%), the method can correctly identify all positive individuals using a much smaller number of diagnostic tests as compared to testing each individual sample separately. We developed P-BEST -a method for Pooling-Based Efficient SARS-CoV-2 Testing, using a group-testing approach, which significantly reduces the number of tests required to identify all positive subjects within a large set of samples (Figure 1) . Instead of testing each sample separately, samples are pooled into groups and each pool is tested for SARS-CoV-2 using the standard clinically approved PCR-based diagnostic assay. Each sample is part of multiple pools, using a combinatorial pooling strategy designed for maximizing the ability to identify all positive individuals (Figure 1 ) 5, 6 . If the percentage of carriers in the tested set of samples is sufficiently low (~1%), the method can correctly identify all positive individuals using a much smaller number of diagnostic tests as compared to testing each individual sample separately. In our current proof-ofconcept study of P-BEST, we pooled 384 patient samples into 48 pools, each containing 48 samples. Each sample was added to six different pools. Pools were designed based on a Reed-Solomon error correcting code, and were generated using an automated liquid dispensing robot. Following genome-extraction and PCR amplification of all pools, a decoding algorithm was used to identify carriers that were then tested individually for verification. The total time for pooling 384 samples into 48 pools using a basic liquid dispensing robot (Arise EZMate-601) was <5 hours, and was performed in a standard BSL-2 laboratory, since individual samples were diluted in a lysis buffer that inactivates all viral particles. We evaluated P-BEST using leftover samples that were previously clinically tested for COVID-19. Samples diluted in lysis buffer were pooled into 48 pools, each containing a set of 48 unique samples. Pooled samples were then tested by the clinical diagnostic laboratory of the Soroka . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20064618 doi: medRxiv preprint University Medical Center using a clinically approved COVID-19 PCR-based diagnostic protocol that included an RNA extraction stage. We tested P-BEST using four sets of 384 samples, each containing an increasing number of positive carriers ranging from two to five. We found that P-BEST was able to correctly identify all positive carriers within these four sets of 384 samples using only 48 tests per set, providing an 8-fold increase in testing efficiency. Simulations demonstrated that the method can correctly identify up to 5/384 (1.3%) of carriers, with an average number of false positives that was less than 2.75, and an average number of false negatives that was less than 0.33. P-BEST provides an efficient and easy-to-implement solution for increasing testing capacity that will work with any clinically approved genome-extraction and PCR-based diagnostic methodologies. Its implementation only requires the use of a widely available automated liquid dispensing robot. Importantly, our results demonstrate that P-BEST can use non-infectious samples (diluted in lysis buffer), allowing the automated pooling to be performed in BSL-2 laboratories. While our current pooling design provides an 8-fold increase in testing efficiency, this can be further improved by increasing the number of samples per pool. Preliminary experiments have demonstrated that a positive sample can be detected even within a pool of 128 subjects, which would allow efficient testing of >1000 samples. P-BEST is optimized for efficiently testing populations with low carrier frequencies (1.3% in this case). It is therefore best suited for screening asymptomatic populations that are not at high risk of being infected, excluding for example subjects who were in close contact with confirmed SARS-CoV-2 cases. Importantly, if the carrier frequency is above 1.3%, P-BEST will lose efficiency, but in turn may help identifying SARS-CoV-2 hotspots. Code and protocols required for implementation of P-BEST can be found at https://github.com/NoamShental/PBEST. We thank Avishai Edri, Aner Ottolenghi and Yariv Greenshpan for helping with sample preparation and RNA measurements. We thank Dr. Rachel Steinberg for help with diagnostic testing. We thank Jenny Racah and Shelly Levy-Tzedek for constructive discussions and manuscript editing. Noam . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20064618 doi: medRxiv preprint . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14. following 20'of incubation, 350µl were used for nucleic acid extraction into a volume of 100µl extracted genome, from which 8µl were taken for the 2019-nCoV PCR assay. Frozen leftover media-lysis buffer (near 500µl) from clinically tested samples were thawed and then used to re-test the samples using the P-BEST approach. To test the sensitivity of the method, we generated several sets of 384 samples each containing 2-5 positive samples (Set1-4, respectively). Pools were prepared using a liquid handling robot (Arise EZMATE-601) using a code written in Python. The code automatically generates a command file for the robot to use. Samples were manually pipetted into 96 well plates from which the robot assembled a set of 48 pools, each containing 48 distinct samples. We generated 48 pools for each sample set. Each pool contained equal volumes from 48 samples (11µl /sample, 528µl per pool). Each individual sample was represented in six different pools. Analogously to single samples, 350µl from each pool were then used for nucleic acid extraction into a volume of 100µl, using the clinically approved STARMag kit and the STARlet robot. Then, 8µl were used for qRT-PCR to detect the E gene of SARS-CoV-2, based on a method that was clinically validated and employed in the SUMC-clinical virology laboratory prior to the introduction of the 2019-nCoV SeeGene kit. We used this PCR method due . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20064618 doi: medRxiv preprint to shortage in the 2019-nCoV Assay kits of SeeGene; yet, to verify competence with the newly approved 2019-nCoV Assay kit, we re-validated positive pools with the See-Gene kit. Since in P-Best samples are diluted into pools of 48 subjects, there is an inherent drop in PCR sensitivity of about 5-6 cycles (a factor of 2 5 -2 6 ), which was indeed observed experimentally and may result in false negative pools. In retrospective analysis, we found that only a single pool that included a carrier yielded a negative PCR result (out of expected 70 positive pools across four experiments). Similarly, only 1/122 of the negative pools was positively identified by the PCRbased assay. These two errors had no effect on the detection capabilities of P-BEST, which is robust to both of these types of errors. Since our proof-of-concept experiments used samples from positive individuals that were previously identified, we observed an additional 5-cycle drop in PCR sensitivity that was likely caused by RNA degradation of samples following freeze-thaw cycles. To confirm this, we tested several individual samples before and after a freeze-thaw cycle and observed a similar drop in PCR sensitivity. However, P-BEST was still able to correctly identify all positive subjects within each of the sample sets. We note that this reduction will not occur when P-BEST will be implemented on fresh samples collected directly into lysis buffer, per our current experimental protocol. To further address the sensitivity issues we propose to (1) modify the sample collection protocol to collect samples directly into lysis buffer, thereby increasing the initial concentration of RNA in the collected samples; and (2) Test P-BEST pooled samples using the SeeGene SARS-CoV-2 PCR kit, that also includes primers for the NP SARS-COV-2 gene, which in our preliminary experiments had higher sensitivity as compared to the other genes included in the kit, and was positive even when diluting a single positive sample into a pool of 120 subjects (Supplemental Figure 1A) . . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . P-BEST is best suited for screening asymptomatic populations, in which the carrier frequency may be sufficiently low to allow efficient detection. Such scenarios include routine screens of healthcare workers and staff in nursing homes, as well population screens in different regions to identify new hotspots of the SARS-CoV-2 spread as early as possible. We note that additional pooling designs may be implemented to estimate the carrier frequency of SARS-CoV-2 in a given population, that may identify the specific scenarios in which P-BEST will be useful. The mathematical field of group testing 7 aims to tackle the problem of identifying individuals carrying a certain rare trait out of a large population by designing an efficient set of pools and measuring each pool as if it were a single sample. In general, pooling is designed in such a way that each individual has a unique 'footprint' on the set of pools, thus allowing carrier identification. Group testing dates back to the mid-20th century 8 and since then many intricate pooling designs have been described and rigorously analyzed. Group testing has been successfully applied in data compression 9 , computation in the data stream model 10 and in molecular biology 11, 12 . In our previous studies we described a combination of group testing and next generation sequencing for detecting . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20064618 doi: medRxiv preprint carriers of rare genetic mutations. We detected all individual carriers of rare mutations out of a set of 1024 mutagenized Sorghum bicolor plants using a set of 48 pools 5, 6 . The efficiency of group testing, often measured by the ratio between the number of screened individuals to the number of pools, generally increases with the decreasing frequency of the observed trait. For example, screening for a trait that appears in 0.1% of the population can be done more efficiently than screening for a trait that appears in 1% of the population. Moreover, when the carrier rate exceeds ~5% group testing is no longer effective, since the required number of pools would be comparable to the number of samples tested. To optimize efficiency, the pooling design needs to be tailored to the expected carrier rate. If the true carrier rate exceeds the expected rate, the method will identify larger sets of suspected carriers, which may include false positives, and in some cases may also fail to identify some of the carriers (false negatives). Therefore it is imperative to evaluate the robustness of a specific pooling design to higher carrier rates (e.g., Figure 2 , and Supplemental Figures 2-3 ). Our current detection strategy is based on binary results from each pool, i.e., a pool is either amplified or not. Since PCR results are quantitative, it may be possible to use this information in the P-BEST reconstruction algorithm to also estimate the C(t) values of each carrier (which is proportional to the individual's viral load and may be clinically relevant). Carrier detection was performed using the Gradient Projection for Sparse Reconstruction (GPSR) algorithm 13 as in our former studies 5, 6 . The transformation from fractional to discrete results was done using the following algorithm: The 20 samples with highest scores were selected and only subsets of these 20 were further considered. In total 2 20 subsets of samples were tested. Each subset corresponds to a vector of length 384, in which the entries of the selected samples were equal to 1, and all others were set to zero. The product of , where is the pooling matrix, was compared to the measurement vector . The vector for which ‖ − ‖ achieved its minimum was selected. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20064618 doi: medRxiv preprint To test the robustness of P-BEST we considered two types of potential noise factors. First, variation in initial RNA levels may cause samples to 'disappear' from all or part of the pools. Variation in RNA levels were estimated from Qubit measurements of 48 samples. The average RNA concentration was 15ng/µl, with standard deviation of 7ng/µl (Supplemental Figure 1B) . These values were used in our simulations. A second possible source of noise is due to PCR amplification, which may fail in a certain number of pools. Supplemental To assess the effects of variations in RNA levels, we measured the average number of false positive and false negative detections as a function of the true number of carriers across 3000 simulations in two scenarios: (1) No noise in RNA levels (black square) and (2) RNA noise based on the measured variation of RNA levels across 48 samples (see Supplemental Figure 1B ). The false positive (left panel) and false negative (right panel) detection rates for the two scenarios show that RNA variation does not significantly degrade P-BEST performance. All simulations considered one dropped pool. . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20064618 doi: medRxiv preprint Supplemental Figure 3 : Evaluating the effect of dropped pools on P-BEST performance. To assess the effects of dropped pools due to PCR amplification failures, we measured the average number of false positive (left panel) and false negative (right panel) detections as a function of the true number of carriers across 3000 simulations for zero, one or two randomly dropped pools. P-BEST seems to be robust to 1-2 dropped pools. All simulations considered the experimental level of RNA variation as measured across 48 samples (Supplemental Figure 1B) . There have been multiple recent studies that identified asymptomatic carriers and have attempted to estimate their rate and contribution to disease spread. A study of 2685 tourists in the New York area conducted over two seasons found that 6.2% of subjects tested positive for at least one respiratory virus, and 38.7% of these were infected with circulating Human Corona viruses 14 . Rothe et al. described transmission of SARS-CoV-2 from a German patient who was infected by a Chinese businesswoman who visited Germany 15 . Importantly, she was asymptomatic during her visit to Germany, and only developed symptoms after returning to China. Two other German co-workers were infected but only came in contact with the German patient who was asymptomatic. Mizumoto et al. 3 reported that 50.5% of infected patients on board the Diamond Princess cruise ship were asymptomatic at time of diagnosis. Using a model, they estimated that the estimated asymptomatic proportion (among all infected cases) was 17.9% (95%CrI: 15.5-20.2%). Another study of evacuated Japanese nationals from Wuhan China, estimated that 30.8% of subjects were asymptomatic (95% CI: 7.7%-53.8%) 16 . . CC-BY-NC-ND 4.0 International license It is made available under a author/funder, who has granted medRxiv a license to display the preprint in perpetuity. is the (which was not peer-reviewed) The copyright holder for this preprint . https://doi.org/10.1101/2020.04.14.20064618 doi: medRxiv preprint Presumed Asymptomatic Carrier Transmission of COVID-19 Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2) Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients Identification of rare alleles and their carriers using compressed se(que)nsing Highly efficient de novo mutant identification in a Sorghum bicolor TILLING population using the ComSeq approach Presumed Asymptomatic Carrier Transmission of COVID-19 Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV2) Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship SARS-CoV-2 Viral Load in Upper Respiratory Specimens of Infected Patients Identification of rare alleles and their carriers using compressed se(que)nsing Highly efficient de novo mutant identification in a Sorghum bicolor TILLING population using the ComSeq approach The Detection of Defective Members of Large Populations Nonrandom binary superimposed codes What's hot and what's not: tracking most frequent items dynamically Overlapping pools for high-throughput targeted resequencing DNA Sudoku--harnessing high-throughput sequencing for multiplexed specimen analysis Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems. IEEE Journal of selected topics in signal processing Asymptomatic Shedding of Respiratory Virus among an Ambulatory Population across Seasons. mSphere Transmission of 2019-nCoV Infection from an Asymptomatic Contact in Germany Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19)