key: cord-0719971-zi0rfidc
authors: Aragón‐Caqueo, Diego; Fernández‐Salinas, Javier; Laroze, David
title: Optimization of group size in pool testing strategy for SARS‐CoV‐2: A simple mathematical model
date: 2020-05-03
journal: J Med Virol
DOI: 10.1002/jmv.25929
sha: be2cbae6252f2c4774121961ec9bbbe780531f41
doc_id: 719971
cord_uid: zi0rfidc

Coronavirus disease (Covid‐19) has reached unprecedented pandemic levels and is affecting almost every country in the world. Ramping up the testing capacity of a country supposes an essential public health response to this new outbreak. A pool testing strategy where multiple samples are tested in a single reverse transcriptase‐polymerase chain reaction (RT‐PCR) kit could potentially increase a country's testing capacity. The aim of this study is to propose a simple mathematical model to estimate the optimum number of pooled samples according to the relative prevalence of positive tests in a particular healthcare context, assuming that if a group tests negative, no further testing is done whereas if a group tests positive, all the subjects of the group are retested individually. The model predicts group sizes that range from 11 to 3 subjects. For a prevalence of 10% of positive tests, 40.6% of tests can be saved using testing groups of four subjects. For a 20% prevalence, 17.9% of tests can be saved using groups of three subjects. For higher prevalences, the strategy flattens and loses effectiveness. Pool testing individuals for severe acute respiratory syndrome coronavirus 2 is a valuable strategy that could considerably boost a country's testing capacity. However, further studies are needed to address how large these groups can be, without losing sensitivity on the RT‐PCR. The strategy best works in settings with a low prevalence of positive tests. It is best implemented in subgroups with low clinical suspicion. The model can be adapted to specific prevalences, generating a tailored to the context implementation of the pool testing strategy.

In late December of 2019, several cases of pneumonia of apparent viral origin were reported in Wuhan, China. 1,2 Subsequently, a novel coronavirus was identified as the causative pathogen, 3 this new pathogen was identified as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The disease (coronavirus disease ) rapidly spread to neighboring countries and overseas, reaching pandemic proportions and was declared by the World Health Organization (WHO) as a Public Health Emergency of International Concern on 30 January, 2020. 4 As of 19 April, 2020, the WHO has reported 2 241 359 confirmed cases with 152 551 deaths worldwide, 5 a total of 185 countries affected, while 10 still remain with no reported cases. 6 The main diagnostic test that has been implemented worldwide to confirm the infection by this novel coronavirus is the real-time reverse transcriptase-polymerase chain reaction (RT-PCR) from respiratory samples with satisfactory levels of sensibility and specificity. 7 However, there might be other clinical specimens where the virus could be detected as well, using the same technique. [8] [9] [10] The procedure takes about a day to come up with a result 11 ; however, more efficient methods are being developed as the pandemic progresses. A crucial part of the public health response to this new threat is to rapidly diagnose and isolate infected individuals to prevent further spreading. 12, 13 Therefore, amplifying the testing capacity of a country experiencing a massive outbreak, is a key strategy for facing this new public health emergency. 14 Nowadays, the United States is the country with a greater number of confirmed cases worldwide and performs as of 19 April, 2020, 167 330 tests daily, with a total of 3 865 864 tests performed since the beginning of the outbreak 15 with all states currently testing. 16 Other largely affected countries are also performing thousands of confirmatory tests on a daily basis. 17 However, due to the overwhelming number of rapidly growing cases, a considerably large number of suspected cases cannot be properly tested and isolated due to the lack of logistics of a progressively collapsing healthcare system. Therefore, it becomes urgent to optimize the standard operating procedures to confirm the infection by SARS-CoV-2. 18 Since the clinical presentation of the disease is often mild or asymptomatic, 19, 20 and that it has been reported that asymptomatic individuals could transmit the virus, 21, 22 it becomes crucial to implement an efficient testing strategy to screen that population and properly isolate them to prevent the further spread of the virus. However, as the healthcare systems around the world are progressively collapsing due to the increasing demand of moderate to severe patients that every day present to the emergency room, the testing of individuals with low clinical suspicion has been left behind, in order to prioritize the available resources for the patients with moderate to severe symptoms. Although it becomes quite logical to prioritize testing for patients with higher clinical suspicion, there is a considerable segment of the population that is not being screened and become vectors of the virus, contributing even more to the spread of the disease and further collapse the healthcare system with the new cases yet to come. 23 On the other hand, as proposed by Seifried and Ciesek, 24 25 However, some studies suggest that the pooling of the sample should be kept as low as possible to reduce dilution and maintain the sensitivity of the test. 26, 27 Since the scope of this strategy could potentially increase multiple times the testing capacity of a country, it becomes prudent to explore how to optimize the implementation of it in the healthcare setting. Therefore, the aim of this study is to provide a mathematical model to estimate the optimum number of pooled samples according to the specific prevalences of positive tests in a particular country context, in order to save as many tests as possible and cover as many people as possible, knowing that if a group tests out positive, all the individuals of the sample would have to be individually tested. It is important to highlight that this model is based on the prevalence of positive tests and can be adapted to each country's specific prevalence. However, it is best implemented for countries with a large number of confirmed cases and relatively large number of tests performed on a daily basis, since more data on the specific prevalence of positive yielding results are available and more accurate estimations can be done based on this; rather than countries with a low number of confirmed cases or where the implementation of testing the population has not been the most adequate.

The manuscript is arranged in the following way: in Section 2, the materials and methods are introduced. In Section 3, the results are given together with the discussion. Finally, the final remarks are presented in Section 4.

Thoughtful description of the process and reasoning for obtaining a formula that represents the benefit of performing a pool test of the most optimum size assuming in advance that if a group tests out positive, all the subjects in the group have to be individually tested, in order to track down the positive case or cases, while if a group tests out negative, then no further testing in that specific group is needed.

All the computations were performed with the software Wolfram Mathematica. 28 

Considering that the sample of each suspected individual tested for the infection of SARS-CoV-2 with the RT-PCR could yield either a negative or positive result, and that performing a pool testing strategy could yield a negative result only when all the samples included in the pool sample are negative, and that it will yield a positive result when at least one of the individual samples is positive, the possible diagnostic scenarios for the pool test can be expressed by the binomial expression of

where x represents the probability of subjects with an individual positive test (prevalence of positives), y represents the probability of subjects with an individual negative test (prevalence of negatives), and n is size of the pool group. Such that n > 1, 0 < x < 1, and y = 1 − x.

Under these assumptions, we obtain that D = 1. Note that the breakdown of this expression will hold all the possible events. This will be represented by the addends, and the combination present in these will be determined by x and y and its respective exponent, which will indicate the number of subjects with a positive or negative sample, respectively. The distribution of the possibilities will depend on the prevalence of the disease, in this case, being the percentage of positive test results obtained from the recent historical data available. For this reason, the probability of each expressed event occurring, will be determined by the substitution of x and y by the respective prevalences of positive and negative tests. Now, let us separate Equation (1) in two parts

x y x y y y 1 . 

Here, the negative groups for the pool test and its probability will be represented by y n , while the pool tests that yield a positive will correspond to all the other cases where there is at least one individual positive sample in the pool, having, therefore, a 1 − y n probability of becoming true. To facilitate the use of Equation (3), it will be expressed as a function of x, which relates to the direct prevalence of positive historical testing for each country, so that it can be inputted in the equation. Therefore, considering that every time a pool test yields a negative result, no further testing will be performed to that group, the saved tests of the otherwise individually tested subjects, will be ex- 

To obtain the optimum group size given the prevalence of positive tests (x) in a determined setting, the minimal global of Equation (5) must be obtained. This minimum value is calculated using x as the input, because x is a continuous variable, while n is a discrete one. Let us remark that, knowing the average minimum number of tests per subject needed to diagnose one subject, then the population covered by one test using a pool testing strategy according to the optimal pool size previously calculated (and addressing the fact that when a group yields a positive result, the whole group has to be individually tested), can be expressed as = /z subjects covered per test 1 .

With the model proposed above, different scenarios were tested according to different prevalences of positive tests. This was done to address the fact that each country presents a unique distribution of daily performed tests and positive results. According to this, the optimum size and average minimum tests per subject to detect a positive for the diverse chosen prevalence scenarios were calculated.

Then, it was further compared to the individual testing strategy and how many more positive results could be detected using pool testing, F I G U R E 1 Contour plot of the average minimum number of tests per subject to diagnose one subject. Horizontal-axis: prevalence of positive tests, x, the interval ranges from 0 to 0.4. Vertical-axis: group size, n. The interval ranges from 2 to 100. The average minimum number of tests per subject to diagnose one subject is represented by the colors, where higher and better values go from green to orange, being orange the closest to the optimum with the same amount of tests, thus, addressing the efficiency of the strategy over individual testing, as shown on Table 1 .

On the other hand, given the optimum group sizes calculated for the chosen prevalence scenarios, the population covered by a 100 tests was calculated using the average minimum test per subject to detect a positive, and was compared the 100 subjects that an individual testing strategy would cover, as it is exposed in Table 2 .

As exposed in the results, the lower the prevalence of positive tests for a particular country is, the more tests that can be saved and the larger the pool groups will be. From prevalences ranging from 0.03 to 0.07, the testing capacity of a country using a pool testing strategy is increased by a factor of two or by a factor three, rather than using individual testing. This could bring unprecedented advances in better understanding the disease and how it distributes on a particular population. From prevalence ranging from 0.08 to 0.2, the net saving of test kits using pool testing strategy is still significant, saving around 46.6% to 17.9% of the tests if an individual testing strategy were to be performed in the same number of subjects, thus, covering a greater portion of the population. However, as prevalence rises, the efficiency of the strategy flattens. Reaching a prevalence over 0.25, the net saving of tests is still significant. However, separating the samples, creating pool groups, tracking individuals in the groups that yielded a positive result, and retesting all those subjects individually, suppose logistical challenges that every healthcare center must weigh to implement this strategy over the most likely already implemented individual testing strategy. Finally, reaching a prevalence near 0.3, the pool testing strategy becomes similar to the individual testing strategy, thus losing its effectiveness and becoming a logistical problem, rather than optimizing the testing protocols. This is mainly because in the model proposed, whenever a group tests positive, all the individuals of the group should get retested to track the positive subject or subjects in the pooled sample. Therefore, the more positive individuals there are in the population, the highest positive pool tests there will be. Thus, more tests will be lost, and more tests will be used in retesting the positive pool samples. Notice that for large positive groups, further subgrouping and pool testing of those subgroups could be implemented. This could potentially save even more tests; however, it is believed that this approach might suppose a difficult logistical challenge that the progressively collapsing healthcare systems worldwide might not be able to cope for now.

As of 19 April 2020, most countries have prevalences of positive tests that range around 0.1 to 0.2 of all the tests daily performed 29 so a pool testing strategy is still a plausible strategy to implement on a national level. However, for the analysis, the overall historical prevalence was used as a country scenario, but when subjects are further stratified according to clinical suspicion, a lower prevalence of positive tests are expected in lower clinical suspicion groups, so the pool testing strategy could be best implemented in this stratified subgroup rather than the whole population. As it was previously exposed, lower prevalence of positive tests, show greater efficiency in the test use, however, with larger group sizes. One of the main critiques to the pool testing strategy, is the dilution that occurs when pooling the samples together, and how this dilution might affect the test sensitivity. Previous studies have shown that there is no decrease in sensitivity for RT-PCR in detecting other viruses when using pool samples of 10 and 20 subjects, 30 however, as far as the available evidence on SARS-CoV-2 show, samples of five subjects do not affect sensibility of RT-PCR for detecting the virus. 24 The model proposes optimum group numbers that range from 11 to 3 subjects, depending on the individual prevalence. This exquisitely copes with the possibility that larger groups might decrease RT-PCR sensitivity due to the dilution of the pooled sample and it has been proposed that to effectively implement poll testing strategy, the pooled samples should be kept as low as possible. 26, 27 Further developing on this, the model predicts optimum groups of four and three subjects for the prevalence of positive tests that range from 0.1 to 0.2, which are the prevalence that most countries are reporting nowadays. Therefore, it adapts to the clinical reality that the frontline workers all over the world are experiencing on a daily basis. 

This article proposed a simple and landed model to estimate the most optimum group number to implement pool testing strategy for SARS-CoV-2, according to the specific historical positive tests prevalence for a determined healthcare context. The aim of this model is to be implemented in different levels of healthcare facilities fighting the pandemic, given its flexibility to estimate the optimum group number, according to specific prevalence. These particular prevalences might differ from a healthcare facility to another, from one a city to another and might also differ from the country's overall outbreak status. Therefore, it helps to create a tailored to the context implementation of the pool testing strategy for testing individuals with suspected infection by SARS-CoV-2.

One of the main limitations of this study is that it assumes that the RT-PCR for detecting SARS-CoV-2 has a 100% sensitivity to the viral ARN, when the evidence available shows sensitivity to be around 70%. 31 However, astonishing work is currently being done to improve test sensitivity; and addressing this non perfect sensitivity would greatly increase the complexity of the model.

Finally, it is worth mentioning the social implications that implementing pool testing might have. As the pandemic grows and more people get tested, implementing this testing strategy might not be well received by the general public, since patients most likely will want to know if their particular test yielded a positive or a negative result as soon as possible and will likely not accept their particular sample to be mixed with other samples. Therefore, it becomes crucial to develop a strong public health policy to inform the population, secure equal access, and best implement the strategy for the greater good.

The authors are thankful to Dr Ricardo Segovia, MD (Hospital 

A pneumonia outbreak associated with a new coronavirus of probable bat origin

The proximal origin of SARS-CoV-2

Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding

World Health Organization. Statement on the second meeting of the International Health Regulations (2005) Emergency Committee regarding the outbreak of novel coronavirus (2019-NCoV)

Situation report -90. Geneva: WHO

Johns Hopkins Coronavirus Resource Center. COVID-19 Map

Detection of 2019 novel coronavirus (2019-nCoV) by real-time RT-PCR

Detection of SARS-CoV-2 in different types of clinical specimens

Detection of SARS-CoV-2 by RT-PCR in anal from patients who have recovered from coronavirus disease 2019

The presence of SARS-CoV-2 RNA in feces of COVID-19 patients

Reverse-transcription PCR (RT-PCR)

Disease control, civil liberties, and mass testing-Calibrating restrictions during the Covid-19 pandemic

Improved early recognition of coronavirus disease-2019 (COVID-19): single-center data from a Shanghai screening hospital

Laboratory testing strategy recommendations for COVID-19

Centers for Disease Control and Prevention, CDC. Testing in the U.S. Available from

To Understand the Global Pandemic, We Need Global Testing -the Our World in Data

Our World in Data

Combination of RT-qPCR testing and clinical features for diagnosis of COVID-19 facilitates management of SARS-CoV-2 outbreak

Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72314 cases from the Chinese Center for Disease Control and Prevention

The clinical feature of silent infections of novel coronavirus infection (COVID-19) in Wenzhou

Transmission of 2019-nCoV infection from an asymptomatic contact in Germany

Epidemiological analysis of COVID-19 and practical experience from China

Impact of nonpharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand

Pool testing of SARS-CoV-02 samples increases worldwide test capacities many times over. Aktuelles Aus Der Goethe Universitt Frankfurt

Pooling RT-PCR or NGS samples has the potential to cost-effectively generate estimates of COVID-19 prevalence in resource limited environments

Optimizing screening for acute human immunodeficiency virus infection with pooled nucleic acid amplification tests

Data. COVID-19: Confirmed cases vs. tests conducted

Evaluation of saliva pools method for detection of congenital human cytomegalovirus infection

Antibody responses to SARS-CoV-2 in patients of novel coronavirus disease 2019

Optimization of group size in pool testing strategy for SARS-CoV-2: A simple mathematical model

The authors declare that there are no conflict of interests.

http://orcid.org/0000-0001-7233-960X