key: cord-1045622-8gztbxvl authors: Chatzimanolakis, M.; Weber, P.; Arampatzis, G.; Wälchli, D.; Karnakov, P.; Kicic, I.; Papadimitriou, C.; Koumoutsakos, P. title: Optimal Testing Strategy for the Identification of COVID-19 Infections date: 2020-07-26 journal: nan DOI: 10.1101/2020.07.20.20157818 sha: 06d79a39a755169a7958135fbebbf709cb2a674c doc_id: 1045622 cord_uid: 8gztbxvl The systematic identification of infectious, yet unreported, individuals is critical for the containment of the COVID-19 pandemic. We present a strategy for identifying the location, timing and extent of testing that maximizes information gain for such infections. The optimal testing strategy relies on Bayesian experimental design and forecasting epidemic models that account for time dependent interventions. It is applicable at the onset and spreading of the epidemic and can forewarn for a possible recurrence of the disease after relaxation of interventions. We examine its application in Switzerland and show that it can provide timely and systematic guidance for the effective identification of infectious individuals with finite testing resources. The methodology and the open source code are readily adaptable to countries around the world. Introduction all the parameters are assumed to follow uniform prior distributions (see table S5, for details). 111 In Switzerland the first infectious person was reported on February 25 th in the canton of 137 Exponential spreading and optimal testing strategy during non-pharmaceutical interven- 138 tions When the spreading of the coronavirus entered an exponential growth stage, several 139 governments decided to take non-pharmaceutical interventions such as requesting social dis-140 tancing, closing schools and restaurants, or even ordering a complete lockdown in order to 141 contain the epidemic. Here, the goal of the OpTS is to obtain measurements that they help to 142 better assess the effectiveness of these interventions. In this case, priors for the model parameters are informed using data from the spread of Table S2 . 155 If only a single canton is to be selected, tests in the canton of Vaud carried out on the 30 th (Tables S3 and S4) . 187 Given that tests should be carried out in four locations and times, the methodology promotes 188 optimal tests for two different times, within a week, in the cantons of Zurich and Vaud. First, 189 tests should be performed in Zurich, providing high information gain for both considered cases. The next two tests are to be performed in Zurich and Vaud, with a rank that depends on the 191 considered case, while the fourth test should be performed in Vaud. We find that the information 192 gain from the last test is approximately 10% of the cumulative information gain from the first 193 three tests. The number of tests can be then selected according to the available resources. Effectiveness of Optimal Testing We demonstrate the importance of following an OpTS by 195 comparing it with a non-specific testing campaign. We first re-examine the situation at the 196 start of an epidemic and assume that the available resources allow for two randomized tests. We introduced a systematic approach to identify optimal times and locations for randomized 229 tests in order to quantify infectious individuals of a country's population during the COVID-230 19 epidemic. The proposed OpTS exploits prior information and available data to maximize 231 10 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint the expected information gain in parameters of interest and to minimize uncertainties in the 232 forecasts of epidemiological models. In turn, improved forecasts for the virus spread provide 233 rational guidelines for optimal allocation of finite testing resources. The proposed method is demonstrated by focusing on the outbreak of the epidemic in 235 Switzerland. The methodology relies on Bayesian experimental design using prior informa-236 tion and available data of reported infections along with forecasts from the SEI r I u R model. 237 We compute the optimal testing strategy for three phases of the epidemic. First, we quantify Finally, the OpTS can assist monitoring for a recurrence of the disease after preventive measures 245 have been relaxed and help guide further planing of interventions. 246 We remark that the proposed OpTS does not depend on a particular type of data/model or to 247 the country of Switzerland. The open source code is modular, scalable and readily adaptable to 248 different scenarios for the epidemic and countries around the world. We believe that the present 249 work can be a valuable tool for decision makers to allocate resources efficiently for testing (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint 16 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint Figure 2 : Expected information gain during start of epidemic. The blue curve corresponds to the utility of taking one measurement. The green curve is the utility when a second measurement is added, provided the location and time of the first measurement correspond to the maximum of the blue curve (found in the canton of Zurich, on March 2 nd ). Similarly, the yellow and red curves show the utilities for a third and fourth measurements, when the locations and time of the previous measurements are fixed to their optimal values. The fixed dates and location of each measurement are plotted with black dashed lines. The shaded areas indicate the difference to the expected information gain of the previous measurement, which becomes thinner as additional measurements do not yield a further significant information gain. Here blue corresponds to taking one measurement, green to adding a second, yellow to a third and red to a fourth. Below the map we plot the magnitude of the expected information gain of each measurement, along with the optimal measurement dates per canton. 18 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. Figure 4 : Optimal testing strategy to monitor a second outbreak. We use Bayesian inference to determine the parameters of the first infection wave using the data (black dots) of the daily new reported infections up to the 6 th of June (upper plot) and to the 9 th of July (lower plot) . The 99% confidence intervals are plotted in gray. The proposed testing strategy is plotted with vertical bars at the found optimal days. Here blue indicated the utilities for the first measurement. The green bars correspond to the gain in utility when adding a second measurement assuming the first was chosen in the optimal location, where the yellow and red correspond to adding a third and fourth measurement. 19 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint The diagonal shows the histogram for the marginal distribution for every parameter. Purple indicates posterior for the the measurement following the optimal testing strategy, gray the one for the non-specific strategy. The lower half and upper half show the samples of the joint distribution of two parameters for the optimal and the non-specific strategy respectively. Here black indicates low density and yellow high density. 20 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint : Prediction uncertainty for different testing strategies. The black dots show the actual unreported infectious for an artificial spread in Switzerland. The error bounds show the 99% confidence intervals of the model output for samples of the parameters with data obtained by optimal (purple) and non-specific testing (gray). 21 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint The optimal time (day) and location (canton) for testing a population to detect infectious in-2 dividuals is determined via Bayesian optimal experimental design (1-3). This optimal testing 3 strategy (OpTS) relies on combining Bayesian inference and utility theory with forecasting 4 models of the epidemic. We remark that the OpTS does not depend on a particular epidemi-5 ological model or type of data. The methodology is applicable at all stages of the epidemic 6 (inception to re-occurrence). It can operate without data at the early stages of the pandemic and 7 takes advantage of data available at later stages of the pandemic. The methodology is rendered 8 computationally efficient using a sequential optimization algorithm (4). Bayesian Inference from randomized testing We consider a testing campaign including a 10 set (s) of randomized tests s i = (k i , t i ), i = 1, . . . M y performed in location k i and on day 11 t i . These tests measure a quantity of interest (QoI), that is denoted by y(s) = (y 1 , . . . , y My ). Here, y i is the number of unreported infectious individuals, measured through test s i . The QoI can be predicted by a model g(s, ϑ, ϑ) (here the SEI r I u R epidemiological model) that 14 depends on parameters of interest ϑ ∈ R N and nuisance parameters ϑ ∈ RÑ . The distinction 15 between model and nuisance parameters is discussed in later sections. We note that both sets The error ε(s) is assumed to follow a zero-mean multivariate normal distribution N (0, Σ) with 20 covariance matrix Σ ∈ R My×My . The elements of the covariance matrix (Σ s,s ) correspond to 21 2 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. where δ kk is the Kronecker delta, which is 1 for k = k and 0 otherwise. The correlation where s i = (i, t). The parameter c ∈ [0, 0.25] is considered a model parameter. The expec- is taken with respect to all parameters ϑ and ϑ that follow the prior probability 31 distribution with density p(ϑ, ϑ) = p(ϑ)p( ϑ). Under these assumptions, the conditional probability of y on ϑ, ϑ and s is given by where |Σ(s)| is the determinant of the covariance matrix. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint lead to higher information gain. Therefore, the most informative data y correspond to the testing 42 strategy (measurement locations and times) with the highest information gain (20, 21). The OpTS is identified by maximizing a utility function (1, 22) . One choice is the KL 44 divergence u(y, ϑ, s) = D KL p(ϑ|y, ϑ, s) p(ϑ) quantifying the information gain from the 45 data (1). However, since data are not available in the experimental design phase, the utility 46 function is selected here to be the expected KL divergence E y| ϑ,s u(y, ϑ, s) over all data generated by the model prediction error equation (1) . Also, to account for the uncertainty in 48 nuisance parameters ϑ, encoded in the prior distribution p( ϑ), the expectation is also taken with 49 respect to ϑ, which results in the utility function (22) 50 U (s) = E ϑ E y| ϑ,s u(y, ϑ, s) = log p(ϑ|y, ϑ, s) p(ϑ) p(ϑ|y, ϑ, s) dϑ p(y| ϑ, s) dy p( ϑ)d ϑ . the utility function can be simplified to 52 U (s) = log p(y|ϑ, ϑ, s) Note that the expected utility only depends on the locations and times of the measurements 53 via s. The term p(y| ϑ, s) is the model evidence given by 54 p(y| ϑ, s) = p(y|ϑ, ϑ, s) p(ϑ) dϑ . The choice of the prior distribution p(ϑ) for the parameters allows to incorporate prior knowl-55 edge from epidemiology. If no information is available from data, a case encountered in the 56 4 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint beginning of the infection, a uniform prior distribution can be assumed. form the prior distribution, as described later on. In this case, the prior p(ϑ) in equation (7) is 60 replaced by the distribution p(ϑ|d) informed from the data d. In the present work, the assumed nuisance parameters are the correlation time τ and the ini- where S k , E k , I r k and I u k denote the number of individuals in canton k = {1, . . . , K} that are susceptible, exposed, reported infectious and unreported infectious, respectively. We denote by where b 0 , b 1 , θ 0 and θ 1 are the infection rates and mobility factors before and after the inter- As in equation (10) with λ ∈ [0, 0.03], while the mobility factor regains its initial value of θ 0 . 90 6 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. where s = (k, t) is the location and time to be estimated sequentially starting with n = 1 and 107Û n (s) =Û s , s = (s * 1 , . . . , s * n−1 , s) . 7 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. The posterior distribution that will be used subsequently as a data informed prior is obtained 123 using Bayes' theorem and is sampled with a nested sampling algorithm (19). Note the difference to equation (6) 125 and the optimal testing methodology, where we are interested to reduce the uncertainty in 126 8 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. Figure S1 : Marginal posterior distributions with data up to 17 th of March 2020. The used data correspond to the daily reported infectious persons in the cantons of Switzerland. The marginals with a canton label XY correspond to the initial condition I u XY (t = 0) for the unreported cases in that canton. 10 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. Figure S2 : Marginal posterior distributions with data up to 6 th of June 2020. The used data correspond to the daily reported infectious persons in the cantons of Switzerland. The marginals with a canton label XY correspond to the initial condition I u XY (t = 0) for the unreported cases in that canton. 11 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. Figure S3 : Marginal posterior distributions with data up to 9 th of July 2020. The used data correspond to the daily reported infectious persons in the cantons of Switzerland. The marginals with a canton label XY correspond to the initial condition I u XY (t = 0) for the unreported cases in that canton. 12 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint Figure S4 : Maximum a-posteriori prediction with data up to 9 th of July 2020. The red points correspond to the daily reported cases per cantons and the blue curve shows the maximum aposteriori prediction. The 99% confidence interval is plotted in green and based on the sample shown in figure S3 . Figure S5 : Comparison of prediction uncertainty per canton. The predictions are based on optimal strategies and non-specific testing for collection of data. They are also based on the SEI r I u R model output. The error bounds show the 99% confidence intervals of the unreported infectious model output for samples of the parameters with data obtained by optimal (purple) and standard testing (gray). The black dots show the actual unreported infectious for an artificial spread in Switzerland. 14 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint Figure S6 : Comparison of propagated uncertainty per canton. The predictions are based on optimal strategies and non-specific testing. The SEI r I u R model output with added model error for the unreported infectious is shown. The error bounds show the 99% confidence intervals of the model output with added model error for samples of the parameters with data obtained by optimal (purple) and standard testing (gray). The black dots show the actual unreported infectious for an artificial spread in Switzerland. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. Table S1: Maximum expected information gain for outbreak of a new disease. The shown expected information gain per measurement is defined in equation (17). The corresponding optimal dates are shown in parenthesis. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. 17 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. Table S3 : Maximum expected information gain for monitoring of a second outbreak with uninformed b 3 . The corresponding optimal dates are shown in parenthesis. 18 All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. Table S4 : Maximum expected information gain to monitor a second outbreak with informed b 3 . The corresponding optimal dates are shown in parenthesis. All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint All rights reserved. No reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted July 26, 2020. . https://doi.org/10.1101/2020.07.20.20157818 doi: medRxiv preprint A Review of Modern Computational Algorithms for Bayesian Optimal 143 A Bayesian experimental autonomous researcher for mechanical de-145 sign Bayesian clinical trials Optimal Sensor Placement Methodology for Parametric Identification of 148 A Novel Coronavirus from Patients with Pneumonia in China Outbreak of a novel coronavirus The infection fatality rate of COVID-19 inferred from seroprevalence data Which interventions work best in a pandemic? Overcoming the bottleneck to widespread testing: A rapid review of nucleic 157 acid testing approaches for COVID-19 detection Special report: The simulations driving the world's response to COVID-19. Na-159 ture Multi-Stage Group Testing Improves Efficiency 161 of Large-Scale COVID-19 Screening Inferring change points in the spread of COVID-19 reveals the effective-165 ness of interventions Bayesian Experimental Design: A Review Simulation-based optimal Bayesian experimental design for non-169 linear systems Substantial undocumented infection facilitates the rapid dissemination of novel 171 coronavirus (SARS-CoV-2) Probability Theory: The Logic of Science Data-driven inference of the reproduction number for COVID-19 before 174 and after interventions for 51 European countries Dynesty: a dynamic nested sampling package for estimating Bayesian pos-176 teriors and evidences On a measure of the information provided by an experiment of Handbook of Mathematical Economics Estimating Expected Information Gains for Experimental Designs With Ap-183 plication to the Random Fatigue-Limit Model Optimal sensor placement for artificial swimmers The reproductive number of COVID-19 190 is higher compared to SARS coronavirus SARS-CoV-2 Cases communicated by Swiss Cantons and Principality 192 of Liechtenstein (FL Online Optimal flow sensing for schooling swimmers Employed persons by commune of residence and work Online On prediction error correlation in Bayesian 200 model updating The effect of prediction error correlation on optimal sen-202 sor placement in structural dynamics Table S5 : Parameters and prior distributions used in Bayesian inference. Here the data corresponds to the daily reported infections. In all cases, data are used from the 25 th of February 2020, when the first reported case was found in the canton of Ticino. Inference I uses data up to the day non-pharmaceutical interventions were announced (17 th of March 2020). Inference II uses data up to the day measures were relaxed (6 th of June 2020). Inference III uses data up to the 9 th of July 2020. The choice of prior distributions is consistent with the choice found in (16); the ranges used in our study are slightly extended.