key: cord-0326234-fl7jmq20
authors: Reike, T.; Golumbeanu, M.; Shattock, A.; Burgert, L.; Smith, T. A.; Filippi, S.; Cameron, E.; Penny, M. A.
title: Machine learning approaches to calibrate individual-based infectious disease models
date: 2021-01-29
journal: nan
DOI: 10.1101/2021.01.27.21250484
sha: c7fb7884de7003e169cf184441d203076dbbf4d2
doc_id: 326234
cord_uid: fl7jmq20

Individual-based models have become important tools in the global battle against infectious diseases, yet model complexity can make calibration to biological and epidemiological data challenging. We propose a novel approach to calibrate disease transmission models via a Bayesian optimization framework employing machine learning emulator functions to guide a global search over a multi-objective landscape. We demonstrate our approach by application to an established individual-based model of malaria, optimizing over a high-dimensional parameter space with respect to a portfolio of multiple fitting objectives built from datasets capturing the natural history of malaria transmission and disease progression. Outperforming other calibration methodologies, the new approach quickly reaches an improved final goodness of fit. Per-objective parameter importance and sensitivity diagnostics provided by our approach offer epidemiological insights and enhance trust in predictions through greater interpretability.

Background 30 Over the last century, mathematical modelling has become an important tool to analyze and understand disease-and intervention-dynamics for many infectious diseases. Individual-based models (IBMs), where each person is simulated as an autonomous agent, are now widely used.

These mathematical models capture heterogeneous characteristics and behaviors of individuals, and are often stochastic in nature. This bottom-up approach of simulating individuals and 35 transmission events enables detailed, robust and realistic predictions on population epidemic trajectories as well as the impact of interventions such as vaccines or new drugs (1, 2) . Going beyond simpler (compartmental) models to capture stochasticity and heterogeneity in populations, disease progression, and transmission, IBMs can additionally account for contact networks, individual care seeking behavior, immunity effects, or within-human dynamics (1) (2) (3) . As such, (4) and guiding the public health response to the Covid-19 pandemic in multiple countries (5).

IBMs have also been applied to tuberculosis (6), influenza (7), dengue, and many other infectious 45 diseases (2). Within the field of malaria, several IBMs have been developed over the last 15 years and have been used to support understanding disease and mosquito dynamics (8-10), predict the public health impact or carry out economic analyses of (new) interventions (11) (12) (13) (14) ; and investigate drug resistance (15). Many have had wide-reaching impact, influencing WHO policy recommendations (11, (16) (17) (18) or strategies of national malaria control programs (19) .

For model predictions to be meaningful, modelers need to ensure their models accurately capture abstractions of the real world. The potential complexity and realism of IBMs often come at the cost of long simulation times and potentially large numbers of input parameters, whose exact values are often unknown. Parameters may be unknown because they represent derived 55 mathematical quantities that cannot be directly measured or require elaborate, costly experiments (for example shape parameters in decay functions (20)), because the data required to derive them in isolation is incomplete or accompanied by inherent biases, or because they interact with other parameters. 4 step selection of the fittest individuals evolves the population towards the nearest fitness peak 90 making valleys of neutral or lower fitness difficult to cross (25).

Other solutions to fit similarly detailed IBMs of malaria employ a combination of directly extracting parameter values from the literature where information is available, and fitting the remainder using multi-stage, modular Bayesian Markov Chain Monte Carlo (MCMC)-based methods (26) (27) (28) (29) (30) (31) . For these models, multiple fitting objectives are often not addressed 95 simultaneously. Rather, to our knowledge, most other malaria IBMs are divided into functional modules (such as the human transmissibility model, within-host parasite dynamics model, and the mosquito or vector model), which are assumed to be influenced by only a limited number of parameters each. The modules are then fit independently and in a sequential manner (27) (28) (29) (30) (31) .

Modular approaches reduce the dimensionality of the problem, allowing for the use of relatively 100 straightforward MCMC algorithms. However, these struggle with efficiency in high dimensions as their Markovian nature requires many sequential function evaluations (10 4 -10 7 even for simple models), driving up computing time and computational requirements (32). Additionally, whilst allowing for the generation of posterior probability distributions of the parameters (30), the modular nature makes sequential approaches generally unable to account for interdependencies 105 between parameters assigned to different modules and how their co-variation may affect disease dynamics.

Progress in recent years on numerical methods for supervised, regularized learning of smooth functions from discrete training data allows us to revisit calibration of detailed mathematical 110 models. In particular, Bayesian optimization with Gaussian processes has gained popularity as an efficient approach to tackle expensive optimization problems, for example in hyperparameter search problems in machine learning (33, 34) . Assuming that the parameter-solution space exhibits a modest degree of regularity, a prior distribution is defined over a computationally expensive objective function by the means of a light-weight probabilistic emulator such as a Gaussian 115 process. The constructed emulator is sequentially refined by adaptively sampling the next training points based on acquisition functions derived from the posterior distribution. The trained emulator model is used to make predictions over the objective functions from the input space with minimum evaluation of the expensive true (simulator) function. Purely sampling-based iterative approaches 5 (like genetic algorithms) are usually limited to drawing sparse random samples from proposals 120 located nearby existing samples in the parameter space. In contrast, the use of predictive emulators permits exploration of the entire parameter space at higher resolution. This increases the chances of finding the true global optimum of the complex objective function in question and avoiding local optima.

Here, we introduce a novel approach for calibrating IBMs (Fig. 1 ) based on Bayesian optimization 125 and incorporating machine learning algorithms. We prove the strength and versatility of our approach by calibrating the OpenMalaria model and optimizing its 23 input parameters using realworld data on 11 epidemiological outcomes in parallel. To emulate the solution space, we explore and compare two prior distributions, namely a Gaussian process (GP) emulator and a superlearning algorithm in form of a Gaussian process stacked generalization (GPSG) emulator. 130 We first use a Gaussian process (GP) emulator to emulate the solution space. Whilst GP emulators provide flexibility whilst retaining relative simplicity (34) and have been used previously as priors in Bayesian optimization (33), stacked generalization algorithms have not. They provide a potentially attractive alternative as they have been shown to outperform GPs and other machine learning algorithms in capturing complex spaces (13, 35) . The stacked generalization algorithm 135 (35) builds on the idea of creating ensemble predictions from multiple learning algorithms (level 0 learners). The cross-validated predictions of the level 0 learners are incorporated into a general learning system (level 1 meta-learner). This allows for the combination of memory-efficient and probabilistic algorithms in order to reduce computational time, whilst retaining probabilistic elements required for adaptive sampling. We prove the superior performance and speed of the 140 Bayesian optimization calibration scheme to classical genetic algorithm approaches and thus propose a novel modus operandi to parameterize complex mathematical models. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. 9 loss functions, ! ( Fig. 2A ). This is sufficient as the aim of both emulators within the Bayesian 195 optimization framework is to find minimal loss function values rather than an overall optimal predictive performance for all outcome values. Examples of truth vs predicted estimates on a 10% holdout set are provided in Figure 2A (additional plots for all objectives can be found in supplementary figures S2-S5). A satisfactory fit of the simulator was previously defined by a loss function value of ! = 73.2 (20) . The previous best model fit derived using the GA had a weighted 200 sum of the loss functions of ! = 63.7 (20). Satisfactory fit was achieved by our approach in the first iteration of the GPSG-based Bayesian optimization algorithm (GPSG-BO), and after six iterations for the GP-based algorithm (GP-BO) (Fig. 2B) . The current best fit was approximately retrieved after six iterations for the GPSG-BO algorithm and after nine iterations for GP-BO, and was improved by both algorithms after ten iterations (returning final values ! = 58.3 for GP-BO 205 and 59.6 for GPSG-BO). This shows that the Bayesian optimization approach with either of our emulators very quickly achieves a better simulator fit than obtained with a classical GA approach that was previously employed to calibrate OpenMalaria. Of the two emulators, the GP approach finds a parameter set associated with a better overall accuracy and the GPSG reaches satisfactory values faster (both in terms of iterations and CPU time). A likely explanation for this is that the 210 GPSG-BO is unable to propagate its full predictive variance into the acquisition function. Only uncertainty stemming from the level 1 probabilistic learner (GP) is therefore captured in the final prediction. This leads to underestimation of the full predictive variance, and a bias towards exploitation in the early stages of the GPSG-BO algorithm (as illustrated by early narrow sampling, see supplementary figures S6-7). 215 Figure 2C shows examples of the posterior estimates returned by the optimization algorithms in context of the log prior distributions for the parameters with the greatest effects on ! (see also figure 3C). All algorithms return parameter values within the same range and (apart from parameter 4), clearly distinct from the prior mean. The fact that highly similar parameter values are identified by multiple algorithms strengthens confidence in the final parameter sets yielded by 220 the algorithms.

The best fit parameter sets yielded by our approach are provided in the supplement (Table S2) .

Importantly, after ten iterations of the GPSG-BO algorithm (approximately 7 days), and 20 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 iterations for the GP-BO algorithm (approximately 12 days), both approaches yielded similar 225 values of the 11 objective loss functions, along with similar weighted total loss function values, and qualitatively similar visual fits and predicted trends to the data (Fig. 3A -B and supplement).

We found this to be an unexpectedly fast result of the two algorithms. Details of the algorithm's best fits to the disease and epidemiological data are shown in Figures S8-S18. Overall, several objectives had visual and reduced loss-function improvements, for example to the objective on the 230 multiplicity of infection (Fig. 3A) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. infection prevalence in individuals aged 2-10 years (36)). Figure 3C shows Sobol total effect indices quantifying the importance of individual parameters and describing each parameter's contributions to the outcome variance for each objective. Our results indicate that most objectives are influenced by multiple parameters from different groups, albeit to varying degrees, thus highlighting the importance of simultaneous multi-objective fitting.

Clusters of influential parameters can be observed for most objectives; for example, parameters associated with incidence of acute disease influence clinical incidence and pyrogenic threshold objectives. Some parameters have strong influence on multiple objectives, such as parameter 4, the critical value of cumulative number of infections and influences immunity acquisition; and parameter 10, a factor required to determine the pyrogenic threshold, which we find to be a key 255 parameter determining infections progressing to clinical illness.

In order to test if our algorithms can recover a known solution, the final parameter sets for both approaches were used to generate synthetic field data sets, and our approaches were subsequently applied to recover the known parameter set. For the GP, 13 of the 23 parameters were recovered 260 ( Figure S19A ). Those not recovered largely represented parameters to which the weighted loss function was found to be insensitive ( Figure 3C ). Thus, rather than showing a shortcoming of the calibration algorithm, this suggests a potential for dimensionality reduction of the simulator and re-evaluation of its structure. However, this was beyond the scope of the current work to develop a fast and powerful calibration methodology for IBMs.

The new parameterizations for OpenMalaria were further explored to assess key epidemiological relationships, in an approach similar multiple-model comparison in Penny et al. 2016 (11) . We 12 examined incidence and prevalence of disease, as well as incidence of mortality for multiple archetypical settings, considering a range of perennial and seasonal transmission intensity and 270 patterns. The results are presented in Figure 3B and Figures S20-30. The new parameterizations result in increased predicted incidence of severe episodes and decreased prevalence for all transmission intensities (thus also slightly modifying the prevalence-incidence relationship).

While we found that the overall implications for the other simulated epidemiological relationship were small, the differences in predictions for severe disease may carry important implications for 275 public health decision making. We conclude that our new parameterizations do not fundamentally bring into question previous research conducted using OpenMalaria, but we do suggest reevaluation of adverse downstream events such as severe disease and mortality.

Calibrating individual-based models can be challenging as many techniques struggle with high 280 dimensionality, or become infeasible with long model simulation times and multiple calibration objectives. However, ensuring adequate model fit to key data is vital, as this impacts the weighting we should give model predictions in the public health decision making process. Our machine learning and Bayesian optimization approaches provide fast solutions to calibrating individualbased models while improving model accuracy, and by extension prediction accuracy.

Using our novel Bayesian optimization approach, we calibrated a detailed simulator of malaria transmission and epidemiology dynamics with 23 input parameters simultaneously to 11 epidemiological outcomes, including age-incidence and -prevalence patterns. The use of a probabilistic emulator to predict goodness-of-fit, rather than conducting sparse sampling, allows for cheap evaluation of the simulator at many locations and increases our confidence that the final 290 parameter set represents a global optimum. Our approach provides a fast calibration whilst also providing a better fit compared to the previous parameterization. We are further able to define formal endpoints to assess calibration alongside visual confirmation of goodness of fit (20, 27), such as the emulator's predictive variance approaching the observed simulator variance. The emulator's ability to quantify the input stochasticity of the simulator also enables simulation at 295 small population sizes, contributing to fast overall computation times.

Despite the demonstrated strong performance of stacked generalization in other contexts such as geospatial mapping (13, 35, (37) (38) (39) (40) , we found that using a superlearning emulator for Bayesian . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 13 optimization was not superior to traditional GP-based methods. In our context using GPSG sped up convergence of the algorithm, but both approaches, GP and GPSG, led to equally good fits. Our methodology constitutes a highly flexible framework for individual based model calibration.

Both algorithms can be applied to other parameterization and optimization problems in disease modelling and also in other modelling fields, such as physical or mobility and transport models.

Furthermore, in the GPSG approach, additional or alternative level 0 can be easily incorporated.

Possible extensions to our approach include combination with methods to adaptively reduce the input space for constrained optimization problems (42), or other emulators may be chosen 315 depending on the application. For example, homoscedastic GPs, which are faster than the heteroscedastic approach presented here, may be sufficient for many applications (but not for our IBM in which heteroscedastic was required due to the stochastic nature of the model).

Alternatively, the computational power required by neural net algorithms scales only linearly (compared with a nominal cubic scaling for GPs) with the sample size, and we envisage wide 320 applications for neural net-based Bayesian optimization in high dimensions. In our example, the bilayer neural net algorithm completed training and prediction within seconds whilst maintaining very high predictive performance. Unfortunately, estimating the uncertainty required for good acquisition functions is difficult in neural networks, but solutions are being developed (34, 43).

These promising approaches should be explored as they become more widely available in high-325 level programming languages. With the increased availability of code libraries and algorithms,

Bayesian optimization with a range of emulators is also becoming easier to implement.

The probabilistic, emulator-based calibration approach is accompanied by many benefits, including relatively quick global sensitivity analysis. As explored in this work, GP-based methods 14 are easily coupled with sensitivity analyses, which provide detailed insights into a model's 330 structural dependencies and the sensitivity of its goodness of fit to the input parameters. To the best of our knowledge, no other individual-based model calibration study has addressed this. In the case of malaria models, we have shown the interdependence of all OpenMalaria model components and a relative lack of modularity. In particular, within-host immunity-related parameters were shown to influence all fitting objectives, including downstream events such as 335 severe disease and mortality when an infection progresses to clinical disease. Thus, calibrating within-host immunity in the absence of key epidemiology and population outcomes can lead to suboptimal calibration and ultimate failure of the model to adequately capture disease biology and epidemiology.

We have employed a new approach to calibrating malaria models compared with previous 340 methods, but reach broadly similar comparisons to the natural history of disease. We also attainted a slightly improved but similar goodness of fit, the main benefit being improved fitting times and the ability to measure parameter importance. Given the high number of influential parameters for each epidemiological objective in our parameter importance investigations, and the overlap between parameter-objective associations, we argue that, where possible, multi-objective fitting 345 should be preferred over purely sequential approaches. Our approach confirms that using a parallel approach to parameterization rather than a modular, sequential, one captures the joint effects of all parameters and ensures that all outcomes are simultaneously accounted for. To the best of our knowledge, no model of malaria transmission of comparable complexity and a comparable number of fitting objectives was simultaneously calibrated to all its fitting objectives. Disregarding the 350 joint influence of all parameters on the simulated outcomes may negatively impact the accuracy of model predictions, in particular on policy-relevant outcomes of severe disease and mortality.

Despite providing relatively fast calibration towards a better fitting parameter set, several limitations remain in our work. We have not systematically tested that a global optimum has been reached in our new approach, but assume it is close to a global minimum for the current loss-355 functions defined, as further iterations did not yield changes, and both the GP and GPSG achieved similar weighted loss function and parameter sets. We aimed to improve the algorithm to calibrate detailed IBM, but we did not incorporate new data, which will be important moving forward as our parameter importance and validation analysis highlights several key epidemiological outcomes on severe disease and mortality are sensitive to results.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint

The key limitations of Bayesian optimization, particularly when using a Gaussian process emulator, are the high computational requirements in terms of memory and parallel computing nodes due to increasing runtimes and cubically scaling memory requirements of GPs. Memory limits may thus be reached before the predictive variance approached its limit. Furthermore, we chose an acquisition function with high probability to be no regret (41), but this likely 365 overemphasizes exploration in the early stages of the algorithm considering the dimensionality of the problem and finite runtime. We opted here for pure exploitation every 5 iterations, but a more formal optimization of the acquisition function should be explored. The GPSG approach presented here can partially alleviate this challenge, depending on the choice of learning algorithms, but the iterative nature and need for many simulations remain. Memory-and time-saving extensions are 370 thus worth exploring, such as incorporating GPU computing or adaptively constraining the prior parameter space, dimensionality reduction, or addressing alternative acquisition functions.

Additionally, as with all calibration methodologies, many choices are left to the user, such as the size of the initial set of simulations, the number of points added per iteration, or the number of replicates simulated at each location. There is no general solution to this as the optimal choices are 375 highly dependent on the problem at hand, and we did not aim to optimize these. Performance might be optimized further through a formal analysis of all these variables, however the methodology here is already fast, effective, and highly generalizable to different types of simulation models and associated optimization problems. Improving the loss-functions or employing alternative Pareto front efficiency algorithms was not the focus of our current study but would be a natural extension 380 of our work, as would be alternative approaches to the weighting of objectives, which remains a subjective component of multi-objective optimization problems (44).

A model's calibration to known input data forms the backbone of its predictions. Our workflow presented here provides large advances in the calibration of detailed mathematical models of infectious disease. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. R 2002-2003 (2004) . . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. Tables S1-S6

References . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. Figs. S1 to S32

Tables S1 to S6

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. 

Disease transmission models generally have two types of parameter inputs: core parameters, inherent to the disease and determining how its natural history is captured, and simulation options characterizing the specific setting and the interventions in place ( Figure 1A in the main manuscript).

The simulation options specify the simulation context such as population demographics, transmission intensity, seasonality patterns and interventions and typically vary depending on the simulation experiment. In contrast, the core parameters determine how its epidemiology and aetiopathogenesis are captured. These include parameters for the description of immunity (e.g. decay of maternal protection), or for defining clinical severe episodes (e.g. parasitemia threshold). To inform the estimation of core parameters, epidemiological data on the natural history of malaria extracted from published literature and collated in previous calibrations of OpenMalaria (3, 20, 23) were re-used in this calibration round. These include demographic data such as age-stratified numbers of host individuals which are used to derive a range of epidemiological outcomes such as age-specific prevalence and incidence patterns, mortality rates and hospitalization rates.

Site-specific OpenMalaria simulations were prepared, representing the studies that yielded these epidemiological data in terms of transmission intensity, seasonal patterns, vector species, intervention history, case management, and diagnostics (23). The mirroring of field study characteristics in the simulation options ensured that any deviation between simulation outputs and data could be attributed to the core parameters. Age-stratified simulation outputs to match to the data include numbers of host individuals, patent infections, and administered treatments. A summary of the data is provided in the supplementary text 2.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint

In our proposed Bayesian optimization framework ( Figure 1 ) we evaluated the deviation between simulation outputs and the epidemiological data by training probabilistic emulator functions that approximate the relationship between core parameter sets and goodness of fit. To test the optimization approach in this study we considered the original goodness of fit metrics for OpenMalaria detailed in (20) and in supplementary text 2, which uses either Residual Sum of Squares (RSS) or negative log-likelihood functions depending on the epidemiological data for each objective (20, 23).

The objective function to be optimized is a weighted sum of the individual objectives' loss functions. We adopted a Bayesian optimization framework where a probabilistic emulator function is constructed to make predictions over the loss functions for each objective from the input space, with a minimum amount of evaluations of the (computationally expensive) simulator.

We compared two emulation approaches. Firstly, a heteroskedastic Gaussian process (GP) emulator and secondly a stacked generalization emulator (35). For approach 1 (GP-BO), we fitted a heteroskedastic Gaussian process with the input noise modelled as another Gaussian process (45) with a Matérn 5/2 kernel to account for the high variability in the parameter space ( Figure 1C ) (33, 46) . For approach 2 (GPSG-BO), we selected a two-layer neural network (47-49), multivariate adaptive regression splines (50), and a random forest algorithm (51, 52) as level 0 learners.

With each iteration of the algorithm, the training was extended using adaptive sampling based on an acquisition function (lower confidence bound) that accounts for uncertainty and predicted proximity to the optimum of proposed locations ( Figure 1B ). As the emulator performance converges (as assessed by its predictive performance on the test set) we gain confidence in the currently predicted optimum. heterogeneity in humans (in exposure, immunity, and clinical progression) as well as aspects of vector ecology (e.g. seasonality and the mosquito feeding cycle). Stochasticity is featured by including between-and within-host stochastic variation in parasite densities with downstream effects on immunity (23). OpenMalaria further includes aspects of the health system context (e.g. treatment seeking behavior and standard of care) (3, 23) with additional probabilistic elements such as treatment seeking probabilities or the option for stochastic results of diagnostic tests. An ensemble of . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10. 1101 /2021 OpenMalaria model alternative variants is available defined by different assumptions about immunity decay, within-host dynamics, heterogeneity of transmission, along with more detailed sub-models that track parasite genetics, and pharmacokinetic and pharmacodynamics. The models allow for the simulation of interventions, such as the distribution of insecticide treated nets (ITNs), vaccines, or reactive case detection (53, 54), in comparatively realistic settings. Full details of the model and the history of calibration can be found in the original publications (3, 20, 23) and are summarized in supplementary texts 1 and 2. In our application, we use the term simulator to refer to the OpenMalaria base model variant (20).

Aim Let !(#) denote a vector of loss functions obtained by calculating the goodness of fit between simulation outputs and the real data (full details of loss function can be found in supplementary text 2). In order to ensure a good fit of the model, we aim to find the parameter set # that achieves the minimum of the weighted sum of 11 loss functions (corresponding to the 10 fitting objectives

is the value of objective function + at # and ) ! is the weight assigned to objective function +:

The weights are kept consistent with previous rounds of calibration and chosen such that different epidemiological quantities contributed approximately equally to &(#) (see supplementary text 2).

Step 1: Initialization. Let 5 = 23 denote the number of dimensions of the input parameter space 8 and 9 = 11 the number of objective functions * ! (;), + = 1, … , 11. Prior distributions consistent with previous fitting runs (20) were placed on the input parameters. As each parameter is measured in different units, we sampled from the 5-dimensional unit cube 8 and converted these to quantiles of the prior distributions (20) (supplementary text 2and Figure S6 ). Previous research suggests that in high-dimensional spaces quasi-Monte Carlo (qMC) sampling outperforms random or Latin Hypercube designs for most function types and leads to faster rates of convergence (55, 56). We therefore used Sobol sequences to sample 1,000 initial locations from 8. The GP can account for input stochasticity of the simulator. For each sample, we simulated 2 random seeds at a population size of 10,000 individuals. Additionally, 100 simulations were run at the centroid location of the unit cube to gain information on the simulator noise. Using small noisy simulations with small populations speeds up the fitting as the noisy simulations are less computational expensive than larger population runs.

Replicates were used to detect signals in noisy settings and estimate the pure simulation variance (45).

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10. 1101 /2021 Computational savings were later achieved through pre-averaging of replicates (45). The 2000 unique locations were randomly split into a training set (90%) and a test set (10%). All simulator realizations at the centroid were added to the training set.

Step 2: Emulation

Each emulator type for each objective function was trained in parallel to learn the relationships between the normalized input space 8, and the log-transform of the objective functions !(#). In each dimension > ∈ 5, the mean @ % and standard deviation A % of the training set were recorded, > = 1, … , 23.

We randomly sampled 500,000 test locations in 8 from a multivariate normal distribution with mean # &'( and covariance matrix B, where # &'( is the location of the current best location and B is determined based on previously all sampled locations, and scaled each dimension to mean @ % and standard deviation A % . The trained emulators were used to make predictions C(#) D of the objective functions C(#) at the test locations. Mean estimates, standard deviations, and nugget terms were recorded. The full predictive variance at each location # ∈ 8 corresponds to the sum of the standard deviation and nugget terms. From this, we derived the weighted sum (#) = ∑ ) ! * "" !#" ! (#) , using weights E consistent with previous fitting runs (Smith 2012) with greater weighting for further downstream objectives. The predicted weighted loss function at location # was denoted & F (#) with a predicted mean @)(#) and variance A H ) (#). Every 15 iterations, we increase the test location sample size to 5 Million to achieve denser predictions.

Step 3: Acquisition. We chose the lower confidence bound (LCB) acquisition function to guide the search of the global minimum. Lower acquisition corresponds to potentially low values of the weighted objective function, either because of a low mean prediction value or large uncertainty (57) where R + is the number of previous unique realisations of the simulator at iteration t, and U = 0.01 is a hyperparameter (Srinivas 2010). We choose this method as with high probability it is no regret (Brochu 2010 , Srinivas 2010 . With increasing iterations, confidence bound-based methods naturally transition from mainly exploration to exploitation of the current estimated minimum. In addition to this, we force exploitation every 10 iterations by setting N + = 0).

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10. 1101 /2021 Step 4: Simulate. The simulator was evaluated at locations identified in step 3 and the realisations were added to the training set. Steps 2-4 were run iteratively. The Euclidian distance between locations of current best realisations was recorded.

Step 5: Convergence. Convergence was defined as no improvement in the best realisation, argmin 0 C.

We compared two emulation approaches. Firstly, a heteroskedastic GP emulator and secondly a stacked generalization emulator (35) using a two-layer neural net, multivariate adaptive regression splines (MARS) and a random forest as level 0 learners and a heteroskedastic GP as level 1 learner:

We fitted a Gaussian process with the input noise modelled as another Gaussian process (45). After initial exploration of different kernels, we chose a Matérn 5/2 kernel to account for the high variability in the parameter space. A Matérn 3/2 correlation function was also tested performed equally. Each time the model was built (for each objective at each iteration), its likelihood was compared to that of a homoscedastic Gaussian process and the latter was chosen if its likelihood was higher. This resulted in a highly flexible approach, choosing the best option for the current task.

Stacked generalization was first proposed by Wolpert 1992 (35) and builds on the idea of creating ensemble predictions from multiple learning algorithms (level 0 learners). In superlearning, the crossvalidated predictions of the level 0 learners are fed into a level 1 meta-learner. We compared the 10fold cross-validated predictive performance of twelve machine learning algorithms on the test set. All algorithms were accessed through the mlr package in R (58). We compared two neural network algorithms (brnn (48) for a two layer neural network and nnet for a single-hidden-layer neural network (59)), five regression algorithms (cvglmnet (60) for a generalised linear model with LASSO or Elasticnet Regularization and 10-fold cross validated lambda, glmboost (61) for a boosted generalized linear model, glmnet (60) for a regular GLM with Lasso or Elasticnet Regularisation, mars for multivariate adaptive regression splines, and cubist for rule-and instance-based regression modelling), three random forest algorithms (randomForest (52), randomForestSRC (62) and ranger (63)), and a tree-like node harvesting algorithm (nodeHarvest (64)). Extreme gradient boosting and support vector regression were also tested but excluded from the comparison due to its long runtime. Their performance was compared with regards to runtimes, and correlation coefficients between predictions on the test set and the true values. Based on these, we selected the two-layer neural network (brnn), multivariate adaptive regression spline (mars), and random forest (randomForest) algorithms. This ensemble of machine learning models constituted the level 0 learners and was fitted to the initialization set. Out-of-sample predictions from a 10-fold cross validation of each observation . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint were used to fit the level 1 heteroskedastic Gaussian process. As in approach 1, we opted for a Matérn 5/2 kernel and retained the option of changing to a homoscedastic model where necessary.

We ascertained that both emulators captured the input-output relationship of the simulator by tracking the correlation between true values ! and predicted values ! F on the holdout set of 10% of initial simulations with each iteration (truth vs predicted R2 0.51-0.89 for GP vs 0.37-0.77 for GPSG after initialization, see supplement S1). Transition from exploration to exploitation during adaptive sampling was tracked by recording the distribution of points selected during adaptive sampling in each iteration ( Figures S2 and S3 ).

A global sensitivity analysis was conducted on a heteroskedastic GP model with Matérn 5/2 kernel that was trained on all training simulation outputs (n=5,400) from the fitting process. We used the Jansen method of Monte Carlo estimation of Sobol' sensitivity indices for variance decomposition (65, 66) with 20 000 sample points and 1000 bootstrap replicates. Sobol' indices were calculated for all loss functions ! as well as for their weighted sum C and in all dimensions. Whilst keeping the number of sample points to as low as possible for computational reasons, we ascertained that first-order indices summed to 1 and total effects >1. We further ensured that the overall results of the Sobol' analysis were consistent with the results of other global sensitivity analyses, namely the relative parameter importance derived from training a random forest ( Figure S32 ).

Synthetic field data was generated by forward simulation using the final parameter sets from each optimization process. The two optimization algorithms were run anew using the respectively generated synthetic data to calculate the goodness of fit statistics. The parameter sets retrieved by the validation were compared against the parameterization yielded by the optimization process.

We conducted a small experiment to compare key epidemiological outcomes from the new parameterizations with the original model and that detail in a four malaria model comparison in Penny et al. 2016 (11). We simulated malaria in archetypical transmission and seasonality settings using the different parameterizations. The experiments were set up in a full-factorial fashion, considering the simulation options described in Table S1 . Monitored outcomes were the incidence of uncomplicated, severe disease, hospitalizations, and indirect and direct malaria mortality over time and by age, prevalence over time and by age, the prevalence-incidence relationship, and the EIR-prevalence relationship. Simulations were conducted for a population of 10,000 individuals over 10 years.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint 2 SUPPLEMENTARY TEXT 1: MALARIA TRANSMISSION MODEL

We test our calibration algorithm on OpenMalaria, an individual-based model of malaria dynamics. To provide context of the model's structure and the role of the fitted parameters (see supplementary text 1), we here briefly describe its main features and key equations. This description is adapted from that provided in Smith et al. 2012 (20) and Smith et al. 2006 (23) . Full details of all model components can be found in The American Journal of Tropical Medicine and Hygiene, Volume 75, Issue 2 Supplement (2006).

OpenMalaria features discrete individual-based stochastic simulations of malaria in humans in 5-day time steps. Every infection and individual are characterized by a set of continuous state variables, namely, parasite densities, infection durations, and immune status. Key processes and relationships regarding the course of infection simulated by model include the attenuation of inoculations, acquired pre-erythrocytic immunity, acquired blood-stage immunity, morbidity (acute and severe) and mortality (malaria-specific and indirect), anemia, and the infection of vectors as a function of parasite densities in the human. Other model components include a vector model and a case management system. All individual components have previously been well documented (20, 23). A visual summary of the model with references to further details on each component is provided in Figure S1 .

In our current recalibration only the original (base) model variant is used to test our new approach (20). Parameters estimated during the calibration process are highlighted and summarized in Table S2 at the end of this section. Other parameter values were drawn from the literature or were calibrated to separate data: for example, the empirical parasite density model of Maire et al. 2006 (67) was calibrated to malariatherapy (68) data and not recalibrated at the population level. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint

The seasonal pattern of entomological inoculation rate (EIR) determines seasonal pattern of transmission and thus the parasite densities in the individual, modified by natural or acquired immunity and interventions (23).

In the base model, the expected number of entomological inoculations experienced by individual + of age X at time I is where Y 123 (I) refers to the annual entomological inoculation rate (EIR) computed from human bait collections on adults and Z(), is the individual's availability to mosquitoes, assumed to be proportional to average body surface area, depending only on age . Z[X(+, I)\ increases with age up to age 20 years where it reaches a value of Z 123 (the average body surface of people ≥ 20 years old in the same population).

The biting rate in relation to human weight is based on data from The Gambia published by Port and others (76), where the proportion of mosquitoes that had fed on a host were analyzed in relation to the host's contribution to the total biomass and surface area of people sleeping in one mosquito net (69).

The number of infective bites received per unit time for each individual +, adjusted by age, is given by Eq. The susceptibility of individual + at time I, ^(+, I) is defined as:

where ^! 11 , g 6 * , Y * , l 6 and ^4 are constants representing the lower limit of success probability of inoculations in immune individuals, critical value of cumulative number of entomologic inoculations, critical value of Y 2 (+, I), steepness of relationship between success of inoculation and g 6 (+, I), and, the lower limit of success probability of inoculations at high where Y 2 (+, I), respectively. Here 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint ^4and Y * are fixed to ^4 = 0.049, and Y * = 0.032 inoculations/person-night and are detailed in (69).

The model for each individual infection u in host + comprises a time series of parasite densities. The base model for infection within humans is described in Maire et al. 2006 (67) . In brief, the duration of each infection, N 123 is sampled from parameterised against malaria therapy data (68) 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. which measures the effect of maternal immunity. g B * , g C * , 5 3 , X 1 * , and Ä 1 are all constants estimated in the fitting process. These constants are described in Table S2 , or further in Maire et al. 2006 (67) .

Variation within individuals described as A B . (+, u, N), where and A D . and g E * are constants, described in Table S2 .

The simulated density of infection u in individual + at time N is then drawn from a normal distribution:

The total density of all infections in individual + at time t is then the sum of the densities of concurrent infections u

The model infectivity of the human host is described in Ross 2006 where infectivity of individual I at time t is given by the distributed lag model:

where I is in 5-day units and

where Å " , Å . , Å F , Ç, A G . are constants representing contributions of past infections to gametocyte densities (detailed in Table S2 ), and to be calibrated at the population level. We define 5 C (+, I) = 1 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. is the total vectorial capacity)

In order to simulate the clinical state of individual + at time I, for each five-day time step 5 independent samples from the simulated parasite density distribution are drawn for each concurrent infection u.

The model for an episode of acute morbidity was originally described in (74) and occurs in individual + at time I with probability where w * is the pyrogenic threshold and w 123 is the maximum density of five daily densities sampled during the five-day interval I.

where * " [w(+, I)\ is a function describing the relationship between accrual of tolerance and the parasite density w(+, v); * . (w * (+, I)) describes the saturation of this accrual process at high values of

å M (I) = ã ∑ Z[X(+, I)\ë 1 (+, I) 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint w * and î ïw * (+, I) determines the decay threshold with first-order kinetics, ensuring that the parasite tolerance is short-lived (74).

Here * " [w(+, I)\ is defined to ensure that the stimulus is not directly proportional to w but rather that it asymptotically reaches a maximum at high values of w:

At high values of w * , a higher parasite load is required to achieve the same increase:

Thus, the pyrogenic threshold w * is defined to follow and the initial condition w * (+, 0) = w > * at the birth of the host, where Ä, î ï w > , w " * and w . * are targets of the calibration, and are defined in Table S2 .

The model for severe disease was described in Ross et al 2006 (75) and two different classes of severe episodes are considered by the model, ó " and ó . . o Q" (+, I) is the probability that an acute episode (Z) is of class ó " and where w Q" * is a constant to be calibrated and ò(+, I) is the clinical status of individual + at time I.

Class ó . of severe malaria episodes occurs when an otherwise uncomplicated episode coincides with some other insult, which occurs with risk

where & > is the limiting value of &[X(+, I)\ at birth and X ) * is the age at which it is halved and both are to be calibrated.

The probability that individual + experiences an episode belonging to class ó . at time I, conditional on there being a clinical episode at that time is

The age ant time specific risk of severe malaria morbidity conditional on a clinical episode is then given by * " [w(+, I)\ = Äw(+, I) w " * + w(+, I) .

(25)

* . [w * (+, I)\ = 1 w . * + w * (+, I) .

(26) >w * (+, I) >I = Äw(+, I) (w " * + w(+, I))(w . * + w * (+, I)) − î ïw * (+, I),

o Q . (+, I) = Pr(ò(+, I) ∈ ó " |ò(+, I) ∈ Z) = w 123 (+, I) w Q" * + w 123 (+, I) ,

o Q / (+, I) = Pr(ò(+, I) ∈ ó . | ò(+, I) ∈ Z) = &(X(+, I)). 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint

Malaria deaths in hospital are a random sample of admitted severe malaria cases, with age-dependent sampling fraction ö C (X), the hospital case fatality rate, derived from the data of Reyburn et al (2004) (77). The original model was described in Ross et al. 2006 (75) .

The severe malaria case fatality in the community for age group X, ö R (X) is estimated as where õ " the estimated odds ratio for death in the community compared to death in in-patients is an age-independent constant to be calibrated and ö C (X) is the hospital case fatality rate. The total malaria mortality is the sum of the hospital and community malaria deaths.

The risk of neonatal mortality attributable to malaria (death in class 5 " ) in first pregnancies is set equal to 0.3@ S9 where where ú S9 is related to ú T9 , the prevalence in simulated individuals 20-24 ears of age via and ú T9 * and ú S9 * are constants to be calibrated and are detailed in Table S2 . occur 30 days (six time steps) after the provoking episode.

o , / (+, I) = Pr(ò(+, I) ∈ 5 . |ò(+, I) ∈ Z),

o , / (+, I) = ö ,

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. 

A comprehensive epidemiological calibration dataset was collated in order to parameterize OpenMalaria. This calibration dataset covers a total of eleven different epidemiological relationships (or objectives for fitting) that span important aspects of the natural history of malaria. Data were collated from different settings (see Table S3 for summary) and were detailed in the original model descriptions (23, 74) and a later parameterization (20). A total of 61 simulation scenarios were setup to parameterize OpenMalaria, constructed to simulate the study surveys and study sites that yielded the calibration dataset. The study site observations were replicated in OpenMalaria by reproducing the timing of the surveys and their endpoints (such as prevalence and incidence) and matching simulation options to the setting with regards to transmission intensity and seasonality, vector species, treatment seeking behavior and anti-malarial interventions. The objectives and data are further detailed below.

The parameter estimation process is a multi-objective optimization problem with each of the epidemiological quantities in Table S3 representing one objective. The aim of the optimization is to find a parameter set that maximizes the goodness of fit by minimizing a loss statistic computed as the weighted sum of the loss functions for each objective. Building a weighted average reduces the multiple loss terms to a single overall loss statistic, defined as:

where ! !" (#) is the loss function for parameter vector #, epidemiological quantity & and dataset ', and the weights ( ! were chosen so that different epidemiological quantities contribute approximately equally to )(#).

For the current calibration, we utilised the loss functions from Smith et al. 2012 (20) , the loss function ! ! (#) for each objective & use either (negative) log-likelihoods or Residual Sum of Squares (RSS) with an unknown minimum. We did not update these loss-functions in order to compare to our previous approaches.

The likelihood functions are given by where the observed values are * # , … , * $ and the model parameters #. In practice, it is easier to work with the log likelihood, namely

The loss functions ! ! (#) used for each objective are detailed in the following sections.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint

Below we described each fitting objective in terms of the data (setting, surveys, observations, references) along with the associated loss function and original references. Table S4 provides an overview of the 61 simulation scenarios used for calibration, and which objective they contribute to.

The data used for the calibration of objective 1 (Age pattern of incidence) consists of eight crosssectional surveys of infection rates by age and EIR in Matsari village, capturing 12 age groups each. Matsari village was monitored entomologically for four years (Nov 1970 during the Garki Project and multiple anti-malaria interventions were administered (78). From October 1970 to March 1972 (the baseline / pre-intervention phase), eight cross-sectional malariologic surveys of the whole village population and intensive entomologic surveillance (human bait collection of mosquitoes and dissections of the mosquito salivary glands for sporozoites) were carried out. The latter was used to estimate a baseline transmission intensity of 67 inoculations per person per year (EIR) and to derive seasonal transmission patterns. Mid-1972 marked the beginning of the intervention phase, during which an additional eight surveys were carried out at 10-week intervals (surveys 9-16). During this time, indoor residual spraying with Propoxur was carried out comprehensively in the village, along with mass treatment of the population with Sulfadoxine-pyrimethamine at 10 week-intervals immediately after assessment of individuals' parasitologic status. The experimental setup is summarised in Figure 3 of Smith et al 2006 (69) . Incidence data (number of patent infections and number of hosts by age) from surveys 9-16 was used for our calibration.

Sites and scenario numbers: Matsari, Nigeria (30) Original reference detailing data and model fits: Smith TA, Maire N, Dietz K, Killeen GF, Vounatsou P et al. Relationship between the entomological inoculation rate and the force of infection for Plasmodium falciparum malaria. Am J Trop Med Hyg. Volume 75, No. 2 Supplement. 2006 (69) 

We denote the Binomial log likelihood for this objective to be whereis the number of age groups, . the number of surveys, / ",& the scenario data number of parasite positive hosts and 0 ",& the scenario data number of hosts for age group 1 and survey '. 

The data used for the calibration of objective 2 (age-patterns of prevalence) consists of six crosssectional malariology surveys conducted in the Rafin Marke, Matsari, Sugungum villages in Nigeria . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 https://doi.org/10. .01.27.21250484 doi: medRxiv preprint 1970 https://doi.org/10. -1972 (12 age groups each, part of the Garki Project during the pre-intervention period) (78), Navrongo in Ghana 2000 (12 age groups) (79) and Namawala 1990 Namawala -1991 and Idete in Tanzania (11 and 6 age groups, respectively) 1992-1993 (81) . In all study sites, annual transmission intensity (EIR) and seasonal patterns were assessed using light trap or human night bait collections and dissections of the salivary glands (see Figure 2 in Maire et al. 2006 (67) ). In all sites except Idete, the health system at the time of the surveys treated only a small proportion of the clinical malaria episodes. In the Idete, the village dispensary was assumed to treat approximately 64% of clinical malaria (based on the published literature). During simulation, prevalence was defined by comparing each predicted parasite density with the limit of detection used in the actual study.

Sites and scenario numbers: Sugungum, Nigeria (24); Rafin-Marke, Nigeria (28); Matsari, Nigeria (29); Idete, Tanzania (31); Navrongo, Ghana (34); Namawala, Tanzania (35) Original reference detailing data and model fits: Maire N, Smith TA, Ross A, Owusu-Agyei S, Dietz K, et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg. Volume 75, No. 2 Supplement. 2006 (67) 

We denote the binomial log likelihood for each scenario of this objective to be whereis the number of age groups, . the number of surveys, 3 ",& the scenario data number of parasite positive hosts and 0 ",& the scenario data number of hosts for age group k and survey j. 

The same data sources as for objective 2 (age pattern of prevalence) were used for calibration of objective 3 (age pattern of parasite density). Parasite densities in sites that were part of the Garki project (Sugungum, Rafin-Make and Matsari, Nigeria) were recorded by scanning a predetermined number of microscope fields on the thick blood film and recording how many had one or more asexual parasites visible. These were converted to numbers of parasites visible by assuming Poisson distribution for the number of parasites per field and a blood volume of 0.5 mm 3 per 200 fields. In the other studies (Idete and Namawala, Tanzania and Navrongo, Ghana), parasites were counted against leukocytes and converted to nominal parasites/microliter assuming the usual standard of 8,000 leukocytes/microliter. The biases in density estimates resulting from these different techniques were accounted for by multiplying the observed parasite densities with constant values estimated for Garki (5 ) ) and non-Garki (5 # ) studies to rescale them to the values in malariatherapy patients (82).

Sites and scenario numbers: Sugungum, Nigeria (pre-intervention, 24); Rafin-Marke, Nigeria (preintervention, 28); Matsari, Nigeria (pre-intervention, 29); Idete, Tanzania (31); Navrongo, Ghana (34); Namawala, Tanzania (35) ) , (*) = log ℒ(*) = ' ' 5 ",) log:6 " , 7= + :; ",) − 5 ",) = 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10. 1101 /2021 Original reference detailing data and model fits: Maire N, Smith TA, Ross A, Owusu-Agyei S, Dietz K, et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg. Volume 75, No. 2 Supplement. 2006 (67) 

For objective 3 (age pattern of parasite densities) we denote the log-Normal log likelihood for each scenario to be where 6 is the number of observations in the data set, 7 = exp(−0.5 log(2 D)), a constant from the log-normal likelihood, RSS is the residual sum of squares given by and E is the standard deviation given by Here, 5 is the appropriate density bias, which is a fitting parameter,is the number of age groups, . is the number of surveys, 3 ",& the scenario number of parasite positive hosts, and F ",& the sum of the log densities, 3 ',& 4 the predicted number of parasite positive hosts and F ',& 4 the predicted sum of the log densities for age group 1 and survey '. The density bias are fitting parameters 5 ) and 5 # .

For objective 4 (age pattern of number of concurrent infections), the dataset from Navrongo, Ghana (also used in the calibration of objectives 2 and 3) was used to calibrate to the total numbers of distinct parasite infections in one individual in each age group, and at each survey. Distinct infections were detected by polymerase chain reaction-restriction fragment length polymorphism in the sampled individuals.

Sites and scenario numbers: Navrongo, Ghana ( 

Assuming that both the data and the simulations are Poisson distributed about the correct value and thereby also allowing for over-dispersion, we denote the Poisson log likelihood for each scenario to be for the objective of age pattern of number of concurrent infections to be whereis the number of age groups, . the number of surveys, 36 ",& the scenario data total patent infections for age group 1 and survey '. Parameter G ",& is associated with the model predictions and is given by 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10. 1101 /2021 where 36 ',& H are the predicted total of patent infections and 0 ',& 4 the predicted number of hosts for age group 1 and survey ' and 0 ",& is the scenario data number of hosts for age group 1 and survey '.

Two distinct datasets representing three study sites (Table S5) were used for the calibration of objective 5 and objective 6 (age pattern of incidence of clinical malaria). For Objective 5, the dataset contains data on the age pattern of clinical episodes in the villages of Ndiop and Dielmo in Senegal (83, 84) . During the study period of July 1990 -June 1992, the village populations were visited daily to detect and treat any clinical malaria attacks with quinine. Cases were detected by reporting of symptoms (fever) during daily active case detection and subsequent thick blood smear microscopy. Only symptomatic individuals (axillary temperature ≥ 38.0°C or rectal temperature ≥ 38.5°C). Due to the active case detection and rapid treatment all symptomatic episodes are assumed to be effectively treated in these villages during the study period. No effective treatment of clinical malaria was assumed prior to the study period. The annual patterns of transmission were replicated as reported by Charlwood et al (1998) (85). A proportion 3 * =35.75% are assumed to be treated effectively in Idete. As all individuals reporting to the village dispensary were treated presumptively with chloroquine, this proportion corresponds to the proportion of episodes reported to the village dispensary.

Sites and scenario numbers: Ndiop, Senegal (232) 

We denote a loss function based on biased residual sum of squares:

whereis the number of age groups, . the number of surveys, . # the initial survey number, and M is the residual given by 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. 3.1.6 Age pattern of incidence of clinical malaria: infants

Objective 6 (age pattern of incidence of clinical malaria in infants) is informed by a dataset on incidence that contains passive case detection data on the age-incidence in infants recorded at the health centre in Idete, Tanzania from June 1993-October 1994 (81 

The loss function for Objective 6 is the same as Objective 5. For scenario 49 (Idete, Tanzania) the bias is P = 0.357459 and represents the proportion of episodes reported to the village dispensary.

3.1.7 Age pattern of threshold parasite density for clinical attacks

Objective 7 (Age pattern of threshold parasite density for clinical attacks), uses the dataset from Dielmo, Senegal (see objective 5) for calibration. The pyrogenic threshold in the (OpenMalaria) predictions is output as the sum of the log threshold values across age groups. The pyrogenic threshold per age group is given as the parasite:leucocyte ratio for recorded incidence of disease. To adjust these densities to the same scale as that used in fitting the simulation model to other datasets, the parasite:leukocyte ratios were multiplied by a factor of 1,416 to give a notional density in parasites/microliter of blood. This number was derived as follows: Parasites were counted against leukocytes and converted to nominal parasites/microliter assuming the usual (though biased) standard of 8,000 leukocytes/microliter. The biases in density estimates resulting from these different techniques was accounted for by multiplying the observed parasite densities with constant values estimated for Garki (5 ) ) and non-Garki (5 # ) studies to rescale them to the values in malariatherapy patients (82).The value 1416 comes from where the original 5 # ≈ 0.18.

Sites and scenario numbers: Dielmo, Senegal (234)

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10. 1101 /2021 Original reference detailing data and model fits: Smith TA, Ross A, Maire N, Rogier C, Trape J-F et al. An epidemiologic model of the incidence of acute illness in Plasmodium falciparum malaria. Am J Trop Med Hyg. Volume 75, No. 2 Supplement. 2006 (74) 3. 1 

For the objective 7 (Age pattern of threshold parasite density for clinical attacks) we denote a residual sum of squares loss function given by (13) with where F * is the observed pyrogenic threshold, F * 4 are the predicted sum log pyrogenic threshold, 0 ',& 4 the predicted number of hosts for age group 1 and survey ' and is a bias related to the scenario. Here, this bias is related to the log parasite/leucocyte ratio and thus P = 1/(80005 # ) where 5 # is the non-Garki density bias.

Data on the relative incidence of severe malaria-related morbidity and mortality in children <9 years old across different transmission intensities were originally collated by Marsh and Snow (1999) (86) ( Table 4 ). Data measurements per age group were available as the relative risk of severe disease compared to age group 1 and the proportion/prevalence of severe episodes. A total of 26 entries on the relationship between severe malaria hospital admission rates and P. falciparum prevalence were used to calibrate objective 8 (Hospitalisation rate in relation to prevalence in children), each represented in a separate simulation scenario, with one observation per scenario. These are summarised in Table S6 . To obtain a continuous function relating hospital incidence rates to prevalence, linear interpolation between data points was performed. To convert hospital incidence rates to community severe malaria incidence, the hospital admission rates was divided by the assumed proportion of severe episodes representing to hospital (48%). There was assumed to be no effective treatment of uncomplicated malaria episodes or malaria mortality.

Sites and scenario numbers: Bo, Sierra Leone (501); Niakhar, Senegal (502), Farafenni, The Gambia (503); Areas I-V, The Gambia (504-508); Gihanga, Burundi (509) 

The loss function is denoted as the log of residual sum of squares

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. 3.1.9 Age pattern of hospitalization: severe malaria

For objective 9 (Age pattern of hospitalisation), a subset of the data collated by Marsh and Snow (1999) (86) (see objective 8) is used. Detailed age-specific severe hospital admission rates were available for 5 of the sites (Table S7 ). The patterns of incidence by age were summarised by age in 1-4 and 5-9 year-old children and compared with 1-11 month old infants by calculating the relative risk.

Of the five sites, four were selected for fitting objective 9 based on the predicted prevalence. Baku, The Gambia was excluded as the very low (2%) prevalence here could not be matched.

Sites and scenario number(s): Area V, The Gambia (158) 1992-95 1990-95 1992-96 1992,1994-96 . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 Person-years exposure to risk of children aged 0-9 yr 23468 52675 45967 40064 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021.

3.1.10 Malaria specific mortality in children (< 5 years old)

For objective 10 (Malaria specific mortality in children (<5 years old)), a subset of the data collated by Marsh and Snow (1999) (86) (see objective 8) was used (88). Mortality data were derived from verbal autopsy studies in sites with prospective demographic surveillance and were adjusted for the effect of malaria transmission intensity on the sensitivity and specificity of the cause of death determination. The odds ratio for death of a case in the community relative to that in hospital was estimated by fitting to the malaria-specific mortality rates in children less than five years of age assuming the published hospital case fatality rate. Nine sites for which both malaria-specific mortality rates and seasonal transmission patterns were available were included for calibration.

There is one observation per study site and simulation scenario, and predicted values are for one survey at the end of 2 years. Med Hyg. Volume 75, No. 2 Supplement. 2006 (75) 3. 1 

For objective 10 on Malaria specific mortality in children, the loss function minimizes the log sum of squares where bcM # is the observed direct mortality rate for age group 1 (0-5 years) and bcM # H is the predicted direct mortality rate for age group 1. The predicted direct mortality rate is given by where bb # H is the number of direct malaria deaths cases predicted for age group 1 and 0 # 4 the total number of predicted hosts for age group 1. The division by 2 is to convert to yearly rate as the survey was conducted at the end of 2 years.

For objective 11 (indirect malaria infant mortality rate), a subset of the data collated by Marsh and Snow (1999) (86) (see objective 8) was used. These constitute a library of sites for which entomologic data were collected at least monthly and all-cause infant mortality rates (IMR) were available. There is one observation per scenario: all cause infant mortality rate (returned as a single number over whole intervention period).

Sites and scenario number(s): Bo, Sierra Leone (401); Niakhar, Senegal (402); Area V, The Gambia (408); Karangasso, Burkina Faso (411); Manhica, Mozambique (414); Namawala, Tanzania (415);

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101 https://doi.org/10. /2021 

Bo, Sierra Leone Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Area I, The Gambia Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86) 505

Area II, The Gambia Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Area III, The Gambia Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Area IV, The Gambia Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Area V, The Gambia Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Gihanga, Burundi Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Katumba, Burundi Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Karangasso, Burkina Faso Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Kilifi North, Kenya Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Manhica, Mozambique Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86) 515 Namawala, Tanzania Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old. Preintervention

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Navrongo, Ghana Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Saradidi, Kenya Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Yombo, Tanzania Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Ziniare, Burkina Faso Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Matsari, Nigeria Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old. Preintervention

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

ITC control, Burkina Faso Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Mlomp, Senegal Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Ganvie, Benin Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence

Marsh and Snow 1999 (86) Kilifi Town, Kenya Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Chonyi, Kenya Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Bandafassi, Senegal Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86)

Kongodjan, Burkina Faso Point estimate of the severe malaria hospital admission rate and P.falciparum prevalence in children <9 years old.

Severe episodes by prevalence (7) Marsh and Snow 1999 (86) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S3 . GP emulator performance. Emulator predictions vs true values on a holdout set compromising 10% of initial samples in iteration 30 (final iteration). w.sum is the weighted sum !, of the 11 objectives. Here, predictions are generated as the weighted sum of individual objective predictions. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S4 . GPSG emulator performance. Emulator predictions vs true values on a holdout set compromising 10% of initial samples in iteration 1. w.sum is the weighted sum !, of the 11 objectives.

Here, predictions are generated as the weighted sum of individual objective predictions. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S5 . GPSG emulator performance. Emulator predictions vs true values on a holdout set compromising 10% of initial samples in iteration 23 (final iteration). w.sum is the weighted sum !, of the 11 objectives. Here, predictions are generated as the weighted sum of individual objective predictions. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint 

36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted 36% predicted total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence total incidence 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S14 . Objective 6. Age pattern of threshold parasite density for clinical attacks. Final simulator fit using the parameter sets yielded using GP-BO and GPSG-BO compared to the previous parameterization (derived using optimization with a genetic algorithm, GA-O). 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S17. Objective 9: Direct mortality in children <5 years old. Final simulator fit using the parameter sets yielded using GP-BO and GPSG-BO compared to the previous parameterization (derived using optimization with a genetic algorithm, GA-O). 

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint

A B Figure S19 . Data recovery validation of posterior estimates. Prior distributions of each parameter and parameter value identified by the optimization algorithm. The final parameter set was used to generate synthetic field data by simulating each of the 61 scenarios with the respective core . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint parameter sets. The simulation outputs were reformatted to match the original field data, generating a synthetic field data set. The optimization with both algorithms was repeated using this synthetic field data. The plot shows the best parameter values in each dimension identified at the end of the validation optimization compared to the values identified in the original optimization. The grey area shows the prior distribution. A. GP-BO validation. B. GPSG-BO validation . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint 8 OPENMALARIA SIMULATED EPIDEMIOLOGY Figure S20 . Seasonal pattern assumed for subsequent analyses. The monthly transmission intensity is equivalent to the annual transmission intensity (EIR) scaled by these values and forced to sum to the annual EIR. Figure S21 . Relationship between EIR and PfPR2-10 under three parameterizations. Solid lines show medians and shaded regions show 95% credible intervals. EIR denotes the entomological inoculation rate.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S22 . Yearly incidence of clinical (uncomplicated) malaria as a function of PfPR2-10 displayed by parameterization and age group. Clinical incidence is presented in terms of the yearly number of events per person. We assume a probability of effective treatment within 14 days of uncomplicated malaria of 36% Figure S23 . Yearly incidence of total severe malaria as a function of PfPR2-10, displayed by parameterization and age group. Incidence is presented in terms of the yearly number of events in a population of 1000 individuals. It is assumed that 48% of severe malaria cases seek official care at a . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint heath care facility (hospital). We assume a probability of effective treatment within 14 days of uncomplicated malaria of 36% Figure S24 . Yearly number of malaria-related deaths as a function of PfPR2-10, displayed by parameterization and age group. Malaria mortality incidence is presented in terms of the yearly number of deaths in a population of 1000 individuals. For the OpenMalaria model both deaths directly attributed to malaria (dotted curve) and all deaths associated with malaria (including both deaths directly attributable to malaria and those associated with comorbidities) are shown (full line). See Box S1.2 for definitions of deaths attributable to malaria in the models . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S25 . Yearly incidence of clinical malaria in a seasonal transmission setting as a function of age, displayed by transmission intensity (PfPR2-10) and parameterization. Clinical incidence is presented in terms of the yearly number of events per person. The PfPR2-10 categories include simulated prevalences of 2.5-3.5%, 9-10%, 28-32%, and 47-53% labeled as 3%, 10%, 30%, and 50%, respectively.

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S26 . Yearly incidence of clinical malaria in a perennial transmission setting as a function of age, displayed by transmission intensity (PfPR2-10) and parameterization. Clinical incidence is presented in terms of the yearly number of events per person. The PfPR2-10 categories include simulated prevalences of 2.5-3.5%, 9-10%, 28-32%, and 47-53% labeled as 3%, 10%, 30%, and 50%, respectively . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S27 . Yearly incidence of total severe malaria in a seasonal transmission setting as a function of age, displayed by transmission intensity (PfPR2-10) and parameterization. Incidence is presented in terms of the yearly number of events per 1000 person-years. It is assumed that 48% of severe malaria cases seek official care at a heath care facility (hospital). The PfPR2-10 categories include simulated prevalences of 2.5-3.5%, 9-10%, 28-32%, and 47-53% labeled as 3%, 10%, 30%, and 50%, respectively . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S28 . Yearly incidence of total severe malaria in a perennial transmission setting as a function of age, displayed by transmission intensity (PfPR2-10) and parameterization. Incidence is presented in terms of the yearly number of events per 1000 person-years. It is assumed that 48% of severe malaria cases seek official care at a heath care facility (hospital). The PfPR2-10 categories include simulated prevalences of 2.5-3.5%, 9-10%, 28-32%, and 47-53% labeled as 3%, 10%, 30%, and 50%, respectively . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S29 . Yearly incidence of malaria-related deaths in a seasonal transmission setting as a function of age, displayed by transmission intensity (PfPR2-10) and parameterization. Malaria mortality incidence is presented in terms of the yearly number of deaths in a population of 1000 individuals. The dashed estimates represent direct malaria deaths, and the solid all malaria deaths (including those attributable to co-morbidities). The PfPR2-10 categories include simulated prevalences of 2.5-3.5%, 9-10%, 28-32%, and 47-53% labeled as 3%, 10%, 30%, and 50%, respectively . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10.1101/2021.01.27.21250484 doi: medRxiv preprint Figure S30 . Yearly incidence of malaria-related deaths in a perennial transmission setting as a function of age, displayed by transmission intensity (PfPR2-10) and parameterization. Malaria mortality incidence is presented in terms of the yearly number of deaths in a population of 1000 individuals. The dashed estimates represent direct malaria deaths, and the solid all malaria deaths (including those attributable to co-morbidities). The PfPR2-10 categories include simulated prevalences of 2.5-3.5%, 9-10%, 28-32%, and 47-53% labeled as 3%, 10%, 30%, and 50%, respectively . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The copyright holder for this preprint this version posted January 29, 2021. ; https://doi.org/10. 1101 /2021 

Individual-based models in ecology after four decades. Rates All-cause malaria

Defined as child admitted with primary diagnosis of malaria and Blantyre coma score of 2 or less. Defined in child with primary diagnosis of malaria and haemoglobin of 5.0g/dL or less on admission. Rates for Siaya derived from person-years exposure to risk and admissions for period

An epidemiologic model of severe morbidity and mortality caused by Plasmodium falciparum

. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)

The loss function minimises the log sum of squares:where &bcM # the observed indirect mortality rate for age group 1 and &bcM # H is the predicted indirect mortality rate for age group 1.. CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) Table S3 : Epidemiological quantities and data sources used for parameterizing models. (a) Some scenarios are used to predict several outcomes, so the total of this column does not equal the total of 61 scenarios involved in fitting the models. (b) The number of data points is the sum over all scenarios and simulated survey periods of the number of age groups into which the data were disaggregated for comparison with the model predictions. (c) In relation to the EIR specified as a seasonal pattern. (d) Model predictions for this objective are compared with linear interpolations between the field data points. *The number of data points is the sum over all scenarios and simulated survey periods of the number of age groups into which the data were disaggregated for comparison with the model predictions. 

Rafin-Marke, Nigeria (preintervention phase) 8 cross sectional surveys of entire village population at 10week intervals (2,593 blood slides)Age-prevalence (2); Age-parasite densities (3) Molineaux and Gramiccia. 1980 (78) 29Matsari, Nigeria (preintervention phase) 8 cross sectional surveys of entire village population at 10week intervals (2,963 blood slides) Age-prevalence(2); Age-parasite densities (3) Molineaux and Gramiccia. 1980 (78) 30Matsari, Nigeria (intervention phase) 8 cross sectional surveys of entire village population at 10week intervals (2,663 blood slides) Age-incidence of patent infections (1) Molineaux and Gramiccia. 1980 (78) 

Idete, Tanzania Surveillance of a rolling cohort of infants (1,382 blood slides over 16 months 

Saradidi, Kenya 21 cohorts each of approximately 50 children between 6 months and 6 years of age whose parasites were cleared and who were then followed up with 2 weekly surveys. Hospitalisation rate by age.Age pattern of severe hospitalisation (8) Beier et al. 1999 (91) , Snow 1997 (87) 173Ganvie, Benin Hospitalisation rate by age. Age pattern of severe hospitalisation (8) Snow et al. 1997 (87) 

Bandafassi, Senegal Hospitalisation rate by age. Age pattern of severe hospitalisation (8) Snow et al. 1997 (87) 232Ndiop, Senegal Longitudinal study of 350 permanent residents over 2 years: Individual level active case detection three times a week (questionnaire + recording of symptoms) and parasitologic surveys twice a week; daily recording of new fever cases at compound level. By age group (9 groups) 

Saradidi, Kenya 21 cohorts each of approximately 50 children between 6 months and 6 years of age whose parasites were cleared and who were then followed up with 2 weekly surveys. 

Bo, Sierra Leone Point estimates of all-cause neonatal, post-neonatal, and infant mortality rates All-cause mortality (10) Barnish et al. 1993 (92)