key: cord-0189194-r9dkp6hw authors: Perumal, Rylan; Zyl, Terence L van title: Surrogate Assisted Strategies (The Parameterisation of an Infectious Disease Agent-Based Model) date: 2021-08-19 journal: nan DOI: nan sha: 09e24077be0ea7be971b565b5cf2ce3f7105fdb9 doc_id: 189194 cord_uid: r9dkp6hw Parameter calibration is a significant challenge in agent-based modelling and simulation (ABMS). An agent-based model's (ABM) complexity grows as the number of parameters required to be calibrated increases. This parameter expansion leads to the ABMS equivalent of the say{curse of dimensionality}. In particular, infeasible computational requirements searching an infinite parameter space. We propose a more comprehensive and adaptive ABMS Framework that can effectively swap out parameterisation strategies and surrogate models to parameterise an infectious disease ABM. This framework allows us to evaluate different strategy-surrogate combinations' performance in accuracy and efficiency (speedup). We show that we achieve better than parity in accuracy across the surrogate assisted sampling strategies and the baselines. Also, we identify that the Metric Stochastic Response Surface strategy combined with the Support Vector Machine surrogate is the best overall in getting closest to the true synthetic parameters. Also, we show that DYnamic COOrdindate Search Using Response Surface Models with XGBoost as a surrogate attains in combination the highest probability of approximating a cumulative synthetic daily infection data distribution and achieves the most significant speedup with regards to our analysis. Lastly, we show in a real-world setting that DYCORS XGBoost and MSRS SVM can approximate the real world cumulative daily infection distribution with $97.12$% and $96.75$% similarity respectively. Agent-based models (ABMs) offer the possibility to model many complex real-world scenarios [Macal and North, 2009 ]. These scenarios range from modelling the spread of an epidemic within a population to modelling trends based on agents' behaviour in the stock market. The number of model parameters to be calibrated to match real-world data rises with increased model complexity [Miksch et al., 2019] . As a result of the larger parameter space, searching for meaningful values can become computationally prohibitive. Machine learning (ML) models, namely surrogate models (SMs), can expedite the search of any parameter space and, in so doing, improve both the accuracy and efficiency of ABMs. SMs are used to classify whether a candidate parameterisation is a good parameter combination, allowing ABMs to match real-world data [Lamperti et al., 2018] . Other uses of surrogate optimisation strategies include the global optimisation of expensive multimodal functions where derivatives are unavailable Shoemaker, 2007, 2013] . Expanding on previous research, we present an improved Framework to facilitate parameterisation of infectious disease ABMs in Figure 1 . We further implement a new surrogate assisted sampling strategy and a surrogate assisted evolutionary parameterisation strategy within this framework. We modified and re-implemented the MSRS and DYCORS surrogate optimisation strategies to enable ABM parameterisation within our framework. Lastly, we select the best strategy-surrogate combinations in terms of accuracy and efficiency (speedup) and minimisation of Standardised L 2 Norm and use each to parameterise the ABM towards a real-world cumulative daily infection distribution. Our results obtained show: 1. We obtains significant speedup between two to fours times above the baseline models using surrogate assisted data drive optimisation. 2. We are able to achieve better than parity accuracy across multiple parameters using surrogate assisted strategies when compared to the baselines. 3. The MSRS SVM strategy-surrogate combination is able to minimise the distance to the synthetic ABM parameters the best overall. 4. The best overall method in terms of both accuracy and speedup is DYCORS XGBoost. 5. DYCORS XGBoost and MSRS SVM used within our ABMS Framework when tested on real-world cumulative daily infection distribution for South Africa are able to get approximately 97.12% and 96.75% accuracy respectively. Agent-based modelling and simulation (ABMS) is an effective natural fit for modelling infectious diseases. Agent-based models (ABMs) are capable of modelling interactions between individuals and their environment. Further, they can capture unexpected emergent patterns and trends during an epidemic that result from collective individual agent behaviours and interactions [Hunter et al., 2017] . Each agent within an ABM can have different characteristics, more closely representing the variation in human populations. The agents act autonomously, governed by the interaction of their set of pre-defined rules and distinctive characteristics. This autonomy allows ABMs to simulate many complex real-world scenarios with sufficient fidelity [Macal and North, 2009] . ABMS can also be used as a substitute for a real-world epidemiological study since it can often be infeasible or even impossible to run a real-world experiment [Bonabeau, 2002 , Tracy et al., 2018 . While there are significant benefits to using ABMs for infectious disease epidemiology, there are equally limitations. ABMs ordinarily require long run times due to the increased computational complexity resulting from agent interactions incorporated into the model [Zhang et al., 2020 , Lamperti et al., 2018 . Additionally, model validation and parameterisation present significant challenges in ABMS, precisely when matching real-world data [Miksch et al., 2019] . Of these two challenges, we are particularly interested in the parameterisation of ABMs. Difficulty finding correct parameter combinations of ABMs leads to extensive calibration efforts resulting in increased model development time. As more complexity is added to the model, the parameter space expands, leading to the ABMS equivalent of the "curse of dimensionality" problem. The outcome is impractical memory and computational costs when searching for meaningful parameter combinations [Grazzini et al., 2017 , Walters et al., 2018 . One approach to overcoming the computational limitations of ABMS is the use of surrogate models (SMs). SMs are machine learning (ML) models that act as function approximators to an agent-based model. Additionally, SMs provide a computationally tractable solution to addressing parameter sensitivity analysis, robust analysis and empirical validation in ABMS [van der Hoog, 2019]. These properties make SMs appealing for approximating complex ABMs that are computationally expensive to validate and calibrate. Previously, Gaussian process regression, also known as the Kriging method, has been used as a surrogate modelling approach to facilitate parameter space exploration and sensitivity analysis challenges in ABMS [Lamperti et al., 2018 , Zhang et al., 2020 . However, Kriging's performance is dependent on the model's ability to estimate the spatial continuity of the data [Bargigli et al., 2018 , Dosi et al., 2018 . Lamperti et al. [2018] presented an alternate approach, overcoming some of the limitations of the Kriging method. An iterative algorithm is proposed for training a SM to effectively approximate the ABM. The novel approach is realised by combining ML and intelligent iterative sampling. It is demonstrated that a model's parameter space can be effectively searched utilising fewer computational resources adopting their approach. In their work the XGBoost ML algorithm is used, where the SM is built in a stage-wise fashion, allowing optimisation of an arbitrary differentiable loss function. This method has also been applied to an Asset Pricing Model by Brock and Hommes [1998] and a Island Growth model by Fagiolo and Dosi [2003] . The results obtained show that the SM is an accurate function approximator of the ABM. Further, the SM radically reduced the computation time for large-scale parameter space calibration and exploration. Zhang et al. [2020] improve on the work of Lamperti et al. [2018] replacing XGBoost with the CatBoost algorithm. They show the surrogate can better approximate the ABM and as a result further reduce parameter calibration and exploration time. Many real-world optimisation problems involve highdimensional black-box functions that are outputs of computationally expensive simulations. Generally, finding the global optimum of these problems is unrealistic as it requires a significant amount of function evaluations [Regis and Shoemaker, 2013] . It is often the case that the derivatives of these black-box functions aren't available. Therefore, derivative-free strategies have been developed to allow for their optimisation. In particular, we are interested in derivative-free optimisation and derivative-free heuristic methods. A common approach to derivative-free optimisation is the use of surrogate models (SMs) or metamodels. Regis and Shoemaker [2007] present a method for the global optimisation of expensive multimodal functions. They use a response surface model (radial basis function and neural network) as a SM. The method iteratively uses the surrogate to approximate the output of the expensive multimodal function and then selects the best potential candidate. The candidate is selected based on two criteria: the estimated response from the SM and the minimum distance to the previously evaluated points. The results presented indicate that this method is a promising solution for the optimisation of expensive high dimensional problems. Regis and Shoemaker [2013] combines a radial basis function SMs and dynamic coordinate search for the global optimisation of computationally expensive functions. This is an extension of the previous work presented by Regis and Shoemaker [2007] . They present two algorithms where they show that their work is an improvement on classical approaches, especially for high dimensional problems. These surrogate optimisation approaches are appealing as they provide an abstraction towards addressing the parameterisation challenges in ABMs. Specifically, in epidemiology, the accurate and efficient parameterisation of infectious disease models is imperative. Figure 1 represents the improved ABMS framework inspired by the algorithm of Lamperti et al. [2018] and Perumal and van Zyl [2020] . Our framework starts of by setting an initial configuration for the parameterisation task. The details of the initial configuration is as follows: selection of a sampling method, which will be used generate candidate parameters from the parameter space; selection of a machine learning algorithm that will construct the SM; selection of a paramterisation strategy; actual (real/synthetic) data distribution we are parameterising towards; setting the ABM MIN/MAX Budget limits which represent the minimum /maximum number of samples required to be evaluated by the agent-based model; defining an initial level of significance for the Kolmogorov-Smirnov Test; lastly, we define a confidence criteria for a given SM. Our proposed ABMS Framework can integrate different surrogates, parameterisation strategies and sampling methods. After initialisation, a pool of candidate parameter vectors is generated utilising the selected sampling method. During initialisation, we exhaust the ABM MIN Budget. We sample a subset of candidates, equal to ABM MIN Budget, from the parameter pool, which the ABM then evaluates. The ABM generates a simulated data distribution based on the input candidate. We compare the similarity between the actual data distribution and the simulated data distribution using Equation 1 and the corresponding candidate is labelled accordingly. The labelled candidates are then included in the ground-truth database. We then construct the SM employing the ground-truth database at the first iteration of the Main Loop . After the SM has been constructed, we execute the strategy. If we are not at the first iteration, we check whether the SM has diverged from the Confidence Criteria. Depending on whether we have diverged from the Confidence Criteria, we either go straight to execute the strategy or update the SM using the newly evaluated batch of candidates. Once the strategy has been executed, we predict the optimal candidate. Depending on whether the Kolmogorov-Smirnov Threshold (similarity threshold between two distributions) or ABM MAX Budget is met, we either stop, or we go into the next iteration of the framework. During the next iteration, a batch of candidates are randomly sampled from the pool, and we continue within the Main Loop. The agent-based model (ABM) used in this framework is a pre-existing model. The ABM used is a continuous space virus spread model, where the disease transmission dynamics follows the basic Susceptible-Infected-Recovered (SIR) framework, proposed by Kermack and McKendrick Kermack and McKendrick [1927] , simulated for 41 days. The SIR framework models the ratio of susceptible, infected and recovered individuals within a population. Also, once infected individuals become aware that they are infected (i.e. they have detected the infection), they are then made immovable to represent being in a state of quarantine (lockdown). The ABM takes the parameters presented in Table 1 as input, simulates an epidemic based on the input and then generates an infection data distribution based on the simulation. Table 1 Table of the ranges for each parameter value of the ABM that we have considered for parameterisation. Parameters 1, 2, 3, and 6 are sampled between the range (0, 1). Parameter 7 is sampled between the range (0, 0.022) to mimic real world interaction as the space defined is bounded by (1, 1). Parameters 4 and 5 are sampled between the range (0, 41) days. The two-sample Kolmogorov-Smirnov (KS) test is used to compare the similarity between the distributions of the actual and simulated data as follows: where x represents the feature we are measuring (number of infected individuals) and F A and F S are the distribution functions of the actual and simulated data respectively. The null hypothesis is H 0 : the two distributions are not the same. The null hypothesis is rejected at a significance level and N = Population Size. A candidate parameter vector generates the simulated data. The vector is labelled as negative if the two distributions are not the same and positive if they are similar. The KS test compares the cumulative distributions of the samples, which must be calculated. The data distributions that we are comparing are time series. However, the empirical KS test is formulated to assess the distance between two independent and identically distributed samples. To use the KS test on our time-series data, we convert the time-series to a cumulative distribution function by first performing a cumulative sum along the time dimension. We then scale the cumulative sum to arrive at a cumulative distribution that maintains the time series's integrity. This normalisation makes our problem scale-invariant to N , which allows us to measure the similarity of the exact epidemic trends between the two scale-invariant time-series distributions. To ensure that our implemented ABMS framework, seen in Figure 1 , is reliable, we conduct a sanity check. Given a set of known optimal parameters, θ * , we use this as input to the constructed ABM. The ABM then generates a synthetic data distribution based on the known parameters, θ * , as output. We subsequently use the synthetic distribution as the actual data distribution in Figure 1 . After that, we execute the ABMS Framework and observe if we can approximate known θ * . We use different parameter combinations of θ * within our framework to assess overall generalisation. The surrogate assisted optimisation methods, Metric Stochastic Response Surface (MSRS) and DYnamic CO-Ordinate Search Using Response Surface Models (DY-CORS) has been modified from the literature. We have incorporated our ABM evaluation and added additional functionality that allows integration with our ABMS Framework. In addition, we implement a new surrogate assisted sampling method and a surrogate assisted evolutionary strategy for the parameterisation of the infectious disease agent-based model. The initial lower and upper bounds of the parameters are reassigned such that each new candidate is generated as a Normal (0, σ 2 ) distributed perturbation around the current best candidate. The value of σ is increased if we have an accurate surrogate model (SM) and decrease if it is inaccurate. A sample of candidates is generated using a random sampling method. The SM predicts the Kolmogorov-Smirnov Test Statistic (KSTS) value of the newly generated candidate points and then computes the previously evaluated candidates' distance. The predicted KSTS value represents the expected similarity between the actual data distribution and simulated distribution as if the candidate was evaluated by the agent-based model (ABM). The SM's predictions and the distance to the previously evaluated points are computed and then rescaled through a linear transform on the interval [0, 1]. A candidate that minimises the weight-distance merit function, is selected as the optimal candidate, where s(x) is the SM's prediction of candidate x, d(x) is the minimum distance to a previously seen candidates and 0 ≤ w ≤ 1. The weight w is commonly cycled through a finite set of values in order to encourage exploration and exploitation, we chose w ∈ {0.3, 0.5, 0.7, 0.95}. When w is close to 0, we do exploration, while w close to 1 does exploitation. At the end of each iteration, the predicted best candidate is evaluated using the ABM, and that candidate is added to the ground truth database. A SM is trained and validated with 3-fold cross-validation. We use the ground truth database of evaluated parameter combinations for this task where the best surrogate is selected based on the F1-Score. If the best surrogate's F1-Score is greater than or equal to a specified threshold and the number of positive and negative samples seen in the database is greater than or equal to φ, where φ = (n f olds + n parameters ) + 1, we are then able to proceed with the strategy. A temporary pool of candidate parameters is generated and then classified by the surrogate as positive or negative parameter calibrations compared to the input data distribution. The new candidate pool is then generated using an -greedy algorithm, where = 0.10. We select positively predicted candidates at a rate 1 − and select negatively predicted candidates at a rate from the temporary pool. The purpose of the -greedy algorithm is to encourage exploration whilst still maximising the parameter space's exploitation. As a baseline for this strategy, we use a Random Sampler which generates arbitrary candidates per the ranges presented in Table 1 . The Quasi-Random Sobol sampling method, proposed by Morokoff and Caflisch [1994] , generates low-discrepancysequences of equally distributed points on a n-dimensional hypercube, where n refers to the number of parameters we are parameterising. We utilise this sampling method to generate a pool of candidate parameter vectors. Like the surrogate assisted random sampling strategy, we classify the candidate points as positive or negative and then use an -greedy algorithm to generate a new candidate pool. As a baseline for this strategy, we use a standard Quasi-Random Sobol sampling approach. We introduce a new surrogate assisted parameterisation strategy built upon the Covariance Matrix Adaptation Evolutionary Strategy ((µ/µ, λ)-CMA-ES). We select the parent pool size, λ, to be equal to our Batch Size size. We follow the standard algorithm using the SM to evaluate the fitness for each of the offspring. Once the new elite parents have been selected, we isolate them for our batch to be evaluated by the ABM. The mean of the next generation is calculated using the elite population. The next generation's covariance matrix is calculated using the elite population along with the mean value of the entire population at the current generation. A new set of candidate points are sampled using a Gaussian distribution with the mean and covariance of the next generation [Hansen and Ostermeier, 1996] . We evaluated the following machine learning algorithms for learning surrogate models (SMs): -eXtreme Gradient Boosting (XGBoost): A decision tree ensemble machine learning algorithm, based off the framework by Friedman [2001] , that is scalable and efficient in its implementation [Chen et al., 2015] . -Decision Tree (DT): A classification algorithm based off a tree-like structure, where the leaves represent a feature/attribute and the branches represent the decision rule which leads to the outcome of that decision [Swain and Hauska, 1977] . -Support Vector Machine (SVM): A classification algorithm which finds a hyperplane in an ndimensional space in order to differentiate between different classes of data points [Vapnik, 2013] . We use the newly sampled batch of labelled parameter vectors at each iteration of the ABMS Framework to validate the SM's performance. In order to validate a SM for classification, we use the: In order to validate a SM predicting a real value we use the: where y i is the predicted real value of the ith candidate from the newly evaluated batch,ŷ i is the true value of the ith candidate and B = Batch Size. The following initial configurations are set as: ABM MIN Budget = 500, ABM MAX Budget = 2500, Batch Size = 250, KS Threshold = 0.005 (≈ 99.5% similar to the input distribution). The surrogate confidence criteria is split into two cases: predicted the class label, we used the F1 Score where F1 Score Threshold = 0.90 and predicted the real label (KSTS value), we used RMSE where RMSE Threshold = 0.001. We conducted a total of 126 experiments, averaging each experiment 20 times, where for each average the true parameters were varied and different combinations of parameters were tested. We compared all of the strategies implemented, parametersing 1, . . . , 7 parameters as seen in Table 1 . Each of the surrogate assisted optimisation strategies have been initialised to generate 1000 new samples and perform three iterations every time the strategy is executed within the framework. The real-world dataset we used comes from the COVID-19 Data Repository by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University [Dong et al., 2020] . We are interested in the number of daily infections, particularly in South Africa. However, the dataset only keeps track of the total confirmed infections per day. Therefore, we got the daily infections for the current day by getting the difference between the recorded infections for the next day and the current day (excluding the last day in the dataset). We used a seven-day moving average between 16/06/2020 -06/09/2020 to smooth out any local reporting anomalies. The framework configurations for this experiment are the same as mentioned above. However, the agentbased model's simulation steps have been increased to 83 to match the number of days between the specified dates. The machine used to run our experiments consisted of an Intel Xeon CPU E5-2683 v4 @ 2.10GHz processor with 64 CPUs and 256GB of RAM using the Ubuntu 18.04.4 LTS operating system. In the following tables and figures, we present the results of the above experiment, followed by a discussion of the results. The Standardised L 2 Norm values show the distance between the true (synthetic) parameter vector and the ABMS Framework's optimal prediction. The optimal prediction is the parameter vector (estimate) with the lowest Kolmogorov-Smirnov Test Statistic (KSTS) value. The sanity check assesses whether our ABMS Framework and a strategy-surrogate combination can approximate the parameters that generated the actual distribution. The values presented in Table 2 are relatively close to zero, which implies our sanity check holds. Also, we have highlighted the top three lowest Standardised L 2 Norm values for each of the strategy-surrogate combinations considered. The results show that overall the MSRS strategy can best approximate the synthetic parameters. In particular, MSRS is relatively good at attaining the synthetic parameters for three or more parameters, whereas DYCORS and CMA-ES are relatively good for 1-3 parameters. The values presented in both Table 2 and 3 show us that the Support Vector Machine (SVM) and Decision Tree (DT) surrogate gets a more significant majority of optimal values (i.e. the lowest values), whereas this is not the case for XGBoost. The KSTS values tell us how close we can get to the actual (synthetic) infected data distribution. Table 3 shows that for one and two parameters, we are still in a relatively low dimensional space, and as such, the baselines and the surrogate assisted samplers perform the best. As we increase the dimensionality of the parameter space, the CMA-ES strategy is optimal for three and four parameters. When moving towards a higher dimensional space of five or six parameters, the DY-CORS strategy attains the lowest KSTS value, and for seven parameters, MSRS has the lowest value. Inspecting Table 3 more closely, we note that all techniques can reasonably approximate the synthetic distribution and that no strategy-surrogate combination stands out When parameterising all seven parameters, we see that MSRS SVM can approximate the synthetic parameters the best with the lowest Standardised L 2 Norm value (2.3829) on average. Also, MSRS SVM can get the lowest KSTS value (0.0034). This result implies that MSRS SVM can predict a parameter vector that generates a simulated infection distribution when run through the ABM, most similar to the synthetic infection distribution. The probabilities of reaching an optimal solution (success) within 98% and 99% of the optimal value are captured in Table 4 . Success is defined as a measure of similarity to the synthetic data distribution. For example, success at 98% implies that a method can produce a simulated distribution that is 98% similar to the synthetic distribution that we try to replicate. In the same table, we also show the speedup attained by each strategy-surrogate combination. Across all implemented strategy-surrogate combinations, we obtain a significant speedup compared to the baselines. Table 4 shows that Random, Random DT and DY-CORS XGBoost have the highest probability of reaching success at 98% and 99% in that specific order. Also, the Sobol DT, MSRS XGBoost, and DYCORS XG-Boost strategy-surrogate combinations achieve the most speedup. Although Random attains a high probability of reaching success, it is not efficient in that it provides no speedup compared to the other methods. The DYCORS XGBoost strategy-surrogate combination is optimal when considering both probabilities of reaching success and speedup. One of the limitations is that we can only get 99% accuracy with a 75% probability. This suboptimal probability means that we would need to run the model more than once to ensure the correct outcome. Running more than once would negate some of the speedup achieved. In general, we would prefer utilising a strategy-surrogate combination that can predict an optimal parameter vector that perfectly matches the actual (real) data distribution in terms of accuracy and speedup (DYCORS XGBoost). However, we may also want to utilise a strategysurrogate combination that gets as close as possible to the true (real) parameter vector, generating the actual data distribution (MSRS SVM). In Figures 2 and 3 , we show a comparison between the actual cumulative daily infection distribution and the simulated distribution obtained by the ABMS Framework using DY-CORS XGBoost and MSRS SVM, respectively. The simulated distribution is generated by taking the optimal prediction from both strategy-surrogate combinations and evaluate them through the ABM. We fail to reject the Kolmogorov-Smirnov Test's null hypothesis, as both optimal predictions generated distributions that closely model the real-world distribution. MSRS SVM can get a KSTS value of 0.0325 ≈ 96.75%, similar to the actual distribution. DYCORS XGBoost can get a KSTS value of 0.0288 ≈ 97.12%, similar to the actual distribution. The optimal predicted parameters for each strategy-surrogate combination is shown in Table 5 . The infection period and detection time values at-tained by DYCORS XGBoost are pretty similar to the virus characteristics of COVID-19 1 . We have implemented a more extensive and adaptive agent-based modelling and simulation (ABMS) Framework. Our framework can effectively swap out parameterisation strategies and surrogate models (SMs) to parameterise infectious disease agent-based models (ABMs). We show that in terms of the lowest Kolmogorov-Smirnov Test Statistic (KSTS) values, we achieve better than parity across all parameters compared to the surrogate assisted sampling strategies and the baselines. The Decision Tree (DT) and the Support Vector Machine (SVM) surrogates are on par with each other to attain the lowest KSTS values overall, whereas XGBoost is not reliable. Using the Standardised L 2 Norm values, we show that MSRS SVM is the best strategysurrogate combination overall in terms of getting closest to the true synthetic parameters. One of the significant challenges in ABMS is the time required to parameterise an ABM is cumbersome. We have shown that DY-CORS XGBoost attains the highest probability of replicating the actual (synthetic) data distribution within 98% and 99% success and achieves the most considerable speedup in comparison to the baselines and evaluated strategy-surrogate pairs. DYCORS XGBoost and MSRS SVM were both used to parameterise the infectious disease ABM within our ABMS Framework. DY-CORS XGBoost and MSRS SVM attained an optimal parameter vector that generated cumulative daily infection distributions that are 97.12% and 96.75% similar to the actual (real) distribution, respectively. The real distribution represents the cumulative number of daily infections between the dates 16/06/2020-06/09/2020 in South Africa. Moreover, we observe the DYCORS XGBoost predicts a parameter vector representing the characteristics of COVID-19 better than MSRS SVM. Our future work aims to improve the ABM's complexity and evaluate different types of ABMs within our framework to assess overall robustness. The financial assistance of the National Research Foundation (NRF) towards this research is hereby acknowledged. Opinions expressed and conclusions arrived at, are those of the author and are not necessarily to be attributed to the NRF. Network calibration and metamodeling of a financial accelerator agent based model Agent-based modeling: Methods and techniques for simulating human systems Heterogeneous beliefs and routes to chaos in a simple asset pricing model Xgboost: extreme gradient boosting An interactive web-based dashboard to track covid-19 in real time. The Lancet infectious diseases The effects of labour market reforms upon unemployment and income inequalities: an agent-based model Exploitation, exploration and innovation in a model of endogenous growth with locally interacting agents. Structural Greedy function approximation: a gradient boosting machine Brian Mac Namee, and John D Kelleher. A taxonomy for agent-based models in human infectious disease epidemiology Containing papers of a mathematical and physical character Agent-based model calibration using machine learning surrogates Agent-based modeling and simulation Uwe Siebert, and Nikolas Popper. Why should we apply abm for decision analysis for infectious diseases?-an example for dengue interventions Quasirandom sequences and their discrepancies Surrogate assisted methods for the parameterisation of agent-based models A stochastic radial basis function method for the global optimization of expensive functions Combining radial basis function surrogates and dynamic coordinate search in high-dimensional expensive blackbox optimization. Engineering Optimization The decision tree classifier: Design and potential Dynamically dimensioned search algorithm for computationally efficient watershed model calibration. Water Resources Research Agent-based modeling in public health: current applications and future directions. Annual review of public health Surrogate modelling in (and of) agent-based models: A prospectus The nature of statistical learning theory Modelling the global spread of diseases: A review of current practice and capability Validation and calibration of an agent-based model: A surrogate approach