key: cord-0513099-z5qd50rr
authors: Pesonen, Henri; Simola, Umberto; Kohn-Luque, Alvaro; Vuollekoski, Henri; Lai, Xiaoran; Frigessi, Arnoldo; Kaski, Samuel; Frazier, David T.; Maneesoonthorn, Worapree; Martin, Gael M.; Corander, Jukka
title: ABC of the Future
date: 2021-12-23
journal: nan
DOI: nan
sha: b1cd532153afc8f4323190b5b185c9b29963898a
doc_id: 513099
cord_uid: z5qd50rr

Approximate Bayesian computation (ABC) has advanced in two decades from a seminal idea to a practically applicable inference tool for simulator-based statistical models, which are becoming increasingly popular in many research domains. The computational feasibility of ABC for practical applications has been recently boosted by adopting techniques from machine learning to build surrogate models for the approximate likelihood or posterior and by the introduction of a general-purpose software platform with several advanced features, including automated parallelization. Here we demonstrate the strengths of the advances in ABC by going beyond the typical benchmark examples and considering real applications in astronomy, infectious disease epidemiology, personalised cancer therapy and financial prediction. We anticipate that the emerging success of ABC in producing actual added value and quantitative insights in the real world will continue to inspire a plethora of further applications across different fields of science, social science and technology.

From its humble beginnings over two decades ago, approximate Bayesian computation, or ABC for short, has more recently been met with ever-increasing excitement and is now regarded as one of the most transformative ideas in the statistical sciences. For in-depth reviews of the history of ABC, its theoretical underpinnings and the most recent developments, see [Marin et al., 2012 , Beaumont, 2019 . In particular, deeper theoretical insights into the large-sample behavior of ABC inference have removed some of the main doubts regarding its statistical validity, and the field is moving rapidly towards a unified understanding of the key aspects that impinge on the asymptotic behavior of ABC approximations. [Fearnhead and Prangle, 2012 , Marin et al., 2014 , Green et al., 2015 , Lintusaari et al., 2017 , Frazier et al., 2018 , Li and Fearnhead, 2018a ,b, Frazier et al., 2020 . Nevertheless, the generic application potential of ABC and other likelihood-free inference (LFI) methods has been held back by the computational requirements of its standard inference algorithms and the lack of a suitable all-purpose software implementation. With the advent of more efficient inference strategies adopted from the field of machine learning [Gutmann and Corander, 2016 , Lueckmann et al., 2018 , Thomas et al., 2021 , Kokko et al., 2019 , Cranmer et al., 2020 , Grazian and Fan, 2019 , Papamakarios et al., 2019 and software platforms such as Engine for likelihood-free inference (ELFI) [Lintusaari et al., 2018] , ABCpy , BSL [An et al., 2019a] and sbi [Tejero-Cantero et al., 2020] , to name a few, the immediate prospect of both using and updating the ABC/LFI toolkits for challenging real-world applications certainly looks brighter.

For example, Bayesian optimization for likelihood-free inference (BOLFI), has been shown in several benchmark examples to speed up ABC inference by 3-4 orders of magnitude [Gutmann and Corander, 2016] , and multiple successful applications of it beyond typical benchmarks used in the statistical literature have emerged. These include applications in very diverse research fields, such as inverse reinforcement learning for cognitive user interface models [Kangasrääsiö et al., 2017] , brain task interleaving modeling [Gebhardt et al., 2020] and more general computational models of cognition [Kangasrääsiö et al., 2019] , perturbation modeling and selection in bacterial populations , direct dark matter detection [Simola et al., 2019] , pathogen outbreak modeling [Lintusaari et al., 2019] , sound source localisation [Forbes et al., 2021] , passenger flow estimation in airports [Ebert et al., 2021] , and the modeling of the genetic components that control the transmissibility of pathogens [Shen et al., 2019] . To inspire further methodological development, software engineering and dissemination of ABC and other LFI methods, we present here an array of real applications and discuss both the benefits and challenges that lie ahead for this exciting subfield of statistics.

We briefly introduce the main concepts and the notation used in the following sections, using the simplest form of ABC algorithm for the purpose of illustration. More detailed description of the methods used in the examples, i.e. approximate Bayesian computation-population Monte Carlo (ABC-PMC) [Beaumont et al., 2009] and BOLFI [Gutmann and Corander, 2016] can be found in the following sections.

ABC provides an inferential framework for situations where the likelihood function is not available to be used with the commonly employed methods of undertaking statistical inference (e.g. optimization, importance sampling, Markov chain Monte Carlo (MCMC)). Instead, it is assumed that simulation of the data-generating process is possible. The rejection ABC algorithm [Tavaré et al., 1997 , Pritchard et al., 1999 is the basic formulation of an ABC method. Assuming that the simulator parameter θ ∈ R p is the target of the statistical inference, the rejection ABC algorithm produces N independent samples from an approximate ABC posterior distribution π (θ | y obs ) ∝ f (y | θ)π(θ)I A ,y obs (y )dy ,

where I A ,y obs (·) is the indicator function for the set A ,y obs = {y | d(y obs , y ) ≤ }.

The ABC posterior as defined by (1) is not conditioning on the data exactly, but on a set of artificial data following the distribution of the generative model that is within a tolerance from the observed data. The difference between observed and artificial data is determined via a distance metric d(·, ·). Because the relative volume of A ,y obs becomes vanishingly small when the dimension of the data y obs increases, sampling-based algorithms such as rejection ABC reduce the sample space in order to perform adequately. Traditionally, this is done by defining a set of summary statistics s(·). Ideally, the summarising function would be a sufficient statistic, but this is rarely available in problems with intractable likelihoods. In practice, s(·) are chosen according to a number of different principles, aimed to maximize the informativeness of the summaries in some sense [Joyce and Marjoram, 2008 , Blum, 2010 , Fearnhead and Prangle, 2012 , Drovandi et al., 2015 . The core of the rejection ABC algorithm 1 is simple and straightforward to implement in most programming languages.

θ for i = 1, . . . , N do d = ∞ while d > do Sample from the prior, θ ∼ π(θ) Simulate from the generative model, y ∼ f (y | θ ) Calculate distance, d = d(s(y obs ), s(y )) end while Set θ (i) = θ end for

Although the rejection ABC algorithm is still being used frequently as a comparison method in the ABC and LFI literature, there are few applications where it would not be beneficial to instead take advantage of more sophisticated versions of this basic algorithm [Beaumont et al., 2009 , Blum, 2010 , Csilléry et al., 2010 , Marin et al., 2012 , Moral et al., 2012 , Clarté et al., 2020 , Rodrigues et al., 2020 . The ABC-PMC approach by Beaumont et al. [2009] is an extension of the rejection ABC algorithm based on importance sampling, and aims to improve the efficiency of the procedure by retrieving a sequence of intermediate distributions. The steps of the method are summarized in Algorithm 2. The first iteration of the ABC-PMC algorithm corresponds to the four steps of the basic rejection ABC algorithm with 1 . Starting from the second iteration t ≥ 2, a particle is sampled with replacement from the set of importance weighted particles at iteration t − 1, it is moved using e.g. a Gaussian kernel, and accepted or rejected based on t < t−1 . The importance weight for particle J = 1, . . . , N at iteration t is

where φ(·) is a multivariate Gaussian kernel with mean 0 and identity covariance, and τ t−1 is twice the (weighted) sample covariance of the particles from iteration t − 1, as recommended in Beaumont et al. [2009] .

As a final remark on the ABC-PMC algorithm, both the total number of iterations T and the series of decreasing tolerances 1 > 2 > · · · > T must be selected in advance by the researcher, which might have an impact on the computational performance of the procedure, as well as on the attainability of a suitably accurate approximation of the exact posterior distribution [Silk et al., 2013 . For this reason, rather than defining in advance the series of decreasing tolerances, other choices are possible. In particular, adaptively selecting the tolerance for some iteration t based on a previously defined quantile of the distances of the accepted particles from the previous iteration t − 1 improves the efficiency of the algorithm in terms of how many times the forward model is used [Lenormand et al., 2013 , Weyant et al., 2013 , Ishida et al., 2015 , Cisewski-Kehe et al., 2019 . For these reasons also a tuning step is often necessary in order to balance the trade-off between computational efficiency and the reliability of the inferential results.

Although potentially orders of magnitude more efficient than the basic rejection ABC algorithm, ABC-PMC is based on the idea of identifying relevant regions of the parameter space by finding proposal distributions such that the corresponding simulated datasets are similar within a tolerance to the observed dataset, according to pre-defined summary statistics. As often only weak information is available in advance about what regions are relevant, this requires a large number of datasets to be simulated through the forward model. Reference Lintusaari et al. [2018] notes, for example, that the number of simulated datasets required in the implementation of a sequential ABC sampler like Algorithm 2 is usually at least of order 10 6 . If model simulation itself is heavy, the total computational cost of the sequential algorithm becomes very high.

To reduce the need for a large number of potentially costly model simulations, active learning methods adapt the querying process according to different strategies. BOLFI uses Bayesian optimization [Gutmann and Corander, 2016 ] to construct iteratively a probabilistic surrogate model for the relationship between parameters θ and discrepancies d(θ ) = ρ(s(y ), s(y obs )) using the growing evidence set E t , which consists of pairs {(θ i , d(θ i ))} t i=1 . The original formulation of BOLFI uses a Gaussian process (GP) as the surrogate model for the discrepancy function, and new evidence {(θ t+1 , d(θ t+1 ))} is sampled from relevant regions of the parameter space. Relevant regions are determined to be parts of the space where the discrepancy is small. The probabilistic model is defined as d(θ) | E t ∼ GP (µ t (θ), v t (θ) + σ 2 ), where GP is a Gaussian process with mean and variance functions,

The vector k t (θ) = k(θ, θ 1 ), . . . , k(θ, θ t ) T and matrix K t = [k(θ i , θ j )] ∈ R t×t are defined via covariance functions k(θ , θ ). A common choice for a covariance function is the squared exponential covariance function

where parameters θ j and θ j are the jth elements of θ and θ . When fitting a GP to the evidence set, the hyperparameters σ, σ f , and λ j can be optimized iteratively [Rasmussen and Williams, 2006 ]. In practice, it is not necessary to update the hyperparameters given each additional evidence point; instead, a more efficient strategy is to update them with every T update additional points, for some specified value for T update .

Finally, from d(θ) we can retrieve a suitable pointwise approximation to the likelihood function

where Φ(·) is the Gaussian cumulative density, and h is a threshold parameter which we choose to be the minimum of the mean discrepancy µ t (θ), although other choices are reasonable. The discrepancies in the evidence set can be log-transformed for possibly improving the GP-fit as advised in Gutmann and Corander [2016] . The rest of the steps of the algorithm remain the same.

In practice, an acquisition function is used to determine the locations of relevant parameter space, and there are several reasonable choices for it. Some are directly based on optimizing the density fitting [Järvenpää et al., 2019] while some are efficient for finding the optimum of the function, such as the lower confidence bound selection criterion (LCBSC) [Srinivas et al., 2010 , Brochu et al., 2010 . LCBSC is defined as

where η 2 t = 2 log t d 2 +2 π 2 /3 η is a coefficient depending on the iteration t, the dimension of the parameter space d and the tunable parameter η . The parameter value is obtained by minimizing (6) and varying it randomly to further balance exploration and exploitation of the probabilistic function d(θ). Our approach is to query the simulator at θ t , where

where T N (θ t , σ 2 acq , a L , a U ) is a normal distribution truncated to interval [a L , a U ] with mean θ t = arg min θ A t (θ) and tunable variance σ 2 acq . Variance σ 2 acq control the variation from the LCBSC minimum. The interval [a L , a U ] is the optimization region for the parameter.

BOLFI can be thought as an extension of either an ABC-or a synthetic likelihood (SL)-type method [Wood, 2010] . As an SL-type method, the likelihood surrogate and the prior can be used as a target for an MCMC or a Hamiltonian Monte Carlo (HMC) sampling algorithm to draw an approximate posterior sample of size N sample . A popular HMC sampling algorithm is the no-u-turn sampler (NUTS) [Hoffman and Gelman, 2014] which avoids certain sensitivity issues regarding user-defined parameters which the algorithm sets automatically. In this study we use the BOLFI implementation available in ELFI, and the specific version of it is presented in Algorithm 3.

The features of rejection ABC, i.e. the distance function, the distance threshold and the summarising statistics are still common in many modern ABC algorithms, even if newer methods have achieved superior performance relative to the original formulation of the method.

The basic rejection ABC uses samples generated from the prior, which can be extremely ineffective depending on the informativeness of the prior. ABC-PMC and ABC-sequential Monte Carlo (ABC-SMC) methods [Toni et al., 2009 ] improve on rejection ABC by sequentially constructing better proposal distributions. Sequential ABC methods have further been improved upon e.g. by introducing adaptivity to the threshold selection and the distance function [Prangle, 2017] . Other approaches have been introduced to help with summary statistic design and selection Prangle, 2012, Thomas et al., 2021] .

MCMC procedures can also used to generate samples from the approximate posterior distributions [Marjoram et al., 2003 , Marjoram, 2013 , Vihola and Franks, 2020 , and improving their applicability to high-dimensional problems is an on-going research problem [Rodrigues et al., 2020 , Clarté et al., 2020 .

Parametric approaches such as SL trade the requirement for a distance function and a threshold for a pointwise approximation of the likelihood by a parametric distribution such as a normal distribution. The moments of the approximating likelihood are, in turn, estimated using simulator draws that are possibly summarised [Wood, 2010 , Price et al., 2018 . The parametric approximation of the posterior enables the use of a variety of methods to draw a representative sample of the posterior [An et al., 2019b] . SL methods have also been extended to high-dimensional parameter spaces [Ong et al., 2018] .

Recently, density estimation techniques based on deep neural network architectures have been developed for likelihood-free inference [Grazian and Fan, 2019 , Lueckmann et al., 2018 , Papamakarios et al., 2019 . These approaches are similar to the SL-type methods where the likelihood is approximated with a parametric surrogate, but in the place of the parametric distribution, neural network architectures such as autoregressive flows or emulator networks are used to fit a surrogate model for the likelihood. Furthermore, these methods have also been combined with active learning approaches to minimize the required number of simulator queries.

Simulator-based inference is well suited to infectious disease epidemiology. For example, it has been used to resolve the outbreak dynamics of stochastic birth-death-mutation models [Lintusaari et al., 2019] , and to infer the transmission dynamics of the Ebola haemorrhagic fever outbreak in 1995 in the Democratic Republic of Congo [McKinley et al., 2009 ] and the COVID-19 pandemic [Chinazzi et al., 2020] . In this case study, we demonstrate the application of simulator models to gain insight into the epidemic caused by the emergence of the Ebola virus in West Africa in 2014 as reported by the WHO Ebola Response Team (Team WER) [WHO Ebola Response Team, 2014] to infer the basic reproduction number R 0 , i.e. the mean value of secondary infections caused by an infectee when no control measures are in place. The computational complexity of the recent, individual-based simulator models can be substantial and the pace at which inference can be delivered is of particular importance to determine the severity of the situation, and the predicted progress under different hypotheses. Therefore, active learning-based methods such as BOLFI in Algorithm 3 provide a strategy to minimize the number of simulator queries and to quickly approximate posterior for R 0 .

WHO Ebola Response Team [2014] estimated the basic reproduction number by modeling the incidence of onset I(·) at time t with a Poisson process

which is defined via the basic reproduction number R 0 , the serial interval parameter ω, which is the time difference between the onset of symptoms of the infector and the infectee, and the incidence history I(0), . . . , I(t−1). The serial interval parameter ω was determined heuristically from the contact tracing data. It was approximated as gamma-distributed with mean 15 and coefficient of variation 0.66. Finally, the time interval [0, T exp ] is the initial period of exponential growth of infections. In this case, the likelihood for the observed time series can be written as

and the posterior distribution can solved analytically when R 0 is modeled as gamma-distributed a priori. However, Britton and Tomba [2019] argued that even though this population level model has good predictive power, it would tend to lead to underestimation of R 0 . The main causes of the underestimation in the model are the use of the serial intervals instead of generation times which indicate the time between infections of the infectee and infector, and the handling of the contact tracing when there are multiple possible infectors. Both of these aspects, plus other complex factors in the observation process, are simpler to take into account when using a forward simulator-based modeling approach, such as the one described in the next subsection.

We simulate an outbreak of a virus in a homogeneous and infinite community. The general simulation model follows closely the description in Britton and Tomba [2019] . The main difference from the model used by WHO Ebola Response Team [2014] , is that the forward simulation program generates the state of each infected individual as described by the diagram in Figure  1 instead of via a population level Poisson-process model. Each infected individual will be in one of four possible states: latent, infectious, recovered or perished. After the initial infection, an individual enters the latent period t lat of the infection. It is modeled as t lat ∼ Γ(2, 5), where Γ(α, β) is a gamma-distribution with shape α and scale β. The latent period is followed by the infectious period t inf ∼ Γ(1, 5) in which the transfer of the infection to other individuals is possible. The probability of a new infection after s time units since the initial infection follows the infection rate R 0 f G (s) defined by R 0 and the generation time distribution f G (·). The individual survives the infectious period with probability p reco = 0.3 after a recovery period of t reco ∼ Γ(4, 3), or perishes with probability (1-p reco ) after a period of t die ∼ Γ(4/9, 9). For simplicity, we assume that there is no delay in reporting the infection once symptoms arise, and that all of the cases are reported. Assumptions concerning possible reporting bias could be built into the model in a similar fashion as for the other quantities. The time period between infection and symptoms is defined as the incubation period which is the latent period multiplied by an incubation factor γ ∼ Uni (0.8, 1.2).

The simulation always starts with a single infected individual and iterates in steps of ∆t = 0.2 days until either 104 weeks pass or 100, 000 individuals have been infected. During each time step the status of each infected individual is checked, and if the current status is infectious, a new individual is infected with probability p inf = ∆t/∆T , where ∆T =t inf /R 0 is the mean time between infections andt inf is the mean duration of infectivity.

The observed data consisted of the daily cumulative count of confirmed cases from the beginning of the 2014 epidemic for which we infer R 0 . We used the same initial time periods [0, T exp ] for estimating R 0 as used in WHO Ebola Response Team [2014] . The inference was performed separately for Guinea and Liberia using data from March 22 -March 30 2014 and June 16 -August 20 2014, respectively. Subsequent data from the following days were used to illustrate After the initial infection, the subject is in latent stage and becomes infectious after that. The infectious period ends either in recovery or in death. Panel B: The generative model can be used with the approximate posterior distribution of the basic reproduction number to probabilistically forecast how the infection will spread in the following weeks.

the progression of the virus. In the original study, the authors determined that the initial growth period would be over during the later time period.

The fact that we do not know the true onset of the virus is taken into account by generating artificial counts until we obtain one that exceeds the first observed count, and then continuing the simulation for the same number of days that is in the observed dataset. This way the artificial datasets will have a similar level of variability to the observed one.

The prior for R 0 is modeled as a truncated normal distribution T N (1.7, 0.5, 1.05, 4). As summary statistic we used the median of slopes of log-transformed case counts that were calculated with consecutive datapoints. As the discrepancy measure ρ(·, ·) we used the logarithm of the euclidean distance. The BOLFI parameters are listed in Table 1 .

To verify R 0 inference performance, a sample of size 500 was drawn from the approximate posterior distribution obtained by BOLFI using NUTS sampler. The approximate posterior sample was propagated using the forward simulator, which allows for the investigation of the prediction intervals as a function of time. The propagation of the sample for the following days was visually compared to the actual observed cumulative case count. The wider uncertainty about R 0 in our case study CIs reflects the more complex underlying modeling assumptions, and taking such uncertainty appropriately into account may be critical in epidemic management situations. Forward propagated posterior prediction results consisting of the pointwise median of the simulated trajectories and the 80%-and 95%-probability bands are also reported in Figure 2 along with the observations used in the inference and some out-of-sample observations that were not used in inference, but are used to illustrate forecasting accuracy. WHO Ebola Response Team [2014] assumed, on a basis of a visual inspection, that the initial period of exponential growth would be 30 March for Guinea and 24 August for Liberia which are illustrated with the gray vertical lines on Figure 2 . Dates are reported as counts since the beginning of the epidemic (March 22). Our model fit is consistent with the initial growth period assumption as the observed data counts after the assumed initial growth period diverge quickly from the probable prediction curves. 

N init 5 N E 100 T update 5 σ 2 acq 0.1 N sample 2000

Personalised cancer treatment is an application area where simulator-based inference has a lot of potential, given the constantly improving biological generative models for the evolution of the disease under treatment [Sottoriva and Tavaré, 2010 , Koz lowska et al., 2018 , Lai et al., 2019 . Realistic biological generative tumor models that are built up from the molecular and cellular processes result typically in a level of complexity that renders analytical solutions infeasible. Using simulator-based methods we are able to conduct inference on such detailed models, which could in the future be used to optimize personalised treatments to achieve the best possible therapy outcomes. However, the complexity of the biological modeling often makes the generative modeling computationally demanding and only active learning based methods such as BOLFI (Algorithm 3) enable the inference to be performed within a reasonable time.

We consider an example of breast cancer tumor modeling that is a combination of empirical data and detailed computer simulation. The system is partly initialised and tuned based on a biopsy from a real patient, but we use only simulated data to demonstrate the inference and prediction procedure. The cancer simulator we use is a multi-scale pharmacokinetic and pharmacodynamic model describing the response of a cross-section of breast tumor tissue to a combination of chemotheraupetic and anti-angiogenic agents [Lai et al., 2019] . Mathematically, it consists of a hybrid cellular automaton model [Ribba et al., 2004 , Alarcón et al., 2010 that couples stochastic and discrete model formalisms with deterministic and continuous components accounting for biological processes at different spatio-temporal scales; see Figure 3 . There are multiple parameters in the model influencing the evolution of cancer cells, stromal cells and tumour vessels, which are all modeled as individual agents. Many of the parameters can be inferred and fixed via various means but some of the unknown parameters have to be inferred based on the reaction of the tumour to the treatment. Here we focus on the inference of two key parameters determining the outcome of each patient to the drug treatment. Those parameters account for the sensitivity of cancer cells to the chemotherapy, α, and the minimal cell cycle length of cancer cells, T c . We can write the simulator as a nonlinear time series model for the evolution of state x t

x t = f t (α, T c , x t−1 , v t ), t = 1, 2, . . . , T,

where f t is the nonlinear transition model at t, which is the time for the temporally discretized simulator with 30 minute increments, and v t is the stochastic component of the simulator. The simulated state x t consists of cells, vessels, and extracellular concentrations of oxygen, Avastin and vascular endothelial growth factor (VEGF) within the simulation grid, which here is a 33 × 20 rectangular grid representing a specific two-dimensional cross-section of the tumor. The time series model is not available for constant monitoring but some of the components can be indirectly measured at sparse time points using methods such as magnetic resonance imaging. For simplicity we assume that it is possible to measure the true state of the system y t k = x t k , k = 1, 2, . . . ,

In this proof-of-concept study we assume that observations can be collected every three days t k = k · 3 · 48, k = 1, 2, . . . , 6. The drug is administered for the patient every 3 weeks, i.e. at times t drug k = k · 21 · 48, k = 0, 1, 2, 3 and we aim to infer the parameters within the first 3-week treatment period, as we would want to forecast whether or not the chosen treatment is effective before the second dose. This dose could be adjusted, given the inferred parameters, to achieve an optimal outcome. Being able to reliably infer the parameters using as little data as possible from the beginning of the cancer evolution curve would enable simulated testing of personalised treatment strategies that are anticipated to be efficient for the specific patient. 

We generate the fake observed data using fixed parameter values T c = 14.69 and α = 3.0, and predetermined treatment protocol, and investigate how well we can infer the set parameters and how accurately we are able to forecast the disease progress based on the data collected within the first treatment period. If we were able to infer the patient specific parameter values based on the observed data, then we would be able to forecast how the patient defined by the parameter values would react also to different treatment protocols within the accuracy of the simulator. In our inference framework the results are approximations to the posterior distribution of the parameters p(α, T c | y t 1 , . . . y t 6 ), conditioned on the state observed at t 1 , . . . t 6 . Prior distributions for the unknown parameters are modelled as uniform α ∼ Uni (1, 4) and T c ∼ Uni (1, 21)

As summary statistics we use the proportions of cancerous cells in the simulation grid at each observation time,

where 660 = 33 · 20, is the size of the simulation grid and y (i) t,cells , i = 1, . . . , 660 is the vectorized simulation grid with binary value 1 when the ith grid location contains a cancer cell. The simulations are computationally very demanding and we use BOLFI to produce the approximation of the posterior. The BOLFI parameters are listed in After obtaining an approximate posterior curve based on the likelihood surrogate provided by BOLFI, NUTS is used to draw a sample from it, and the generative model is used to propagate the posterior sample in time under the selected treatment. This results in a simulationbased estimate of the posterior predictive distribution of the state p(x t end | s 1 , . . . s 6 ). Here we use the twelve week mark t end = 4032 as the end point. This time interval contains four full treatment periods.

The inference results are illustrated in Figure 4 . We see that the posterior distributions have the probability mass peaks located closely around the true values. Importantly, we are able to use the posterior distributions to probabilistically investigate how the disease will evolve under treatment. In Figure 5 we plot the summarised state of the cancer cells for 12 week treatment period and compare it to the trajectory simulated given the true parameter values. The results indicate that the forecast is quite consistent given the inferred posterior distributions and that the current treatment most likely will not erase the cancerous growth, and other treatments should be investigated for a better outcome.

Likelihood-free inference is becoming increasingly important in astronomy, where physical models cannot often be fully characterized in terms of a tractable likelihood function [Schafer and Freeman, 2012 , Cameron and Pettitt, 2012 , Weyant et al., 2013 , Ishida et al., 2015 , Leclercq, 2018 , Picchini et al., 2020 . Here we evaluate the performance improvement arising from using BOLFI (Algorithm 3) instead of the ABC-PMC algorithm (introduced in Section 2.2) on an astronomical model from Jennings and Madigan [2017] . The ABC-PMC version proposed for this example is slightly different from Algorithm 2. In particular, the version employed for this example requires us to properly tune the final number of iterations T , the first tolerance 1 and the quantile q t used for adaptively decreasing the series of tolerances. We did so by following the suggestions in Lenormand et al. [2013] and Weyant et al. [2013] . The choices for these three quantities are highlighted in Section 5.1. Starting with the SuperNova ANAlysis (SNANA) light curve package by Kessler et al. [2009] and the corresponding implementation of the SALT-II light curve fitter presented in Guy et al. [2010] , a sample of 400 supernovae with redshift range z ∈ [0.5, 1.0] have been simulated and then binned into i = 20 redshift bins. A model that describes the distance modulus as a function of redshift z is defined as:

where E(z) = Ω m (1 + z) 3 + (1 − Ω m ) exp 3 z 0 d ln(1 + z )[1 + w(z )] . The true cosmological parameters used to generate the "observed" data µ are the matter density of the universe Ω m = 0.3, the dark energy density of the universe Ω Λ = (1 − Ω m ) = 0.7, the present value of the dark energy equation ω 0 = −1.0 and, finally, the current Hubble constant h 0 = 0.7. In the following, Ω Λ and h 0 are considered known and fixed at their input values. The goal is to estimate the two cosmological parameters Ω m and ω 0 . The original example as presented in Jennings and Madigan [2017] is available in the astroABC Python package. Jennings and Madigan [2017] added artificial noise to the data simulated through (13) by using a skew-normal distribution [Azzalini, 1985] with location, scale and skewness parameters fixed at −0.1, 0.3 and 5.0, respectively. By doing so, the commonly used MCMC algorithm for Bayesian statistical inference is not applicable. In fact, in order the perform the analyses by using the MCMC algorithm, Jennings and Madigan [2017] tried to add artificial noise to the data simulated through Eq. (13) by using a normal distribution with location and scale parameters fixed at −0.1 and 0.3, respectively. The results obtained by the MCMC algorithm (see Section 5 in Jennings and Madigan [2017] ) led to a poor estimation of the parameters of interest and therefore an ABC based approach is preferred [Jennings and Madigan, 2017] .

The goal of the analysis presented in Jennings and Madigan [2017] was to present a comparison between this slightly more complicated model (for which the ABC-PMC algorithm is required) and a simplified version for which artificial noise is not added (for which the likelihood function is tractable and therefore MCMC is possible). The contour plot of the joint distribution (Ω m , ω 0 ) obtained by using the MCMC algorithm and the marginal posterior means for Ω m and ω 0 are available in Jennings and Madigan [2017] . Beyond retrieving reliable summaries such as marginal posterior means and marginal highest posterior density (HPD) intervals for the parameters of interest, it is of relevance from a physics standpoint to evaluate how well a likelihood-free inference approach preserves the so-called "banana-shape" [Kessler et al., 2013 , Hinton et al., 2019 that describes the relation between Ω m and ω 0 . The "banana-shape" is not expected to significantly change after the artificial noise is added to the data, as shown by Jennings and Madigan [2017] .

In order to use BOLFI (introduced in Section 2.3) and the ABC-PMC sampler for the estimation of the matter density of the universe parameter Ω m and the dark energy equation parameter ω 0 , two additional quantities must be specified. As highlighted in Section 2, the distance function used to compare the observed and the simulated data and the prior distributions for the parameters of interest (i.e. for this example Ω m and ω 0 ) must be defined. Since in this section we want to compare the performance of BOLFI with the ABC-PMC sampler, we performed the analysis for both methods by using the same specifications recommended in Jennings and Madigan [2017] . The metric ρ that compares the observed data µ with the simulated data µ sim (z) is defined as:

where σ i is the error on the data point µ i , estimated by calculating the sample variance of the observation in the i th bin. We note that, in this example, Jennings and Madigan [2017] do not formally define a summary statistic for the data. Therefore the summary statistic is equivalent to the 20 dimensional vector of the data. The prior distributions, Ω m ∼ N (0.3, 0.5) and ω 0 ∼ N (−1.0, 0.5), are chosen, where the prior for Ω m is consistent with the (0, 1) range for this parameter.

The ABC-PMC sampler from the astroABC package was run using N = 1000 particles, a total number of iterations T = 20 and a quantile equal to q t = 0.75, which is used to reduce the ABC tolerance parameter through the iterations. It follows that, with respect to the ABC-PMC sampler defined in Algorithm 2, the vector of the ABC tolerances 1 , . . . , 20 is not tuned in advance by the researcher but instead defined through the approach suggested by Lenormand et al. [2013] , among others. The perturbation kernel used from the second iteration onwards in Algorithm 2 follows the recommended Gaussian distribution, having variance equal to twice the empirical coverage amongst the particles for both Ω m and ω 0 . Following the choices by Jennings and Madigan [2017] , the first tolerance was fixed to 1 = 500 and the final tolerance was 20 = 29.82. We tried different combinations of T, q t in order to identify a suitable level for the final tolerance at which to stop the ABC-PMC algorithm, resulting in T close to 30. Table 3 : BOLFI parameters for the estimation of the supernova model

The computational efficiency of both the ABC-PMC sampler and BOLFI was investigated. Our parameter choices for BOLFI are summarized in Table 3 . We used a Metropolis-Hasting algorithm to produce the N sample approximate posterior draws. With the selected particle sample size of N = 1000, the ABC-PMC sampler takes 90 minutes to produce the final ABC posterior distribution. In comparison, BOLFI produces the posterior distribution in 3 minutes. The gain in computational efficiency is only one of the advantages obtained by using BOLFI over the ABC-PMC sampler. Figure 6 displays the contour plots of the joint distribution (Ω m , ω 0 ) obtained by the ABC-PMC sampler and by BOLFI, while the point estimates for Ω m and ω 0 obtained by the ABC-PMC analysis (the weighted marginal posterior means) and the estimates retrieved by BOLFI (the marginal posterior means and marginal posterior medians) are summarized in Table 4 . It is possible to note that BOLFI provides marginal posterior means closer to the true values (Ω m = 0.29 and ω 0 = −1.06) compared with the corresponding estimates provided by the ABC-PMC (Ω m = 0.36 and ω 0 = −1.22). Both procedures are able to reconstruct the expected "banana-shape", although the contour plot obtained by BOLFI presents a smaller lower tail compared with the "banana-shape" retrieved by using the ABC-PMC algorithm. This observation is also confirmed by looking at the marginal 90% HPD credible intervals for Ω m and ω 0 , reported in parentheses in Table 4 . Table 4 ). The point estimates obtained by BOLFI are closer to the true values for Ω m and ω 0 compared with the corresponding estimates provided by ABC-PMC. The expected "banana-shape" is reconstructed by both methods, although the contour plot obtained by BOLFI presents a smaller lower tail compared with the "banana-shape" retrieved by using the ABC-PMC algorithm. for Ω m and ω 0 indicate that the probability mass in the "banana-shape" produced by BOLFI is within a more compact region with with a smaller lower tail compared with the "banana-shape" retrieved by using the ABC-PMC algorithm.

6 ABC forecasting with an application to optimal portfolio allocation

Thus far the discussion has centred primarily on the use of ABC as an inferential method, and on improving the performance of more basic versions of ABC via BOLFI. Whilst BOLFI has been used for prediction in two of the three previous illustrations, any comparison with predictions that would have been produced via exact (likelihood-based) Bayesian inference has not been possible, due simply to the fact that the likelihood function is inaccessible (or, at the very least, challenging) in the given examples.

In this section, we step back from the illustration of ABC in situations where it is essential, to an artificial situation in which the exact posterior and, hence, the exact predictive, is available. The aim of the exercise is to illustrate that an ABC-based predictive distribution can be a very accurate approximation of the exact predictive and, hence, yield equally reliable forecasts. This then provides some reassurance that prediction based on a LFI method has value in those cases where it is indeed the only option, such as those illustrated in this paper. We revert here to the simplest form of rejection ABC in order to emphasize that predictive accuracy does not necessary depend on using an optimal version of the inferential algorithm.

Let Y n+1 denote a scalar random variable, observed at time n + 1, and generated according to p(y n+1 |θ, y obs ), where y obs = [y 1 , y 2 , ..., y n ] T . The exact Bayesian predictive (or forecast distribution -we use the terms 'forecast ' and 'prediction', and their variants, synonymously) is p(y n+1 |y obs ) = p(y n+1 |θ, y obs )p(θ|y obs )dθ, where p(θ|y obs ) is the exact posterior, defined in the usual way, and y n+1 denotes a value in the support of Y n+1 . In cases where p(y obs |θ) and, hence, p(θ|y obs ), is inaccessible, p(y n+1 |y obs ) is also inaccessible, and a natural solution is to define the approximate Bayesian predictive, g(y n+1 |y obs ) = Θ p(y n+1 |θ, y obs )π (θ | y obs )dθ, with g(y n+1 |y obs ) produced using the ABC posterior in (1), π (θ | y obs ), which could be accessed using either Algorithm 1 or Algorithm 2. Alternatively, BOLFI (Algorithm 3) could be used to produce an approximate posterior sample, as described in Section 2.3, noting that a notational change would be required to represent this approximate posterior in the expression for g(y n+1 |y obs ).

In an extensive exploration, demonstrate, both theoretically and in practical situations, that the differences between g(y n+1 |y obs ) and p(y n+1 |y obs ) can be negligible, despite there sometimes being substantial differences between the approximate and exact posteriors. Moreover, the authors also demonstrate that ABC-based forecasting can produce reliable forecasts in a fraction of the time required for exact methods, owing to the speed with which π (θ | y obs ) can be constructed, relative to p(θ|y obs ).

The simplicity and computational speed of ABC-based forecasting is important in the sphere of economics and finance, where the need to predict the actions, and interactions, of large numbers of economic 'agents' leads to complex dynamic models that often challenge the MCMC toolkit and exact Bayesian forecasting. To illustrate this novel use of ABC, we document the performance of ABC-based forecasting, relative to exact forecasting, in a particular empirical example: the production of a utility-optimizing financial portfolio.

We reiterate that our illustration is based on the simplest version of ABC, as given in Algorithm 1.

At time n, an investor chooses to allocate her wealth W n across m possible investment choices, where at time (n + 1) the different investment choices yield random returns denoted by R n+1,i , i = 1, 2, ..., m, with R n+1 = [R n+1,1 , ..., R n+1,m ] T . For ∆ m := {α ∈ [0, 1] m : m i=1 α i = 1}, the individual's wealth at time (n + 1) is given by W n+1 = W n [1 + α T R n+1 ]. The goal of portfolio analysis is to discern an "optimal" allocation rule α opt ∈ ∆ m for the portfolio α T R n+1 .

The portfolio allocation problem exhibits different solutions depending on the definition of "optimality". A common approach in economics and finance is to find α opt by maximizing expected utility (of wealth) using von Neumann-Morgenstern expected utility (EU) theory [von Neumann and Morgenstern, 1953] : For u(·) : [0, ∞) → R a utility function, and E n (·) denoting expectation conditional on information available at time n, the optimal allocation rule is

For this illustration we consider the canonical risk-averse investor with power utility func-

1−γ , where γ > 1 denotes the risk aversion parameter. Given a model for conditional returns, p(R n+1 |R obs ), where R obs = [R 1 , R 2 , ..., R n ] T , the allocation rule that maximizes the expected utility is given by

Given that p(R n+1 |R obs ) is typically unavailable in closed-form, we must resort to simulation to compute the integral in (16): if we can obtain M draws from p(R n+1 |R obs ), denoted as R (j) n+1 , j = 1, ..., M , the value of α opt n+1 can be approximated numerically by solvinĝ

The optimally allocated portfolio at time n + 1 is then given by W opt n+1 = W n [1 + α opt T n+1 R obs n+1 ], with utility, u(W opt n+1 ), where R obs n+1 denotes the observed value of R n+1 . Repeating this exercise over an evaluation period yields a series of such utility values, which can be averaged to produce

In what follows, we demonstrate that, in this particular representative example, there is negligible difference between the values of E n [u(W opt n+1 )] produced via exact and approximate predictives associated with a given model for R n+1 .

For the illustration we consider a very simple portfolio, W n+1 = W n [1 + α n+1 T R n+1 ], where R n+1 = [exp(y n+1 ), exp(rf n+1 )] T and α n+1 = [α n+1 , 1 − α n+1 ] T , with y n+1 the logarithmic return on the S&P500 market portfolio and rf n+1 the (known) logarithmic return on a onemonth constant maturity US Treasury Bill, over the period n to (n + 1). The monthly S&P500 Total Returns index is sourced from the Chicago Board of Options Exchange (CBOE), and captures both capital gains as well as dividend yields. The Treasury Bill is sourced from the Federal Reserve St Louis (FRED) database. The weight α n+1 ∈ [0, 1] determines the proportion of wealth allocated to the risky market portfolio, and is to be chosen according to (17) . (See, for example, [Billio et al., 2013] .) The monthly data extends from June 1986 to June 2018, totaling 384 observations. The first 324 observations are reserved for inference/training and the final 60 observations (five years) used to estimate expected utility.

We produce the 60 one-step-ahead exact and approximate predictive distributions over the evaluation period based on the following stochastic volatility model for y t ,

where ε t i.i.d.

∼ N (0, 1) and v t i.i.d.

∼ N (0, 1), with ε t and v t independent for all t = 1, 2, ..., n. The unknowns comprise the static parameter vector, θ = [ρ, σ, ω, µ] T , the vector of (in-sample) latent volatilities, V = [V 1 , V 2 , ..., V n ] T , and the unknown V n+1 on which y n+1 is conditioned. This model is adequate for capturing the behaviour of monthly returns, and allows for a ready application of an exact algorithm, for this comparative exercise.

Expanding windows are used to produce five predictives -one exact and four approximate -for the 60 time points in the evaluation period, with draws of θ updated only yearly in all cases. Draws from p(θ, V |y obs ), used to estimate the exact predictive, are produced using a particle Metropolis Hastings (PMH) scheme [Andrieu and Doucet, 2010] . The four approximate predictives are produced using the auxiliary likelihood-based ABC approach of to specify the summary statistics in Algorithm 1. Four alternative models from the generalized autoregressive conditionally heteroscedastic (GARCH) family of volatility models are chosen to define the auxiliary likelihood: a GARCH model with normal errors, a GARCH model with Student's t errors, an asymmetric GARCH with normal errors, and an asymmetric GARCH model with Student's t errors. We refer the reader to Chapter 9 of Brooks [2014] for a description of these, and related, models; we simply highlight here that such conditionally deterministic volatility models are suitable for an auxiliary likelihood-based version of ABC, given the closed-form nature of their likelihoods. A nearest-neighbour version of rejection ABC is used, with N = 34992 and a selection probability of 0.86%. See Frazier et al. [2018] for an explanation of the rule adopted to determine these values for a given sample size, n.

For each predictive, M = 2000 draws are used to produce an optimal weight, as per (17), the associated optimal portfolio, and its utility. The 60 values of utility are then averaged for a particular predictive method, with five such numbers produced.

Before discussing the resulting (estimated) expected utilities, we present representative exact and approximate predictives in Panels A and B of Figure 7 , for two particular months. We label the predictive estimated using PMH as 'Exact' and the approximate predictives constructed using each of these four auxiliary models listed above as: 'ABC1', 'ABC2', 'ABC3' and 'ABC4', respectively. The plots demonstrate that there is very little to distinguish between the approximate and exact predictive methods, in particular for June 2018. The similarity between the exact and approximate predictives is further borne out in the expected utility calculations. For a fixed risk aversion parameter of γ = 4, the exact approach yields expected utility of −4.72, while the four approximate methods yield expected utilities of −4.69, −4.70, −4.69 and −4.69. Moreover, the computation time required to produce the exact expected utility is ten times larger than the time required to produce the expected utility via the slowest of the approximate methods, and fifteen times greater than the computation time for the fastest approximate method. A comprehensive set of expected utility estimates were produced, for different degrees of risk aversion, with the same negligible difference between exact and approximate results being in evidence. 

ABC and similar likelihood-free inference methods are emerging as an important part of the analysis toolbox in various application areas where we are able to simulate realistic data, but the models are too complicated for likelihood-based inference. Here we have demonstrated various aspects of using this approach in challenging real-world applications that go beyond the typical benchmark examples used in the literature. A particularly interesting combination of inference and generative modeling is the possibility of learning the parameters of a system based on observed data and simulating the performance under various scenarios, treatments or interventions. For example, one of our case studies involved predicting a patient-specific cancer treatment outcome, which could in the future be an important part of treatment planning, especially as the mathematical modeling of disease evolution under treatments is rapidly improving [Koz lowska et al., 2018] . A similar application is found in policy planning for epidemics, as the models of infection control are improving, and different 'what-if'-scenarios can be explored. Such activities have recently been extensively performed across the world during the Covid-19 pandemic and the iteratively improving understanding about the epidemiology of the virus illustrates the need for proper representations of uncertainties in the model components.

As the modeling proficiency increases, we will be faced with even more difficult inference tasks in higher dimensional domains, with computationally heavier simulators and, in some applications, even requirements for carrying out the inference (and prediction) in real-time. These challenges call for further advances from the statistics and computation community on both the inference algorithms and their efficient implementation via user-friendly software.

A multiple scale model for tumor growth

Estimating the reproduction number of ebola virus (ebov) during the 2014 outbreak in west africa

BSL: An R Package for Efficient Parameter Estimation for Simulation-Based Models via Bayesian Synthetic Likelihood

Accelerating Bayesian synthetic likelihood with the graphical lasso

Particle Markov chain Monte Carlo methods

A class of distributions which includes the normal ones

Approximate Bayesian Computation

Adaptive approximate Bayesian computation

Time-varying combinations of predictive densities using nonlinear filtering

Approximate Bayesian computation: A nonparametric perspective

Estimation in emerging epidemics: biases and remedies

A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning

Introductory Econometrics for Finance

Approximate bayesian computation for astronomical model analysis: A case study in galaxy demographics and morphological transformation at high redshift

The effect of travel restrictions on the spread of the 2019 novel coronavirus (covid-19) outbreak

A preferential attachment model for the stellar initial mass function

Componentwise approximate Bayesian computation via Gibbs-like steps

Frequency-dependent selection in vaccine-associated pneumococcal population dynamics

The frontier of simulation-based inference

Approximate Bayesian computation (ABC) in practice

Approximate bayesian computation using indirect inference

Bayesian indirect inference using a parametric auxiliary model

Estimation of parameters for macroparasite population evolution using approximate bayesian computation

Abcpy: A user-friendly, extensible, and parallel library for approximate bayesian computation

Likelihood-free parameter estimation for dynamic queueing networks: Case study of passenger flow in an international airport terminal

Constructing summary statistics for approximate bayesian computation: semi-automatic approximate bayesian computation

Approximate Bayesian computation with surrogate posteriors

Asymptotic properties of approximate bayesian computation

Approximate Bayesian forecasting

Model misspecification in approximate Bayesian computation: consequences and diagnostics

Hierarchical reinforcement learning explains task interleaving behavior

A review of approximate bayesian computation methods via density estimation: Inference for simulator-models

Bayesian computation: a summary of the current state, and samples backwards and forwards

Bayesian optimization for likelihood-free inference of simulator-based statistical models

Likelihood-free inference via classification

The supernova legacy survey 3-year sample: Type ia supernovae photometric distances and cosmological constraints

Steve: a hierarchical bayesian model for supernova cosmology

The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo

cosmoabc: Likelihood-free inference via population monte carlo approximate bayesian computation

astroabc: an approximate bayesian computation sequential monte carlo sampler for cosmological parameter estimation

Approximately sufficient statistics and Bayesian computation

Efficient Acquisition Rules for Model-Based Approximate Bayesian Computation

Inferring cognitive models from data using approximate bayesian computation

Parameter inference for computational cognitive models with approximate bayesian computation

Snana: A public software package for supernova analysis

Testing models of intrinsic brightness variations in type ia supernovae and their impact on measuring cosmological parameters

Pylfire: Python implementation of likelihood-free inference by ratio estimation

Mathematical modeling predicts response to chemotherapy and drug combinations in ovarian cancer

Towards personalized computer simulation of breast cancer treatment: a multiscale pharmacokinetic and pharmacodynamic model informed by multi-type patient data

A scalable solver for a stochastic, hybrid cellular automaton model of personalized breast cancer therapy

Bayesian optimization for likelihood-free cosmological inference

Adaptive approximate bayesian computation for complex models

On the asymptotic efficiency of approximate Bayesian computation estimators

Convergence of regression-adjusted approximate Bayesian computation

Fundamentals and Recent Developments in Approximate Bayesian Computation

ELFI: Engine for Likelihood-Free Inference

Resolving outbreak dynamics using Approximate Bayesian Computation for stochastic birthdeath models

Likelihood-free inference with emulator networks

Approximate Bayesian computational methods

Relevant statistics for bayesian model choice

Markov chain Monte Carlo without likelihoods

Auxiliary likelihood-based approximate Bayesian computation in state space models

Inference in epidemic models without likelihoods

An adaptive sequential Monte Carlo method for approximate Bayesian computation

Likelihood-free inference in high dimensions with synthetic likelihood

Sequential neural likelihood: Fast likelihoodfree inference with autoregressive flows

Adaptive mcmc for synthetic likelihoods and correlated synthetic likelihoods

Adapting the ABC Distance Function

Bayesian synthetic likelihood

Population growth of human Y chromosomes: a study of Y chromosome microsatellites

Gaussian processes for machine learning

The use of hybrid cellular automaton models for improving cancer therapy

Likelihood-free approximate Gibbs sampling

Statistical Challenges in Modern Astronomy V, chapter 1

Pneumococcal quorum sensing drives an asymmetric owner-intruder competitive strategy during carriage via the competence regulon

Optimizing threshold-schedules for sequential approximate bayesian computation: applications to molecular systems

Machine learning accelerated likelihood-free event reconstruction in dark matter direct detection

Adaptive approximate bayesian computation tolerance selection

Handbook of Approximate Bayesian Computation

Integrating approximate bayesian computation with complex agentbased models for cancer research

Gaussian process optimization in the bandit setting: No regret and experimental design

Inferring coalescence times from dna sequence data

sbi: A toolkit for simulation-based inference

Likelihood-free inference by ratio estimation

Approximate bayesian computation scheme for parameter inference and model selection in dynamical systems

On the use of approximate bayesian computation markov chain monte carlo with inflated tolerance and post-correction

Theory of Games and Economic Behavior

Likelihood-free cosmological inference with type Ia supernovae: approximate Bayesian computation for a complete treatment of uncertainty

WHO Ebola Response Team. Ebola virus disease in West Africa -The first 9 months of the epidemic and forward projections

Statistical inference for noisy nonlinear ecological dynamic systems