key: cord-0181917-eml6pore
authors: Nardini, John T.; Baker, Ruth E.; Simpson, Matthew J.; Flores, Kevin B.
title: Learning differential equation models from stochastic agent-based model simulations
date: 2020-11-16
journal: nan
DOI: nan
sha: 9464e7daec5ac465745ca22662de875f564732a3
doc_id: 181917
cord_uid: eml6pore

Agent-based models (ABMs) are a flexible framework that are frequently used for modeling many biological systems, including cell migration, molecular dynamics, ecology, and epidemiology. Analysis of ABM dynamics can be challenging due to their inherent stochasticity and heavy computational requirements. Common approaches to ABM analysis include extensive Monte Carlo simulation of the ABM or the derivation of coarse-grained differential equation (DE) models to predict the expected or averaged output from an ABM. Both of these approaches have limitations, however, as extensive computation of complex ABMs may be infeasible, and coarse-grained DE models can fail to accurately describe ABM dynamics in certain parameter regimes. We propose that methods from the equation learning (EQL) field provide promising novel approaches for ABM analysis. Equation learning is a recent field of research from data science that aims to infer DE models directly from data. We use this tutorial to review how methods from EQL can be used to learn DE models from ABM simulations. We demonstrate that this framework is easy to use, requires few ABM simulations, and accurately predicts ABM dynamics in parameter regions where coarse-grained DE models fail to do so. We highlight these advantages through several case studies involving two ABMs that are broadly applicable to biological phenomena: a birth-death-migration model commonly used to explore cell biology experiments and a susceptible-infected-recovered model of infectious disease spread.

Complex interactions between individuals are a crucial aspect of many biological processes: honeybees dance to direct others to food sources Couzin et al. [2005] , cells push their neighbors to promote invasion during tumorigenesis Schmidt and Friedl [2009] , and animal herds aggregate together to deter predation Binny et al. [2020] . Agent-based models (ABMs) are invaluable tools to simulate how such interactions between individuals scale to population-wide phenomena D' Orsogna et al. [2006] . In an ABM, the states and decisions of individual agents are simulated using pre-defined rules to govern the agents' interactions and behavior . The ease of construction of ABMs by domain experts and modelers allows for complex models that can capture rich dynamical behavior .

Predicting the emergent behavior of ABMs can be a challenging task, however, due to their inherent stochastic nature and analytic intractability. As such, the most straightforward and commonly-used approach to interrogate ABMs is extensive Monte Carlo simulation using well-established computational algorithms (Arrow 1 of Figure 1 ). Average ABM behaviors at fixed parameter values can be inferred from many simulations from the central limit theorem Seber and Wild [1988] , and the inverse problem of inferring model parameter distributions can be done with Markov chain Monte Carlo samplers Beheshti and Sukthankar [2013] . Unfortunately, such extensive simulation of many ABMs may not be feasible due to significant computational costs involved. An alternative method to predict the emergent behavior of an ABM consists of deriving coarse-grained differential equation (DE) models to approximate ABM output (Arrow 2 of Figure 1 ). The most commonly-used DE model approximations for ABMs are mean-field models Baker and Simpson [2010] , Fadai et al. [2019] . Such models describe the evolution of population density over time (and possibly space) and can be derived by approximating agent-agent interactions with locally-averaged agent densities Fadai et al. [2019] . DE models are often simple to solve (either analytically or numerically), so they provide an advantageous alternative to extensive simulation of the ABM. Furthermore, such DE Figure 1: An illustration of current (blue) and proposed (red) methods to predict emergent ABM behavior. Extensive simulation (Arrow 1) is performed by running many ABM simulations over a range of parameter values, and then using Monte Carlo techniques to average ABM output. While this approach will accurately predict ABM dynamics, it can be computationally intensive to perform. DE models derived using model coarse-graining approaches (Arrow 2) can be analyzed (e.g., using bifurcation analysis or perturbation methods). This technique is advantageous because such analytical methods do not require any computation. Unfortunately, coarse-grained models will provide inaccurate predictions in many parameter regimes. We propose that DE models can be learned from ABM simulation data using techniques from equation learning (Arrow 3). This method is advantageous because it may only require a small number of ABM simulations, will lead to a DE model that can predict ABM dynamics accurately, and can be informed with analytical techniques. models are amenable to analytical techniques (including bifurcation, traveling wave, perturbation analysis), which can be used to predict how ABM output will change in response to variations in parameter values J. D. Murray [2012] , Murray [2002] .

Previous ecological studies have demonstrated some of the advantages of both extensive simulation and model coarse-graining for ABM analysis Bernoff and Topaz [2013] . For example, Bernoff et al. Bernoff et al. [2020] model the foraging behavior of the Australian plaque locust with a discrete and stochastic ABM. In the model, individual locusts forage and feed on a given resource (representative of food) and, in turn, create a gradient of this resource. The model robustly shows that individual locust behavior drives the formation of this resource gradient and, in turn, determines how the averaged profile of locust density migrates and forms over time. The authors derive the mean-field partial differential equation (PDE) model for this ABM and perform a traveling wave analysis to quantify how the locust population's invasion speed depends on the total mass of locusts. The mean-field PDE model is shown to match the ABM output well in biologically consistent parameter regimes. In addition, non-mean-field models have been considered to approximate other ABMs of locust behavior. For example, the ABMs in Dkhili et al. [2017] describe self-organizing locust behaviors through rules governing locust attraction, repulsion, and alignment during foraging and invasion. By simulating the ABM over many different parameter values, Dkhili et al. Dkhili et al. [2017] discovered three distinct population patterns (spot, band, and ribbon formations). Topaz et al. Topaz et al. [2012] analyzed a continuous partial integro-differential equation as a representation of this locust flocking behavior and used a linear stability analysis to provide analytical insights into which parameter values governing agent interactions lead to the formation of such spatial patterns. There are thus many scenarios in which DE models supplement ABM simulations to aid in our understanding of emergent behavior.

Despite their wide use, coarse-grained models can provide misleading predictions of ABM dynamics in regions of parameter space in which the assumptions made during the coarse-graining process do not hold Baker and Simpson [2010] , Fadai et al. [2019] , Matsiaka et al. [2017] . Furthermore, it can be challenging to determine informative DE models for more complex ABMs. As one such example, Gallaher et al. Gallaher and Anderson [2013] constructed an ABM in which thousands of cells with different phenotypes compete for space during tumor growth. Each agent in the simulation is given an internal set of dynamic traits dictating how fast the agent moves in space and how frequently it divides. The intricate dynamics of this model allow for interesting findings of biological relevance, including how transmission of proliferation rates from parent to daughter cells alters the final trait landscape of the population and, in turn, the eventual physical clustering of the population. Formulating a DE model for this process from ABM rules would be challenging, however, due to the many different cell phenotypes and complicated rules between such cells. Instead, methods to directly infer DE models from a small number of ABM simulations may provide a useful tool for modelers to determine the salient features necessary for modeling complex ABM dynamics.

Equation learning (EQL) is a recent field from data science that aims to infer the dynamical systems model that best describes a given dataset Brunton et al. [2016] . The learned models can, in turn, be used to understand the system under study by providing a mechanistic description of observed dynamics or predicting how dynamics will change in response to different conditions. There has been much progress in this field over the past five years largely thanks to increases in computational power, and many EQL methods can accurately recover DE models from artificially simulated noisy data from DE models Lagergren et al. [2020b] , Rudy et al. [2017] , Zhang and Lin [2018, 2019] . There have been some recent studies showing that equations can be learned from noisy experimental data Lagergren et al. [2020a] , and the EQL field is now in a position where EQL can, in principle, be used to aid in the development of DE models to approximate the dynamics of complex ABMs (Arrow 3 of Figure 1 ). Similar to coarse-grained DE models, learned DE models can be analyzed using both computational algorithms and analytical techniques to infer the emergent behavior that results from a given set of ABM, or estimate mechanistic parameters from data. Furthermore, such learned equations may have fewer computational requirements than ABMs if they can learn ABM dynamics from a small number of simulations. As a result, EQL methods may be tractable for learning DE models for a broad range of ABM simulations. There are many ways in which learned DE models may be useful for ABM analysis, including the discovery of novel DE models, predicting unobserved dynamics from complex ABMs, and enabling accurate parameter estimation. Furthermore, as ABMs imitate many key features of biological systems (including stochasticity and heterogeneity), inferring DE models from ABM data is an intermediate step towards developing algorithms to aid the discovery of models from experimental data.

This article is intended to serve as a tutorial on three separate but synergistic methods (extensive simulation, coarse-grained model derivation, and EQL) to infer the emergent behavior of ABMs. The first two are frequently used for ABM analysis, and we propose that EQL will provide added insight into ABMs for modelers. We will showcase each of these methods and highlight the advantages and limitations of each. In Section 2, we discuss ABM setup and implementation as well as how simple ABM rules can be coarse grained directly into DE models. In Section 3, we discuss how methods from EQL can be used to learn DE models from ABM data directly. We then focus on several case studies in Section 4 and address specific questions on how EQL methods can be used in practice. We use two representative ABMs throughout: a birth-death-migration (BDM) model of population growth Baker and Simpson [2010] , and a susceptible-infected-recovered (SIR) model of infectious disease spread. We note, however, that the approaches discussed within this tutorial are broadly applicable to many social and biological phenomena that have been modeled by ABMs previously , Bonabeau [2002] , Cosgrove et al. [2015] , Interian et al. [2017] , Stevens [2000] , Stevens and Othmer [1997] . We make final conclusions, summarize the advantages and limitations of each method, and suggest future avenues for research in Section 5. The code for all tutorials and case studies shown in this study are publicly available at https://github.com/johnnardini/Learning-DE-models-from-stochastic-ABMs.

Coarse-grained DE models are now frequently used to investigate how rules governing individual behaviors translate to emergent behavior at the population level Anguige and Schmeiser [2009] , Simpson [2010], Fadai et al. [2019] . In this section, we illustrate this approach by introducing two simple ABMs and coarse-graining them to give DE models. We consider two ABMs: (i) a BDM process in Section 2.1, and (ii) a SIR model in Section 2.2. While we focus on ordinary differential equation (ODE) models throughout this article, the derivation of PDE models in the presence of spatial heterogeneity can also be performed using extensions of the methods presented herein, as discussed in Fadai et al. 

We consider the BDM process, a lattice-based ABM in which agents are able to give birth, die, and move Baker and Simpson [2010] . This ABM is representative of many biological phenomena, for example, agents may represent cells during the wound healing process Johnston et al. [2014] or the invasion of animals in ecology Bernoff and Topaz [2013] . We begin by introducing the ABM rules in Section 2.1.1 and then coarse grain these rules into DE models and compare to ABM output in Section 2.1.2.

We use a two-dimensional square lattice with a spacing of ∆ between adjacent sites. We arbitrarily set ∆ = 1 and assume the lattice has X lattice sites in each spatial dimension. Each lattice site is indexed by α = (i, j) ∈ N 2 , i, j = 1, . . . , X. For each interior lattice site, α, we define its neighboring sites, B(α), using the Von Neumann neighborhood B(α) = {(i, j + 1), (i, j − 1), (i + 1, j), (i − 1, j)} and adjust this definition at boundary sites to enforce no-flux conditions. We designate the occupancy of each lattice site α as µ α (t) = 0 if α is unoccupied at time t or by µ α (t) = A if α is occupied by an agent at time t. For simplicity, we will also use the notation 0 α (t) and A α (t) when α is unoccupied and occupied, respectively. To represent volume excluding effects, we assume that each lattice site can be occupied by a maximum of one agent at a time.

We next define how agents proliferate, migrate, and die. For proliferation events, we assume that agents proliferate with rate P p (formally, an agent will attempt to proliferate over an infinitesimal timestep of length dt with probability P p dt). If an agent chooses to proliferate, then it will attempt to place its daughter cell into a neighboring site β ∈ B(α) with the choice of β made uniformly at random. If the chosen site is occupied, then the event is aborted. This process may be written as a bimolecular reaction with rate P p /4:

The reaction rate here is divided by four because proliferating agents randomly choose one of their four neighboring sites to place their daughter cell into. For death events, lattice sites transition from occupied to unoccupied. We assume agents die with rate P d , and write death events as a monomolecular reaction with rate P d :

Agents attempt to move with rate P m . During migration events, agents randomly choose a neighboring lattice site β ∈ B(α) to attempt to move to. If the chosen site is already occupied, then the migration event is aborted. This process may be written as a bimolecular reaction with rate P m /4:

The reaction rate here is divided by four because migrating agents randomly choose one of their four neighboring sites to place their daughter cell into. The BDM model can be simulated using the Gillespie algorithm Gillespie [1977] , which is provided in Fadai et al. [2019] . Each ABM simulation is initialized by placing agents uniformly at random throughout the lattice so that 5% of lattice sites are occupied. Reflecting boundary conditions are used at the boundaries of the lattice, which enforces a no-flux condition in the spatial domain. We use the following notation to summarize the output from an ABM simulation. To estimate the total agent density from the n th of N identically prepared ABM simulations we compute

where C n (t) is the number of occupied sites at time t. To estimate the averaged agent density over time from N identically prepared ABM simulations, we compute

We depict snapshots of two simulations of the BDM process in Figure 2 . The blue dots in the right-most column correspond to C 1 ABM (t) from these individual simulations. Simpson et al. [2013] have demonstrated that the mean-field DE model for the BDM process described in Section 2.1 is given by the logistic DE model d dt

This model is advantageous in that it can be solved analytically to give

where r = P p −P d , K = (P p −P d )/P p , and C(0) denotes the initial condition. The full derivation of this model is provided in Appendix A. To arrive at Equation (6), we use the mean-field assumption that the occupancies of neighboring lattice sites are independent, i.e. for all sites α,

is the probability that lattice site α is occupied at time t and P[A α (t), A β (t)] is the joint occupancy probability of neighboring lattice sites α and β at time t.

Mean-field models are widely used to predict ABM dynamics, however they fail to accurately predict dynamics in regions of parameter space in which the mean-field assumption is violated Baker and Simpson [2010] , Fadai et al. [2019] , Matsiaka et al. [2017] , . We depict ABM snapshots for two simulations of the BDM process in Figure 2 . The mean-field assumption seems to be satisfied during the simulation with (P p , P d ) = (0.01, 0.0025): agents appear uniformly distributed, which indicates that Figure 2 : ABM simulation snap shots for the BDM ABM. Blue pixels denote A α (t) and white pixels denote 0 α (t). Simulations were computed with agent migration rate P m = 1. The right-most column depicts C ABM (T ) (blue line and dots) against the solution of the mean-field DE model (solid black line). Quantities are plotted against nondimensionalized time T = (P p − P d )t for ease of interpretation.

neighboring site occupancies are independent of each other. As expected, we observe close agreement between the mean-field model and C ABM (t) for this parameter combination. For the simulation with (P p , P d ) = (1, 0.25), however, the ABM simulation exhibits clusters of agents. In this case, neighboring site occupancies will be dependent within and outside of cluster regions, which violates the mean-field assumption. As a result, we observe poor agreement between the mean-field model and the ABM simulation output, C ABM (t) . Similar results have been documented previously for this model over a wide range of parameter values: the mean-field model matches ABM output well for small values of P p /P m and P d /P m (P m is fixed at unity in all simulations in this study), but this agreement worsens as either of these ratios increase Baker and Simpson [2010] .

Susceptible, infected, and recovered (SIR) models are used in epidemiology to model and predict the emergence of infection diseases Blackwood and Childs [2018] . In this section, we detail how the previous modeling framework can be extended to derive DE models for such ABMs of disease spread. We introduce model rules in Section 2.2.1 and then derive the mean-field DE model and compare it to ABM output in Section 2.2.2.

We use an equivalent lattice to that presented in Section 2.1. Each lattice site, α, can now take one of four states over time: 0 α (t), S α (t), I α (t), and R α (t) denote that α is unoccupied, or occupied by a susceptible, infected, or recovered agent at time t, respectively. We assume three rules governing how agents move, infect, and recover in our SIR model. For agent movement, we assume that each agent moves with rate P m . When an agent attempts to migrate, the agent chooses a neighboring lattice site β ∈ B(α) randomly to move to. If the chosen site is already occupied, then the migration event is aborted. We write this process as a bimolecular reaction with rate P m /4:

for Y ∈ {S, I, R}.

The second rule governs infection of agents, which occurs with rate P I . During an infection event, an infected agent at lattice site α will randomly infect an agent at a neighboring lattice site β ∈ B(α). If the chosen site β is occupied by a susceptible agent, then the susceptible agent becomes infected. Otherwise, the infection does not alter the state of lattice site β. We model this rule using a bimolecular reaction with rate P I /4:

The final rule concerns the recovery of infected agents: infected agent recover with rate P R . We model this process using a monomolecular reaction with rate P R :

Simulation of the SIR ABM proceeds as follows. We begin each simulation by randomly placing susceptible agents in 49% of the lattice sites, infected agents in 1% of the lattice sites, and leaving the remaining lattice sites unoccupied. Because there is no death or birth in the model, the proportion of occupied lattice sites is fixed at M = 0.5 for all time in simulations. We use reflecting boundary conditions, which model a no-flux condition in the spatial domain. The Gillespie Algorithm is used to simulate the model Gillespie [1977] . From the n th of N identically prepared simulations, we let S n ABM (t), I n ABM (t), and R n ABM (t), denote the fractions of susceptible, infected, and recovered agents in the model over time, respectively (e.g., S n ABM (t) is equal to the number of susceptible agents at time t divided by M X 2 ). We then estimate the averaged ABM fraction for each subpopulation by averaging over all N simulations

We depict snapshots of two simulations of the SIR model in Figure 3 . In both cases, we observe that the small initial proportion of infected agents causes an outbreak of infection. The majority of agents have become infected and then recovered by the end of both simulations.

We show in Appendix B that the mean-field model for the SIR process is given by the frequently-used system of equations:

In Equation (12), the variables S(t), I(t), R(t) denote the fraction of susceptible, infected, and recovered agents at time t, respectively. In Figure 3 , we depict snapshots from two simulations of the SIR ABM together with evolution of the corresponding ABM densities. The ABM simulation with (P I , P R ) = (0.005,0.0005) appears well-mixed for all agent-types, which would satisfy the mean-field assumption. As a result, simulations of the mean-field model predict the ABM density well. If we increase the infection and recovery rates to (P I , P R ) = (0.2,0.02), however, then the resulting ABM simulation has separate patches comprised primarily of infected agents or susceptible agents. These patches of single agent types decrease the population-wide average infection rate because infected agents in the middle of an infected cluster are unable to infect any susceptibles. As a result, the mean-field assumption is violated within these patches, and the mean-field model cannot accurately predict ABM dynamics. For both of the BDM and SIR models, we have seen that some parameter regimes lead to close agreement between the ABM output and mean-field model predictions, whereas other parameters lead to poor agreement between the two. There is thus a need to develop methods that can determine when mean-field modes will accurately predict ABM dynamics and find novel DE models that accurately predict ABM dynamics when the mean-field models fail to do so. Figure 3 : Simulation snap shots for the SIR ABM. Blue pixels denote S α (t), green pixels denote I α (t), black pixels denote R α (t), and white pixels denote 0 α (t). In the rightmost column, we compare predictions of the mean-field SIR model (Equation (12)) to the ABM outputs S ABM (T ) , I ABM (T ) , R ABM (T ) . Solid lines correspond to the mean-field (MF) model and dots correspond to ABM simulation output. Quantities are plotted against nondimensionalized time T = P R t for ease of interpretation. Step 1: Generate averaged ABM output;

Step 2: Estimate the temporal derivative of the ABM output;

Step 3: Library construction;

Step 4: Equation Inference. At Step 4, one can either perform (a) regression or (b) sparse linear regression to learn an equation for the ABM output. We will consider both the Lasso and Greedy sparse regression algorithms to perform EQL.

For many EQL studies, the goal is to infer a dynamical systems model, written as C(t), from a time-varying dataset, C d (t). This dynamical system can broadly be written as

where F describes the dynamics of C(t). When C d (t) is a time-varying scalar quantity (or a vector of scalar quantities), then an ODE model is relevant, in which case F = F(t, C). When C d (t) varies over time and a one-dimensional spatial dimension, x, then a PDE model may be more relevant, in which case F = F(t, x, C, C x , C xx , . . . ). In the following sections, we exemplify how methods from EQL may be used to learn a form of F from output ABM data.

In this section, we outline the steps one may take to learn a DE model from ABM data for the BDM model ( Figure 4 ). Code is provided for the results presented in this section in the file EQL Tutorial.ipynb.

We illustrate how to use EQL methods using data from the BDM process described in Section 2.1. In the first step, we simulate the ABM 50 times with parameter values (P p , P m , P d )=(0.01,0.005,1.0) and average the output to acquire C d (t) = C ABM (t) . The time vector, t, is sampled on an equispaced grid such that t i = (i − 1)∆t, i = 1, . . . , 100 for some small ∆t > 0.

In the second step of this process, we estimate the numerical derivative of C d (t). Finite difference computations are a simple method to approximate derivatives LeVeque [2007] . We use centered differences at the internal time points and forward and backward differences at the first and final time points, respectively:

The resulting computation is plotted in Step 2 of the EQL pipeline depicted in Figure 4 .

The third step of the EQL pipeline requires the construction of a library of potential terms for inclusion in the inferred DE model. We saw that polynomials in C(t) can describe the ABM output in Section 2.1.2, so we assume that the underlying model here is a polynomial in C(t). Recall from the rules of the BDM ABM that each agent interacts with its four neighboring sites, so we further assume that this polynomial is up to fourth order. As a nonzero constant in F would represent a constant source or sink of agents (which is not present in the ABM), we set the constant in this polynomial to be zero. Altogether, we propose the following possible model for the BDM process

for unknowns ξ i ∈ R. Given data C d , we substitute into Equation (15) to arrive at the following linear system of equations satisfied by the unknowns ξ 1 , . . . , ξ 4

For convenience, we re-write Equation (16) as

where the columns of the n × 4 matrix Θ contain the library terms evaluated at the data points C d (t i ), i = 1, . . . , n. The fourth step in the EQL pipeline is to infer the form of the DE model that best approximates the dynamics of C d (t). We do so by finding the least-squares solution of Equation (17), given bŷ

We solve Equation (18) using numpy's lstsq command from the linear algebra packagw and find ξ = [0.0048, −0.0105, 0.0031, −0.0030] T . This solution suggests that the following model best describes the ABM dynamics:

We numerically simulate Equation (19) with initial condition C(0) = C d (0). The resulting output, C(t), is depicted against against C d (t) in Figure 5 and we observe that this model accurately predicts the ABM output. By solving Equation (17) directly, we are likely to find forms of F with many terms because the system is overdetermined when n, the number of data points, satisfies n 4. We may wonder if a simpler form of F can also accurately describe the ABM dynamics. A logistic model such as that presented in Equation (6) has only two terms and may describe the data accurately, for example. Instead of solving Equation (18), we can use sparse regression methods to find a sparse vector, ξ, to solve Equation (17). There are many approaches to sparse regression, including the least absolute shrinkage and selection operator (Lasso), ridge regression, and the Greedy algorithm Tibshirani [1996] , Rudy et al. [2017] , Zhang [2009] . We utilize the Lasso algorithm in this section, but will return later to a discussion of whether alternative methods should also be considered. The Lasso method solves the regularized system ξ = arg min 

for some λ > 0, which is called a regularization parameter Tibshirani [1996] . Regularization is used to avoid extreme values in ξ and to prevent overfitting. There are many approaches to solve the Lasso problem; here we use the Fast iterative Shrinking-Thresholding Algorithm (FISTA) Beck and Teboulle [2009] . We provide the pseudo-code for this algorithm in Algorithm 1 of Appendix C. Using a regularization strength of λ = 0.0004 (see Appendix D for a discussion of hyperparameter selection) we find the resulting vector to beξ = [0.0047, −0.0095, 0, 0]. This estimate suggests that the model equation

should be a more parsimonious model for the BDM process than Equation (19). We depict the solution to Equation (21) (with initial condition C(0) = C d (0)) against the ABM output in Figure 5 and observe that this DE model accurately approximates C d (t). Furthermore, we observe that the form of Equation (21) is similar to the logistic DE,

By comparing coefficients between Equations (21) and (22) we can estimate the mechanistic ABM parameters P p and P d asP p = 0.0095 andP d = 0.0048. These estimates are very close to the true underlying values of P p = 0.01 and P d = 0.05. The proposed EQL methodology is thus able to simultaneously infer a DE model that accurately predicts ABM output and provide realistic parameter estimates when combined with the mean-field model. We have thus shown that concepts from EQL can be used to determine simple DE models that accurately describe ABM dynamics.

As mentioned previously, there are many approaches to find sparse solutions to Equation (17) sparsely. The Greedy algorithm is another popular algorithm for sparse regression and solves a similar problem to the Lasso problem from Equation (20). In the Greedy algorithm, we solvê

for some regularization parameter λ > 0. Here, ξ 0 counts the number of nonzero terms in ξ. We use the forward-backward approach to solve this system, which converts the regularization parameter λ into a tolerance hyperparameter. Pseudo-code for this algorithm is provided in Zhang [2009] . We use this algorithm on the ABM data with a tolerance of 0.0001 and find the resulting vector to beξ = [0.0047, −0.0095, 0.0001, 0], which suggests that the model equation

is able to describe the ABM dynamics. We note that this learned equation is similar in form to the learned equation from Lasso (Equation (21)). We will consider both the Lasso and Greedy algorithms for EQL in future case studies.

We use this section to explore how methods from EQL can aid modelers in performing ABM analysis with EQL methods. We will do so through five case studies pertaining to: how learned equations change with ABM parameters in Section 4. . The purpose of this case study is to determine if EQL methods can be used as a simple test to determine when mean-field models accurately predict ABM dynamics, and to propose novel models for more accurate inference. These goals are summarized in the following questions (Q1) Can EQL aid in determining when mean-field models accurately approximates ABM dynamics? (Q2) Can methods from EQL discover novel DE models for accurate ABM analysis?

To address both Questions (Q1) and (Q2), we will test the ability of the mean-field and learned DE models to predict ABM dynamics for the BDM model over a range of mechanistic ABM parameter values. The SIR model is considered in a later case study.

We now investigate the performance of the mean-field and learned DE models in describing ABM output from the BDM process over a range of proliferation rates. We take P p = 0.01, 0.05, 0.1, 0.5. For each value of P p , we fix the migration rate at P m = 1 and set the death, P d , to be half of P p . The ABM output is comprised of C d (t) = C ABM (t) , which is averaged over N = 50 ABM simulations to ensure convergence to mean behavior and reduce the impact of noise. For model learning, we use the library of right-hand side terms Θ = [C, C 2 , C 3 , C 4 ], and use the Greedy algorithm Zhang [2009] to sparsely solve the linear system dC d /dt = Θξ. The code for this case study is provided in the file Case Study 1 Varying parameters for BDM.ipynb. We depict predictions of the mean-field and learned DE models against C ABM (t) in Figure 6 as well as the model equations and their mean-squared error (MSE) in approximating C ABM (t) in Table 1 . Both the mean-field model and learned DE model provide accurate predictions of the ABM output for P p = 0.01, with a MSE between the simulated model and C ABM (t) of 0.0011 and 0.0001, respectively. As P p increases to 0.05, 0.1, and 0.5, the mean-field model does an increasingly poor job in predicting the ABM data, mean-field model for P p = 0.01, but for P p = 0.05, 0.1, 0.5 the learned model also recovers cubic terms. We observe that when the learned model resembles the mean-field model, then the mean-field model accurately predicts C ABM (t) . When the learned model deviates from the mean-field model, then the mean-field model poorly predicts the ABM data. We suggest that the mean-field model can make accurate predictions of ABM behaviors when the learned equation closely resembles the mean-field model; otherwise, the mean-field model can lead to inaccurate predictions of ABM behaviors. Consider the per-capita growth rate as one illustrative example. For a DE model of the form dC/dt = F(C), the per capita growth rate is defined by F(C) = CG(C), and it quantifies the average contribution of each individual to population growth over time. We plot G(C) for the mean-field model and each learned model in the insets of Figure 6 . The mean-field model predicts that the per-capita growth is a linear decreasing function connecting the points (0,P p − P d ) and (K,0), where K = (P p − P d )/P p is the carrying capacity predicted by the mean-field model. The learned model predictions of G(C) closely resemble the mean-field model predictions of G(C) for P p = 0.01 and 0.05. At the larger proliferation rates of P p = 0.1 and 0.5, however, the learned model per-capita growth rates are much lower than the mean-field model rates. Recall that higher rates of proliferation lead to spatial clustering of agents in the ABM: this clustering reduces the averaged per-capita growth rate, which the learned model can account for but the mean-field model does not. We also observe that the effective carrying capacity of the ABM reduces as P p increases (with P d = P p /2), which is again likely due to increased spatial clustering. All learned models accurately capture this reduction in the carrying capacity, whereas the mean-field models do not.

The EQL pipeline results for these four simulated datasets suggests that the mean-field model will accurately describe ABM data when the learned equation form matches that of the mean-field model. When the mean-field models are not able to accurately predict ABM dynamics, learned models of the form

can accurately predict ABM data instead. This form learned equation was able to provide more accurate ABM analysis than the mean-field model over several parameter values.

ABMs are inherently noisy due to the random updating of agent states that occurs during simulation. When using EQL methods to analyze such ABMs, one should take care to ensure they learn the mean dynamics and do not overfit to small trends in the data. We averaged ABM data over a large number of ABM simulations in previous sections of this study to ensure the data had converged to its mean value. Performing such extensive simulations may not be feasible for computationally intensive ABMs, however, so we now investigate how the learned DE model changes with the number of ABM simulations. This can be summarized with the following question:

(Q3) How can we determine when enough ABM simulations have been performed for accurate DE model learning?

To investigate (Q3), we consider data sets from the BDM ABM that have been averaged over different numbers of ABM simulations. All data in this case study is simulated using the parameter values

0.012 10 dC /dt = 0.005C − 0.01C 2 (0.0028) dC /dt = 0.00472C − 0.01193C 2 + 0.00439C 3 (0.0008) 0.002 25 dC /dt = 0.005C − 0.01C 2 (0.0027) dC /dt = 0.00453C − 0.01054C 2 + 0.00232C 3 (0.0005) 0.003 Table 2 : Case study 2: Learned DE models for the BDM process for various numbers of ABM simulations. We fixed P p = 0.01, P d = 0.005, and P m = 1 in each scenario and averaged ABM output over the given value of N ABM simulations ten separate times to investigate the EQL method's performance in the presence of stochastic ABM fluctuations. The presented learned DE models depicts the final learned equation whose coefficients were averaged over all the learned DE models for each realization of C ABM (t) . The rightmost column corresponds to the MSE between successiveξ estimates: e.g., for N = 5, we compute ξ5 −ξ 1 2 .

P p = 0.01, P d = 0.005, and P m = 1. ABM data is comprised of C ABM (t) , which is computed over N = 1, 5, 10, or 25 simulations, and we used 10 separate datasets for each value of N to investigate how stochastic fluctuations affect the final results. For DE learning, we use the Greedy algorithm to solve dC/dt = Θξ for Θ = [C, C 2 , C 3 , C 4 ]. We denoteξ N as the estimate of ξ that results for each value of N .

The code for this case study is provided in the file Case study 2 varying number of ABM Simulations.ipynb. For each value of N considered, we learned ten separate DE models from ten separate realizations of C ABM (t) . We computed the average learned DE model by averaging the coefficients from each of the ten learned equations. We depict model predictions from the averaged learned DE model and mean-field model against all ABM data in Figure 7 . The solution profile for each average learned DE model does not change too much as N increases, but the ABM data fluctuations decrease with larger values of N . We depict the distributions of each coefficient for each value of N in Figure 11 in Appendix E. As expected, the variance of each distribution decreases with N , so we can be more certain about learned DE models with more ABM simulations. We present the averaged learned DE models for each value of N in Table 2 as well as the averaged mean-field and learned model MSEs. The mean-field MSE maintains a nearly constant value for all N , but the learned model MSE decreases with N . The learned equation is cubic for all values of N , and the cubic coefficient appears to approach zero as N increases. We observe that the differences in MSE between successive averaged ξ N estimates (i.e., ξ5 −ξ 1 2 , etc.) is low for N > 10. The insensitivity of the learned equation above N = 10 suggests that N = 10 or 25 simulations are sufficient to accurately capture the mean BDM ABM dynamics for the parameter values we used.

A current challenge for modelers is to develop EQL methods that are able to learn DE models from real experimental or clinical data. ABMs are a useful intermediate step to test the predictions of mathematical methods because ABMs emulate the stochastic and discrete nature of many biological processes and allow researchers to alter aspects of the data. Biological data presents many practical challenges for modelers, including only partial observations of the process under consideration or sparse sampling of the data Nardini et al. [2020] . We will use this case study to consider the performance of the EQL methods in the face of both limited data sampling and partial data observations. In turn, we address the following questions:

(Q4) How can we determine the resolution needed for accurate DE model learning? (Q5) Which time scales are informative for learning predictive DE models for unobserved data?

We applied the EQL methodology to ABM data where n = 13, 25, 50 and 100 time samples were collected. For all values of n, the data were collected at equispaced time intervals between the same starting and ending time points. All ABM simulations were computed with mechanistic parameters P p = 0.05, P d = 0.0125, and P m = 1. The ABM data, C ABM (t) , was averaged over N = 50 simulations. The code for this case study is provided in the file Case Study 3a Varying data resolution.ipynb. We depict model predictions from the learned equation against ABM data in Figure 8 . The learned DE models accurately predict the ABM output for n = 100, 50, and 25 time samples. With n = 13 time samples, however, the learned equation predicts the same carrying capacity as the data but fails to accurately predict the ABM dynamics before plateauing (excluding the initial time sample, which is used as the initial condition). The learned model equations for these different scenarios, and the MSE between the learned model prediction and ABM data from each model, are summarized in Table 3 . The MSE is less than or equal to 10 −3 for n ≥ 13 but rises to 0.0154 for n = 0.13. From this case study, we suggest that n = 25 time samples provides sufficient time resolution to accurately infer the underlying dynamics for parameter values P p = 0.05, P d = 0.0125, and P m = 1. If we analyzed experimental data we had reason to believe resulted from similar underlying parameter values, we would trust our analysis of the experimental data if we had 25 or more equispaced timepoints over this same time interval.

We applied the EQL methodology to ABM data where only the first 10%, 20%, 25% or 50% of the ABM data are used for training and the remaining data are held out for testing how well the learned DE model predicts unobserved dynamics. We used 50 simulations of the BDM model with P p = 0.01, P d = 0.005, and P m = 1 and computed C ABM (t) with n = 100 data points until t = 15(P p − P d ). The code for this case study is provided in the file Case study 3b predicting unobserved dynamics.ipynb.

We depict model predictions against the testing and training data in Figure 9 . When trained on 10% of % of data Learned Model (MSE) 100 dC /dt = 0.03374C − 0.04522C 2 (0.0002) 50 dC /dt = 0.03379C − 0.04531C 2 − 0.0C 3 (0.0002) 25 dC /dt = 0.03372C − 0.0452C 2 (0.0004) 13 dC /dt = 0.02073C − 0.03733C 3 (0.0154) Table 4 : Case study 3b. Summary of models learned with the first 10%, 20%, 25%, and 50% of C ABM (t) . MSE on the remaining data is given in parentheses. Data were simulated with the BDM model using P p = 0.01, P d = 0.005, and P m = 1, averaged over N = 50 ABM simulations. Note that our implementation of the learned DE model trained on 10% of the data fails to converge because this model grows faster than exponential growth, which is not realistic of the ABM data. We depict the MSE of this learned equation against C ABM (t) as nan (not a number) due to this numerical instability.

data, the inferred model grows without bound for all simulated time and does not accurately describe the testing data. When trained on 20% of data, the learned model plateaus above the carrying capacity of the data, approximately C = 0.75. For training data comprised of 25% and 50% of the available data, the learned models closely predict the test data. We suggest that, for this case study, data should be sampled beyond the inflection point in order to accurately predict the unobserved dynamics and carrying capacity. The learned DE model for these different data percentages and their MSEs on the test data set are summarized in Table  4 . As expected, the testing data MSE decreases as more of the data is used to learn the DE model.

Our first case study used EQL methods to demonstrate that the mean-field model may not be valid for predicting the dynamics of the BDM ABM when P p > 0.1 (with other parameters fixed at P m = 1 and P d = P p /2). If one wants to use a DE model to analyze such ABM simulations, then an alternative model may be required. In Appendix A, we show the DE model given by

can be used to model output from the BDM ABM. In Equation (26), F is the occupancy correlation between neighboring lattice sites Baker and Simpson [2010] . For the lattice-based BDM model, this value is defined between two neighboring lattice sites α, β as

.

Note that if the occupancy between these sites is independent, then F (t) ≡ 1, and Equation (26) Bortz and Nelson [2006] are suited to determine which model most parsimoniously describes a given dataset from several plausible models. We may now be interested in determining which of our our two models, the mean-field model in Equation (6), or the DE model in Equation (26), best describes ABM output for the BDM process. A typical model selection study for these two models may be problematic, however, as deriving and computing the DE model for F in Equation (26) Trained on 50% of data Figure 9 : Case study 3b: Predicting BDM dynamics from partial ABM data. We applied the EQL methodology on the first 10% (top left), 20% (top right), 25% (bottom left), and 50% (bottom right) of C ABM (t) to investigate the learned equations performance in predicting unoberved ABM dynamics. The blue dots correspond to ABM output that was used to infer the learned model, the green stars denote ABM output that was used for model testing, and the red dashed line denotes the solution of the learned model. All simulations are depicted against nondimensionalized time T = t(P p − P d ). We fixed P p = 0.01, P d = 0.005, and P m = 1 in each scenario and averaged ABM output over N = 50 simulations.

as simple as the BDM model, yielding an additional set of auxiliary DEs needed to describe F Baker and Simpson [2010] . Since the derivation of such auxiliary equations is effectively intractable for more complex ABM models, we will investigate the following question:

(Q6) Can methods from EQL be used for DE model selection for ABM analysis?

In doing so, we propose an alternative strategy for model selection, i.e., for selecting additional timedependent variables, such as the occupation correlation, F (t), to increase DE model accuracy with concepts from EQL.

We use this section to demonstrate how EQL methods can be used for model selection between the logistic equation given by Equation (6) and the modified logistic equation given by Equation (26). The code for this case study is provided in the file Case study 4 model selection with EQL.ipynb. We will vary P p over the values 0.005, 0.01, 0.05, 0.1, 0.5. For each value of P p , we fix P m = 1 and set P d = P p /2. We estimate the occupancy correlation value from Equation (27) from the n th of N ABM simulations by computing

where C n, (2) ABM (t) is the number of jointly occupied neighboring pairs of lattice sites over time and χ 2 is the total number of adjacent lattice-site pairs Fadai et al. [2019] . We then average over all N simulations

We apply the model selection methodology using EQL concepts as follows. From each ABM dataset, we compute both C d = C ABM (t) and F d = F (t) and substitute these values into two separate n × 2 matrices of right hand side terms given by

where any multiplication is performed element-wise. Note that Θ 1 corresponds to the library of terms for Equation (6) while Θ 2 corresponds to the library of terms for Equation (26) . We uniformly at random place half of the elements comprising dC d (t)/dt and the corresponding rows from Θ 1 , Θ 2 into training sets given by the vector dC train d (t)/dt and the n/2 × 2 matrices Θ train 1 , Θ train 2 . The remaining elements of dC d (t)/dt and rows of Θ 1 , Θ 2 are placed into testing sets given by dC test d (t)/dt and matrices Θ test 1 , Θ test 2 . The training set is used to estimateξ 1 from Θ train 1 andξ 2 from Θ train 2 by solving the two linear regression problems

Note that the vectorξ 1 parameterizes Equation (6), andξ 2 parameterizes Equation (26). We then use the testing set to select the best model for each data set bŷ

We use 100 randomly sampled training and validation sets and select whichever of the two models minimizes Equation (31) more often in these 100 testing-validation realizations.

In Table 5 , we present the final selected models for various values of P p . The mean-field model is selected for P p = 0.005 and 0.01 with 57 and 77 of the 100 total votes, respectively. Equation (26) is selected for all larger proliferation values with at least 93 of the 100 total votes. These results are in agreement with Case Study 1, where the mean-field model predicted ABM output well at P p = 0.01, but was unable to predict these data for larger values of P p (see, e.g., Figure 6 ). In addition to more accurately matching the ABM output than the mean-field model, Equation (26) also provides accurate parameter estimates for P p and P d . By matching the forms of the modified logistic equation to the learned equation (i.e., we can estimate P p using the coefficient in front of C(1 − F C) and P d using the negative of the coefficient in front of C), we observe that these estimates appear very close to their true underlying values. Table 5 : Case study 4: Model selection with EQL. We present the average selected model equations for C ABM (t) over various values of P p . For each value of P p , we set P d = P p /2 and P m = 1. ABM output is averaged over N = 50 ABM simulations of the BDM model to ensure convergence to mean behavior. The rightmost column lists how many votes the selected equation received out of 100 total.

The EQL methods considered in this work are applicable to many ABMs, including the SIR model introduced in Section 2.2. We will now learn models for this ABM over several parameter regimes and test the performance of the mean-field and learned models in predicting ABM output Blackwood and Childs [2018] , Diekmann et al. [1990] , Viceconte and Petrosillo [2020] . We let the agent infection rate take the values P I = 0.005, 0.01, 0.05, 0.1. We fix P m = 1 for all scenarios and set the agent recovery rate, P R , to be one-tenth of the infection rate for each value of P I . The ABM output is comprised of S d = S ABM (t) , I d = I ABM (t) , and R d = R ABM (t) , all of which are averaged over N = 25 ABM simulations. Because these three quantities will always sum to unity in the model, we focus on learning equations for S(t) and I(t) and note that R(t) = 1 − S(t) − I(t). For model learning, we use the matrix of potential right-hand side terms given by Θ = [S, S 2 , I, I 2 , SI]. We use the Lasso algorithm to solve for the unknown vectors ξ 1 and ξ 2 from the linear systems dS/dt = Θξ 1 and dI/dt = Θξ 2 . We note that, even for a two-dimensional system with five library terms, the total number of possible models for the S and I variables is 5 i=0 5 i = 32 each, resulting in a total combination of 32 2 = 1024 possible DE models, which highlights the difficulty in finding a predictive model from this large suite of possibilities. The code for this case study is provided in the file Case study 5 SIR Varying params.ipynb.

We depict the predictions of the mean-field and learned DE models over time against ABM output in Figure 10 . The mean-field and learned DE model equations with corresponding MSEs are presented in Table  6 . For P I = 0.005 the mean-field model predicts the ABM output well, but as P I increases the mean-field model predictions worsen, as evidenced by increases in the MSE. The mean-field model underpredicts the susceptible agent density at all times and overpredicts the infected agent density at early times. The learned model, on the other hand, is able to predict ABM output accurately for all values considered and achieves low MSE values (with the lowest value at P I = 0.01). The learned equation forms are similar to the mean-field model for P I = 0.005, 0.01, and 0.05. For P I = 0.1, the learned equations have additional S, S 2 , and I terms in the model equations for dS/dt. These terms may be used to capture effects that are not described by the mean-field model.

The basic reproductive number, or R 0 , is defined as the expected number of secondary infections that result from a single infection in a population comprised solely of susceptible agents Blackwood and Childs [2018] , Diekmann et al. [1990] , Viceconte and Petrosillo [2020] . A disease may spread rapidly when R 0 > 1, and will die out when R 0 < 1. R 0 is now a commonly-used criterion to determine when a disease will continue to spread through a population and possibly cause an outbreak or pandemic. More details of R 0 computation are provided in Blackwood and Childs [2018] , but we note here that R 0 can be found for the mean-field SIR model by determining critical parameter values where dI/dt will exceed zero at the initial condition. From Equation (12), the mean-field model shows that dI/dt > 0 when M P I S > P R . Recalling that S(t) ≈ 1 at the start of the disease spread, this implies that dI/dt >0 when M P I /P R > 1, i.e., R 0 = M P I /P R for the mean-field model. The values of R 0 can be computed using similar methods for the learned equations detailed in Table 6 . Here we observe that the mean-field model predicts that R 0 = 5.0 for all considered scenarios (recall, M = 0.5 in all simulations), yet as P I increases from 0.005 to 0.01,0.1,0.25 (with P I /P R = 10.0), the learned equations predict R 0 = 4.68, 4.52, 3.72, 3.41, respectively. Thus, while the mean-field model suggests that a single infected individual will cause five secondary infections (in a population full of susceptible agents) at each of these infection rates, the learned models suggest that as In each figure, we depict the predictions of the mean-field model (solid blue curve for S(t) and solid green curve for I(t)), against the predictions of the learned DE models (blue dashed curve for S(t) and dashed green line for I(t)), as well as S ABM (t) (blue dots) and I ABM (T ) (green dots). All simulations are depicted against nondimensionalized time T = tP R . We fixed P m = 1 for each simulation.

P I increases (with the ratios P I /P R and P I /P m fixed), the number of secondary infections will decrease in this scenario. Recall from Figure 3 that infected agents tend to cluster with large values of P I (while P m and P I /P R are fixed). The mean-field model does not consider how such spatial clustering may affect the number of secondary infections that result from a single infection in a population comprised solely of susceptible agent. The learned DE models, on the other hand, appear to learn this information when finding a predicting DE model from the ABM data. Table 6 : Case study 5: Mean-field and learned DE models for the SIR ABM for various values of (P I , P R ), along with the computed MSE values between model and ABM output, and model R 0 calculations. We fix P m = 1 in each scenario.

In this work, we have considered three different but synergistic approaches to study the emergent behavior of ABMs. These approaches include extensive simulation, DE model derivation using coarse-graining approaches, and EQL. We demonstrated some of the strengths and weaknesses of each approach through their applications to two ABMs: a BDM model and an SIR model. We summarize some of these strengths and weaknesses in Table 7 . Extensive simulation is the most commonly-used and straightforward approach in understanding the behavior of ABMs. Extensive simulation is advantageous because, in theory, it can be used to analyze any ABM: it only requires simulating the ABM on a computer. This approach quickly becomes practically challenging, however, because it can become difficult to simulate sufficiently complex ABMs with basic computing hardware. The BDM process used in this work is simple enough to be simulated on a personal laptop (2.5 GHz Intel Core i7 processor, 16 GB RAM) for all datasets generated for this study. The SIR model is more involved, however, and was run on a computer cluster to ease implementation. Computation of a single SIR simulation with P I = 0.005, P R = 0.0005, P M = 1.0 took about 48 minutes to compute. Two further drawbacks of performing extensive simulation are that this technique is not compatible with analytical methods, and that its output may not be interpretable, i.e., we observe that the system behavior changes as parameters are varied, but we may not understand why this happens, nor can we necessarily predict this behavior a priori.

There is a wide literature demonstrating that coarse-graining approaches can be used to derive DE models that approximate ABM dynamics Baker and Simpson [2010] , Bernoff et al. [2020] , Binny et al. [2020 ], Fadai et al. [2019 ], Johnston et al. [2017 , Nardini et al. [2016] . Such DE models are advantageous because they are usually simple to solve (either analytically or numerically), interpretable, and provide insight into how changing ABM parameters can lead to emergent behaviors. Analytical techniques allow us to infer how the system will behave over many different parameter values without simulating the ABM. For example, the per-capita growth rate for the BDM model predicts the population growth rate as a function of density, and the R 0 calculation for the SIR model predicts if a disease will outbreak or die out. However, we saw throughout the presented case studies that these analyses fail to accurately predict ABM behavior for parameter regimes where the mean-field assumption is violated. Another limitation of this approach is that the derivation of DE models is only possible for simple ABMs, since it can be challenging to convert complex ABM rules into DE models.

EQL is a recent field of research that seeks to infer DE models directly from data. EQL combines many benefits of the two previously-mentioned methodologies to analyze the emergent behavior of ABMs. We highlighted many of the advantages of these approaches, and addressed several ways in which EQL can aid modelers in ABM analysis, through our investigation of Questions (Q1)-(Q6). While the mean-field model does not accurately predict ABM output for many parameter combinations, our exploration of (Q1) demonstrated that EQL provides a simple way to determine when the mean-field model can or cannot be trusted for such predictions. If the learned equation form matches the mean-field model, then the mean-field model should provide accurate insight; when it does not, then alternative models may be needed, as we discussed in (Q2). We also demonstrated that EQL methods can be used to infer novel DE models for ABMs. We observed that the mean-field model cannot predict output of the BDM process for large rates of agent proliferation and death relative to motility. Instead, Equation (25) can be used to accurately predict ABM dynamics. Further investigation needs to be carried out in order to understand how Equation (25) may result from ABM rules.

A significant challenge of ABMs is their intensive computational nature. We considered (Q3) to determine how many ABM simulations are necessary to reliably predict ABM output. We showed that the learned equation may deviate from the average ABM behavior when only a small number of ABM simulations are used. With a sufficiently large number of ABM simulations, however, the learned equation can accurately predict ABM dynamics. For the BDM process, we found that N = 10 ABM simulations was sufficient. In practice, we can determine when enough simulations have been performed to capture mean ABM dynamics by considering how much the learned vector, ξ, changes with additional ABM simulations. When this vector becomes sufficiently insensitive to increases in N , then sufficient ABM dynamics have been performed.

We used Question (Q4) to determine how finely data must be sampled in time to reliably predict ABM dynamics. In this case study, we determined a magnitude between uniformly-sampled time samples above These two investigations into (Q4) and (Q5) suggest important future work must be performed to determine strategic (and not necessarily uniform) samples of the ABM that are informative and capture all dynamic regimes of the data). Some preliminary work towards these questions for PDE models has been investigated in Nardini et al. [2020] . A limitation of EQL methods, as opposed to ABM simulation and mean-field models, is that it may not be able to accurately extrapolate to unseen data and parameters, as we observed in Case study 3b. This is possible with ABM simulation, where the ABM can be run for longer time or at different parameter values. Similarly, the mean-field model can extrapolate its predictions by solving the model over a longer time period or simulating it at different parameter values. Finally, we considered Question (Q6) to determine if EQL methods can be used to aid in model selection for ABMs. This is advantageous because DE models that are more complex than mean-field models can also be derived for ABMs. Although potentially more accurate, these models are difficult to interpret and simulate than mean-field models. Equation (26), as one example, can be challenging to implement numerically because the dynamical system for the occupancy correlation function, F (t), requires solving a high-dimensional system of equations using user-defined closure approximations Baker and Simpson [2010] . Instead, we can measure F over each ABM simulation and use these observations of F in a hybrid approach to select whether including F leads to a more accurate DE model. Applying such a hybrid approach to the simulation of DE models has been recently proposed to aid in reducing identifiability-related issues Hamilton et al. [2017] , Lagergren et al. [2018] .

EQL methods are quickly growing in popularity as a means to infer DE models from noisy data Lagergren et al. [2020b] , Rudy et al. [2017] , Nardini et al. [2020] . We have shown in this study that such methods provide a reliable and promising tool to aid modelers in interpreting and analyzing ABMs. Learning DE models from data does not require user-made assumptions on whether or not the ABM simulation satisfies certain properties, as is needed for the derivation of mean-field models. EQL methods are further advantageous because they can learn ABM dynamics from limited ABM simulations, provide accurate parameter estimates, as well as accurate predictions over a wide range of parameter values. We anticipate that through this tutorial, EQL will increasingly be used to interpret complex ABM simulations, and that this will aid in the further development of algorithms that can be used to learn DE models from real experimental data.

In this section, we will derive coarse-grained DE models of the BDM process. We begin by defining P[0 α (t)] and P[A α (t)] as the probabilities that the individual lattice site α is either vacant or occupied, respectively, at time t. We simplify notation by writing: P[A α (t)] = C α (t) and P[0 α (t)] = 1 − C α (t).

Similarly, let P[A α (t), A β (t)] denote the probability that both neighboring sites α and β are occupied at time t; we refer to this value as the neighboring lattice site occupancy probability. Along these lines, P[0 α (t), A β (t)] is the probability that α is vacant and β is occupied at time t, etc. These joint probabilities are related to the individual occupancy probabilities through their marginal probabilities:

Neighboring occupancy probabilities are also related to the individual occupancy probabilities using the joint occupancy correlation function (See Baker and Simpson [2010] ):

Note that if F (t; α, β) = 1, then P[A α (t), A β (t)] = C α (t)C β (t), and the occupancy of the neighboring sites α and β are independent. We can combine Equations (32) and (33) to write each neighboring occupancy probability in terms of the individual occupancy probabilities and the occupancy correlation function

We are now ready to convert the rules of the BDM process into a coarse-grained DE model. We begin by writing a master equation for how C α (t) will change due to the effects of agent birth, death, and migration:

We now aim to derive espressions for K birth , K death , and K migration . The The birth reaction from Equation (1) specifies that the density at lattice site α may increase when α is unoccupied and β ∈ B(α) is occupied because the agent at β may give birth and place its daughter agent in α. We can then write

because any of the neighboring lattice sites may undergo birth events. Similarly, we can convert Equation (2) as

for agent death and convert Equation (3) as

for agent migration. Substitution of these terms into Equation ( 

Equation (39) provides a DE model to describe the dynamics of C α (t), however, this equation is not "closed" because we need to know P[A α (t), A β (t)] in order to compute the right-hand side. We can use the marginal identities from Equations (32) and (34) to simplify the terms in Equation (39) and write

We proceed by making a simplification in order to close this system. Since we initiate all simulations with agents distributed uniformly at random, we assume that all individual occupancy probabilities are equally distributed 1 on average so that C α (t) = C γ (t) = C(t) for any two lattice sites α and γ. From this assumption, we have that F (t; α, β) = F (t; |α − β|), i.e., F (t; α, β) = F (t; 1) for β ∈ B(α). From Equation (33), we next write P[A α (t), A β (t)] = C 2 (t)F (t; 1) for β ∈ B(α). These observations lead to the following DE model from Equation (40):

Equation (41) is not yet closed because we still do not know F (t; 1). Our second simplification is the meanfield assumption, in that the occupancy probabilities of neighboring lattice sites are independent so that F (t; 1) ≡ 1. This assumption leads to the mean-field model for the ABM:

Note that Equation (42) can be re-formulated as the standard logistic DE model given by

where r = P p − P d , K = (P p − P d )/P p . This model is advantageous in that it is closed and can be solved analytically:

C(t) = KC(0)e rt K + C(0)(e rt − 1)

,

where C(0) denotes the initial condition.

We now derive DE models governing the dynamics for P[S α (t)], P[I α (t)], and P[R α (t)]. As for the BDM model, because the initial agent configurations are uniformly distributed in space, we assume the probability of any type of agent occupancy (S, I, or R) is independent of the lattice site and define S(t) = P[S α (t)], I(t) = P[I α (t)], R(t) = P[R α (t)]. By converting the bimolecular reactions in Equations (8) and (9) into the corresponding occupancy probability configurations that will lead to changes in S, I, or R, and converting the monomolecular reaction in Equation (10) into the individual occupancy probabilities that will lead to changes in S, I, or R, we derive the master system of equations for S(t), I(t), and R(t) to be: 

We then use the mean-field assumption to write P[Y α (t), Z β (t)] = Y (t)Z(t), where Y, Z ∈ {S, I, R, 0}. This assumption reduces Equation (45) to the commonly used SIR model given by: dS dt = −P I SI, dI dt = P I SI − P R I, dR dt = P R I.

In this equation, the variables S, I, and R denote the density of susceptible, infected, and recovered agents over time respectively, which cannot exceed 0.5 if only half of the simulation domain is occupied by agents. We can convert these variables to the fraction of susceptible, infected, and recovered agents by computing the dimensionless variables S * (t) = S(t)/M, I * (t) = I(t)/M, and R * (t) = R(t)/M , where M is the proportion of occupied lattice sites in the simulation domain. The system of equations for these variables are given by dS * dt = −M P I S * I * , dI * dt = M P I S * I * − P R I * , dR * dt = P R I * .

C Lasso algorithm using FISTA Algorithm 1: Lasso implementation using FISTA from Beck and Teboulle [2009] . Input : Matrix A, vector b, and regularization parameter λ

Output: x = 1 2 arg min Ax − b 2 2 + λ x 1 . Get d = # columns in A. 

This section provides brief notes about the practical selection of hyperparameters for the Lasso method. We used regularization parameter λ = 0.0004 for the Lasso algorithm to learn an equation for C ABM (t) in the tutorial in Section 3.1.1. The optimal value of this hyperparameter is typically not known a priori. There are several ways to select such a hyperparameter, including a grid search Lagergren et al. [2020b] , cross validation Mangan et al. [2017] , or Bayesian optimization Snoek et al. [2012] . We discuss the grid search option here due to its simplicity. In a grid search to determine an appropriate value for λ, we specify several plausible options, given by {λ 1 , λ 2 , . . . , λ n } and split the data into training (dC train d (t)/dt and Θ train ) and testing (dC test d (t)/dt and Θ test ) sets. The training and testing portions of Θ will contain all columns of Θ but only a subset of the rows. For a possible hyperparameter value, λ i , we solve the Lasso problem from Equation (20) using the training data and λ = λ i to determine the resulting ξ estimates,ξ i . The optimal value of λ is then chosen as:λ

The optimal valueλ results in the estimateξ that best generalizes to the testing data. Recent work has shown that for Lasso,ξ tends to incorporate small additional terms in the final learned equation Lagergren et al. [2020b] . To select the regularization parameter, λ, for the Lasso algorithm, we randomly split half of C ABM (t) into a training set and the remaining into a testing set. We considered 100 values of λ between 10 −5 and 10 −3 as well as λ = 0 and solved Equation (20) using the training set for all 101 potential values of λ. We can perform this operation several times (changing the training and testing set each time) and notice that sometimes the chosen hyperparameter is zero and sometimes it is nonzero. When the chosen hyperparameter is zero, then the EQL pipeline learns an equation of the form: dC dt = 0.00488C − 0.01187C 2 + 0.00806C 3 − 0.00831C 4 , 

To ensure the final learned equation is sensitive to changes in each nonzero library coefficient, we perform a round of pruning after learning which proceeds as follows. The j th nonzero term ofξ is included in the final inferred equation if dC test d (t)/dt − Θ testξ j 2 2 increases by a given pruning percentage, where hereξ j is the estimated parameter vector with the j th term manually set to zero. To ensure that the final learned equation is not an artifact of the training and testing split, we perform this entire process for ten randomized training and testing splits of the data and select the equation form arises most frequently. We set the parameters for each coefficient to be the mean of each coefficient for each time the final equation form was learned. If we set our pruning percentage to be 5%, then the majority of learned equations will be of the form:

Optimization and control of agent-based models in biology: a perspective

A one-dimensional model of cell diffusion and aggregation, incorporating volume filling and cell-to-cell adhesion

Correcting mean-field approximations for birth-death-movement processes

A fast iterative shrinkage-thresholding algorithm for linear inverse problems

Improving markov chain Monte Carlo estimation with agentbased models

Nonlocal aggregation models: a primer of swarm equilibria

Agent-based and continuous models of hopper bands for the Australian plague locust: How resource consumption mediates pulse formation and geometry

Living in groups: spatial-moment dynamics with neighbour-biased movements

An introduction to compartmental modeling for the budding infectious disease modeler

Agent-based modeling: Methods and techniques for simulating human systems

Model selection and mixed-effects modeling of HIV infection Dynamics

Discovering governing equations from data by sparse identification of nonlinear dynamical systems

Agent-based modeling in systems pharmacology

Effective leadership and decision-making in animal groups on the move

Cellular Automaton Modeling of Biological Pattern Formation. Modeling and Simulation in Science, Engineering and Technology

On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations

Selforganized spatial structures of locust groups emerging from local interaction

Self-propelled particles with soft-core interactions: patterns, stability, and collapse

Accurate and efficient discretizations for stochastic models providing near agent-based spatial resolution at low computational cost

Evolution of intratumoral phenotypic heterogeneity: the role of trait inheritance

Exact stochastic simulation of coupled chemical reactions

Hybrid modeling and prediction of dynamical systems

Tumor growth modelling by cellular automata

Asymptotic Analysis. Science & Business Media

How much information can be obtained from tracking the position of the leading edge in a scratch assay

Co-operation, competition and crowding: a discrete framework linking allee kinetics, nonlinear diffusion, shocks and sharp-fronted travelling waves

Forecasting and uncertainty quantification using a hybrid of mechanistic and non-mechanistic models for an age-structured population model

Biologicallyinformed neural networks guide mechanistic modeling from sparse experimental data

Learning partial differential equations for biological transport models from noisy spatio-temporal data

Finite difference methods for ordinary and partial differential equations: steady-state and time-dependent problems

Model selection for dynamical systems via sparse regression and information criteria

Continuum approximations for lattice-free multi-species models of collective cell migration

Mathematical biology I. an introduction

Modeling keratinocyte wound healing: cell-cell adhesions promote sustained migration

Learning equations from biological data with limited time samples

Data-driven discovery of partial differential equations

Interstitial cell migration: integrin-dependent and alternative adhesion mechanisms

Nonlinear regression. Wiley series in probability and statistics

Multi-species simple exclusion processes

Experimental and Modelling Investigation of Monolayer Development with Clustering

Distinguishing between mean-field, moment dynamics and stochastic descriptions of birth-death-movement processes

Practical Bayesian optimization of machine learning algorithms

A stochastic cellular automaton modeling gliding and aggregation of myxobacteria

Aggregation, blowup, and collapse: the ABC's of taxis in reinforced random walks

Regression shrinkage and selection via the Lasso

Locust dynamics: behavioral phase change and swarming

COVID-19 R0: Magic number or conundrum?

Robust data-driven discovery of governing physical laws with error bars

Robust subsampling-based sparse Bayesian inference to tackle four challenges (large noise, outliers, data integration, and extrapolation) in the discovery of physical laws from data

Adaptive forward-backward greedy algorithm for sparse learning with linear models