key: cord-0674476-7rufaxbz
authors: Kunwar, Prateek; Markovichenko, Oleksandr; Chyba, Monique; Mileyko, Yuriy; Koniges, Alice; Applied, Thomas Lee; Studies, computational Epidemiological; Mathematics, University of Hawai'i at Manoa Department of; Honolulu,; Hawai'i,; States, United; Institute, Hawai'i Data Science; Manoa, University of Hawai'i at; Studies, Office of Public Health; Manoa, University of HAwai'i at
title: A study of computational and conceptual complexities of compartment and agent based models
date: 2021-08-26
journal: nan
DOI: nan
sha: bb5fcc2c315f2d8570bdfe0a8198e2e19acb1146
doc_id: 674476
cord_uid: 7rufaxbz

The ongoing COVID-19 pandemic highlights the essential role of mathematical models in understanding the spread of the virus along with a quantifiable and science-based prediction of the impact of various mitigation measures. Numerous types of models have been employed with various levels of success. This leads to the question of what kind of a mathematical model is most appropriate for a given situation. We consider two widely used types of models: equation-based models (such as standard compartmental epidemiological models) and agent-based models. We assess their performance by modeling the spread of COVID-19 on the Hawaiian island of Oahu under different scenarios. We show that when it comes to information crucial to decision making, both models produce very similar results. At the same time, the two types of models exhibit very different characteristics when considering their computational and conceptual complexity. Consequently, we conclude that choosing the model should be mostly guided by available computational and human resources.

Introduction. SARS-CoV-2 (COVID- 19) has impacted not only health, but the economy and how we live daily life. However, it crept onto the world stage at the end of 2019 in Wuhan, China, where health officials reported a cluster of pneumonia cases of unknown cause. The first reported case of COVID-19 in the United States came from an asymptomatic male who returned from China on January 15, 2020. By January 19, Chinese officials closed off travel in and out of Wuhan and on January 30, 2020 the World Health Organization (WHO) declared a global health emergency.

COVID-19 was officially named on February 11, as it continued to spread across Asia and Europe. While countries in those regions started to see massive increases in cases, hospitalizations and fatalities during the initial months of the outbreak, the United States did not report its first death until February 29. By March 26, the United States led the world in confirmed cases. While many European and Latin American countries were already in full shut down due to COVID-19, the United States Government left the shut down to individual states, resulting in a surge of cases and deaths. In July 2020, the United States reached 68,000 daily cases for the first time.

During the Fall months, cases started to level off. However during this period of relative calm, a more serious variant of COVID-19 was mutating and spreading in the United Kingdom. In November 2020, the B.1.117 COVID-19 variant (UK variant) was detected in the United Kingdom and accounted for 43 percent of all COVID-19 cases by December 2020. In South Africa, another COVID-19 variant, B.1.351, emerged independently of the UK variant around the same time. The UK variant is associated with increased transmission and risk of death, while the South African variant shows evidence that it may decrease the neutralization by some monoclonal and polyclonal antibodies. With the recent surge of cases in India, it was inevitable that India would have its own variants of concern, primarily the B.1.617. The WHO has added it to its list of variants of concern and England already identified a sub strain of the Indian variant in early May.

The accelerated research, production, and distribution of the COVID-19 vaccines have had an impact on slowing the spread of COVID-19 in the areas that had access to enough supply of the vaccine. As of June 1, 2021, only 5.5 percent of the world's population have been vaccinated. While countries such as the United States, Israel, and England have at least 38 percent of their country fully vaccinated, helping to prevent future spread of COVID-19. Yet many first world countries have yet to even break 10 percent of their population fully vaccinated, including Japan, South Korea, and Canada. While the mRNA vaccines allow for rapid adjustments due to variants, the possibility of a mutation that escapes the current mRNA vaccine remains possible, though unlikely. This possibility will remain until both industrialized and under-developed countries increase their percentage of population fully vaccinated.

Mathematicians have found themselves at the front seat of this race against COVID-19. Indeed, modeling is a powerful tool to address key questions such as: when and which non-pharmaceutical mitigation measures should be implemented? how to allocate efficiently vaccines? impact from the new variants? when can we relax travel restrictions? However, there is still a lot of unanswered questions and challenges regarding the outcome of several models as well as their limitations. It is unclear at this time if there is a "better" model, and while most of the challenges in epidemiological forecasting come from incomplete data and impossibility to model people's behavior, there is still the question of what model to use when and for what purpose.

There are primarily two different types of epidemiological models: differential equation-based (EBM) and agent based (ABM). We here focus on two such models: a discrete compartmentalized SEIR model that we developed and the COVID-19

Agent-based Simulator (Covasim) [13] that we altered to include some specific attributes. The first one is deterministic while the second one is stochastic which makes them different in design, however it is expected that their forecasting converge provides similar assumptions since they do model the same pandemic. It is often stated that EBMs are simpler and faster to compute, while ABMs are more detailed and computationally expensive. [13] Our goal is to test this assumption concretely on a specific data set that we have extensive knowledge of so we can determine for future studies the benefits and pitfalls of each of these different simulation methods. In particular, we use Honolulu county in the Hawaiian archipelago as a test-bed population for our simulations. We show that rather than treating the two models as two distinct ways to obtain the same results we should exploit advantages of both throughout a pandemic simulation, particularly when the simulation is used in a predictive real-time fashion. Throughout the current COVID-19 pandemic, most results and forecasting come from one model but not a combination of both. We consider what can be learned from running both models side-by-side; taking and applying the best of each model using the measured data. We also note the conceptual and computational requirements of each of the models for this particular test-bed population.

The outline of the paper is as follows. In section 2 we introduce the concepts of compartmental and agent based models and explain how adding features such as travelers and vaccines in the models increases conceptual complexity. We also illustrate our model on data from Honolulu County. Section 3 discusses conceptual and computational complexity for both models, as well as optimization for data fitting. A benchmark example is provided to analyse performance of both models. We end the paper with a conclusion, section 4.

2. Epidemiological Models. Infectious disease modeling has been used by epidemiologists and mathematicians for years, however the COVID-19 pandemic has really highlighted the importance of understanding the purpose and functionality of these models. At its most core function, the intent of these epidemiological models is to estimate the spread of the infectious disease across a population based on some core epidemiology determinants: incubation period, duration of infectious period, population size, and R0 (the reproductive value). Our specific implementations of these two models use more variables and complex interactions than other more standard implementations [5, 19, 20, 27] . The purpose of this added functionality is to better account for the nuances and assumptions in a real-world situation.

The fundamental difference between the two types of models is that EBM captures aggregate behavior over the population while ABM captures agent interactions and progression of the disease over each individual. EBM are typically less computational expensive, and easier to use since they require less information. However, they provide only large-scale information on the spread of the disease compared to ABM models that can take into account spatial information in much more detail.

2.1. Compartmental Models. The origin of compartmental models, also called Equation Based Models (EBMs), in the study of infectious diseases is from the works of Ross (1916) [23] , Ross and Hudson (1917) [22] , Kermack and McKendrick in 1927 [17] and Kendall (1956) [12] . In those models, the major assumption is that the population is divided into compartments based on the nature of the evolution of the disease. The population size is typically assumed to be fixed and by design is assumed homogeneous within each compartment. The basic model consists of three compartments -Susceptible(S), Infected(I) and Removed(R). Variations of this model may include compartments such as Exposed(E) or a loop back into susceptible in case of diseases with no immunity against re-infection. See Fig. 1 , where the hazard rate λ(t) = β I(t) N , with β denoting the baseline transmission rate and N denoting the (fixed) total population. Variants of the SIR model can be differentiated based their treatment of vital dynamics. Vital dynamics refers to life outcomes such as births and deaths [11] . In very simple models (or with diseases that are not prolonged) the SIR model without vital dynamics is ideal. In models that are more detailed, or if the disease is present in the population for a prolonged period of time, the SIR model with vital dynamics is more suitable. However, the duration of the "prolonged period of time" is at least 10 years [1] . It is too early to tell where COVID-19 falls regarding the possibility of it being endemic in particular pockets in the world. Historically, the SIR model has been used to estimate the impact of highly infectious diseases, such as smallpox [6] .

Of the various compartmental models, we use one inspired by [15] , which is based on a standard discrete and deterministic SEIR model. Some classical SEIR models include Cooke and Driessche [4] , who introduced the SEIR model with two delays and Li et al. [14] studied global dynamics with both non-linear and standard incidence rates. We assume a given population is divided into four compartments: Susceptible (not currently infected), Exposed (infected with no symptoms), Infected (infected with symptoms), and Removed (recovered or deceased). We subdivide the entire population into two additional groups: the general community (C), healthcare workers (H). This is motivated by the fact that healthcare workers are potentially more exposed to a virus but also use better protection, and therefore should interact differently with the community during a pandemic. In addition, during the severe acute respiratory syndrome (SARS) epidemic in 2003, healthcare workers formed a large fraction of the infected population [15] . This has also been which suggested to be the case for COVID-19 [16, 24] .

In our model, Exposed and Infected (in each population group) are split into multiple stages per day to better reflect the progression of the disease. Individuals in isolation (including hospitalization) are similarly distinguished. The dynamics of the two population groups are essentially the same and are represented using the diagram in Fig. 2 with variables described in Tab. 1.

By choosing the probabilities of moving from one stage to the other, we are able to model the observation (according to the Centers for Disease Control and Prevention (CDC) as well as other sources) that about 40% of people who contract SARS-CoV-2 remain asymptomatic and that the incubation period for those who do develop symptoms is between 2 to 14 days after exposure, with the mean period being between 4 and 6 days [18] . Those who do not develop symptoms after 14 days are moved directly from the Exposed compartment to the Removed compartment. The quarantine sub-compartment E q,i is also broken down into 14 stages and is used to model the effect of contact tracing and the reduced transmission rate for Description of the variables in the EBM model. Variable Description

Number of total susceptible individuals E i (t) (resp. E q,i (t)) Number of asymptomatic infected individuals i days after exposure who are not quarantined (resp. qurantined) I j (t) (resp. I q,j (t)), i = 0, 1 Number of symptomatic infected individuals i days after the onset of symptoms who are not isolated (resp. isolated) I j (t) (resp. I q,j (t)), j = 3, 4, 5

Number of symptomatic infected individuals at the nominal stage i of the illness that has not been isolated (resp. isolated). Note that a person can stay at a given stage for several days R(t)

Number of removed (recovered or deceased) individuals quarantined individuals. The infected individuals go through 5 stages, of which the first two represent the first two days of being symptomatic whereas the last three represent the phase where the immune system is fighting the disease. Since the last three stages can go on for more than one day each, there is a variability in the number of days any given person can spend at each stage. Our model assumes that the symptomatic phase of the illness lasts at least 5 days. This can be seen in Figure 2 as the feedback loops in I 2 , I 3 and I 4 where only 20% of people in each of these compartments move on to the next one. The remaining 80% stay in the same compartment for the next iteration. The parameters in Fig. 2 are described in Tab. 2. The parameter β, the basal transmission rate, is optimized to fit the data. Addition of compartments, mask mandate, contact tracing, travelers and vaccinations modify β according to equations (14) , (16) , (25) , (27) , (37) and (39) to obtain the hazard rate λ i , (i = c, h, v) which governs the dynamics of the evolution of disease for each category. We introduce parameters p i for the probability to develop symptoms on day i, and refine them such that if symptoms do develop, it takes between 2 to 14 days, with a mean between 4 and 6 days [18] , while assuming that about 40% of all infections remain asymptomatic. The values of q s,i are chosen to reflect the prediction that symptomatic individuals are likely to quarantine, with a probability of 0.1, 0.4, 0.8, 0.9 and 0.99 for the first five days of symptoms. For healthcare population these probabilities are assumed to be slightly higher at 0.2, 0.5, 0.9, 0.98 and 0.99 for the first five days of symptoms. In addition, the parameter r is the probability of transitioning from one stage of the illness to the next (with the final stage being recovery or death). Based on prior work [3] , we chose r to yield an expected length of illness of 17 days. The equations for the dynamics are given in (1)- (13) .

I q,1 (t + 1) = I q,0 (t) + q s,0 I 0 (t) (10)

I q,j (t + 1) = r(q s,j−1 I j−1 (t) + I q,j−1 (t)) + (1 − r)(q s,j I j (t) + I q,j (t)), j = 3, 4

As we mentioned, a crucial part of the dynamics relates to the hazard rate. For the general community, group C, we have

where we suppressed the dependency on t on the right for convenience. We use sub-indices c (community), and h (healthcare workers) to indicate the appropriate group. The subscript q indicates quarantined and isolated individuals. Here p me and p mp represent mask efficiency and mask compliance, respectively. Mask efficiency is chosen to reflect a reduction in transmission of 75%. N c denotes the mixing pool for the general community, computed as

where variables E and I here represent the sum over all the stages within these compartments. For the healthcare worker group, we have

where Figure 3 represents the optimized fit obtained from the SEIR model for Honolulu County.

On October 15, 2020, the State of Hawai'i implemented the Safe travel program which brought back tourists to the islands as well as allowed more residents to travel to the mainland. Travelers are an important component of spread of a disease, especially for more isolated locations such as islands or archipelagos (among which for instance New Zealand, Iceland, Japan and Polynesia). To implement travelers, we introduce daily travelers (T) and consider two broad categories of travelerstourists and returning residents. The returning residents are assumed to behave similar to the existing community members whereas the tourists are assumed to have different behaviour and form a new group (V). For tourists, we assume a 25% higher basal transmission rate to account for their risk-taking behaviour. We also assume a 50% reduced interaction of tourists with the community. This is reflected in the parameter ρ v which can be seen in the expression for hazard rate, λ c . To account for some form of safe travel protocol for every region, we make further assumptions about the testing rate φ 1 , false negative rate φ 2 , prevalence φ 3 and fraction of untested travelers quarantining φ 4 . See Diagram 4 for a visual of our assumptions.

From these assumptions we compute the coefficients for the number of arriving travelers distributed in each compartment as v 1 , v 2 , v 3 :

For simplicity we assume the untested unexposed travelers go directly into the susceptible group even though they might quarantine for 10 days. This is to avoid introducing a new variable of susceptible quarantine individuals. The dynamics equations become where the terms in red account for travelers and T is the average number of travelers entering the region per day. With these assumptions and changes in the model, the equation for the hazard rates get modified to:

where:

The expression for λ h remains unchanged with the introduction of travelers, however, we now have a new hazard rate, λ v that governs the dynamic of the tourist group (V): Since the travelers add to the susceptible population, we have to make certain assumptions on how and when they are removed from the model once they leave the region. The number of susceptible travelers is given by v 1 e −λ(t) T , and we remove a fraction of them to reflect them leaving the region (represented by the α in Fig. 5 ). Tourists are removed from the R compartment proportionally to incoming tourists that got exposed. Figure 6 represents the optimized fit obtained from the SEIR model for Honolulu County including the period from October 15 to December 27, 2020 when travelers were re-introduced through the safe travel program. The parameters in Table 3 were used for Honolulu county to find v 1 , v 2 , v 3 as described earlier. The per day average influx of travelers and returning residents are modeled as piece-wise linear functions over two week intervals from October 15, 2020 to April 25, 2021. The data is obtained from [9] and the values are shown in Table 4 .

Then in December 2020, the world started vaccinating. For simplicity our model assumes one type of vaccine requiring two doses. Expending to multiple vaccines, possibly some requiring only one dose, follows the same conceptual framework by adding more compartments. To implement vaccination in the EBM model, we introduce new compartments N V 1, N V 2,Ē,Ē q ,Ī andĪ q . While N V 1 and N V 2 represent the number of people who have received the first and second dose of vaccination respectively, theĒ,Ī,Ē q ,Ī q keeps track of exposures and infections between vaccine doses and post vaccination. They have the exact same sub-structure as the Exposed and Infected compartments used for non-vaccinated groups but parameters might vary. We introduce the new parameter ψ = 1/δ where δ is the time gap between doses, as well as µ 1 and µ 2 to account for possible reduced susceptibility of vaccinated individuals. We also assume an 80% reduction in transmissibility for vaccinated population, which is represented by the parameter ω which multiplies the contribution of vaccinated compartment to the hazard rate as seen in Equation 37. We assume a 95% protection as a result of vaccination. This is modeled as 13 i=0 (1 −p i ) = 0.95. This reduces the probability of vaccinated individuals developing symptoms and also reduces the probability of severe infections. The flow diagram with the vaccinated compartments is shown in Fig. 7 .

In the EBM, we assume that 100% of healthcare population are completely vaccinated by December 27, which is when the community starts receiving vaccination. For this, we begin vaccinating Healthcare population on December 22 with an average of 2500 people being vaccinated everyday. The proportion of community population that is vaccinated is based on daily averages from data from [7] The introduction of new compartments results in our equations being modified to:

where the terms in red account for vaccination where N V represents the daily average of number of people receiving the first dose of vaccination. With these assumptions and changes in the model, the equation for the hazard rates get modified to:

where:

and ω is the reduction in transmissibility due to vaccination. If for simplicity we assume that by the start of the vaccination period for community population, the entire healthcare population is fully vaccinated, their hazard rate is modified to:

When combining travelers and vaccines, we assume that a proportion θ of all arriving travelers are fully vaccinated. These are moved directly into the NV2 compartment (for respective categories returning resident and tourists). The remaining 1 − θ are distributed between Susceptible, Exposed and Exposed quarantine compartment based on our prior assumptions from the model including travelers. Table 5 . The parameter values and data for travelers is the same as shown in Table 3 and Table 4 respectively. 

Compartmental models focus on directly capturing the collective behavior of groups of people, and are typically derived using estimates of aggregate (or limiting) behavior of a large number of individuals under (many) simplifying assumptions. In contrast, agent based models (ABMs) focus on capturing the behavior of a single individual, referred to as an agent. Such individual behavior can often be described using fairly simple rules, however the collective behavior may still exhibit complicated phenomena. As a related example from physics, one could imagine modeling diffusion using the standard PDE versus modeling the Brownian motion of each particle. It should be noted, however, that ABMs have been used in social, economic, and biological sciences as early as the 80s [21] , and some models could be tracked to the 70s [25, 26] . Popularity of ABMs exploded in the 90s, when the computational power significantly increased and became widely available. d S can be a product R m × Z k 2 , where R m captures continuous variables related to the disease, such as age and susceptibility to infection, and Z k 2 captures binary variables, such as presence of infection and comorbidities. Of course, one can add other categorical variables if needed.

The evolution of the system state through time can be regarded as a function g : T → A S or, equivalently, f : T × A → S, with f (t, a) = g(t)(a), where T is our time set, which we shall assume to be discrete, i.e. T = N. The function f can be defined deterministically or it can be a realization of a stochastic process. Importantly, one does not focus on the whole f and instead defines how an individual value, f (t + 1, a) is obtained when f (t, a) for all a ∈ A is known. Of course, a change in the state of an agent is unlikely to depend on all of the other agents. Typically, each agent has an associated subset of agents that may affect its state due to "interaction". We shall refer to such a subset as a contact set of an agent. In the case of pandemic modeling, a contact set consists of people who actually interact with a given individual. This example also suggests that a contact set of an agent a ∈ A, which we shall denote N (a), may have an additional structure to better reflect interactions within different contexts. For example, interactions at work may be different from those at home. Mathematically, this may be represented as a disjoint union, N (a) = n i=1 N i (a). Also, a contact set may be time dependent, thus yielding N (t, a) = n i=1 N i (t, a), t ∈ T . Considering that interactions between agents are typically symmetric, it is convenient to represent all N (a), a ∈ A, using an undirected graph, so that each separate N (a) is just a collection of adjacent vertices. More specifically, we let A be the vertex set of our graph, and let E ⊂ {{a, b}|a, b ∈ A} be the set of edges. We shall refer to such a graph as a contact network. We can then define a contact set of an agent a ∈ A as N (a) = {b ∈ A|∃{a, b} ∈ E}. Additional structure and time dependency of contact sets are obtained by defining multiple, possibly time dependent interaction networks with the same vertex set A and different edge sets E i (t), i = 1, . . . , n.

In the deterministic case, the state of an agent a ∈ A at time step t + 1 is defined as a function of time t and the states f (t, b) where b ranges over N i (t), i = 1, . . . , n. In the stochastic case, one computes the conditional probability distribution for f (t + 1, a), conditioned on the above f (t, a) (and possibly t), and then samples from it. When modeling a pandemic, this often simplifies to computing the probability of infection given an uninfected individual, or computing the probability of developing symptoms, given an infected but asymptomatic individual, etc. Then a (pseudo)random number is generated to determine the actual state transition (e.g. an individual gets infected, develops symptoms, etc). While the transition between states can be mathematically quite complicated and non-Markovian, the actual implementation is often fairly straightforward.

An example of a contact network for an agent based model of a pandemic is shown in Fig. 9 . We should note that contact networks constitute one of the most important aspects of agent based models and their construction can be a challenging problem. But once the network construction is done, handling changes to agents' states due to newly available information can be readily implemented as tweaks to the interaction network and/or states of agents. For example, Fig. 9 shows that adding vaccinated individuals is conceptually quite simple. Of course, the process governing state transitions also needs to be updated. [13] . Covasim provides several ways to construct contact networks, with the default choice resulting in a network consisting of four "social layers" (i.e. edge sets) analogous to the ones shown in Fig. 9 . The state of a Covasim agent consists of 39 variables which include demographic information, individual susceptibility, and variables representing intrahost viral dynamics (along with viralload-based transmissibility). By design, Covasim is a stochastic agent-based model. Thus, as mentioned above, each step is focused on computing multiple transition probabilities.

While Covasim comes with its own demographic data set, it can also incorporate user-supplied demographic information, and we chose to use the data from the Hawai'i Population Model developed by the Hawai'i Data Collaborative [8] . In fact, Covasim allows the user to easily customize multiple aspects of the initialization step. We customized the population size along with the number of initially infected people as well as multiple parameters affecting the simulation, such as probability of asymptomatic infection, probability of isolation upon the onset of symptoms, etc. Since Covasim is a stochastic model, output quantities of interest (e.g. new daily infections, the number of hospitalizations, etc.) are averaged over multiple simulations, typically 15 to 20. A user can supply a seed for the (pseudo)random number generator, typically keeping it fixed during the testing phase and then properly resetting the seed before each simulation. In most cases we employed the default construction of the contact network with its four social layers: household, work, school, and community. However, it is fairly straightforward to alter the default construction, for example by increasing the average number of contacts among young adults in the community layer. We actually did implement the latter to better capture the fact that young people are typically more socially active.

Customization of Covasim simulation steps is done through so called "interventions," which are procedures that can be supplied to and then called by the main simulation routine. Covasim comes pre-packages with several interventions, allowing the user to take into account changes in the baseline transmission rate due to imposed mitigation measures, such as a lockdown, as well as incorporate testing and contact tracing with customized isolation probabilities. One can also write custom interventions, which we did to implement the Hawai'i vaccination protocol. The latter was necessary because the vaccine intervention bundled with Covasim could not properly capture the necessary dynamics of administering available COVID-19 vaccines.

Once Covasim runs its initialization step, it iterates over the given number of days and for each iteration uses the constructed contact network along with the provided parameters and interventions to calculate the probability that an agent gets infected. Additionally, the state of each already infected agent gets updated (e.g. switching from an asymptomatic infection to a symptomatic one) according to the prognoses pre-calculated during the initialization step.

Incorporating the effect of travelers on the spread of the disease is important for Hawai'i, but it had to be done in an indirect way, since Covasim keeps the total number of agents fixed throughout the simulation. Hence, we added a fifth social layer of contacts representing workers in the hospitality industry and a new custom intervention which increased the baseline transmission rate within the new layer based on the number of tourists traveling to Hawai'i as well as the nationwide infection rates. Figure 11 represents the optimized fit obtained from the Covasim model for Honolulu County.

It is important to note that while both SEIR and Covasim models denote by β the parameter representing the baseline transmission rate of the disease, these two parameters are, in fact, different quantities. In Covasim, it can be regarded as the probability of a susceptible person getting infected when a contact with an infected individual occurs and no information about other modifiers is given (i.e. layer-specific transmission rate factor, individual transmissibility, and individual susceptibility are all equal to 1). In the standard discrete SEIR model, this parameter also affects the fraction of newly infected individuals, with the latter given by 1 − exp (−β I N ), but its relation to the probability of infection per contact Figure 10 . Diagram of basic Covasim simulation algorithm. Figure 11 . In green is the Covasim model fit for Honolulu county from March 6 to October 15, 2020. Included are the optimized transmission rates.

is indirect and depends on multiple simplifying assumptions. This relation becomes even more complicated in our expanded SEIR models.

3. Computational Versus Conceptual Complexity, and Data Fitting. In this section we discuss the advantages and limitations of both models.

3.1. Conceptual Complexity. While computationally efficient, compartmental models present another difficulty: if the model itself needs to be modified to take into account newly discovered features of the pandemic, such modifications can be highly non-trivial. The reason for the difficulty is the aggregate nature of the interactions. Some of these interactions are quite intricate and are based on a series of complicated assumptions. Consequently, the conceptual complexity of the model may become a hurdle. This can be observed from the diagrams in figures 2, 5 and 8 where we illustrate the incorporation of additional compartments into the model. Adding new variants to the compartmental model would drastically increase the already existing complexity of the interactions between different compartments. A compartmental model is also quite limiting when it comes to the inclusion of demographic, ethnic and other essential information, as it would again require one to introduce numerous compartments. For agent based models, incorporating new attributes for individuals, such as age, ethnicity or vaccination status, is conceptually fairly simple, since each individual is represented by an agent. Thus, a simple augmentation of the variables representing the state of each agent should be done, and the network remains unchanged. For instance, taking into account a new variant of SARS-CoV-2 would amount to simply adding variables that characterize the variant to the agent state space.

Simulation of agent based models may be a computationally expensive process if the number of agents (i.e., the population) is large and the computation is not appropriately parallelized. Each time step requires an iteration over each interaction edge, and the state of each agent needs to be evaluated and possibly updated. Thus, the total time complexity of each time step is O(|E| + |A|). Here |E| denotes the number of interaction edges and |A| denotes the number of agents. Multiplying this already substantial number by the number of time steps, N , results in a fairly large computational cost O(N (|E| + |A|)).

We should also mention that interventions may also increase the time complexity. Fortunately, in many cases such computations can be parallelized, and employing compute clusters and/or multicore nodes can provide a significant speed-up. However, this potential parallelism is not currently exploited in our simulations, although the 15-20 simulations needed to average the results (as mentioned earlier) are run parallel.

If |E| is proportional to |A|, then one can expect a linear increase in computational time with respect to the number of days and the number of agents. Figure  13 shows this for our simulations. Here, no extra parallelism is employed. These are run on a single Haswell node of the NERSC Cori system. Each node has two sockets, and each socket is populated with a 2.3 GHz 16-core Haswell processor (Intel Xeon Processor E5-2698 v3). Each core supports 2 hyper-threads, and has two 256-bit-wide vector units. To produce the results in the picture, we used Covasim without additional parallelism and performed the computations on a single core. One can see that the increase in computational time with the number of days is roughly linear, and a reference straight line is included (top figure) . The scaling of the computational time with respect to the number of agents is also close to being linear (bottom figure). In the future, we plan to investigate the role of parallelism in reducing the computational time of agent based epidemiological models. Possible speedup is crucial for large population sizes since, as shown in Fig. 13 , already for 50 million agents one has to wait more than 2 hours to perform required computations on a single node. Moreover, larger populations may require employing distributed computations, as the total state of the model may not fit into the RAM of a single node.

Computational complexity of compartmental models is typically O(N C), where N C is the number of compartments, which is significantly less computationally intensive for the same problem. As expected, Fig. 14 shows a linear increase in the computational time with respect to the number of days. For this benchmark, a basic model with community and healthcare compartments is used without any interventions, travelers, or vaccinations. It was run on single Haswell node of the NERSC Cori system. Note that scaling with respect to the population is not relevant for a compartmental model since the population is aggregated. Generally agent based models are likely easier to parallelize, however we have not explored this because of the Python-based code in Covasim and its reliance on Python specific libraries which make hands-on parallelization techniques on a node, such as OpenMP, more difficult.

3.3. Data Fitting. There are differences in data fitting methodology, implementation and results for the two models we study. Our SEIR model was fitted to available data from the State of Hawai'i based on daily infections using a classical gradient-based optimization method. The latter is possible because we can explicitly compute the gradient with respect to the parameters. Thus obtained values of the baseline transmission rate for the SEIR model allowed us to calculate appropriate values of the corresponding parameter in Covasim. Fitting an agent based model to data directly is a far more complicated task. In most cases, one needs to resort to a very general global optimization technique such as the Metropolis-Hastings algorithm or the genetic algorithm [2, 10] . Such a procedure is very computationally costly and does not always produce a good fit [10] . Figure 15 shows the optimized fit obtained from the SEIR and the Covasim model for Honolulu County. We can see that qualitatively both fits agree very well in certain places, including the spike in August. While the models agree well when fitting specific data as for Honolulu County, it is unclear if the agreement will holds as we run the model without fitting data. In Fig. 16 we show simulation of a benchmark scenario to visualize the impact of the vaccines for both models. We start with a basic model with no interventions, vaccinations or travelers. For both models, we assume a population of 1 million, with 100 infections on day 1. In EBM this is accomplished by setting E 0 (0)=100 for the community population and choosing a single beta value for the simulations. We forecast two scenarios, the first one has no vaccines and shows comparatively how the two models forecast over a year. For the second scenario, we assume 2500 individuals are vaccinated per day, a 95% vaccine protection for developing symptoms (this means thatp i are chosen such that Π 13 i=0 (1 −p i ) = 0.95), and a 80% reduction in transmissibility (which corresponds to ω = 0.2). We also assume the vaccine requires only one shot. We can observe that the model forecasting agrees even with no fitting and overall the models behave similarly.

It is important to note that the Covasim curve above is actually the mean of 20 simulation curves. In Fig. 17 you can see the mean with 4 individual simulations curves: one with the highest peak (red), one with the lowest peak (magenta), with the earliest peak (blue), and with the latest peak (green). Taking the values (and times of occurrence) of all the peaks of our 20 simulations and computing their means for the scenario without vaccines, we get the mean peak value of 520 and the mean time of occurrence of 341.05, while the value of the peak of the mean of all simulations is 463.6 occurring at time 321. For the second scenario, the mean value of the peaks 114.95 with the mean time of occurrence of 178.15, while the value of the peak of the mean of the simulations is 102.05 occurring at time 172. These observations suggest that the differences in the curves in Fig. 16 are at least partially caused by somewhat a simplistic computation of the averages in Covasim. 4 . Conclusion. In this paper, we look at two types of epidemiological models and analyze their complexity both in terms of conceptual design and computational time. Our conclusion is that the decision to use one model versus the other ones depends on the objectives, available data as well as access to resources. If used properly, Figure 16 . SEIR (blue) and Covasim(green) benchmark scenarios forecasting spread and vaccination. these two types of models offer similar outcomes for the spread of the disease at the population level. This is not surprising, it was indeed observed in [28] over data from the 1918 Influenza pandemic. Our example, see Fig. 16 is designed to understand whether or not the two models agree when no data fitting is performed. It is clearly the case, and demonstrates that overall the models behave similarly.

Reflecting upon a year of modeling in the current COVID-19 pandemic, we conclude that for the State of Hawai'i both models played an important role. Early in the pandemic the compartment model allowed us to fit the data, and run some forecasting with limited data and mitigation measures that applied to the entire population (such as stay-at-home orders). Once the pandemic advanced and actions got more sophisticated with testing, contact tracing, safe travel program, tier system for reopening strategy and eventually vaccines it was clear that shifting to an agent based model was a better strategy. It allowed us to mimic the age demographic for vaccines plan, and include other attribute in an easier way. Computationally, because of the small size population in the State of Hawai'i it was not a major issue to use the agent based model. However for a state like California with Developing an hybrid model with some aspects that are agent based but some also with aggregated population might provide in the end the ideal tool. 1. Supporting Information. The equations for the dynamics of the three population groups are essentially the same and are given below. Only the hazard rate and the parameters determining transition rates into quarantine may be different between the three groups. the symptoms (j = 0, 1), or the stage of the illness (j = 2, 3, 4).

• VariablesĪ q,j (t), j = 0, 1. The number of vaccinated quarantined symptomatic infected individuals, with j representing either the number of days after the onset of the symptoms (j = 0, 1), or the stage of the illness (j = 2, 3, 4).

• Variable R(t). The number of removed (recovered or deceased) individuals.

Splitting exposed individuals into multiple stages, E i , allows us to capture possible differences in the progression of the asymptomatic phase of the disease. Importantly, it allows us to take into account that, according to the Centers for Disease Control and Prevention (CDC) as well as other sources, about 40% of people who contract SARS-CoV-2 remain asymptomatic, and the incubation period for those who do develop symptoms is somewhere between 2 to 14 days after exposure, with the mean incubation period between 4 and 6 days [?, ?, ?]. Individuals who do not develop symptoms after 14 days are assumed recovered. The use of the quarantine sub-compartments, E q,i , allows us to capture the effect of contact tracing and the reduced transmission rate for quarantined individuals.

Similarly, having multiple stages for infected individuals better reflects progression of the symptomatic phase of the disease. The first two stages represent the first two days of symptoms, but the next three should be understood as phases of the immune system fighting the disease. There is a substantial variability (due to age as well as other factors) in the number of days any given person can spend at each stage. Our model implicitly assumes that the symptomatic phase of the illness lasts at least 5 days (in the unlikely case that each stage lasts just one day).

As we mentioned, a crucial part of the dynamics relates to the hazard rate. For the general community, group C, we have λ c (t) = β(1 − p mp (1 − p me )) (I c + εE c ) + γ((1 − ν)I c,q + εE c,q )+ 0.33(Ī c + εĒ c ) + γ((1 − ν)Ī c,q + εĒ c,q )+ ρ[(I h + εE h ) + γ((1 − ν)I h,q + εE h,q )]+ ρ v [(I v + εE v ) + γ((1 − ν)I v,q + εE v,q )] /(N c + ρ v N v ), (27) and for the tourists we have

where we suppressed the dependency on t on the right for convenience. We use sub-indices c (community), h (healthcare workers), and v (tourists) to indicate the appropriate group. Subscript q indicates quarantined individuals. Here p me and p mp represent mask efficiency and mask compliance. Mask efficiency is chosen to reflect a reduction in transmission of 75% for all regions. Mask compliance is set at 20% for all regions at the start of the pandemic, but this value is modified on the dates the regions introduce mask regulations. N v denotes the mixing pool for the visitors and N c denotes the mixing pool for the general community, computed as N c (t) = S c + E c + I c + R c + ρ(S h + E h + I h + R h ) + ρ v1 (S v + E v + I v + R v ). (29) where variables E and I here represent the sum over all the stages within these compartments. For the healthcare worker group, we have λ h (t) = ρλ c + βη (I h + εE h ) + κν(I h,q + I c,q + I v,q ) /N h ,

where N h (t) = S h + E h + I h + R h . The model fit plot the following value

Modelling a pandemic with asymptomatic patients, impact of lockdown and herd immunity, with applications to SARS-CoV-2

Understanding the metropolis-hastings algorithm

Epidemiological Model of the Spread of COVID-19 in Hawai'is Chanllenging Fight Against the Didease

Analysis of an SEIRS epidemic model with two Delays

An agent-based model to evaluate the COVID-19 transmission risks in facilities

Mathematical models of contact patterns between age groups for predicting the spread of infectious diseases

Hawaii Population Model. Hawai'i Data Collaborative

Hawaii Safe Travels Digital Platform

Introduction to nonlinear and global optimization

Three Basic Epidemiological Models

Deterministic and stochastic epidemics in closed populations

Covasim: an agent-based model of COVID-19 dynamics and interventions. medRxiv

Global stability of a SEIR epidemic model with vertical transmission

Curtailing transmission of severe acute respiratory syndrome within a community and its hospital

Risk of COVID-19 among front-line health-care workers and the general community: a prospective cohort study. The Lancet Public Health

A contribution to the mathematical theory of epidemics

A Systematic Review of COVID-19 Epidemiology Based on Current Evidence

The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study

Management strategies in a SEIR-type model of COVID 19 community spread

Flocks, herds and schools: A distributed behavioral model

An Application of the Theory of Probabilities to the Study of a priori Pathometry

An Application of the Theory of Probabilities to the Study of a priori Pathometry

COVID-19 infection among healthcare workers: a cross-sectional study in southwest Iran

Dynamic models of segregation

Micromotives and macrobehavior

High-Resolution Agent-Based Modeling of COVID-19 Spreading in a Small Town

Analyzing the impact of modeling choices and assumptions in compartmental epidemiological models