key: cord-0559938-83prpte1 authors: Saeidpour, Arash; Rohani, Pejman title: Optimal non-pharmaceutical intervention policy for Covid-19 epidemic via neuroevolution algorithm date: 2021-10-23 journal: nan DOI: nan sha: ad908d136e916f889f0540979d0d2961353fada9 doc_id: 559938 cord_uid: 83prpte1 National responses to the Covid-19 pandemic varied markedly across countries, from business-as-usual to complete shutdowns. Policies aimed at disrupting the viral transmission cycle and preventing the healthcare system from being overwhelmed, simultaneously exact an economic toll. We developed a intervention policy model that comprised the relative human, economic and healthcare costs of non-pharmaceutical epidemic intervention and arrived at the optimal strategy using the neuroevolution algorithm. The proposed model finds the minimum required reduction in contact rates to maintain the burden on the healthcare system below the maximum capacity. We find that such a policy renders a sharp increase in the control strength at the early stages of the epidemic, followed by a steady increase in the subsequent ten weeks as the epidemic approaches its peak, and finally control strength is gradually decreased as the population moves towards herd immunity. We have also shown how such a model can provide an efficient adaptive intervention policy at different stages of the epidemic without having access to the entire history of its progression in the population. This work emphasizes the importance of imposing intervention measures early and provides insights into adaptive intervention policies to minimize the economic impacts of the epidemic without putting an extra burden on the healthcare system. On March 11, 2020 the World Health Organization (WHO) announced that Covid-19, caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1] , "can be characterized as a pandemic" [2] . Within a month, most countries around the world had taken public health measures to contain the spread of the novel virus [3] . However, the type and severity of implemented measures and their subsequent success in minimizing the public health impacts of the outbreak varied greatly by country [4] . This variation in policies and their effectiveness reflects the complexity of finding the balance between two often competing policy objectives: protecting the public's health versus minimizing the economic impact of intervention measures [5] . Initially, without access to pharmaceuticals, studies focused on two distinct control approaches: mitigation and suppression [6, 7, 8] . The mitigation strategy aims to reduce transmission such that healthcare systems are not overwhelmed, while aiming to maintain the chain of transmission in order to achieve herd immunity. In contrast, the suppression strategy is aimed at virus elimination. In hindsight, countries that acted early to suppress the disease have excelled at minimizing both the public health and economic impact of the epidemic [9, 10, 11] . While early suppression measures appear to outperform the mitigation strategy both in terms of public health goals and economic costs, such policies would not necessarily be successful in countries where citizens are more averse to government-enforced control and surveillance measures [12] . Moreover, suppression measures would only be successful if implemented in the early stages of the epidemic and sufficiently strictly as to curtail transmission effectively. In a number of settings, however, suppression has been implemented in a piece-meal manner, leading to periods of drastic interventions including lockdowns punctuated by relaxation of social distancing measures and subsequent uptick in transmission [13, 14] . This prompted us to examine the optimal mitigation strategy, which aims to manage or mitigate the healthcare impacts of the epidemic while population approaches herd immunity. Characterizing immediate and long-term economic, social and human burden of Covid-19 epidemic is challenging and has led to several research efforts to examine the optimal intervention policy from various perspectives. It is unfeasible to review comprehensively this body of work, so we confine ourselves to a number of the key studies. Rowthorn and Maciejowski [15] investigated the optimal uniform lockdown in an SIR model assuming a variety of parameterizations [15] . Their objective function assigned monetary values to costs arising from infection, lockdown, and value of life. Their main finding was that in the medium term, a policy that maintains effective reproduction number value close to 1 provides the best path. Bethune and Korinek [16] contrasted the decisions made by rational, individual agents with the choices made by a social planner who is able to coordinate the choices of individuals [16] . They found that rational agents generate large externalities because they fail to internalize the effects of their economic and social activities on others' risk of infection. Alvarez et al. formalized the social planner's dynamic control using an SIR epidemiological model and a linear economy. The best strategy starts with a severe lockdown two weeks after the epidemic, covers 60% of the population after a month, and progressively decreases to 20% of the population after three months. More recently, a number of studies have broadened this exploration to identify age-specific optimal control strategies [17, 18] . Inspired by [19, 20, 21, 22, 23] , we sought to use an neuroevolution strategy to finding the optimal policy function which would dynamically determine the minimal required reduction in transmission rates at each time instant, deemed as control strength hereafter. Reductions in transmission may result from lower contacts (due to isolation-in-place ordinances, movement restrictions, or lockdown policies), or the adoption of personal protective measures that serve to curtail transmission upon contact (such as the use of face masks), with varying economic impact. The fitness function is expressed such that a strategy is rewarded for allowing the epidemic to remove individuals from the susceptible pool without overwhelming the healthcare capacity. The proposed neuroevolution strategy begins by initializing a population of random policy functions. The generated policy functions are then used to simulate the trajectory of the epidemic. The fitness of each function is then evaluated based on a reward function. The most elite policy functions are then perturbed (mutated) to generate the next generation offsprings. The new population is then evaluated and this process is repeated for a pre-defined number of iterations. We also derived the optimal control solution via Pontryagin's maximum principle (PMP) [24] and compared the results with neuroevolution optimal policy. We have chosen the United Kingdom as our target population to implement the proposed approach. The choice of the UK as our target population was largely motivated by the frequent changes in the government's strategy to contain the epidemic [25] , as summarized in Figure 1 . The UK's initial response was a mitigation policy, majorly inspired by the response to the flu pandemic, with an emphasis on protecting the most vulnerable to avoid overburdening the healthcare system in an effort to achieve herd immunity [9] . This initial policy later changed to a suppression policy by implementing lock-downs and imposing face mask-wearing requirements. Looking back at the early days of the epidemic, this study aims to understand how an effective mitigation policy could have been implemented (see [9] for a comparison of initial responses to Covid-19 by different countries including United Kingdom). Our study explores mechanisms for "flattening the curve" -it is motivated by COVID-19 pandemic but need not be restricted to precise courses of action undertaken in the response to pandemic. Our findings are intended to be informative for future epidemic control, particularly at early stages of epidemic where no effective vaccine is in sight. The ideal intervention policy results in a rapid increase in control strength early in the epidemic, followed by a sustained increase over the next ten weeks as the epidemic reaches its peak, and ultimately a progressive drop in control strength as the population achieves herd immunity. We've also shown how, without having access to the complete history of the epidemic's growth in the population, such a model may give an effective adaptive intervention policy at various stages of the epidemic. This study highlights the significance of implementing control measures as promptly as possible and offers insights into adaptive intervention strategies aimed at reducing the economic effect of epidemics while avoiding undue strain on the healthcare system. 2 Materials and methods We used a deterministic, time-varying Susceptible-Exposed-Infectious-Recovered-Hospitalized in ICU (SEIRH) model [26, 27] to characterize the transmission dynamics in the UK as described in Eqs. 1-5: where β is the transmission rate, 1/ρ and 1/γ give the mean latent and infectious periods, respectively and c(t) ∈ [0, 1] is the reduction in transmission (such that c(t) = 1 signifies complete cessation of transmission). The state variable H(t) denotes the number of occupied ICU beds and is determined by the probability that an infection is detected (P Detection ), the fraction of cases that require ICU treatment (σ ICU ) and the rate of admission to the ICU (γ ICU Delay ). The mean duration of stay in the ICU is determined by 1/γ ICU Stay . Model parameters and chosen values are presented in Table 1 . In our analyses, we examine changes in optimal intervention policy assuming policies are implemented starting at different points during the epidemic, T 0 . To identify the appropriate initial conditions at these different starting points, we used a particle filter [28] to estimate the effective retrospective daily c(t) (where t = 0, . . . , T 0 ), thus yield the epidemiological state of the population at different stages of the epidemic. The agreement between our fitted SEIRH model and data is shown in Figure S2 . We first introduce the following multi-objective reward function to account for three opposing goals: i) Sustain viral transmission to achieve herd immunity, ii) Keep the ICU occupancy below the maximum capacity, and iii) Impose the minimum possible control: We defined r 1 (t) for the sake of mathematical simplicity in deriving PMP solution and it is only used to compare the optimal NPI policies obtained from neuroevolution and PMP methods. For the remainder of this study, we use a slightly different objective function, r 2 (t), defined as follows: In both reward functions (equations (6) & (7)), the terms α 1 , α 2 and α 3 modulate the relative importance of herd immunity, healthcare burden and economic costs, respectively. The goal, therefore, is to identify the optimal intervention function c(t) that maximizes the sum of rewards, J, during the course of epidemic: In this section we first derive the necessary conditions for optimal control via Pontryagin's maximum principle, and describe the iterative numerical algorithm (the forward-backward sweep method) used to find the optimal solution. First, we form the following Hamiltonian function: λ s (t) are adjoint functions satisfying the adjoin system: λ s (T ) = 0 (Transversality condition). Expanding equation 10 yields: The necessary conditions for the optimal control is obtained by maximizing the (H) with respect to c(t): The state equations (equations 1-5) and adjoint equations (equations 10-16) together with state initial conditions and transversality conditions (equation 11) form the Optimality system. The explicit solution can not be analytically derived. Thus we turned to an iterative numerical method, Forward-backward Sweep, to solve the Optimality system. The optimal policy function, π θ , is a feed forward neural network, parameterized by θ which takes the state of the system at current time t, {S(t), E(t), I(t), R(t)} as input and returns the control strength, c(t). The neuroevolution strategy aims to find the optimal policy function, P G Most elite , with highest fitness score. Fitness score of policy function j in generation i, f i j , is equal to the sum of rewards, J (equation 8) and is obtained by running the SEIRH model with the corresponding policy function. First, M policy functions (P 1 j ) are randomly initialized. For each policy function, a trajectory is rolled out and fitness score is calculated at the end of simulation, as shown in figure 2. The L policy functions with the highest fitness scores are mutated to generate the next generation of policy functions. Mutation is implemented by adding a random Gaussian noise, scaled by the mutation rate, σ, to θ parameters of elite policy functions. The new offspring policy functions served as the parents of next generation. This process continues to find a policy function with a sufficiently high fitness score, P G Most elite . We used a fully-connected feed-forward network with 3 16-unit hidden layers and one tanh output layer to model the policy function. Pseudocode for the neuroevolution algorithm used in this study is provided in Algorithm 1. We compared the optimal intervention policies obtained from PMP and neuroevolution policies (Fig S1) . The policies are obtained using the r 1 reward function (equation 6) with α 2 = 1e − 1, α3 = 5e − 3 and same initial conditions. We found the optimal policies obtained from both methods to be very similar. In simpler problems where an analytic solution can be obtained for the optimality system, the PMP method can provide more insights about the optimal control solution and the dynamics of the system. Otherwise, a neuroevolutionary approach is computationally advantageous since the resulting policy function provides an optimal strategy for a broad range of initial conditions at a substantially smaller computations cost. That is, the PMP optimal intervention for a given initial condition is obtained by solving the boundary-value problem formulated in equations (1) (2) (3) (4) (5) and (10) (11) (12) (13) (14) (15) (16) . For a new boundary condition, the numerical solution must be repeated to solve the new boundary-value problem. In the remainder of the paper, our optimal solutions are obtained via the neuroevolutionary approach. The relative economic burden of different objectives in the reward function is determined by the weights, {α 1 , α 2 , α 3 }. Thus, we examined the effects of variation in these parameters on the resulting optimal policy (see Figure S3 ). We constrained α 1 to be 1 and changed the values of α 2 and α 3 over a logarithmic grid. For each parameter set, we trained the neuroevolution algorithm for 2000 generations with a population size of 256. The resulting policy functions (purple lines) and corresponding ICU occupancy trajectories of the 10 best-performing agents for each parameter set are depicted in Figure S3 . We found the reward function to be consistently robust to variation in the values of α 2 . That is, the tested range of α 2 values makes the cost of ICU overflow sufficiently prohibitive, leading to high-fitness strategies ensuring ICU maximum capacity is not exceeded (note that the ICU overflow reward is equal to 0 while the ICU occupancy is below the maximum capacity and negative otherwise). Evidently, making α 2 smaller would eventually deprioritize the goal of maintaining the ICU occupancy below the limit. Without loss of generality, we will use α 2 = 1e9 in the remainder of this paper. In contrast, we found the reward function to be highly sensitive to variation in α 3 . For α 3 > 10 −4 , the relative cost (negative reward) of imposing control becomes prohibitive and leads to one of the extreme intervention strategies: Suppression policy to end the endogenous transmission at the earliest possible time and avoid imposing lengthy control measures; or a no-intervention policy which plainly leads to the minimum relative control cost. In practice, the inclination for a specific intervention strategy depends on the policy maker's priorities. We observed pronounced variation in the optimal policies and resulting ICU occupancy trajectories for smaller values of α 3 (compare the first and third columns, Fig. S3 ). In Figure S4 , we demonstrate this variation for each parameter set and across the values of α 3 . As shown in Fig. S4A , values of α 3 smaller than 10 −4 result in greater Cumulative herd immunity reward. Thus, when the relative cost of control is modest, the optimal policy function will tend to maximize the reward by increasing the number of individuals removed from the susceptible pool, which in turn leads to greater Cumulative control reward (Fig. S4B ) and longer epidemic duration (Fig. S4C) . Therefore, among the tested values, α 3 = 1e − 4 represents the middle ground between no-intervention and suppression policies, and is the value that we have used in the rest of this paper. Figure 3 presents a comparison between the optimal intervention policy identified via our neuroevolution algorithm, a uniform intervention policy and no-intervention policy. The uniform intervention policy is implemented by imposing a constant reduction in transmission throughout the epidemic, c(t) = c u . The value of control strength, c u , is estimated such that the peak ICU occupancy tangents the maximum capacity. Figure 3A depicts the ICU occupancy trajectories of these three policies. As expected, the no-intervention policy leads to ICU burdens well beyond the threshold capacity for more than two months (67 days). The other notable observation is the difference between the optimal and uniform policies in managing the ICU burden: the optimal policy maintains the ICU occupancy near the maximum capacity throughout the epidemic, but not beyond it. Figure 3B depicts the implemented control strength in time for optimal and uniform policies. Except for a period of time less than 10 weeks at the onset of the epidemic, the control strength of the optimal policy is below the uniform intervention policy. The difference in the imposed control between two policies is better illustrated by Figure 3C , where a widening gap between the cumulative imposed control of the two policies emerges after day 200. In Figure 3D , we present the recovered individuals for each policy. Unlike the optimal policy, the final fraction of recovered individuals in the uniform intervention policy case is well below the theoretical herd immunity threshold. This suggests that the any reduction in the control strength, could lead to another epidemic wave given the large fraction of susceptible individuals. We have estimated the optimal intervention policy initiated at different stages of the epidemic, as shown in Figure 4 . Each scenario corresponds to a particular start date for the roll out of the optimal intervention policy. Figure 4A depicts the scenario in which optimal intervention policy starts on March 1st, which coincides with a surge in cases in the UK. The optimal intervention policy starts with c(t) = 0.33 (a 33% reduction in transmission rates) and is gradually increased to c(t) = 0.54 by mid-May. The control strength tapers off to 0 by June 2021. This scenario leads to two peaks in ICU occupancy, in November 2020 and June 2021. Figures 4B-E depict the optimal intervention policy starting at intermediate stages of the epidemic. As mentioned above, we estimated the initial conditions for each scenario by fitting our SIER model to fatality data using particle filtering, a Monte Carlo likelihood estimation algorithm for hidden state-space dynamical systems [38] . Comparing the optimal intervention policy curves in different scenarios depicts how implementing transmission reduction measures at earlier stages of the epidemic will eventually shorten the epidemic: The termination of optimal intervention policy is delayed from June 2021 (in Figure 4A ) to February 2022 (in Figure 4D ). The only exception is Figure 4E , in which the optimal intervention policy terminates slightly sooner than in Figure 4D . This is most likely due to the emergence of new variants with higher transmissibility [39] which gave rise to a faster depletion of the susceptible pool than accounted for in our model. To better illustrate the importance of implementing early control measures, we have demonstrated the Total duration of intervention policy implementation and Cumulative imposed control for different scenarios in Figure 5 . The Total duration of intervention policy implementation represents the time period between March 1st 2020 and the termination date of intervention policy for each scenario. The Cumulative imposed control is obtained by summing the daily implemented control strength (c(t)), divided by total number of days with c(t) > 0 for each scenario. As shown in Figure 5A , the Total duration of intervention policy implementation increases from 442 days in the first columns to 700 days in the last one. Figure 5B also confirms the fact that implementing the optimal intervention policy from earlier stages of epidemic would reduce the overall required control measures. Note that depicted Cumulative imposed control values do not include the actual imposed control strength (c(t)) before the start of optimal intervention policy and adding those values would only widen their differences. Also, the Cumulative imposed control is a linear measure of overall imposed control, however, the actual economic cost would not necessarily change linearly with duration and strength of imposed intervention policy. Figure 6 paints an overall picture of how the optimal policy fine tunes the transmission rates to sustain endogenous transmission in the population without overburdening the ICU capacity. Figure 6A demonstrates the variation of effective reproductive ratio (R ef f ) throughout the epidemic (black line), the control strength is also shown (blue dashed line). At the onset of the epidemic, R ef f is instantly reduced to 1.52 from 2.3 by imposing a 0.33 reduction in contact rates (c(t) = 0.33) and further decreased to R ef f ≈ 1 by mid-may (point i) to stall the epidemic growth. From point i to point ii, The R ef f is maintained close to 1 to maintain the ICU occupancy close to the maximum capacity. At this point, c(t) is slightly increased which leads to a sharp decrease of R ef f to 0.89 in point iii. This is followed by a steep decrease in c(t) to bring the R ef f above 1 to sustain the transmission. To summarize, the optimal mitigation policy is achieved by finding the balance between two extreme scenarios: Suppression policy which aims to stall the endogenous transmission in the population, and "No-intervention" which leads to exponential epidemic growth and the overburdening of healthcare capacity. More than eighteen months into the SARS-CoV-2 pandemic, it is becoming increasingly clear that countries that implemented suppression strategies early on experienced greater success in managing both the public health and economic burden of the epidemic [9, 10, 11] . However, such strategies work best when employed early in the epidemic, For each scenario, the number of susceptible, exposed, infectious and recovered individuals is estimated from a SEIRH model fitted to the UK fatality data and used as initial condition to derive the optimal intervention policy. when number of cases is relatively small. Moreover, in countries where government-imposed restrictions are not well received by the public, implementation of such policies will be challenging. Looking back at the early stages of the epidemic, our work provides a dynamic mitigation strategy that sustains the community transmission without Normalized cumulative imposed control B Figure 5 : Implementing the optimal intervention policy will reduce the overall impact of control measures the Total duration of intervention policy implementation and Cumulative imposed control for different scenarios. The Total duration of intervention policy implementation represents the time period between March 1st 2020 and termination date of intervention policy for each scenario. The Cumulative imposed control is obtained by adding up the implemented control strength (c(t)) in each day, divided by total number of days with c(t) > 0 for each scenario. overwhelming the healthcare capacity. A number of previous studies on optimal non-pharmaceutical interventions have used quadratic cost expressions for the control term in the cost function [40, 18, 41] . This is mainly because when the cost function is quadratic with respect to the control, the differential equations arising from the necessary conditions for an optimal control have a known solution. Other functional forms frequently provide difficult-to-solve systems of differential equations. To circumvent this, we employed a neuroevolution algorithm which enabled us also to explore non-quadratic functions. The neuroevolution algorithm was used to train a policy function that takes the epidemiological state of population (the numbers of susceptible, exposed, infectious and recovered individuals) on each time day and provides the corresponding control strength. We defined a multi-objective reward function to account for three conflicting goals: Sustain the transmission to achieve herd immunity when suppression is not feasible, maintaining the ICU occupancy below the maximum capacity and imposing minimum possible control measures to reduce the contact rates. A relative weighting parameter was assigned to corresponding terms of each of these objectives in the reward function. The sensitivity analysis indicated that the resulting policy function is highly sensitive relative weighting of the control term and found a optimal range of of values for it. We chose United Kingdom as our target population and fitted an SEIRH model to fatality data to estimate the initial conditions at different stages of the epidemic. S(t) Figure 6 : The optimal intervention policy maintains the effective reproductive ratio (R ef f ) close to 1: The figure displays the changes in effective reproductive ratio when implementing the optimal intervention policy. The control strength (c(t)) is sharply increased at early stages of epidemic to stall the epidemic growth and keep healthcare capacity from being overwhelmed. The R ef f is maintained close to 1 by gradually reducing the c(t) as the size of susceptible pool shrinks. Once the value of R ef f reaches below 0.9, c(t) is increased to sustain the transmission in the population, while keeping the occupied ICU beds below the maximum capacity. The optimal intervention policy confirmed the importance of early interventions to reduce the contact rates in the population, as highlighted in the previous studies [41, 15] . An initial 34% reduction in transmission at the onset of the epidemic, gradually increasing to 50% in the next 10 weeks is required to bring the R ef f near 1. After that, the restrictions are constantly decreased as the the size of susceptible pool diminishes. The association between the control strength and the size of the susceptible pool (except the first initial 10 weeks) highlights the importance of reliable and widespread serosurveys in order to inform policy decision making. A key component of our neuroevolution algorithm is the assumption that the full epidemiological state of the population is observable at each time step. In reality, however, the observable data provide an incomplete and potentially biased picture of epidemiology since they are based on reported incidence, hospitalization and fatality data in addition to seroprevalence surveys. Besides assuming complete epidemiological information, our approach also assumed that the optimal intervention policy is implemented in deterministically; that is, the output action is perfectly implemented at each time instant and the resulting new state given the corresponding action is always the same -something that is not practical [42, 43] . An important next step in this area would be to extend our novel framework to identify the optimal intervention strategies with hidden states in a stochastic setting. Furthermore, while this study addresses the optimal reduction in the contact rates over time, the economic cost and effectiveness of various non-pharmaceutical intervention mechanisms [44, 45] to achieve the optimal policy reduction requirements must also be examined. We carried out a sensitivity analysis to investigate the impact of relative weighting of each term in the reward function on the observed optimal policy outcome. This section presents the corresponding results. A novel coronavirus from patients with pneumonia in china WHO Director-General's opening remarks at the media briefing on COVID-19 -11 Inferring the effectiveness of government interventions against covid-19 Transmission dynamics reveal the impracticality of covid-19 herd immunity strategies The effect of control strategies to reduce social mixing on outcomes of the covid-19 epidemic in wuhan, china: a modelling study The impact of covid-19 and strategies for mitigation and suppression in low-and middle-income countries Effects of non-pharmaceutical interventions on covid-19 cases, deaths, and demand for hospital services in the uk: a modelling study Hindsight is 2020? lessons in global health governance one year into the pandemic An interactive web-based dashboard to track covid-19 in real time. The Lancet infectious diseases Pareto-based evaluation of national responses to covid-19 pandemic shows that saving lives and protecting economy are non-trade-off objectives How will country-based mitigation measures influence the course of the covid-19 epidemic? The lancet Genetic evidence for the association between COVID-19 epidemic severity and timing of non-pharmaceutical interventions Mitigation Strategies for Pandemic Influenza A: Balancing Conflicting Policy Objectives A cost-benefit analysis of the covid-19 disease Covid-19 infection externalities: Trading off lives vs. livelihoods Optimal targeted lockdowns in a multi-group sir model Age-structured non-pharmaceutical interventions for optimal control of covid-19 epidemic Evolution strategies as a scalable alternative to reinforcement learning Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning Combating pertussis resurgence: One booster vaccination schedule does not fit all A graph theoretic-based approach for deploying heterogeneous multi-agent systems with application in precision agriculture A reinforcement learning-based approach for modeling and coverage of an unknown field using a team of autonomous ground vehicles Mathematical theory of optimal processes Variation in the response to covid-19 across the four nations of the united kingdom. Blavatnik School of Government working paper series Modelling Infectious Diseases Anomalous influenza seasonality in the united states and the emergence of novel influenza b viruses Statistical inference for multi-pathogen systems Population estimates for the uk Early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside hubei province, china: a descriptive and modelling study Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2) How epidemiological models of covid-19 help us estimate the true number of infections Coronavirus (covid-19) in the uk Clinical characteristics of 138 hospitalized patients with 2019 novel coronavirus-infected pneumonia in wuhan, china Baseline characteristics and outcomes of 1591 patients infected with sars-cov-2 admitted to icus of the lombardy region, italy Critical care bed capacity and urgent operations A tutorial on particle filtering and smoothing: Fifteen years later CMMID COVID-19 Working Group Optimal control for pandemic influenza: the role of limited antiviral treatment and isolation Optimal covid-19 epidemic control until vaccine deployment Probabilistic hurricane risk analysis of coastal bridges incorporating extreme wave statistics. Engineering Structures Parameterized fragility assessment of bridges subjected to hurricane events using metamodels and multiple environmental parameters The impact of non-pharmaceutical interventions on sars-cov-2 transmission across 130 countries and territories Strong social distancing measures in the united states reduced the covid-19 growth rate: Study evaluates the impact of social distancing measures on the growth rate of confirmed covid-19 cases across the united states We have derived the optimal control solution via Pontryagin's maximum principle (PMP) and compared the results with neuroevolution optimal policy in Figure S1 The model parameters are described in Table 1 in the main text and the model was fitted to estimate the control strength c(t) SEIR (median) 90th percentile Research reported in this publication was supported by the National Institute Of General Medical Sciences of the National Institutes of Health under Award Number R01GM123007. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.