key: cord-0449818-tw0qyjq2 authors: Magri, Luca; Doan, Nguyen Anh Khoa title: First-principles machine learning modelling of COVID-19 date: 2020-04-20 journal: nan DOI: nan sha: 2269c998e254a3e6bd559f4b0e5cdbcfcceea591 doc_id: 449818 cord_uid: tw0qyjq2 The coronavirus disease 2019 (COVID-19) has changed the world since the World Health Organization declared its outbreak on 30th January 2020, recognizing the outbreak as a pandemic on 11th March 2020. As often said by politicians and scientific advisors, the objective is"to flatten the curve", or"push the peak down", or similar wording, of the virus spreading. Central to the official advice are mathematical models and data, which provide estimates on the evolution of the number of infected, recovered and deaths. The accuracy of the models is improved day by day by inferring the contact, recovery, and death rates from data (confirmed cases). A data-driven model trained with {it both} data {it and} first principles is proposed. The model can quickly be re-trained any time that new data becomes available. The method can be applied to more detailed epidemic models with virtually no conceptual modification. In December 2019, a cluster of unexplained pneumonia cases in Wuhan, the capital of Hubei province in the People's Republic of China, resulted into a global pandemic by 11 March 2020, as declared by the In first-principles machine learning modelling, we need first principles and data (machine learning) to generate a model. Section 2.1 introduces the first principles and working assumptions, Sec. 2.2 describes the data, and Sec. 2.3 formulates the problem as a constrained optimization problem. The proposed solution method is presented in Sec. 2.4. The COVID-19 infectious disease is an epidemic [9] . To model an epidemic, suitable groups (also known as compartments [10] ) are defined to cover the entire population of a country. Because (i) the epidemic has a (relatively) short time scale, for which the new births can be neglected; (ii) the number of deaths is small as compared with the entire population; and (iii) travel restrictions are enforced, the population, N , is assumed to be constant. The population of a country is divided into mutually exclusive groups: susceptible (S), infected (I), deceased (D), and recovered (R) (Fig. 1 ). In this model, the deaths are due to COVID-19. Every group is assumed to have the same characteristics, i.e., the groups are homogeneous. Every susceptible person can contract the virus (the immune group is neglected). These working assumptions can be relaxed in more complex models [11, 9] . Mathematically, Equation (1) is a continuity equation. The population N , which does not vary in time, is the sum of the groups S, I, R, D, which vary in time. This compartmental approach is known as the SIR-epidemic model with vital dynamics and constant population [10, 12] . The model will be called the SIRDmodel for brevity. The working assumptions and first principles are mathematically expressed by four ordinary differential equations (ODEs) with time-varying parameters (non-autonomous dynamical system) subject to initial conditions S 0 , I 0 , R 0 and D 0 . The symbol˙denotes the time derivative, d/dt. In compact forṁ q = F(q; α) (6) q = q 0 at t = 0 (7) where F is the model (i.e., the SIRD equations), and are the column vectors of the state and parameters, respectively. I/N is the probability to come into contact with an infected individual; β is the average number of contacts per person per unit of time weighed by the transmissibility (contact rate); γ is the average number of recovered people per unit of time (recovery rate); µ is the average number of deaths due to COVID-19 per unit of time (death rate). These parameters are time dependent and depend on several variables, such as governmental policies (lockdown, school/university closures, social distancing, etc.), heterogeneity in the population (age, life style, herd immunity, hygiene standards, etc.), and properties of the epidemic (virus genome, spreading mechanisms, etc.). The SIRD parameters estimate the epidemic time scales: 1/γ is the average time to recover; 1/β is the average time between one contact (with an infected) and another; and 1/µ is the average time to decease (for those who do not recover). The basic reproduction ratio 1 , R 0 ≡ β/γ, is the expected number of secondary infections from a single infection entering a population where all members are susceptible [9] . If R 0 > 1, the number of infected increases (Eq. (3)). If R 0 < 1, the disease does not grow on average. The total number of new cases per unit of time due to the contact of S susceptible people with infected people is βI/N · S. This is the only nonlinear term of the equations. (Other nonlinearities are hidden in the time dependence of the parameters β, γ and µ.) Equations (2)-(5) are interpreted as follows. The reliable data is about the number of confirmed infected, I c , and confirmed deaths, D c , which are arranged in a vector The data on the confirmed recovered, R c , was discontinued because it was deemed inaccurate 2 . The data used here is publicly available in the CSSEGISandData/COVID-19 GitHub repository 3 , which collects data from official sources and organizations. The calculation of the groups' dynamics and time-varying epidemic parameters is a constrained optimization problem: to minimize subject to an epidemic model. The epidemic model used in this paper is provided by Eqs. (6) and (7), however, more detailed models can be used. ||·|| is a norm, λ 1 and λ 2 are user-defined normalization factors. The loss function, E d , measures the error between the candidate solution (I, D) and the data (I c , D c ). Among all the possible candidate solutions, only the solutions that fulfil the epidemic model (Eqs. (6) and (7)) will be accepted. The cumulative confirmed number of cases is the dataset used. This is a quantity to be preferred over the daily increase of confirmed cases because it is smoother, i.e., it is not significantly affected by random fluctuations, in contrast with the daily increase. The algorithm that solves this constrained optimization problem is presented in Sec. 2.4. A data-driven model combined with first principles is proposed. This is referred to as first-principles machine learning for brevity. The data-driven algorithm is an optimal interpolator, while the epidemic model helps to obtain parameters that are consistent with the model. This synergistic combination helps to reduce the uncertainty in the predictions, which are as good as the employed epidemic model and the accuracy of the data. The first-principles machine learning epidemic modelling is based on the combination of an ODEsolver, which time-advances the SIRD model in Eqs. (2)-(5) (first principles), and a feedforward neural network (machine learning), which performs the assimilation of data with the epidemic model to learn the parameters' vector α(t) ( Algorithmically, the architecture is trained as follows: 1. First guess on the parameters. From the dataset {I c (t)} Nt t=0 and {D c (t)} Nt t=0 , a set of constant parameters α 0 ≡ [β 0 , γ 0 , µ 0 ] T is obtained by nonlinear regression of the data, I c and D c , during the initial exponential growth only. This time window is [0, t = Regr] ( Table 1) . The neural network is pre-trained to output the set of constant parameters, α 0 . This set of parameters ensures that the initial state of the neural network is consistent with the initial exponential growth, which makes the time integration of the SIRD model robust. Unless otherwise specified, the neural network consists of 1 layer with 8 neurons a . The time evolution of the SIRD parameters is obtained by nonlinear combination of the neurons with a sigmoid activation. 3. Training of the neural network. The entire architecture, which consists of the neural network and the SIRD timeintegrator, is optimized by a gradient-based optimizer (L-BFGS-B optimizer [13] ) to minimize the loss function The loss function is composed of four terms, which can be interpreted as follows: • E d1 is the error in a log-scale between the prediction and the available data (infected and deaths). This removes noisy fluctuations from the solution. • E d2 is the error in a linear scale between the prediction and the available data (infected and deaths). • E r is a regularization term, which prevents large discontinuities in the time-variation of the SIRD parameters from occurring, making the evolution smoother. The regularization factor before the sum in E r is an empirical scaling factor to ensure that the orders of magnitude of E r and E d are comparable. The factor 100 before the terms with µ ensures that the parameters of the SIRD model have a comparable order of magnitude. • E 0 constrains the initial values of α to be close to the first guess obtained at step 1. This ensures that, in the early stage of the epidemic, the growth is largely exponential with parameters that are nearly constant. A typical convergence of the optimizer is shown in Fig. 3 . a Architectures with 4 to 64 neurons provide the same accuracy (result not shown). (a) Cumulative quantities. (b) Daily rates. Novel Coronavirus (COVID-19) Cases, provided by John Hopkins University CSSE Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand Mathematical models of infectious disease transmission Contributions to the mathematical theory of epidemicsâĂŤI The mathematics of infectious deseases