key: cord-0103112-qrrkcjl5
authors: Csutak, Balazs; Polcz, Peter; Szederkenyi, Gabor
title: Computation of COVID-19 epidemiological data in Hungary using dynamic model inversion
date: 2021-05-07
journal: nan
DOI: nan
sha: 4e41c7cc1a81598b39b6eba37ffb7028a6507f3a
doc_id: 103112
cord_uid: qrrkcjl5

In this paper, we estimate epidemiological data of the COVID-19 pandemic in Hungary using only the daily number of hospitalized patients, and applying well-known techniques from systems and control theory. We use a previously published and validated compartmental model for the description of epidemic spread. Exploiting the fact that an important subsystem of the model is linear, first we compute the number of latent infected persons in time. Then an estimate can be given for the number of people in other compartments. From these data, it is possible to track the time dependent reproduction numbers via a recursive least squares estimate. The credibility of the obtained results is discussed using available data from the literature.

Since the spring of 2020, the COVID-19 pandemic has put an enormous burden on the societies, economies and healthcare systems of most countries. It is of fundamental importance to track the evolution of the epidemic process in order to prepare healthcare capacities, or to design efficient measures and vaccination policies against the disease spread. It is well-known that only a fraction of the actually infected persons get registered into the official databases. This ratio varies widely between countries depending on the testing activity and related policies. E.g., as of Apr. 28., 2021, the official data for the cumulative numbers of infected in Czechia and Hungary are 1.626.033 [1] and 774.399 [2] , respectively. Considering the similarities between these two countries in terms of geographic location, area, population size, the stringency of interventions, the organization of healthcare, and the number of reported COVID-deaths, it is highly unlikely that there could be such a huge difference in the true numbers. There exist several attempts in the literature for the reconstruction of real epidemic data. In [3] a robust statistical estimate with uncertainty analysis was given for the detection rate in several countries. It turns out from the published results that during the 2020 spring wave of the pandemics, the detection rate could be as small or even lower than 10% for e.g., Sweden, Italy, Spain, or may have reached 40-50% in the case of Australia and South Korea, where huge effort was put into immediate contact tracing and testing. Somewhat tighter upper bounds were computed for the possible total number of cases in several European countries in [4] using a data-driven estimator from the distributions of daily cases and deaths. It is clear from the above that the recorded number of infected people alone is generally not a reliable source for the tracking of the epidemics. Moreover, it is also visible from the Hungarian data that recoveries were not followed and recorded precisely in the second wave of the pandemic until the middle of December, 2020 (see, e.g. at [5] ). Therefore, the epidemic curve showing the active cases cannot reflect the real situation even in a qualitative way.

After the spring wave of COVID-19, a representative serological test was designed and carried out in Hungary which estimated that exposure was approximately 0.68% among the total population by the middle of May, 2020 [6] . However, no similar study has been done since the summer of 2020, although it would certainly have huge significance in exploring the current situation and refining our models.

There is a huge literature on epidemic modeling and recently on the modeling of the COVID-19 pandemic for different purposes such as identification, control, or prediction [7] . E.g., in [8] , a hybrid machine learning approach is proposed to predict the number of infected individuals and the mortality rates of the first wave of the COVID-19 outbreak in Hungary. In [9] the authors show that epidemic outbreak prediction can be efficiently supported by integrating machine learning and SEIR models.

In systems and control theory, it is a common practice to estimate unknown quantities (e.g., states or parameters) from measurement data using dynamical models of the observed phenomena [10, 11] . Population-level deterministic epidemic models are most often written in a nonnegative compartmental form with polynomial nonlinearities representing the infection mechanism and linear subsystems describing the transitions between certain compartments [12] . We will use a control oriented ODE model of the pandemics in Hungary which was proposed and used in [13] , and was derived from the more detailed model in [14] .

Based on the above, the purpose of this paper is to propose a model-based computation approach to study the second and third waves of the COVID-19 pandemic in Hungary. Due to the previously mentioned unreliability of other data sources, we only use the number of hospitalized people available from [15] as input data, assuming that testing is wide-spread and quick enough in hospitals.

2 The applied compartmental model

To describe the transmission dynamics of the disease, we use a compartmental model describing the spreading characteristics of COVID-19. We divide the population of N individuals into eight classes, representing the different stages of the illness, discussed in detail in [13] . The compartments used stand for the following: the susceptibles (S) are those individuals, who can be infected by the disease (i.e., neither have been infected nor yet obtained immunity by, eg., vaccination). Latents (L) are infected people in the very first stage, lacking any symptoms and also incapable to transmit the pathogen. This is followed by the pre-symptomatic (P) stage, including individuals already infectious, but still without symptoms. As a significant part of the infected show no characteristic symptoms, and thus they being infected is often not confirmed, we handle the asymptomatic (A) and symptomatic infected (I) classes differently. Members of A always recover (R) after a certain time, but for some part of I hospitalization is needed. Hospitalized patients (H) may recover or decease (D). The possible transitions between the compartments are illustrated in Fig. 1 and their dynamics is described formally by the ODEs below.Ṡ

From now on, we suppress the time arguments in the state variables as it is commonly done in the literature. 

We set the model parameters according to the estimates published in related literature. We used the following model parameters for (1) which had been determined by updating the parameter set published in [13] with newer Hungary-specific data. The population of Hungary is N = 9.8 · 10 6 people. The latent period (α −1 ) is 2.5 days on average, whereas, the pre-symptomatic period (p −1 ) is generally 3 days. The infectious period of both the symptomatic (ρ I ) and asymptomatic (ρ A ) classes is approximately 4 days. The average length of hospitalization (h −1 ) before recovery or death is 10 days. The hospitalization probability of the symptomatic (η) is 0.076. The probability of developing symptoms (q = 0.6) and the relative infectiousness of the asymptomatic (δ = 0.75) are based on CDC estimates. The death ratio among hospitalized is µ = 0.185.

The basic reproduction number, expressing the average number of new infections generated by a single infected individual in a fully susceptible population, is given as

Using R 0 = 2.2 estimated from early Hungarian data, and the already determined parameters, a nominal value of β = 1/3 can be derived, but note that this parameter largely depends on the actual restrictions and conditions, therefore, it will be estimated as a time-varying parameter. The timedependent reproduction number for our model taking into account the change of β and the decrease of the susceptible population is given by

3 Computation methodology

Dynamic inversion is known to be sensitive to the abrupt changes in the derivatives of measured signals caused by fluctuations and noise. Therefore, we applied the following preprocessing to smoothen the input data. Lety k denote the number of hospitalized patients on day k collected from [15] , k = 1, . . . , T . As the starting date (k = 1), we considered the 20th of August, 2020. The available data was collected up till the 28th of April, 2021.

Generally, at weekends the healthcare system has a decreased capacity to perform tests and document the confirmed cases. Therefore, as it is commonly done in the literature and engineering practice, we apply a 7-day long moving average filter to the officially published hospitalization data record. Formally, we consider the following smoothed time series of the numbers of hospitalized patients:

Obviously, at the two ends of the time series, the sliding window has to be truncated as follows:

Next, we used a cubic spline interpolation technique, ensuring the result being twice continuously differentiable. We divided the data series into n = 15 equally long segments, the endpoints being at t i = i · (T /n), t = 0, 1, . . . , n in time. As these time points (expressed in days) are not necessarily integers, we used a linear interpolation to calculate y t i from y i·(T /n) and y i·(T /n) +1 . The spline was fitted on data points (t i , y t i ), i = 0, 1, . . . , n using not-a-knot conditions for the slope endpoints. Let the spline interpolated hospitalization data be denoted by y t , t ∈ [1, T ]. The original measurement (y) and the results (ȳ and y) of the smoothing process can be seen in Fig. 2 . Additionally, Fig. 3 illustrates that the data preprocessing steps make it possible to compute the first three derivatives of y relatively noise-free. Henceforth, in all computations, we use y as the single available information about the time evolution of the epidemic spread.

Consider the (P, I, H) subsystem of (1) in the following continuous-time linear time-invariant (LTI) state-space form:

The input of Σ PIH is the daily number of people in the latent phase (u = L), whereas, the output is (the filtered time series of ) the daily number of hospitalized patients (y = H). The continuous-time transfer function of Σ PIH is the following:

from which we can see that the analytical inverse of the system is non-causal.

To compute the unknown input of Σ PIH , we consider the discrete-time (DT) model of Σ PIH with T s = 1 (day) sampling period. Using zero order hold, the transfer function of the discretized model is the following:

where a 0 = −0.5049, a 1 = 1.911, a 2 = −2.4, b 0 = 1.52 · 10 −4 , b 1 = 7.225 · 10 −4 , b 2 = 2.139 · 10 −4 . The corresponding DT input-output model can be written as follows:

First, we compute the input from the output using a standard least squares deconvolution. Using the available filtered time series of y k=1,...,T , the input-output difference equation (10) can be written in the following matrix-vector format: 

The system of linear equations in (11) is under-determined in u, since M a ∈ R (T −3)×T and M b ∈ R (T −3)×(T −1) . The least-squares solution for u can be computed by considering the Moore-Penrose

After computation, we filtered the time seriesǔ ls with a 7-day moving average filter, which resulted in u ls,k , k = 1, . . . , T −1. Having an estimate of the input of the system Σ PIH , we can apply a standard linear state observer to estimate the population in the non-measured compartments (P, I, A, R). An estimate for D can be computed from (1h), where the official governmental data can be used as well. However, we do not use the published data on COVID deaths for inversion, and the dynamics of the other compartments do not depend on D.

As another possible solution, in this subsection we present an unknown input observer approach for system inversion and state estimation. It can be seen from the transfer function model (8) that the LTI subsystem Σ PIH of (1) has relative degree 3, i.e., the input signal does appear only in the third (and higher order) derivatives of the output equation: (15) and (16) as two additional outputs of Σ PIH , then the augmented state-space model

is unknown-input observable. Furthermore, (17) allows us to algebraically express the unknown input from the third derivative of signal y.

Following the ideas of [16, Section 3.2.1], we construct the following unknown-input observer for (18):

The coefficient matrices of (19) are:

constitutes the Moore-Penrose left pseudo-inverse of matrix O 3 B, and K 1 is selected such that F is a stability matrix.

To summarize, the non-measured state variables (P, I) of Σ PIH can be reconstructed using the first two derivatives of y, whereas, the input reconstruction requires also the third derivative of y. The remaining state variables (A, R, D) of (1) can be reconstructed by considering the estimated trajectories of (P, I) and the measured trajectory of y. Finally, using the fact that the whole population is conserved in our compartmental model, the number of susceptibles can be estimated simply as

where V denotes the cumulative number of people how have become immune due to vaccination.

In the computation, we considered a simple vaccination model taking into consideration the documented efficacy of vaccines applied in Hungary. Based on this, we assumed that 85% of vaccinated people do not take part in the infection process any more from 21 days after administering the first dose. The time series of the number of vaccinated people were collected from the official website [18] of the Government of Hungary.

It is visible from Eqs. (2) and (3) that the parameter β is needed to compute estimates for the reproduction numbers. The right hand side of Eq. (1a) depends linearly on β, therefore, a recursive least squares (RLS) algorithm with exponential forgetting can be applied [11] . For this, we discretize (1b) using a simple forward-Euler scheme with sampling time T s which can be written in linear regression form as

Then, the RLS algorithm for estimating β can be written aŝ

whereβ k is the estimate for β, P k is the auxiliary variable, and λ = 0.9 is the forgetting factor. The estimates for the reproduction numbers R 0 and R c are given by substitutingβ into Eqs. (2) and (3). We note that the computation of R c uses the simple vaccination model presented in Remark 1.

The studied period is from August 20, 2020 to April 28, 2021. We considered this time independently of the drastically and successfully suppressed first wave of the epidemic in the spring of 2020. During the first half of August, the main infection indicators (daily new infections, test positivity, number of hospitalized people) were so low that the zero initial condition assumption for the transfer function representation is acceptable. (We remark that the correct convergence of the state estimates would be ensured even in the case of non-zero initial conditions after a transient period, since the subsystem (6) is asymptotically stable and observable.)

In Fig. 4 , we illustrate the least squares solutionǔ ls for the unknown input, its 14-day long moving average filtered data u ls , and the dynamically reconstructed input u uio . In Fig. 5 , the solid lines illustrate the simulated state variables of the (P, I, H, A) subsystem of (1) driven by the filtered least squares solutionǔ ls . Whereas, the dashed lines of Fig. 5 , constitute the observed state variables of Σ PIH and (1e), and the reconstructed unknown input u uio in the approximative knowledge of the daily number of hospitalized patients (y) and its first three derivatives.

To evaluate the reliability of the state observations, we simulate Σ PIH with the computed (and linearly interpolated) unknown input functionsǔ ls , u ls , and u uio . Then, the reconstructed hospitalization data, denoted byy ls , y ls , and y uio , respectively, are compared to the actual and the filtered measurementsy,ȳ, and y. Fig. 6 illustrates that the shape of y, y ls , and y uio are qualitatively the same.

To quantify the difference between the measured and the reconstructed signals, (instead of the standard squared error model) we use the following relative squared error formula: Practically, (25) is the relative distance between the two series obtained from their absolute 2 distance y −ŷ 2 and divided by the 2 norm y 2 of the reference sequence y. It is worth mentioning that the relative distance between y and the identically zero function is 1. Furthermore, note that the last three values of the least squares solutionǔ ls cannot be considered, since (9) is a third -order DT-LTI model and thus the summation in (25) is performed only up to T − 3.

In Table 1 , we present the (pairwise) relative 2 distance between the measurementy,ȳ, y and the reconstructed hospitalization datay ls , y ls , y uio . The results show a slightly smaller relative error when an unknown-input observer is applied. However, it is worth remarking that the preliminary filtering steps of Section 3.1 are essential when the derivative-based observer filter (19) is applied. Fig. 7 shows the computed number of daily new infections, where two distinct waves (called the 2nd and 3rd waves, respectively) can be observed. The peaks of this curve during November-December 2020 and March 2021 match the available data, and the computation suggests a roughly 4-fold ratio between the model computed and officially detected new cases. The effect of the restrictions (closure of restaurants and gyms, online education in secondary schools, banning of most gatherings, curfew from 8pm to 5am, etc.) introduced in the first half of November is clearly visible. In Fig. 8 , the sum of all the infected compartments (L, P, I, A, H) is shown. The obtained numbers for October-November are comparable to the results of a nationwide testing in Slovakia at the end of October 2020, where about 1.06 percent of the population proved to be COVID-positive [19] . The estimated reproduction numbers are visible in Fig. 9 . In February 2021, R c increased above 1 again which indicates the spread of a more contagious variant (B.1.1.7) causing the large 3rd wave although the November restrictions were not at all relaxed at that time. The ratio of the peaks of the estimated R 0 in December 2020 and late February 2021 is 1.618, which is completely in agreement with the 1.4 -1.8 interval reported in the UK [20] . On March 8, 2021, further measures were introduced including the closing of all schools which clearly contributed to the decrease of the number of daily infections from the second half of March. Fig. 11 illustrates the estimated number of recovered people. The computations suggest that approximately 30 percent of the Hungarian population might have already gone through the disease. This result is extended with a straightforward uncertainty analysis (illustrated in Fig. 12 ). For each constant model parameter in (1a) -(1h) we assumed and uncertainty interval of ±20% around the nominal value. 5000 simulation runs were performed with uniformly distributed random parameters taken from the multidimensional uncertainty interval. The mean value of the sum of the estimated number of recovered people is 4.0 times higher on April 28, 2021 than the official cumulative number of COVID-positive cases mentioned in the Introduction. This multiplier is comparable to [4] , where the calculated ratio of probable total and detected cases is between 3.93 and 7.94 for 10 European countries not including Hungary, with a mean of 4.85. For this comparison, we have to take into consideration that testing intensity was significantly increased during the 2nd and 3rd waves.

The data of the COVID-19 pandemic in Hungary was studied in this paper using a systems theoretic approach. The approach is based on the dynamic inversion of a linear subsystem of the used compartmental model. We have shown two possible approaches leading to similar results to track the non-measured state variables, which then allows the estimation of the time dependent reproduction numbers. We remark that the modeling of vaccination is not necessary to give an estimation for the compartments containing infected people, although it is definitely required to estimate the number of susceptible people and the time-dependent reproduction number during the 3rd wave. The described approach not only gives an estimate for the cumulative number of infected and recovered people but it is also able to unravel information on the whole time-course of the epidemic process. The obtained results fall into the ranges that have been published in the literature for different European countries. Future work will be focused on the extension of the method to take model uncertainty into consideration in a systematic way.

COVID-19: Overview of the current situation in the Czech Republic

3 million 774 thousand vaccinated, 1,692 new infected, 188 patients deceased

Robust estimates of the true (population) infection rate for COVID-19: A backcasting approach

Estimating the size of undetected cases of the COVID-19 outbreak in Europe: An upper bound estimator

Hungary coronavirus (live): 391 170 cases -The coronavirus app

Novel coronavirus epidemic in the Hungarian population, a crosssectional nationwide survey to support the exit policy in Hungary

Forecasting models for coronavirus disease (covid-19): a survey of the state-of-the-art

Covid-19 pandemic prediction for hungary; a hybrid machine learning approach

Covid-19 outbreak prediction with machine learning

Mathematical control theory: Deterministic finite dimensional systems

System identification: Theory for the user

Mathematical models in population biology and epidemiology

Nonlinear model predictive control with logic constraints for COVID-19 management

Early phase of the COVID-19 outbreak in Hungary and post-lockdown scenarios

Data on hospital and ICU admission rates and current occupancy for COVID-19

Robust model-based fault diagnosis for dynamic systems

Existence of unknown input observers and feedback passivity for linear systems

COVID-19 testing in Slovakia

Transmission of SARS-CoV-2 lineage B.1.1.7 in England: Insights from linking epidemiological and genetic data