key: cord-0487210-mb91j2zs
authors: Agapiou, Sergios; Anastasiou, Andreas; Baxevani, Anastassia; Christofides, Tasos; Constantinou, Elisavet; Hadjigeorgiou, Georgios; Nicolaides, Christos; Nikolopoulos, Georgios; Fokianos, Konstantinos
title: Modeling of Covid-19 Pandemic in Cyprus
date: 2020-10-05
journal: nan
DOI: nan
sha: 631b104ce8c402762bf04479c252b5beffef0791
doc_id: 487210
cord_uid: mb91j2zs

The Republic of Cyprus is a small island in the southeast of Europe and member of the European Union. The first wave of COVID-19 in Cyprus started in early March, 2020 (imported cases) and peaked in late March-early April. The health authorities responded rapidly and rigorously to the COVID-19 pandemic by scaling-up testing, increasing efforts to trace and isolate contacts of cases, and implementing measures such as closures of educational institutions, and travel and movement restrictions. The pandemic was also a unique opportunity that brought together experts from various disciplines including epidemiologists, clinicians, mathematicians, and statisticians. The aim of this paper is to present the efforts of this new, multidisciplinary research team in modelling the COVID-19 pandemic in the Republic of Cyprus.

Coronavirus disease 2019 , an infection caused by the novel coronavirus SARS-CoV-2 (Coronaviridae Study Group of the International Committee on Taxonomy of Viruses (2020)) that first emerged in Wuhan, China, (Zhu et al. (2020) ), counts now more than 25 million cases and has claimed nearly 850,000 lives (World Health Organization (2020)). Despite some advances in therapy (Beigel et al. (2020) ) and considerable progress in vaccine development, with some vaccine candidates reaching phase III trials (Jackson et al. (2020) ), there are still many gaps in our understanding of the new pandemic disease including some epidemiological parameters. Epidemic modelling is a fundamental component of epidemiology, especially with regards to infectious diseases. Following the pioneering work of R. Ross, W. Kermack, and McKendrick in early twentieth century (Kermack and McKendrick (1927) ), the discipline has established itself and comprises a major source of information for decision makers. For instance, in the United Kingdom, the Scientific Advisory Group of Emergencies (SAGE) is a major body that collects evidence from multiple sources including inputs from mathematical modelling to advice the British government on its response to the complex COVID-19 situation; for more information see this link.

In the context of the COVID-19 pandemic, expert opinions can help decision makers comprehend the status of the pandemic by collecting, analyzing, and interpreting relevant data and developing scientifically sound methods and models. An exact model that would describe perfectly the data is usually not feasible and of limited scope; hence scientists usually aim for models that allow a statistical simulation of synthetic data. At the same time, models can also approximate the dynamics of the disease and discover important patterns in the data. In this way, researchers can study various scenarios and understand the likely consequences of government interventions. Finally, the proposed models could motivate the conduct of further studies about the evolution of both infectious and non-infectious diseases of public interest.

Here we report our work including results from statistical and mathematical models used to understand the epidemiology of COVID-19 in Cyprus, during the time period starting from the beginning of March till the end of May 2020. We propose a range of different models that capture different aspects of the COVID-19 pandemic. The analysis consists of several methods applied to understand the evolution of pandemics in the long and short run. We use change-point detection, count time series methods and compartmental models for short and long term projections, respectively. We estimate the effective reproduction number by using three different methods and obtain consistent results irrespective of the method used. Results are cross-validated against observed data with considerable consistency. Besides providing a comprehensive data analysis we illustrate the importance of mathematical models to epidemiology.

In this section, after a brief introduction to the testing protocol, we introduce the different techniques and models that have been used for the modelling and analysis of the COVID-19 infections in Cyprus.

The Unit for Surveillance and Control of Communicable Diseases (USCCD) of the Ministry of Health operates COVID-19 surveillance. The lab-based surveillance system consists of 19 laboratories (7 public and 12 private) that carry out molecular diagnostic testing for SARS-CoV-2. Sociodemographic, epidemiological, and clinical data of individuals with SARS-CoV-2 infection are routinely collected from laboratories and clinics, and reported to an electronic platform of the USCCD. A confirmed COVID-19 case is a person, symptomatic or asymptomatic, with a respiratory swab (nasopharynx and/or pharynx) positive for SARS-CoV-2 by a real-time reverse-transcription polymerase chain (rRT-PCR) assay. Cases are considered imported if they have travel history from an affected area within 14 days of the disease onset. Locallyacquired cases are individuals who test positive for SARS-CoV-2 and have the earliest onset date in Cyprus without travel history from affected areas. People with symptomatic COVID-19 are considered recovered after the resolution of symptoms and two negative tests for SARS-CoV-2 at least 24-hour apart from each other. For asymptomatic cases, the negative tests to document virus clearance are obtained at least 14 days after the initial positive test. A person with a positive test at 14 days is further isolated for one week and finally released at 21 days after the initial diagnosis without further laboratory tests. Testing approaches in the Republic of Cyprus included: a) targeted testing of suspect cases and their contacts; of repatriates at the airport and during their 14-day quarantine; of teachers and students when schools re-opened in mid-May; of employees in essential services that continued their operation throughout the first pandemic wave (e.g., customer services, public domain); and of health-care workers in public hospitals, and b) population screenings following random sampling in the general population of most districts and in two municipalities with increased disease burden. By June 2nd 2020, 120,298 PCR tests had been performed (13,734.2 per 100,000 population). Public health measures were taken in 4 phases: Period 1 (10 -14 March, 2020) included closures of educational institutions and cancellation of public gatherings (>75 persons); Period 2 (15 -23 March, 2020) involved closure of entertainment areas (for instance, malls, theatres, etc), allowance of 1 person per 8 square meters in public service areas, and restrictions to international travel (for example, access to the Republic of Cyprus was permitted only for specific persons and after SARS-CoV-2 testing); Period 3 (24 -30 March, 2020) included closure of most retail services; and Period 4 (31 March -3 May) included the suspension of incoming flights with few exceptions (for instance, repatriated Cypriot citizens), stay at home order, and night curfew.

Change-point detection is an active area of statistical research that has attracted a lot of interest in recent years and plays an essential role in the development of the mathematical sciences. A non-exhaustive list of application areas includes financial econometrics (Schröder and Fryzlewicz, 2013) , credit scoring (Bolton and Hand, 2002) , and bioinformatics (Olshen et al., 2004) . The focus is on the so-called a posteriori changepoint detection, where the aim is to estimate the number and locations of certain changes in the behaviour of a given data sequence. For a review of methods of inference for single and multiple change-points (especially in the context of time series) under the a-posteriori framework, see Jandhyala et al. (2013) . The aim is to estimate the number and locations of certain changes in a stream of data. Detecting these change-points enables us to separate the data sequence into homogeneous segments, leading to a more flexible modeling approach. Advantages of discovering such heterogeneous segments include interpretation and forecasting.

Interpretation naturally associates the detected change-points to real-life events or/and political decisions.

In this way, a better description of the observed process and the impact of any intervention can be communicated. Forecasting, is based on the final detected segment which is important as it allows for more accurate prediction of future values of the data sequence at hand. Methods developed in this context are based on a given model. For the purpose of this paper, we work with the following signal-plus-noise model

where X t denotes the daily incidence COVID-19 cases and f t is a deterministic signal with structural changes at certain time points. Details about f t are given below. The sequence t consists of independent and identically distributed (iid) data with mean zero and variance equal to one and σ > 0. We denote the number of change-points by K and their respective locations by r 1 , r 2 , . . . , r K . The locations are unknown and the aim is to estimate them based on (1). The daily incidence cases of the COVID-19 outbreak in Cyprus is investigated by using the following two models for f t of (1):

1. Continuous, piecewise-linear signals: f t = µ j,1 + µ j,2 t, for t = r j−1 + 1, r j−1 + 2, . . . , r j with the additional constraint of µ k,1 + µ k,2 r k = µ k+1,1 + µ k+1,2 r k for k = 1, 2, . . . , N . The change-points, r k , satisfy f r k −1 + f r k +1 = 2f r k .

2. Piecewise-constant signals: f t = µ j for t = r j−1 + 1, r j−1 + 2, . . . , r j , and f rj = f rj +1 .

In this work, we are using the Isolate-Detect (ID) methodology of Anastasiou and Fryzlewicz (2019) to detect changes based on (1) by using linear and constant signals, as described above; see Appendix A-1 for a description of the method.

The analysis of count time series data (like daily incidence data we consider in this work) has attracted considerable attention, see Kedem and Fokianos (2002, Sec 4 & 5) for several references and Fokianos (2015) for a more recent review of this research area. In what follows, we take the point of view of generalized linear modelling as advanced by McCullagh and Nelder (1989) . This framework naturally generalizes the traditional ARMA methodology and includes several complicated data generating processes besides count data such as binary and categorical data. In addition, fitting of such models can be carried out by likelihood methods; therefore testing, diagnostics and all type of likelihood arguments are available to the data analyst.

The logarithmic function is the most popular link function for modeling count data. In fact, this choice corresponds to the canonical link of generalized linear models. Suppose that {X t } denotes a daily incidence time series and assume, that given the past, X t is conditionally Poisson distributed with mean λ t . Define ν t ≡ log λ t . A log-linear model with feedback for the analysis of count time series (Fokianos and Tjøstheim (2011) ) is defined as

In general, the parameters d, a 1 , b 1 can be positive or negative but they need to satisfy certain conditions to obtain stability of the model. The inclusion of the hidden process makes the mean of the process to depend on the long-term past values of the observed data. Further discussion on model (2) can be found in Appendix A-2 which also includes some discussion about interventions. An intervention is an unusual event that has a temporary or a permanent impact on the observed process. Computational methods for discovering interventions, in the context of (2), under a general mixed Poisson framework have been discussed by Liboschik et al. (2017) . In this work, we will consider additive outliers (AO) defined by

where the notation follows closely that of Sec. 2.2 and I(.) denotes the indicator function. Inclusion of the indicator function shows that at the time point r k , the mean process has a temporary shift whose effect is measured by the parameter γ k but in the log-scale. Other type of interventions can be included (see Appendix A-2) whose effect can be permanent and, in this sense, intervention analysis and change-point detection methodologies address similar problems but from a different point of view. Model fitting is based on maximum likelihood estimation and its implementation has been described in detail by Liboschik et al.

(2017).

Compartmental models in epidemiology, like the Susceptible-Infectious-Recovered (SIR) and Susceptible-Exposed-Infectious-Recovered (SEIR) models and their modifications, have been used to model infectious diseases since the early 1920's (see Keeling and Rohani (2008) , Nicolaides et al. (2020) among others). The basic assumptions for these models are the existence of a closed community, i.e without influx of new susceptibles or mortality due to other causes, with a fixed population, say N , and also that the individuals who recover from the illness are immune and do not become susceptible again. In the basic SEIR model, at any point in time t, each individual is either susceptible (S(t)), exposed (E(t)), infectious (I(t)) or recovered (R(t), including death). The epidemic starts at time t = 0 with one infectious individual, usually thought of as being externally infected, and the rest of the population being susceptible. People progress between the different compartments and this motion is described usually through a system of ordinary differential equations that can be put in a stochastic framework.

A variety of SEIR modifications and extensions exist in the literature, and a multitude of them emerged recently because of the COVID-19 epidemic. In this work, we consider four such modifications, based on the models proposed in Peng et al. (2020) and Li et al. (2020) for the analysis of the COVID-19 epidemic in Wuhan and the rest of the Chinese provinces.

Initially, we employ the SEIR model based on the meta-population model of Li et al. (2020) but simplified to take into account only a single population. The novelty compared to the standard SEIR model, is that this model takes into account the existence of undocumented/asymptomatic infections, which transmit the virus at a potentially reduced rate. The model tracks the evolution of four state variables at each day t,

representing the number of susceptible, exposed, infected-reported and infected-unreported individuals, S(t), E(t), I r (t), I u (t) respectively. The parameters of the model are the transmission rate β (days −1 ), the relative transmission rate µ representing the reduction in transmission for asymptomatic individuals, the average latency/incubation period Z (days), the average infectious period D (days) and the reporting rate α representing the proportion of infected individuals which are reported. For a graphic description of the model see Figure 16 . The time evolution of the system is defined by the following set of differential equations (recall N denotes the population size):

Following Li et al. (2020) , we use a stochastic version of this model with a delay mechanism. Each term, say U , on the right hand side of (4) is replaced by a Poisson random variable with mean U . At each day, we use the 4th order Runge-Kutta numerical scheme to integrate the resulting equations and obtain the values of the four state variables on the next day. For each new reported infection, we draw a Gamma random variable with mean τ d days, to determine when this infection will be recorded. For the main analysis we use τ d =6 days, as the average reporting delay between the onset of symptoms and the recording of an infection; see also Li et al. (2020) . Note that the results are robust with respect to the value of reporting delay. The final output of this model is the number of recorded infections on each day t, y = y(t).

We also use the meta population model of Li et al. (2020) . It models the transmission dynamics in a set of populations, indexed by i, connected through human mobility patterns, say M ij . This is implemented by incorporating information on human movement between the 5 main districts of Cyprus: Nicosia, Limassol, Larnaca, Paphos and Ammochostos. In this case, i = 1, 2, 3, 4, 5 and M ij denotes the daily number of people traveling from district i to district j, i = j. Such information is based on the 2011 census data obtained from the Cyprus Statistical Service. The time evolution of the four compartmental states in each district i is defined by the following set of differential equations:

where the notation follows the notation given in Sec. 2.4.1. In addition to the four state variables, this model also updates at each time step the population of each area i, say N i , by

where the multiplicative factor θ is assumed to be greater than 1 to reflect under-reporting of human movement. Like Model (4), Model (5) 

Further, we consider the meta-population model of Peng et al. (2020) . This is a generalisation of the classical SEIR model, consisting of seven states: (S(t), P (t), E(t), I(t), Q(t), R(t), D(t)). At time t, the susceptible cases S(t) will become with a rate ζ insusceptible P (t) or with rate β exposed E(t), that is infected but not yet infectious i.e. in a latent state. Some of the exposed cases will eventually become infected with a rate γ. Infected means they have the capacity of infecting but are not quarantined Q(t) yet. The introduction of the new quarantined state, Q(t), in the classical SEIR model, formed by the infected cases with a constant rate δ, allows to consider the effect of preventive measures. Finally, the quarantined cases, are now split to cured cases, R(t), with rate λ(t) and to closed, D(t), with mortality rate κ(t). The model's parameters are the transmission rate β, the protection rate ζ, the average latent time γ −1 (days), the average quarantine time δ −1 (days) as well as the time dependent cure rate λ(t) and mortality rate κ(t). The relations are characterized by the following system of difference equations:

The total population size is assumed to be constant and equal to N = S(t) + P (t) + E(t) + I(t) + Q(t) +

According to the official reports, the number of quarantined cases , recovered and deaths , due to COVID-19, are available. However, the recovered and death cases are directly related to the number of quarantine cases, which plays an important role in the analysis, especially since the numbers of exposed (E) and infectious (I) cases are very hard to determine. The latter two are therefore treated as hidden variables.

This implies that we need to estimate the four parameters ζ, β, γ −1 , δ −1 and both the time dependent cure rate λ(t) and mortality rate κ(t). Notice here that while the rest of the parameters are considered fixed during the pandemic, we allow the cure and mortality rate to vary with time. We expect that the former will increase with time, given that social distancing measures have been put in place, while the latter will decrease. Finally, this is an optimization problem, and the methodology we have followed in order to address it can be found in Appendix A-3.

The last model we consider is a modified version of a solution created by Bettencourt and Ribeiro (2008) to estimate real-time effective reproduction number R t 1 using a Bayesian approach on a simple Susceptible -Infected (SI) compartmental model:

We use the Bayes rule to update the beliefs about the true value of R t based on our predictions and on how many new cases have been reported each day. Having seen k new cases on day t, the posterior distribution of R t is proportional to (denoted by ∝) the prior beliefs of the value of P (R t ) times the likelihood of R t given that we have recorded k new cases, i.e., P (R t |k) ∝ P (R t ) × L(R t |k). To make this iterative every day that passes by, we use last day's posterior P (R t−1 |k t−1 ) to be today's prior P (R t ). Therefore in general

However, in the above model the posterior is influenced equally by all previous days. Thus, we propose a modification suggested in Systrom (2020) that shortens the memory and incorporates only the last m days of the likelihood function, P (R t |k) ∝ T t=m L(R t |k t ). The likelihood function is modelled with a Poisson distribution.

Recall the compartmental models discussed in Sec. 2.4.1 and 2.4.2. Then the effective reproduction number is given by

see the supplement of Li et al. (2020) . We estimate R t in (8) during consecutive fortnight periods for which its value is considered to be constant. To achieve this we estimate the parameters of each model, also assumed to be constant for each fortnight, using daily incidence data for Cyprus 2 . To estimate the parameters we employ Bayesian statistics, that is, we postulate prior distributions on the parameters and incorporate the data and the model (through the likelihood) to obtain the posterior distributions on the parameters. The posterior distributions capture our updated beliefs about the parameters after combining the prior with the observed data; see, for example, Bernardo and Smith (1994) .

For the model defined by (4), we consider the whole area of Cyprus as a single uniform population.

For this case, the observations are not sufficiently informative to identify all five parameters of the model.

A solution would be to enforce identifiability by postulating strongly informative prior distributions on the parameters. Instead, we choose to make the assumption that the parameters Z, D and µ have globally constant values, fixed over time. In particular we set D = 3.5 and µ = 0.5 as estimated in Li et al. (2020) and

.1 which appears to be the globally accepted mean incubation period. We thus only need to infer the reporting rate α and the transmission rate β, which vary both between different fortnights and for different countries because of the amount of testing and the degree of adherence to the social distancing policies.

On the other hand, the model defined by (5) is sufficiently informative to infer all six model parameters.

All computational methods, prior modelling and assumptions in relation to both compartmental models discussed in Sec. 2.4.1 and 2.4.2 are given in Appendix A-4. In addition to the above methods we further consider the method of Cori et al. (2013) as a benchmark to compare all methodologies for estimating the effective reproduction number.

By the end of May 2020, 952 cases of COVID-19 were diagnosed in the Republic of Cyprus. Of these, 50.2%

were male (n = 478) and the median age was 45 years (IQR: 31-59 years). The setting of potential exposure was available for 807 cases (84.8%). Of these, 17.4% (n = 140) had history of travel or residence abroad during a 14-day period before the onset of symptoms. Locally acquired infections were 667 (82.7%) with 8.6% (n = 57) related to a health-care facility in one geographical setting (cluster A) and 12.4% (n = 83) clustered in another setting (cluster B). The epidemic curve by date of sampling and date of symptom onset is shown in Figure 1 . The number of cases started to decline in April reaching very low levels in late May.

In this section, we investigate the long-term impact of COVID-19 to Cyprus. Towards this, we give longterm projections for the daily incidence and death rates. We fit system (6) to COVID-19 data that were collected during the period from the 1st of March 2020 till the 31st of May 2020, in Cyprus. We treat all the reported cases without making the distinction between local and imported. The model parameters are estimated using the methodology described in Appendix A-3. Once the model is fitted to data, it can be used to forecast the epidemic. In order to study the evolution of the model as new data are added and the quality of the respective forecasts, we have fitted model (6) using different time periods. Specifically, the four datasets were formed using the daily reported incidences from the beginning of the observation period until and including the 2/4/2020, 17/4/2020, 15/5/2020 and 24/5/202 respectively. The dates were chosen according to the change points detected using the methodology described in Section 2.2, see also Section 3.3.

The fitted model in each case was used in order to predict the pandemic's evolution until the 30/6/2020.

In Figure 2 , we show the number of predicted exposed plus infectious cases (green solid lines) and the number of predicted recovered cases (blue solid lines) for the duration of the prediction period, and compare them to the observed cases which are indicated by circles and triangles. We use circles for data that have been used in the prediction and triangles for the observed data that are used for validation. Visual inspection shows that after a period of about two months during which the model overestimates the number of active cases and underestimates the number of recovered, see Figure 2 (top), model (6) was able to capture accurately the evolution of the pandemic, Figure 2 (bottom).

The performance of the predictions can also be evaluated by means of the relative error (RE) which are

, where x t denotes the datum for day t and y t the model prediction for the same day. The RE for the recovered cases equal 0.4%, 0.2%, 0.3% and 0.3% for the four time periods respectively with the corresponding RE for the active cases being high in the beginning 18%, 5.8%, but then dropping considerably 0.16% and 0.1%, reflecting the fact the model caught up with the evolution of the pandemic. Overall, system (6) gives adequate predictions especially when data from longer time periods are used. for the active cases. Figure 3 shows the number of deaths and their respective predictions using subsets of data as described above. In the duration of the first data set, there were no deaths registered and therefore the prediction was identically zero, giving also an RE equal to 100% see Figure 3 (top left). As more deaths are registered the model's ability to predict the correct number of deaths is improving, see Figure 3 . The recovery rate (λ(t)) is modelled as

, λ i ≥ 0, i = 1, 2, 3.

The idea is that the recovery rate, as time increases, should converge towards a constant. In Figure 4 (left), the fitted recovery rate (solid line) is plotted against the observed number of recovered cases (stars).

Finally, Model 3 can be used to estimate the unobserved number of exposed, E(t), and infectious, I(t),

cases during the development of the pandemic. The maximum number of exposed cases occurs on the 21st of March 2020 and is estimated to be 173 cases, Figure 4 (right, blue line), with the maximum of infectious individuals (136) being attained on the 26 of March 2020. We can observe a delay in the transition of exposed to infectious in the order of 5 days, which suggests a 5 day latent time of COVID-19. 

We first consider the change-point detection method of Sec. 2.2 for the case of piecewise-linear signal plus noise model. Figure 5 illustrates the results obtained by this analysis on daily incidence data. 

We first fit model (2) Note again that the sum 0.779 + 211 ∼ 1 which shows that the non-stationarity persists even after including additive outliers (in the log-scale). Furthermore, the positive sign of both interventions shows the sudden explosion of the daily number of people infected. The corresponding BIC values obtained after fitting this model is equal to 576.643 which improves the BIC of the model without intervention which was equal to 615.766. Figure 6 shows the fit of the model to the data and gives 95% prediction intervals for the week ahead. Comparing both change-point analysis (see Fig. 5 ) and the result obtained by using the above intervention analysis, we observe that both approaches give similar prediction intervals that include future observed incidence data. Indeed, the observed data for the week ahead (01/06/2020-07/06/2020) were 4,6,1,0,5,5 and 1 cases.

Recall the effective reproduction number R t defined by (8). We perform Bayesian analysis using (4) (see For the data concerning all incidents, the first recorded incident was on 07/03/2020, hence, as detailed in Appendix A-4, we initialize our analysis of the outbreak 3 days earlier, on 04/03/2020. Figure posterior probabilities of the event R t < 1. Analysis using the full data.

Next, we consider the estimation model described in Li et al. (2020) where Cyprus is divided in 5 subpopulations (Nicosia, Limassol, Larnaca, Paphos, Ammochostos) and the mobility patterns between them are taken into account (as described in Metapopulation compartmental model 2). The effective reproduction number is given by (8). The compartmental model 2 structure was integrated stochastically using a 4th order Runge-Kutta (RK4) scheme. We use uniform prior distributions on the parameters of the model, with ranges similar to Li et al. (2020) as follows: relative trasmissibility 0.2 ≤ µ ≤ 1, movement factor 1 ≤ θ ≤ 1.75; latency period 3.5 ≤ Z ≤ 5.5; infectious period 3 ≤ D ≤ 4. For the infection rate we choose 0.1 ≤ β ≤ 1.5 before the lockdown and 0 ≤ β ≤ 0.8 after the lockdown and for the reporting rate we choose 0.3 ≤ α ≤ 1. Note that the Ensemble Adjustment Kalman Filter (EAKF, described in Appendix A-4) is not constrained by the initial priors and can migrate outside these ranges to obtain system solutions. For the initialization purposes we assume that all 5 districts are potential origins with an undocumented infected and exposed population drawn from a uniform distribution [0, 5] a week before the first documented case.

Initial condition does not affect the outcome of the inference. Transmission model 2 does not explicitly represent the process of infection confirmation. Thus, we mapped simulated documented infections to confirmed cases using a separate observational delay model. In this delay model, we account for the time interval between a person transitioning from latent to contagious and observational confirmation of that individual infection through a delay of T d . We assume that T d follows a Gamma distribution G(a, τ d /a)

where τ d = 6 days and a = 1.85 as derived by Li et al. (2020) using data from China. Inference is robust with respect to the choice of τ d .

For the inference we use incidents from local transmission in Cyprus as were reported by the Ministry of Health. In Figure 10 we plot the time evolution of the weekly effective reproduction number R t . While at the beginning of the outbreak the effective reproduction number was close to 2.5, after the lockdown measures, it dropped below 1 and stayed consistently there until the end of June 2020. We then use the methodology proposed by Bettencourt and Ribeiro (2008) and recently modified by Systrom (2020) as described in detail in Section 2.4.4. For that method we also use the incidents from local transmission in Cyprus as were reported by the Ministry of Health. Figure 11 shows the daily median value as well as the 95% credible intervals for the effective reproductive number using that method. 

The work presented in this report is the result of intensive collaboration of an interdisciplinary team which was formed shortly after the pandemic started. The main motivation was to give guidance to Cypriot government for controlling this major infectious disease outbreak. Accordingly, we developed models and methods that are of critical importance in appreciating how this disease is developing and what will be its next stage and in what kind of time framework. This is a valuable information for outbreak control, resource utilization and to initiate again the normal daily life.

We followed diverse paths to accomplish this by appealing to different modeling approaches and methods. We have shown that the government interventions were successful on containing COVID-19 in Cyprus, by the end of May, even though the disease initiated with a high value of R t . The government lockdown helped reduce the reproduction number, as the data shows, by applying different methodology. In addition, we have shown by change-point methodology and time series analysis the effect of various measures taken and have developed short-term predictions. The models we applied are based on simple surveillance data, seem to work well, give similar results, and can certainly help epidemiologists and public health officials quantify and understand changes in the transmission intensity of future epidemics and the drivers of these changes. Finally, we feel that our approach to bring together experts from various fields avoids misunderstandings and gaps in communication between scientists, and maximizes the effectiveness of efforts to deal with public health emergencies.

The existing change-point detection techniques for the scenarios mentioned in Section 2.2 are mainly split into two categories based on whether the change-points are detected all at once or one at a time. The former category mainly includes optimization-based methods, in which the estimated signal is chosen based on its least squares or log-likelihood criterion penalized by a complexity rule in order to avoid overfitting.

The most common example of a penalty function is the Bayesian Information Criterion (BIC); see Schwarz (1978) and Yao (1988) for details. In the latter category, in which change-points are detected one at a time, a popular method is binary segmentation, which performs an iterative binary splitting of the data on intervals determined by the previously obtained splits. Even though binary segmentation is conceptually simple, it has the disadvantage that at each step of the algorithm, it looks for a single change-point, which leads to its suboptimality in terms of accuracy, especially for signals with frequent change-points. One method that works towards solving this issue is the Isolate-Detect (ID) methodology of Anastasiou and Fryzlewicz (2019) ; it is the method used for the analysis carried out in this paper.

The concept behind ID is simple and is split into two stages; firstly, the isolation of each of the true change-points within subintervals of the domain [1, 2, . . . , T ], and secondly their detection. The basic idea is that for an observed data sequence of length T and with λ T a positive constant, ID first creates two ordered sets of K = T /λ T right-and left-expanding intervals as follows. The j th right-expanding interval is For clarity of exposition, we give below a simple example. Figure 13 covers a specific case of two change-points, r 1 = 38 and r 2 = 77. We will be referring to Phases 1 and 2 involving six and four intervals, respectively. These are clearly indicated in the plot and they are only related to this specific example, as for cases with more change-points will entertain more such phases. At the beginning, s = 1, e = T = 100, and we take the expansion parameter λ T = 10. Then, r 2 gets detected in {X s * , X s * +1 , . . . , X e }, where s * = 71. 

Recall (2) and that the parameters d, a 1 , b 1 can be positive or negative but they need to satisfy certain conditions so that we obtain stable behavior of the process. Note that the lagged observations of the response X t are fed into the autoregressive equation for ν t via the term log(X t−1 + 1). This is a one-to-one transformation of X t−1 which avoids zero data values. Moreover, both λ t and X t are transformed into the same scale. Covariates can be easily accommodated by model (2). When a 1 = 0, we obtain an AR(1) type model in terms of log(X t−1 + 1). In addition, the log-intensity process of (2) can be rewritten as

after repeated substitution. Hence, we obtain again that the hidden process {ν t } is determined by past functions of lagged responses, i.e.

(2) belongs to the class of observation driven models; see Cox (1981) .

Models like (2) (2), is determined by a latent process. Therefore a formal linear structure, as in the case of Gaussian linear time series model does not hold any more and interpretation of the interventions is a more complicated issue. Hence, a method which allows detection of interventions and estimation of their size is needed so that structural changes can be identified successfully. Important steps to achieve this goal are the following; see Chen and Liu (1993) :

1. A suitable model for accommodating interventions in count time series data.

2. Derivation of test procedures for their successful detection.

3. Implementation of joint maximum likelihood estimation of model parameters and outlier sizes.

4. Correction of the observed series for the detected interventions.

All these issues and possible directions for further developments of the methodology have been addressed by Liboschik et al. (2017) under the Poisson and mixed Poisson distributional framework. (6) According to the official reports, the number of quarantined cases (Q), recovered (R) and deaths (D), due to COVID-19, are available. However, the recovered and death cases are directly related to the number of quarantine cases, which plays an important role in the analysis, especially since the numbers of exposed (E) and infectious (I) cases are very hard to determine. The latter two are therefore treated as hidden variables.

This implies that we need to estimate the four parameters ζ, β, γ −1 , δ −1 and both the time dependent cure rate λ(t) and mortality rate κ(t). This is an optimization problem that we solve as follows: first we allow the latent time γ −1 to vary between 1 and 7 days and for each fixed γ −1 , we explore its influence on the rest of the parameters. The system of differential equations (6) is solved numerically using the Runge-Kutta 45 numerical scheme. The left plot of Figure 14 shows that the protection rate ζ and the transmission rate β both attain their corresponding maximum value when γ −1 is equal to 3 days. Note that ζ takes values between 0.08 and 0.2, while β converges very fast to 1. The reciprocal of the quarantine time δ −1 is increasing with the latent time γ −1 . One would suspect that longer latent time results to higher transmission rate and as the latent time increases almost every unprotected person will be infected after a direct contact with a COVID-19 patient. The right plot of Figure 14 shows the effect of the latent time on the total number of infected cases (exposed and infectious E(t) + I(t)) but not yet quarantined. The peak of the infection was achieved between the 21st and the 24th of March, depending on the latent time with the estimated number of infected people ranging between 338 and 526, depending again on the latent time considered.

Hence, once the latent time γ −1 is fixed, the fitting performance depends on the values of ζ, β and δ −1 .

After a small sensitivity analysis the latent time was finally determined as 3 days. The mortality rate κ(t)

is constantly very small and almost equal to zero, therefore we have not attempted to fit any function to it. For the cure rate λ(t) we have fitted the exponential function given in 9, the idea behind being that with time the recovery should converge to a constant rate. For the parameter estimation we have used a modified version of the MATLAB code given by Cheynet (2020) because Cyprus is a small country and this fact needs to be taken properly into account. Figure 14 : Sensitivity analysis on the parameters for the model defined by (6). The influence of the latent time γ −1 on the protection rate ζ, the transmission rate β and the quarantine time δ −1 (left plot ), the sum of exposed and infectious cases E(t) + I(t) (right plot).

We present a Bayesian analysis, for the model defined by (4) consideration. In particular, in the first period (when the number of tests was relatively low) we employ a symmetric prior around the value α = 0.5, while for later periods (when the number of targeted and random tests increased) we let the prior become progressively skewed towards 1. For the transmission rate β > 0, in the first period we use a Gamma(3/2, 3/2) prior, which puts high probability around 2, while for later periods we use an Exponential(1) prior which puts more mass closer to zero. This choice reflects the existence of super-spreaders in the early stages of the outbreak with higher probability compared to later on.

In each time-period under consideration we also need to initialize the outbreak in Cyprus. For the first period in both datasets, we use a uniform prior supported in {0, 1, . . . , 10} on the number of exposed and the number of undocumented infected 3 days before the first recorded incident. The two priors are independent, while the number of susceptible individuals is taken equal to Cyprus' population and the number of infected-reported equal to zero. For later periods, we use as priors on the four state variables, their posterior distributions at the end of the previous period (corrected appropriately based on the observation at the end of the previous period).

Following Li et al. (2020) , we assume that the daily number of reported cases are independent Gaussian random variables and use an empirical variance given as

by recalling that y(t) denotes the number of infected cases at day t. This allows us to build a Gaussian likelihood for the parameters α and β. Combining this likelihood with the prior distributions, we can deduce a formula for the posterior distribution on α, β. This distribution is not available in a closed form, hence in order to compute posterior estimates and their respective uncertainty quantification, we need to sample it. In the relatively simple setting of Model 1, it is feasible to employ Markov chain Monte Carlo methods, (see Robert and Casella (2013) ), in order to sample the posterior (namely, we use an independence sampler). This is in contrast to the model defined by (5) in Sec. 2.4.2, see Li et al. (2020) , where one has to use the Ensemble Adjustment Kalman Filter (EAKF), which introduces some approximations to the posterior distribution, due to the more complex meta-population structure. Originally developed for use in weather prediction, the EAKF assumes a Gaussian distribution for both the prior and the likelihood and adjusts the prior distribution to a posterior using Bayes rule deterministically. In particular, the EAKF assumes that both the prior distribution and likelihood are Gaussian, and thus can be fully characterized by their first two moments (mean and variance). The update scheme for ensemble members is computed using Bayes rule (posterior ∼ prior × likelihood) via the convolution of the two Gaussian distributions (see Li et al. (2020) for the implementation).

We report the results obtained after fitting a piecewise-constant signal plus noise model, as descibed in Sec. 2.2. The scenario here is that at each change-point, we have a sudden jump in the mean level of the signal. Figure 

Data and code are available at GitHub (https://github.com/chrisnic12/covid_cyprus)

Detecting multiple generalized change-points by isolating single ones

Remdesivir for the treatment of Covid-19 -preliminary report

Bayesian Theory

Real time Bayesian estimation of the epidemic potential of emerging infectious diseases

Statistical Fraud Detection: A Review

Joint estimation of model parameters and outlier effects in time series

Generalized SEIR epidemic model (fitting and computation)

A New Framework and Software to Estimate Time-Varying Reproduction Numbers During Epidemics

The species severe acute respiratory syndrome-related coronavirus: classifying 2019-ncov and naming it sars-cov-2

Statistical analysis of time series: Some recent developments

Impact of non-pharmaceutical interventions (NPI)s to reduce COVID-19 mortality and healthcare demand

Statistical Analysis of Count Time Series Models: A GLM perspective

Log-linear Poisson autoregression

An mrna vaccine against sars-cov-2 -preliminary report

Inference for single and multiple change-points in time series

Regression Models for Time Series Analysis

Modeling infectious diseases in humans and animals

A contribution to the mathematical theory of epidemics

Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov-2)

tscount: An R package for analysis of count time series following generalized linear models

Generalized Linear Models

Hand-hygiene mitigation strategies against global disease spreading through the air transportation network

Circular binary segmentation for the analysis of array-based DNA copy number data

Epidemic analysis of COVID-19 in China by dynamical modeling

Monte Carlo statistical methods

Adaptive trend estimation in financial time series via multiscale change-pointinduced basis recovery

Estimating the dimension of a model

The metric we need to manage COVID-19 rt: The effective reproduction number

Coronavirus disease (COVID-19) -situation report-197

Estimating the number of change-points via Schwarz' criterion

A novel coronavirus from patients with pneumonia in China