key: cord-1018300-q3de7r1p
authors: Griette, P.; Magal, P.
title: Clarifying predictions for COVID-19 from testing data: the example of New-York State
date: 2020-10-12
journal: nan
DOI: 10.1101/2020.10.10.20203034
sha: 559813db2187f1738f35ca070cf9ef27c408dd75
doc_id: 1018300
cord_uid: q3de7r1p

In this article, we use testing data as an input of a new epidemic model. We get nice a concordance between the best fit the model to the reported cases data for New-York state. We also get a good concordance of the testing dynamic and the epidemic's dynamic in the cumulative cases. Finally, we can investigate the effect of multiplying the number of tests by 2, 5, 10, and 100 to investigate the consequences on the reduction of the number of reported cases.

The epidemic of novel coronavirus infections began in China in December 2019 and rapidly spread worldwide in 2020. Since the early beginning of the epidemic, mathematicians and epidemiologists have developed models to analyze the data and characterize the spread of the virus, and attempt to project the future evolution of the epidemic. Many of those models are based on the SIR or SEIR model which is classical in the context of epidemics. We refer to [26, 28] for the earliest article devoted to such a question and we refer to [1, 3-7, 10, 12, 13, 20, 25] for a rather complete overview on SIR and SEIR models in general. In the course of the COVID-19 outbreak, it became clear for the scientific community that covert cases (asymptomatic or unreported infectious case) play an important role. An early description of an asymptomatic transmission in Germany was reported by Rothe et al. [24] . It was also observed on the Diamond Princess cruise ship in Yokohama in Japan by Mizumoto et al. [19] that many of the passengers were tested positive to the virus, but never presented any symptoms. We also refer to Qiu [21] for more information about this problem. At the early stage of the COVID-19 outbreak, a new class of epidemic models was proposed in Liu et al. [14] to take into account the contamination of susceptible individuals by contact with unreported infectious. Actually, this class of model was presented earlier in Arino et al. [2] . In [14] a new method to use the number of reported in SIR models was also proposed. This method and model was extended in several directions by the same group in [15] [16] [17] to include non-constant transmission rates and a period of exposure. More recently the method was extended and successfully applied to a Japanese age-structured dataset in [11] . The method was also extended to investigate the predictability of the outbreak in several countries China, South Korea, Italy, France, Germany and the United Kingdom in [18] . The application of the Bayesian method was also considered in [9] .

In parallel with these modeling ideas, Bayesian methods have been widely used to identify the parameters in the models used for the COVID-19 pandemic (see e.g. Roques et al. [22, 23] where an estimate of the fatality ratio has been developed). A remarkable feature of those methods is to provide mechanisms to correct some of the known biases in the observation of cases, such as the daily number of tests. Here we will embed the data for the daily number of tests into an epidemic model, and we will compare the number of reported cases produced by the model and the data. Our goal is to understand the relationship between the data for the daily number of tests (which will be an input our model) and the data for the daily number of reported cases (which will be an output for our model).

The plan of the paper is the following. In Section 2, we will present a model involving the daily number of tests. In Section 3, we apply the method presented in [14] to our new model. In Section 4, we present some numerical simulations, and we compare the model with the data. The last section is devoted to the discussion.

Let n(t) be the number of tests per unit of time. Throughout this paper, we use one day as the unit of time. Therefore n(t) can be regarded as the daily number of tests at time t. The function n(t) is actually coming from a database for the New-York State [29] . Let N (t) be the cumulative number of tests from the beginning of the epidemic then N (t) = n(t), for t ≥ t 1 and N (t 1 ) = N 1 .

(2.1) Remark 2.1 Section 4 is devoted numerical simulations. We will use n(t) as a piecewise constant function that varies day by day. Each day, n(t) will be equal to the number of tests that were performed that day. So n(t) should be understood as the black curve in Figure 4 .

The model consists of the following system of ordinary differential equations

This system is supplemented by initial data (which are all non negative)

thereby assuming that the disease was introduced by an individual incubating the disease at some time before t 1 . The time t 1 corresponds to the time where the tests started to be used constantly. Therefore the epidemic started before t 1 .

Here t ≥ t 1 is the time in days. S(t) is the number of individuals susceptible to infection. E(t) is the number of exposed individuals (i.e. who are incubating the disease but not infectious). I(t) is the number of individuals incubating the disease, but already infectious. U (t) is the number of undetected infectious individuals (i.e. who are expressing mild or no symptoms), and the infectious that have been tested with a false negative result, are therefore not candidates for testing. D(t) is the number of individuals who express severe symptoms and are candidates for testing. R(t) is the number of individuals who have been tested positive to the disease. The flux diagram of our model is presented in Figure 1 . Susceptible individuals S(t) become infected by contact with an infectious individual I(t), U (t) or D(t). When they get infected, susceptible are first classified as exposed individuals E(t), that is to say that they are incubating the disease but not yet infectious. The average length of this exposed period (or noninfectious incubation period) is 1/α days.

After the exposure period, individuals are becoming asymptomatic infectious I(t). The average length of the asymptomatic infectious period is 1/ν days. After this period, individuals are becoming either mildly symptomatic individuals U (t) or individuals with severe symptoms D(t). The average length of this infectious period is 1/η days. Some of the U -individuals may show no symptoms at all.

In our model, the transmission can occur between a S-individual and an I-, U -or R-individual. Transmissions of SARS-CoV-2 are described in the model by the term τ S(t)[I(t) + U (t) + D(t)] where τ is the transmission rate. Here, even though a transmission from R-individuals to a S-individuals is possible in theory (e.g. if a tested patient infects its medical doctor), we consider that such a case is rare and we neglect it. : Key time periods of COVID-19 infection: the latent or exposed period before the onset of symptoms and transmissibility, the incubation period before symptoms appear, the symptomatic period, and the transmissibility period, which may overlap the asymptomatic period.

The last part of the model is devoted to the testing. The parameter σ is the fraction of true positive tests and (1 − σ) is the fraction of false negative tests. The quantity σ has been estimated at σ = 0.7 in the case of nasal or pharyngeal swabs for SARS-CoV-2 [27] .

Among the detectable infectious, we assume that only a fraction g are tested per unit of time. This fraction corresponds to individuals with symptoms suggesting a potential infection to SARS-CoV-2. The fraction g is the frequency of testable individuals in the population of New-York state. We can rewrite g as

where P is the total number of individuals in the population of the state of New-York and 0 ≤ κ ≤ 1 is the fraction total population with mild or sever symptoms that may induce a test. Individuals who were tested positive R(t) are infectious on average during a period of 1/η days. But we assume that they become immediately isolated and they do not contribute to the epidemic anymore. In this model we focus on the testing of the D-individuals. The quantity n(t) σ g D is flux of successfully tested Dindividuals which become R-individuals. The flux of tested D-individuals which are false negatives is n(t) (1 − σ) g D which go from the class of D-individuals to the U -individuals. The parameters of the model and the initial conditions of the model are listed in Table 1 .

Before describing our method we need to introduce a few useful identities. The cumulative number of reported cases is obtained by using the following equation

(2.5)

3 All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted October 12, 2020. . https://doi.org/10.1101/2020. 10.10.20203034 doi: medRxiv preprint The daily number of reported cases DR (t) is given by

(2.6)

The cumulative number of detectable cases is given by

and the cumulative number of undetectable cases is given by

Time (in days)

Number of reported (tested infectious) cases at time t

Cumulative number of reported (tested infectious) cases at time t

Daily number of reported (tested infectious) cases at time t

Cumulative number of undetectable infectious at time t (2.8) 

In order to deal with data, we need to understand how to set the parameters as well as some components of the initial conditions. In order to do so, we extend the method presented first in [14] . The main novelty here will concern the cumulative number of tests which is assumed to grow linearly at the beginning. This property is satisfied for the New-York State data as we can see in perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted October 12, 2020. . This means that we can find a pair of numbers a and b such that

where a the daily number of tests and N 1 is the cumulative number of tests on day t 1 . By using the fact that N (t) = n(t) we deduce that

April. Figure 3 shows that the linear growth assumption is reasonable for the New-York State cumulative testing data.

Phenomenological models for the reported cases: At the early stage of the epidemic, we assume that all the infected components of the system grow exponentially while the number of susceptible remains unchanged during a relatively short period of time t ∈ [t 1 , t 2 ]. Therefore, we will assume that

We deduce that the cumulative number of reported satisfies

hence by replacing D(t) by the exponential formula (3.3)

and it is makes sense to assume that CR(t) − CR(t 1 ) has the following form

By identifying (3.5) and (3.6) we deduce that 

5 All rights reserved. No reuse allowed without permission.

perpetuity.

preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in

The copyright holder for this this version posted October 12, 2020. . https://doi.org/10.1101/2020.10.10.20203034 doi: medRxiv preprint By using (3.3) we obtain

(3.8)

Finally by using (3.7) D 1 = χ 2 χ 3 σ a g . (3.9) and by using (3.8) we obtain

(3.10)

We assume that the transmission coefficient takes the form

where τ 0 > 0 is the initial transmission coefficient, T m > 0 is the time at which the social distancing starts in the population, µ > 0 is serving to modulate the speed at which this social distancing is taking place. To take into account the effect of social distancing and public measures, we assume that the transmission coefficient τ (t) can be modulated by γ. Indeed by the closing of schools and non-essential shops and by imposing social distancing the population of the New-York State, the number of contacts per day is reduced. This effect was visible on the news during the first wave of the COVID-19 epidemic in New-York city since the streets were almost empty at some point. The parameter γ > 0 is the percentage of the number of transmissions that remain after a transition period (depending on µ), compared to a normal situation. A similar non-constant transmission rate was considered by Chowell et al. [8] .

In Figure 5 we consider a constant transmission rate τ (t) ≡ τ 0 which corresponds to γ = 1 in (4.1). In order to evaluate the distance between the model and the data, we compare the distance between the cumulative number of cases CR produced by the model and the data (see the orange dots and orange curve in Figure  5-(a) ). In Figure 5 -(c) we can observe that the cumulative number of cases increases up more than 14 millions of people, which indeed is not realistic. Nevertheless by choosing the parameter g = 3.08 × 10 −7 = 1/ S0 6 in Figure 5 -(d) we can see that the orange dots and the blue curve match very well.

6 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted October 12, 2020. 

we plot the number of cases obtained from the model. We can observe that most of the cases are unreported. In figure (d) we plot the daily number of tests (black dots), the daily number of positive cases (red dots) for the state of New-York and the daily number of cases DD(t) obtained from the data.

In the rest of this section, we focus on the model with confinement (or social distancing) measures. We assume that such social distancing measures have a strong impact on the transmission rate by assuming that γ = 0.2 < 1. It means that only 20% of the transmissions will remain after a transition period.

In Figure 6 -(c) we can observe that the cumulative number of cases increases up to 800 000 (blue curve) while the cumulative number reported cases goes up to 350 000. In Figure 6 -(d) we can see that the orange dots and the blue curve match very well again. In order to get this fit we fix the parameter g = 10 −5 .

In Figure 7 (a) and (b), we aim at understanding the connection between the daily fluctuations of the number of reported cases (epidemic dynamic) and the daily number of tests (testing dynamics). The combination of both the testing dynamics and the infection dynamics gives indeed a very complex curve parametrized by the time. It seems that the only reasonable comparison that we can make is between the cumulative number of reported cases and the cumulative number of tests. In Figure 7 In Figure 7 , all the curves are time dependent parametrized curves. The abscissa is the number of tests (horizontal axis) and the ordinate is the number of reported cases (vertical axis). It corresponds (with our notations) to the parametric functions t → (n data (t), DR(t)) in figures (a) and (b) and their cumulative equivalent t → (N data (t), CR(t)) in figure (c) and (d). In figures (a) and (c) we use only the data, that is to say that we plot t → (n data (t), DR data (t)) and t → (N data (t), CR data (t)). In figures (b) and (d) we use only the model for the number of reported cases, that is to say that we plot t → (n data (t), DR model (t)) and t → (N data (t), CR model (t)).

8 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in The copyright holder for this this version posted October 12, 2020. Figure  6 . In figure (a) we plot the daily number of cases coming from the data as a function of the daily number of tests. In figure (b) we plot the daily number of cases given by the model as a function of the cumulative number of cases coming from the data. In figure (c) we plot the cumulative number of cases coming from the data as a function of the cumulative number of tests. In figure (d) we plot the cumulative number of cases coming from the model as a function of the cumulative number of tests from the data.

In Figure 8 , our goal is to investigate the effect of a change in the testing policy in the New-York State. We are particularly interested in estimating the effect of an increase of the number of tests on the epidemic. Indeed people commonly say that increasing the number of tests will be beneficial to reduce the number of cases. So here, we try to quantify this idea by using our model.

In Figure 8 , we replace the daily number of tests n data (t) (coming from the data for New-York's state) in the model by either 2 × n data (t), 5 × n data (t), 10 × n data (t) or 100 × n data (t).

As expected, an increase of the number of tests is helping to reduce the number of cases. However, after increasing 10 times the number of tests, there is no significant difference (in the number of reported) between 10 times and 100 times more tests. Therefore there must be an optimum between increasing the number of tests (which costs money and other limited resources) and being efficient to slow down the epidemic. 9 All rights reserved. No reuse allowed without permission. perpetuity. preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in Parameter values: are the same as in Figure 6 . In figure (a) we plot the cumulated number of cases CR(t) as a function of time. In figure (b) we plot the cumulative number of undetectable cases CU (t) as a function of time. In figure (c) we plot the cumulative number of cases (including covert cases) CD(t) as a function of time. Note that the total number of cases (including covert cases) is reduced by 35% when the number of tests is multiplied by 100.

In this article, we propose a new epidemic model involving the daily number of tests as an input of the model. The model itself is extending our previous models presented in [11, [14] [15] [16] [17] [18] . We propose a new method to use the data in such a context based on the fact that the cumulative number of tests grows linearly at the early stage of the epidemic. Figure 3 shows that this is a reasonable assumption for the New-York State data from mid-March to mid-April.

Our numerical simulations show a very good concordance between the number of reported cases produced by the model and the data in two very different situations. Indeed, Figures 5 and 6 correspond respectively to an epidemic without and with public intervention to limit the number of transmissions. This is an important observation since this shows that testing data and reported cases are not sufficient to evaluate the real amplitude of the epidemic. To solve this problem, the only solution seems to include a different kind of data to the models. This could be done by studying statistically representative samples in the population. Otherwise, biases can always be suspected. Such a question is of particular interest in order to evaluate the fraction of the population that has been infected by the virus and their possible immunity.

In Figure 7 , we compare the testing dynamic (day by day variation in the number of tests) and the reported cases dynamic (day by day variation in the number of reported). Indeed, the daily case is extremely complex, but we also obtain some relatively robust curve for the cumulative numbers. Our model give a good fit for this cumulative cases.

In Figure 8 , we compare multiple testing strategies. By increasing 2, 5, 10 and 100 times the number of tests, we can observe that this efficient up to some point 10 and but increasing 100 times is not making a big difference. Therefore, it is useless to test to many peoples and there must an optimum (between the cost of the tests) and the efficiency in the evaluation of the number of cases.

Infective Diseases of Humans: Dynamics and Control

Simple models for containment of a pandemic

The Mathematical Theory of Epidemics

Mathematical epidemiology

Mathematical models in epidemiology

Vertically Transmitted Diseases: Models and Dynamics

The basic reproductive number of Ebola and the effects of public health measures: the cases of Congo and Uganda

Modelling the COVID-19 epidemics in Brasil: Parametric identification and public health measures influence

Mathematical Tools for Understanding Infectious Disease Dynamics

Unreported cases for Age Dependent COVID-19 Outbreak in Japan

The mathematics of infectious diseases

Modeling infectious diseases in humans and animals

Understanding unreported cases in the 2019-nCov epidemic outbreak in Wuhan, China, and the importance of major public health interventions

Predicting the cumulative number of cases for the COVID-19 epidemic in China from early data

A COVID-19 epidemic model with latency period

A model to predict COVID-19 epidemics with applications to South Korea

Predicting the number of reported and unreported cases for the COVID-19 epidemics in China

Estimating the asymptomatic proportion of coronavirus disease 2019 (COVID-19) cases on board the Diamond Princess cruise ship

Covert coronavirus infections could be seeding new outbreaks

Using Early Data to Estimate the Actual Infection Fatality Ratio from COVID-19 in France

Effect of a one-month lockdown on the epidemic dynamics of COVID-19 in France

Transmission of 2019-nCoV infection from an asymptomatic contact in Germany

Mathematics in Population Biology

Estimation of the transmissionrisk of the 2019-nCoV and its implication for public health interventions

Evolving Epidemiology and Impact of Non-pharmaceutical Interventions on the Outbreak of Coronavirus Disease

Nowcasting and forecasting the potential domestic and inter-national spread of the 2019-nCoV outbreak originating in Wuhan, China: a modelling study