key: cord-0142213-z3f63fmk
authors: Pascal, Barbara; Abry, Patrice; Pustelnik, Nelly; Roux, St'ephane G.; Gribonval, R'emi; Flandrin, Patrick
title: Nonsmooth convex optimization to estimate the Covid-19 reproduction number space-time evolution with robustness against low quality data
date: 2021-09-20
journal: nan
DOI: nan
sha: f3c8533d5abf0d12384af121ae8f4b1d62612a53
doc_id: 142213
cord_uid: z3f63fmk

Daily pandemic surveillance, often achieved through the estimation of the reproduction number, constitutes a critical challenge for national health authorities to design countermeasures. In an earlier work, we proposed to formulate the estimation of the reproduction number as an optimization problem, combining data-model fidelity and space-time regularity constraints, solved by nonsmooth convex proximal minimizations. Though promising, that first formulation significantly lacks robustness against the Covid-19 data low quality (irrelevant or missing counts, pseudo-seasonalities,.. .) stemming from the emergency and crisis context, which significantly impairs accurate pandemic evolution assessments. The present work aims to overcome these limitations by carefully crafting a functional permitting to estimate jointly, in a single step, the reproduction number and outliers defined to model low quality data. This functional also enforces epidemiology-driven regularity properties for the reproduction number estimates, while preserving convexity, thus permitting the design of efficient minimization algorithms, based on proximity operators that are derived analytically. The explicit convergence of the proposed algorithm is proven theoretically. Its relevance is quantified on real Covid-19 data, consisting of daily new infection counts for 200+ countries and for the 96 metropolitan France counties, publicly available at Johns Hopkins University and Sant{'e}-Publique-France. The procedure permits automated daily updates of these estimates, reported via animated and interactive maps. Open-source estimation procedures will be made publicly available.

Context. The ongoing COVID-19 pandemic has produced an unprecedented health and economic crisis, urging for the construction of efficient monitoring procedures of its spreading across territories, a crucial step in designing sanitary, social and economic policies by national authorities [15] . It is however often not the value of the infection level per se that matters to design pandemic counter-measures (lockdown,. . . ), but its evolution along time and variations across territories. However, in the context of the Covid-19 pandemic outburst and of the sanitary crisis, collecting daily new infection counts, the basis material in any pandemic surveillance strategy, results in low-quality data (missing samples, outliers, seasonalities. . . ). Surprisingly, after 18 months of pandemic, the quality of the data collected and made available by most national health authorities in the world remains limited, which severely impairs an accurate and timely pandemic evolution assessment, the issue at the heart of the present work. Related works. Pandemic surveillance has been envisaged with several categories of tools from different fields of sciences (cf. [2] for a review). Yet, it is classically performed using compartmental models, elaborating on the classical Susceptible-Infectious-Recovered reference. For realistic use and to match social realities (social groups, contact, . . . ), these models need to be refined (cf. e.g., [5, 17] ), essentially by increasing the number of compartments, which implies increasing (in a quadratic way) the numbers of parameters to be estimated. Parameter estimation is usually achieved within Bayesian frameworks, maximizing the likelihood attached to the models, at the expense of heavy computational burdens. Such models are thus often used a posteriori (i.e., after the epidemic) with consolidated and accurate datasets. The low quality of the Covid-19 data collected by most national public health authorities across the world significantly impairs the use of such Bayesian schemes and thus the reliable estimations of these parameters. This thus precludes the use of such models for daily basis update of the pandemic evolution assessment.

Instead of compartmental models, epidemiologists massively use the so-called reproduction number, R, that measures how many new individuals are on average infected by an already contaminated person, a key marker of the pandemic strength (cf. e.g., [11, 14, 19, 27, 28] ). Estimated to around 3 during the outburst of the Covid-19 pandemic [13, 25] , it is also used as a function of time R T to monitor the temporal evolution of the pandemic. The strength of such a pandemic monitoring is to be based on a single parameter to estimate R, while accounting for the basic and universal pandemic propagation mechanisms: The number of new infections, Z T , at day T , depends on R T and on the numbers of new infections measured at previous days, {Z T −1 , Z T −2 , Z T −3 , . . .} weighted by the so-called serial interval function Φ(t). The latter quantifies the probability that someone found Covid-positive today was actually contaminated several days ago [11, 17, 19, 26] . Following [11] , R T can be estimated using a Bayesian scheme 1 . Instead, in an earlier work [1] , we proposed an estimation procedure of the spatio-temporal evolution of the reproduction number, written as a nonsmooth convex optimization problem, combining data-model fidelity with time and space regularity constraints for the estimates of R to be pandemic realistic, and solved by proximal algorithms [4, 6, 10, 20, 22] . Though promising, that proximal-optimization based estimation significantly lacks robustness against the low quality of the Covid-19 data, a generic and crucial issue addressed in the present contribution.

Goals, contributions and outline. The overall goal of this work is to devise an inverse problem-type convex-optimization procedure to assess the spatiotemporal evolution of the reproduction number R, from daily new infection counts, measured simultaneously on several connected territories, such as the different counties of a same country. The proposed estimation procedure is designed to be: i) robust to the poor quality of Covid-19 data ; ii) efficiently applicable online to update estimates, on a daily basis or whenever new data are available, at moderate computational costs even for large datasets.

To that end, Section 2 recalls the pandemic model used here [11] , frames its classical estimation and related issues. Section 3 describes the core methodological contributions of the paper: i) It discusses the poor quality of the data and proposes an extension of the original pandemic modelling that handles poor quality data as outliers (cf. Section 3.1) ; ii) It proposes an original and theoretical design of a functional whose minimization yields in a single step a joint estimation of both the reproduction number and the outliers, by combining the log-likelihood of the extended model as a data fidelity term with temporal and spatial regularity constraints for epidemiology-realistic estimates of R and sparsity constraints for the outliers (cf. Section 3.2). This functional is carefully crafted to preserve convexity so that fast and efficient proximal algorithms can be devised for minimization ; iii) The theoretical properties of the functional and of its solution (existence, unicity) are studied theoretically in Section 3.3 ; iv) This permits to devise a proximal type iterative algorithm, with explicit analytical derivation of the proximal operators, whose convergence and stopping criterion are discussed theoretically and practically (cf. Section 3.4).

The relevance of the proposed estimation procedure to assess the spatiotemporal evolution of the pandemic is illustrated on real Covid-19 data (daily counts of new infections) made available by national public health authorities (such as e.g., Santé-Publique-France, SPF) or collected by Johns Hopkins University, described in Section 4.

Estimation performance are reported in Section 5: i) First, the role and choices of the hyperparameters balancing the regularization terms and the datamodel fidelity term are discussed in detail (cf. Section 5.1) ; ii) Second, the benefits of the proposed strategy to estimate the temporal evolution of R T at the level of a given country are illustrated by comparisons against different estimation strategies, such as a two-step procedure (outliers estimated and removed first, followed by estimation of R) (cf. Section 5.2) ; iii) Third, the relevance of the space-time evolution of R estimated across territories that are connected (shared borders, massive population commutations,. . . ), here the 96 départements (counties) that organize administratively metropolitan France), is illustrated and discussed precisely (cf. Section 5.3).

The proposed procedure is efficient enough to permit the automated daily updates of the estimates of R T for 200+ countries, for the 96 continental French départements of France and for the 50 US states. Links to the resulting animated and interactive maps of estimates are provided and open-source estimation procedures will be made publicly available.

The pandemic model used here, focused on the reproduction number R, was proposed in [11] and relies on two key ideas: i) Conditionally to past counts Z 1:T −1 {Z 1 , . . . , Z T −1 }, the count of daily new infections, Z T , is modeled as a random variable drawn from a Poisson distribution Poiss(p T ), ii) whose parameter p T depends on the past counts Z 1:T −1 , on the causal serial interval function Φ T and on the current R T :

The serial interval function Φ t is a key element of the model that accounts for the epidemiology evolution mechanisms, as it models the random delays between the onset of symptoms in a primary case and the onset of symptoms in secondary cases, [11, 17, 19, 26] . For the Covid-19 pandemic and for earlier pandemics of same types, it was shown that Φ can be approximated as a Gamma function, with shape and rate parameters 1.87 and 0.28, respectively, corresponding to mean and standard deviations of 6.6 and 3.5 days, indicating a high risk of infecting other persons from 3 to 10 days after the symptoms have appeared [16, 18, 24] . It is assumed here that Φ is known and follows this Gamma approximation.

Given the analytic expression of the Poisson distribution in Model (1), the loglikelihood ln P (Z T |Z 1:T −1 ; R 1:T ) = Z T ln p T − p T − ln Z T !, permits to compute the Maximum-Likelihood Estimate,

in closed-form expression as:R MLE

It can be interpreted as a ratio of moving averages, whose size and shape are consistent with pandemic modeling (thus with Φ T ). TheR MLE T computed from real Covid-19 data (cf. Section 4) are displayed in Figs. 2 and 3 (middle plots) and display a far too large variability along time to be of any practical use in actual pandemic monitoring. Such variability mostly stems from the low quality of the data. The goal of the present work is thus to improve the estimation of R despite the low quality of the available Covid-19 data.

Covid-19 daily new infection counts made available by public health authorities are for most countries severely corrupted, with missing samples, non meaningful negative counts followed by retrospective cumulated counts spread across the following days,. . . . They also show pseudo-seasonality effects, with significantly less counts on non working days. Seasonality are however uneasy to model, as non working days are not only week-ends but also additional day-offs that depend on countries. Also, the way the pandemic has been monitored along week-ends and non-working days has significantly changed along time, even within a given country. Therefore, rather than trying to devise necessarily adhoc and parametric models for noise in Covid-19 data, we have instead chosen to refer to these different types of data corruption under the generic term of outlier, denoted O T , and we propose to model observed daily new infection counts Z T still from a Poisson distribution conditionally to past counts Z 1:T −1 , yet with a Poisson parameter p T that depends both on the current reproduction number R T and current outliers O T . Additionally, when new infection counts are monitored simultaneously for several connected territories, e.g., the counties or states of a given country, the observed daily counts consist of D time series {Z These considerations lead to a first key contribution of the present work: the modeling of corrupted daily infection counts across territories by extending Model (1) to multivariate daily new infection Poisson counts, each with an instantaneous Poisson parameter p

The goal is now to estimate jointly

The next original contribution is thus to frame the estimation into an inverse problem strategy, with a careful crafting of the functional to minimize combining the negative loglikelihood of extended Model (3) with positiveness and regularity constraints in time and space on R and structure on O.

The negative log-likelihood associated to the extended Poisson Model (3) is defined in a separable manner as

The indicator function of the singleton {(0, 0)} permits to force the (MLE) estimates of R and O to vanish when the pandemic stops, i.e., when (ΦZ)

Unlike Model (1), from which a unique MLE estimate can be derived (Eq. (2)), for Model (3) the extended likelihood (Eq. (4)) is no longer strictly convex, thus does not have a unique maximum, and hence requires additional constraints on the estimates of R and O. i) Time regularity: By nature and to be of any potential use for actual pandemic monitoring, the estimated reproduction numbers for a given territory must vary only slowly and smoothly along time. Following [1] , it has been chosen to enforce piecewise linear time evolution of the estimate of the reproduction numbers. Such a property ensures that, besides an estimate of R at day t, one also gets an estimation of a local trend, assessing whether the pandemic is growing or decreasing. Following [8, 12] , to favor piecewise linear temporal evolution of the estimates, we penalize the 1 -norm of the multivariate time-domain Laplacian D 2 : R D×T → R D×(T −2) of the estimates of R:

with (D 2 R)

. ii) Positivity: R is by nature positive. Yet, Eq. (4) indicates that, per se, the proposed extended negative log-likelihood ensures the positivity of p

itself. Thus, to enforce the positivity of R (d) t , an indicator function is added in the constraints:

iii) Spatial regularity: When different territories d are connected, it is natural to expect that their pandemic dynamics are correlated and thus that the corresponding estimates of R show some spatial regularity. Following [1] , we use a graph, with D vertices corresponding to the different territories and E edges to connections between pairs of territories. We favor piecewise constant spatial estimates ofR 1:D t across territories by penalizing the Graph Total Variation, defined as:

where d1∼d2 runs over vertices connected by an edge. The Graph Total Variation can be rewritten as a composition of a discrete difference operator and a sparsity-promoting 1 -norm, GTV(R) = GR 1 , defining the linear operator G:

The discussion in Section 3.1 suggests that outliers have a sparse structure: They occur with significant values only on specific days (sundays, day-offs,. . . ) and are essentially negligible elsewhere. Sparsity is classically enforced in inverse problem settings by constraining the 1 -norm. We thus further propose to impose the following constraint to the estimates of the outliers:

Combining the data-fidelity term of Eq. (4) and the penalizations of Eqs. (7), (8) , (9) , and (11), we propose to obtain the estimates of R and O by minimizing:

with λ T > 0, λ S > 0 and λ O > 0, regularization hyperparameters, balancing the strengths of the different constraints one against the others and against the likelihood.

To study the theoretical properties of Problem (12) , and to derive a minimization algorithm, it is useful to recast Eq. (12) into a generic formulation:

with L a linear operator and H a function enforcing constraints promoting certain sparsity / non-negativity, defined as:

Showing that Eq. (12) provides well-defined estimates of R and O requires to prove the existence of a minimizer of F (·) + H(L·). To that aim, preliminary results are necessary. Proof. First, Lemma 1 ensures the convexity of the data-fidelity term F . Then, the function H, appearing in Eq. (13), involves 1 -norms and an indicator function which are convex, thus it is convex and the composition of H with the linear operator L is convex as well. As the sum of two convex terms, the penalized Kullback-Leibler functional is convex.

Further, because the Kullback-Leibler data-fidelity term F (Eq. (4)), as well as the indicator function in Eq. (8), may take infinite values, the domain of the convex functional in Eq. (12) is not the whole space R D×T 2 , but restricts to a domain Ω of feasible variables, described in Lemma 2.

Lemma 2. The domain Ω of feasible points for Problem (12) consists of pairs

It is convex and closed.

Proof. Conditions (i) and (ii) reflect the fact that the extended Kullback-Leibler data-fidelity term F (Eq. (4)) must be finite for feasible time series, while Condition (iii) stems from the positivity constraint enforced by the indicator function in Eq. (8) . Further, each Condition (i) to (iii) defines a closed convex set of feasible time series, the finite intersection of closed convex sets being closed and convex as well, Ω is a closed convex set.

Then, the existence of a minimizer can be proven. Proof. Since, for feasible points, the standard Kullback-Leibler divergence, the indicator functions and the 1 -norms are nonnegative, the functional is lower bounded by zero. Let (R, O) 2 R 2 2 + O 2 2 be the canonical norm on the product space R D×T 2 . Let 1 (resp. 0) the multivariate time series with all entries equal to 1 (resp. 0), (1, 0) ∈ Ω is a feasible point. Let

As shown below, the functional F (·|Z) + H(L·) is coercive, thus there exists α > 0 such that

Let ∆ α be the closed ball of radius α in R D×T 2 for the canonical norm · 2 .

Since the feasible set Ω is closed, the intersection Ω ∩ ∆ α is bounded and closed, hence compact as a subset of a finite-dimensional Hilbert space. Further, the functional F (·|Z) + H(L·) being continuous over Ω ∩ ∆ α it reaches its minimum over this set at some point (

showing that the relative minimum (R,Ô) on Ω ∩ ∆ α is to be a global minimum on the whole domain of feasible points Ω. To conclude the proof, we need to compute α satisfying (17), which follows from:

ii) Since ∀z ≥ 0, lim p→∞ d KL (z|p) = ∞ there is a constant c = c(Z) such that D KL (Z|Q) > µ as soon as Q ∞ > c. Therefore, with c min u,d (ΦZ)

In finite dimension, the equivalence of the norms yields constants C O and C R such that, (17). The regularizing functional in Eq. (12) not showing strict convexity, investigating the uniqueness of its minimizer requires further analysis, beyond the scope of this work. Nevertheless, leveraging the strict convexity of the standard KLD in Eq. (4), a uniqueness property for the estimate of the Poisson parameter p 

If g is strictly convex, then there existsŷ ∈ R n , such that for any minimizerx of (18),ŷ = Bx. = 0: For these times and territories, the estimated instant Poisson parameter is forced to cancel by the indicator function involved in Eq. (4). Then, restricting to the indices for which Z and ΦZ are not vanishing simultaneously, the data-fidelity term F is strictly convex and the uniqueness of P follows by the same reasoning.

From Corollary 1, the penalized Kullback-Leibler functional is convex, and, from Theorem 1, it has at least one minimizer. The 1 -norm and the indicator functions in the objective function make the overall functional non-smooth, preventing from the use of standard gradient descent. Minimizing (12) thus requires more advanced tools, handling nondifferentiable functions, such as proximal algorithms.

Let L a bounded linear operator L, its adjoint operator is denoted L * and its operator norm is L op = sup{ LX 2 , X| 2 ≤ 1}. The convex conjugate of the proper, lower-semicontinuous function H : X → R, is defined as H * (Y ) = sup X∈X Y, X − H(X) [3] , with X a Hilbert space. For τ ∈ R + the proximal operator of τ H at Y ∈ X reads by definition: prox τ H (Y ) arg min X∈X 1/2 X − Y 2 + τ H(X).

In this work, we chose to implement the Chambolle-Pock primal-dual minimization scheme [7] . Making use of the compact reformulation in Eq. (13), we perform here an explicit particularization of the Chambolle-Pock algorithm detailed in Algorithm 1. From [7] , Algorithm 1 converges toward a minimizer of Problem (13) , if the descent parameters τ , σ satisfy τ σ L 2 op < 1.

Therefore, the actual and practical use of Algorithm 1 and enforcement of Condition (19) , requires to evaluate, at least, an upper bound of L op .

Algorithm 1 Primal-dual minimization of the penalized Kullback-Leibler (12) for the estimation of reproduction numbers and outliers.

Require: Infection counts: Z ∈ R D×T and (ΦZ) ∈ R D×T Choose descent parameters: τ, σ > 0, precision: ε Max. iterations: k max , length of window: k smooth Initialization

Update the dual, primal and auxiliary variables

Computation of the convergence criteria

k ← k + 1 end while Proposition 2. Let L be defined in Eq. (14) , then

Proof. By definition of the operator norm · op , one has

In practice, equal descent step sizes are chosen, so that they saturate Condition (19) , replacing the unknown L op by its upper bound derived at Proposition 2, leading to

For proximal algorithms, such as Algorithm 1, the key ingredient to achieve an actual and fast implementation is to derive a closed-form expression for each proximal operator involved. Proximal operator of the regularizing function: The dual variable update in Algorithm 1 involves the proximal operator of the convex conjugate of the penalty H defined in (15) . In practice, first, the proximal operator of H is computed, and then, the proximal operator ofits convex conjugate H * is derived applying Moreau's identity, which states that, for σ > 0, prox σH * (X) = X − σprox H/σ (X/σ). Then, function H/σ being fully separable, both over its four entries and over the components of each entry, its proximal operator can be computed component-wise. These computations involve two standard closed-form proximal operators: i) the proximal operator of the absolute value, known as the soft-thresholding, prox σ|·| (q) = max {|q| − σ, 0} q/|q|, and ii) the proximal operator of the indicator function, which amounts to a projection: prox ι ≥0 (q) = max {0, q}. Adjoint of the linear operator: Updating the primal variable requires first to compute L * :

, which can be expressed in terms of the adjoint operators of D 2 and G, via

Proximal operator of the data-fidelity term: The extended negative log-likelihood 

The final step in the actual of an iterative algorithm is to choose a stopping criterion. Classically, the convergence of Algorithm 1 is quantified by the decrease of the normalized increments of the objective functional. Fig. 1 shows that such increments (dashed blue curves) display large and erractic fluctuations, thus precluding their use to construct a reliable stopping criterion. Instead, and to provide a robust convergence criterion, we propose to make use of a nonlinearly smoothed normalized increments (solid black curve in Fig. 1 ), obtained as a maximum over a sliding window over the past k smooth iterations, as detailed in Eq. (23) . Hence, in our concrete experiments, we set k smooth = 500 and Algorithm 1 is stopped whenever the smoothed increment reaches a precision ε, set to 10 −7 , or when the number of iterations exceeds a maximum of k max iterations, set to 10 7 . 

To show the relevance and efficiency of the proposed procedure for the estimation of the evolution of the reproduction number of the pandemic, two real Covid-19 datasets made available by national health authorities are used, different in nature and from different international repositories.

Johns Hopkins University 2 provides access to the cumulated counts of daily new infections (together with deceased and recovered persons), at the entire population level, on a per country basis, for 200+ countries worldwide, since January 1st, 2020, hence, essentially since the earliest stages of the Covid-19 pandemic 3 . In the present work, use will be made of the daily new infection counts only, {Z 1:T }, from Feb., 15th, 2020 until until August, 31st, 2021, hence T = 586 days of pandemic.

Santé-Publique-France 4 , the French national health authorities, provides several different datasets related to the Covid pandemic in France. In the present work, use will be made of the daily hospital-recorded new infection counts, across France on a per département-basis 5 , départements (or counties) consisting of the usual French administrative granularity. These new infections counts are stacked into a Z ∈ R T ×D data matrix, for the 101 French départements, and estimation of the reproduction number will be conducted jointly for all départements. Such data are however available only after March 19th, 2020 and analyzed until August, 31st, 2021, hence T = 531 days of pandemic.

5 Robust estimation from Covid-19 data

One of the issue at hand for a practical use of Eq. 12 to estimate jointly R and O lies in the selection of the hyperparameters, λ T , λ S , and λ O that balance the contributions of the regularizations terms one against the others as well as against the data-model fidelity term. Such choices have significant impacts on the achieved estimates. Accurate parameter tuning is thus critical yet may prove time consuming notably when this selection needs to be made for each of the 200+ countries studied here, with possibly very different population sizes or pandemic intensities. Automated data driven selection of hyperparameters in functional minimization has been addressed theoretically in several different settings (cf. e.g. [21] ) and references therein), often relying on a Stein Unbiased Risk Estimator (SURE). This is working well only for large size data, which is however not the case for the Covid-19 pandemic.

Instead, in the present work, hyperparameter selection was driven by the following dimensional analysis. Inspecting Eq. 12 shows that the functional to minimize is covariant under the change (Z, O; λ T , λ S , λ O ) → (αZ, αO; αλ T , αλ S , λ O ) for any α > 0, thus leading to identical minima. This led us to propose to use one same set of hyperparameters (λ T , λ S , λ O ) valid for all countries, irrespective of their population sizes or pandemic intensities, by applying minimizations in 12 to αZ instead of Z, where α are arbitrary multiplicative constants, that may depend on each country. We found empirically that setting systematically α to the empirical standard deviation of the observed Z constitutes an efficient way to account both for the population size and for the severity of the pandemic. Precisely, λ T = 3.5 is chosen as a universal (for all countries) time constant, after inspections of numerous different countries over the whole pandemic period, as a valid trade-off between too small -that would lead to too much variability in the estimation of R and would prevent the use of R to actually assess the strength of the pandemic -and too large -that would lead to too few variability hence preventing the possibility of detecting changes in the pandemic dynamics that may follow some sanitary policy decisions (lockdown, vaccination). Further, λ O is set to 0.025 to enable the outlier term in the functional to be versatile enough to account for pseudo-seasonal effects modeled as sparse irrelevant measures. Finally, for different D = 96 territories, the connectivity of the graph for metropolitan France contains 475 edges, indicating that the time and space regularizations will have comparable impacts when 96λ time 2 × 475λ space , which frames the selection of λ space . Inspection of the results lead to choose λ space = 0.002, thus corresponding to a mild spatial regularization. 

Because Public Health Policies were not or weakly coordinated across countries, even for closely intertwined spaces such as Western European countries, population level daily new infection counts (Dataset1, JHU) are first analyzed independently for each country, that is, without spatial regularization, or setting λ S ≡ 0 in (12) .

Estimates of the reproduction numbers and the outliers,R O ,Ô O are thus obtained for each country by applying (12) to daily infection counts Z 1:T , after normalisation by standard deviations computed across the whole pandemic period, using the same hyperparameters for all countries : (λ T , λ S , λ O ) = (3.5, 0, 0.025).

To assess the relevance of the proposed one-step procedure,R O ,Ô O are compared against estimates obtained from a two-step procedure developed in [1] , R noO : First, a sliding-median over a 7-day window is applied to Z 1:T , and values that depart from the window median by ±2.5 local (in-window) standard deviation are replaced by the window median, yielding estimates of outliers referred to asÔ noO ; Second, an estimateR noO of R is obtained by applying Eq. 12 with (λ T , λ S , λ O ) = (3.5, 0, +∞), to the a priori denoised infection counts after normalization by standard deviation. Fig. 2 compares the estimatesR M LE , (R O ,Ô O ) and (R noO ,Ô noO ), for France across the whole pandemic period, and also focuses on a recent narrow period of time to ease illustration and analysis. Fig. 2 shows that, for the pseudo-seasonal effect related to new infection count under-reporting due to non-working days, both procedures essentially correctly detect and estimate the corresponding outliers. For the subsequent working-day over-reporting performed by the French public health authority, Fig. 2 clearly shows that the proposed one-step procedure correctly accounts for such small outliers, thus yielding a smooth and outlier-robust estimation of R, while mis-reporting are missed by the two-step procedure, that thus produces estimates with numerous discontinuities irrelevant in terms of pandemic monitoring and clearly driven by these mis-reporting. The period of early April 2021 is of great interest, corresponding to a long weekend effect, with a normal week-end, followed by an alternation of working and non working days, resulting in anomalously low infection counts, followed by high counts (misreport correction), during intertwined and following working days. The two-step procedure yields for that period totally irrelevant estimates, while the proposed one-step procedure remains satisfactorily insensitive to such count misreports. These pseudo-seasonal or long week-end effects are not specific to France and can be observed across numerous other countries. 

Let us now consider multivariate daily infections counts, Z ∈ R T ×D , for the D = 96 départments of metropolitan France, considered as the vertices of the graph G in Eq. 12, with edges between départments sharing a terrestrial boundary. Estimates (R O,S ,Ô O,S ) ∈ R T ×D are obtained by applying Eq. 12 to Z ∈ R T ×D , after standardization independently per départment, with (λ T , λ S , λ O ) = (3.5, 0.002, 0.025) (cf. Section 5.1).

To illustrate the relevance of (R O,S ,Ô O,S ), they are compared against estimates obtained without spatial regularization and without accounting for outliers (R noO,noS ,Ô noO,noS ) ((λ T , λ S , λ O ) = (3.5, 0, +∞)), without spatial regularization but accounting for outliers (R O,noS ,Ô O,noS ) ((λ T , λ S , λ O ) = (3.5, 0, 0.025)), with spatial regularization but without accounting for outliers (R noO,S ,Ô noO,S ) ((λ T , λ S , λ O ) = (3.5, 0.002, +∞)).

These estimates are first compared as functions of time for two départments in Fig. 4 , which shows that estimates that do not account for outliers are far too irregular for being useful in pandemic monitoring, even with spatial regularization. To the converse,R O,S that results from both accounting for outliers and spatial regularization yields very regular estimates likely to reflect a relevant assessment of the pandemic. The proposed estimation procedure is of particular interest when applied to territories with granularity level as is the case for the French départments (with on average slightly less than one million inhabitants) or during low activity phases of the pandemic, such as early summer 2020 (days 95 to 155). Spatial regularization is of particular relevance for theses two chosen départments. Indeed, départment Paris (Id=75) corresponds to the city of Paris, thus a small territory yet with massive connection to the surrounding départments : estimates of R within Paris, can thus not notably differ from those of these so-called Paris-Crown départments. The same holds Figure 4 : Estimates of R t for départment Paris (Id=75) (right) and départment Rhône (Id=69) (left):R O,S (accounting for outliers and with spatial regularization, red),R noO,S (not accounting for outliers but with spatial regularization, magenta),R O,noS (accounting for outliers but without spatial regularization, blue),R noO,noS (not accounting for outliers and without spatial regularization, cyan).

for départment Rhône (Id=69), of very small surface (for historical reasons) yet acting, with the city of Lyon as a massive transit hub, for a number of surrounding départments. Further, the estimates of R are for continental France are reported in Fig. 5 ) for three different days. Estimates ofR O,S with spatial regularization and accounting for outliers (right most plots) permits a far clearer assessment of the status of the pandemic across metropolitan France. EstimatesR noO,S and R noO,noS show far too much spatial variability to be realistic estimates. Es-timatesR noO,S improve spatial regularity, but still show significant variability induced by data low quality. EstimatesR O,S , with spatial regularization and accounting for outliers (right most plots), likely yield the most likely and relevant estimates each day and permit a far clearer assessment of the status of the pandemic across the connected metropolitan France départements. March, 31st, corresponds to the pandemic 1st-wave maximum andR O,S shows a clear North-East/South-West gradient in the strength of the pandemic ; March, 31st, 2021 corresponds to the start of the third lockdown period, referred to as couvre-feu (corfew) in France andR O,S shows that the pandemic was significantly progressing uniformly across all France ; August, 31st, 2021 corresponds to the time of submission of this work and shows that the pandemic is globally decreasing in France, yet non uniformly and still active in large areas. Estimateŝ R O,S are updated automatically on a daily basis for the 96 continental France départements and made publicly available via interactive and animated maps (cf. Section 6).

The present work developed and assessed an inverse problem-type procedure to estimate the spatio-temporal evolution of the Covid-19 reproduction number, R, robust to the low quality of the pandemic data. The devised procedure relies first on a detailed analysis of the data quality that suggested that most count misreports, be they mild and seasonal (sundays, week-end, days-off) or large and random (report failure), can be efficiently accounted for as sparse outliers. Second, it is based on devising carefully a functional that balances a data-model fidelity term (Poisson distribution and outliers), and regularity in time and space properties thatR needs to fulfill to be of actual relevance in practical pandemic monitoring. Fast and efficient algorithms were devised to minimize this functional and their convergence was studied. Applied to real Covid-19, the procedure was shown to yield meaningful estimates of R, that thus provides a relevant assessment of the spatio-temporal evolution of the pandemic that can be involved as part of a decision strategy for designing Covid-19 counter measures. Indeed, at a current time, the procedure outputs estimates of R for the entire pandemic period, thus permitting to assess a posteriori if and how the implementation of a sanitary policy decision (lockdown, curfew,. . . ) impacted the evolution of the pandemic. Also and importantly, while the proposed procedure is not forecasting R, it is actually implementing a nowcasting approach: Not only estimates of R today are provided, but the imposed piecewise smoothness constraints provide epidemiologist with a local trend around at current time indicating whether the pandemic is progressing or regressing. Spatial regularization also permits to robustly assess homogeneity or heterogeneity of the pandemic across related territories, thus permitting to decide on local or global (nation-wide) measures.

Estimates are updated automatically on a daily basis for 200+ countries worldwide and for several related territories (Counties in France, States in the USA). They are made publicly available via downloadable text files or interactive and animated maps 6 . Procedures will be made publicly available.

Future investigations to improve estimation and monitoring include the construction of confidence intervals for the estimates and automated data-driven selection of the hyperparameters, using adaptive strategy that adjust the variations and phases of the pandemic.

We believe that the present work constitutes a significant step forward toward the practical use of the proposed procedure permits an actual real-time and on-the-fly monitoring of the Covid-19 pandemics. These tools are ready to be put at work immediately at the outbreak of potential future pandemics. Finally, the automated production of these daily up-dates is also intended as a scientific tool to favor transdisciplinary scientific work against Covid-19 impact on the society at large.

Spatial and temporal regularization to estimate COVID-19 reproduction number R (t): Promoting piecewise smoothness via convex optimization

Describing, modelling and forecasting the spatial and temporal spread of COVID-19-A short review

Convex Analysis and Monotone Operator Theory in Hilbert Spaces

Convex Analysis and Monotone Operator Theory in Hilbert Spaces

Mathematical models in epidemiology

Image restoration: Total variation, wavelet frames, and beyond

A first-order primal-dual algorithm for convex problems with applications to imaging

Nonlinear denoising for solid friction dynamics characterization

A Douglas-Rachford splitting approach to nonsmooth convex variational signal recovery

Proximal splitting methods in signal processing

A new framework and software to estimate time-varying reproduction numbers during epidemics

Sparsest Continuous Piecewise-Linear Representation of Data

Expected impact of lockdown in Ile-de-France and possible exit strategies

On the definition and the computation of the basic reproduction ratio R0 in models for infectious diseases in heterogeneous populations

Covid-19 cacophony: is there any orchestra conductor? The Lancet

The impact of a nation-wide lockdown on COVID-19 transmissibility in Italy

Measurability of the epidemic reproduction number in data-driven contact networks

Epidemiological parameters of coronavirus disease 2019: A pooled analysis of publicly reported individual data of 1155 cases from seven countries

The R0 package: A toolbox to estimate reproduction numbers for epidemic outbreaks

Proximal algorithms

Automated data-driven selection of the hyperparameters for total-variationbased texture segmentation

Waveletbased image deconvolution and reconstruction

Parallel proximal algorithm for image restoration using hybrid regularization

Epidemiological characteristics of COVID-19 cases in Italy and estimates of the reproductive numbers one month into the epidemic

Estimating the burden of SARS-CoV-2 in France

Improved inference of time-varying reproduction numbers during infectious disease outbreaks

Reproduction numbers and subthreshold endemic equilibria for compartmental models of disease transmission

Different Epidemic Curves for Severe Acute Respiratory Syndrome Reveal Similar Impacts of Control Measures