key: cord-0193492-2rsf9dpi authors: Kreiss, Alexander; Keilegom, Ingrid Van title: Semi-Parametric Estimation of Incubation and Generation Times by Means of Laguerre Polynomials date: 2020-12-31 journal: nan DOI: nan sha: 08e3e44e06dd979904a2ec2282f3dfa3dddae3fa doc_id: 193492 cord_uid: 2rsf9dpi In epidemics many interesting quantities, like the reproduction number, depend on the incubation period (time from infection to symptom onset) and/or the generation time (time until a new person is infected from another infected person). Therefore, estimation of the distribution of these two quantities is of distinct interest. However, this is a challenging problem since it is normally not possible to obtain precise observations of these two variables. Instead, in the beginning of a pandemic, it is possible to observe for infection pairs the time of symptom onset for both people as well as a window for infection of the first person (e.g. because of travel to a risk area). In this paper we suggest a simple semi-parametric sieve-estimation method based on Laguerre-Polynomials for estimation of these distributions. We provide detailed theory for consistency and illustrate the finite sample performance for small datasets via a simulation study. A dominating question in the public evaluation of the COVID-19 situation during the 2020 pandemic is the estimation of the basic reproduction number R 0 , the number of new infections which are caused (on average) from a single infected individual. While this number is prominently discussed in the news about COVID-19, it is an important variable for disease transmission in general (Wallinga and Lipsitch [2007] , Leavitt et al. [2020] ). In order to estimate the reproduction number of a disease the so-called generation time G plays an important role (see Euler-Lotka-Equation e.g. in Britton and Scalia Tomba [2019] ). The generation time G is defined as the time difference between the infection time of a randomly chosen infected individual and his or her infector. Let ϕ G denote the density of the generation time G and let i(t) be the expected incidence at time t assuming a certain model of transmission ("the average community rate of new infections", Britton and Scalia Tomba [2019, p. 2] ). By formulating a renewal equation for i(t) and assuming an exponential growth i(t) = Ce rt , one obtains R −1 0 = F R 0 (G) (the Euler-Lotka-Equation, see e.g. Britton and Scalia Tomba [2019] , Wallinga and Lipsitch [2007] ), where (1.1) Hence, the basic reproduction number R 0 is a function of ϕ G . We suppose to observe transmission pairs, i.e, two infected people A and B where it is known that A, the index case, infected B, the secondary case. We observe for A and B their times of symptom onset and we observe, in addition, that the infection of the index case happened in a certain interval. We do not assume information about the infection of the secondary case through the index case. Such data can arise e.g. if the index case got infected during a travel to a region where the virus is circulating and infected the secondary case back home where the virus is not spreading yet (see Lauer et al. [2020] , , Xia et al. [2020] for examples related to travel to and from Wuhan in the early days of the pandemic). This leads to observations of the infection times with measurement error. Hence, the generation time and the incubation times I 1 , I 2 (the time from infection to symptom onset) are not directly observed. The serial interval S (the time between the two symptom onsets), in contrast, can be observed. Coming back to the example of estimation of F R 0 (G) (and hence R 0 ), the most natural estimator, i.e., replacing the expectation by the sample average, is not feasible from our observations because the exact infection times are not observed. In reality it is rarely possible to make observations of G directly for the same reason (cf. Nishiura et al. [2009] ). However, the urgency of the situation requires estimation with imperfect data (Ferretti et al. [2020b] ). We can write down a likelihood for observing the symptom onset times and the exposure interval. This likelihood can be written in terms of ϕ G and other quantities. If it was possible to formulate a family of generation time densities which is indexed by R 0 we could simply maximize the likelihood over R 0 . In view of (1.1) this seems difficult. We follow therefore a different idea: Find a non-parametric estimateφ G,n of ϕ G and study F R 0 (φ G,n ) −1 as an estimator of R 0 . Such plug-in estimators have been studied e.g. in Shen [1997] and it was argued there that smoothness of F R 0 can compensate for typical drawbacks of non-parametric estimators like a slow convergence rate. This approach is not limited to R 0 but can be applied to other interesting functions F of ϕ G , we call them features of G. One example are tests: Some methods for estimation of R 0 make implicit assumptions about the generation time (cf. Wallinga and Lipsitch [2007] ). Testing for such assumptions can be realized within this framework if the test statistic T (φ G,n ) can be written as a function of an estimateφ G,n of ϕ G , e.g. similarly to Härdle and Mammen [1993] who use an L 2 type test-statistic. Further interesting features of ϕ G are for instance variances, mean values, quantiles or the probability of pre-symptomatic infection P(G ≤ I) where I denotes the incubation time (the time from infection to symptom onset). This is a feature of the joint variable (G, I). It is therefore interesting to study efficient estimation of general features of G or of (G, I). We will focus in the following on features of G. More precisely, we will estimate F (G) by firstly finding a semi-parametric sieve-estimatorφ G,n of ϕ G based on Laguerre polynomials and consider then the plug-in estimator which estimates F (G) through replacing ϕ G byφ G,n in the definition of F . In the beginning of a pandemic it is important to estimate its transmission characteristics ) and it is common to replace the unobserved generation time G by the serial interval S (cf. Britton and Scalia Tomba [2019] , Ferretti et al. [2020b] , Wallinga and Lipsitch [2007] ), because this can be observed in clinical studies. In certain situations this can be a promising approach, cf. Remark 2.2 below. However, this practice can also yield biased estimates (cf. Britton and Scalia Tomba [2019] ). In this paper, we avoid this issue by estimating ϕ G directly. Moreover, since the interest lies in a function of ϕ G rather than ϕ G itself, it might be desirable for researchers to not impose parametric assumptions on ϕ G but to consider a non-parametric approach instead. In addition, a non-parametric approach avoids issues like selecting a parametric family or important aspects being hidden through this choice (cf. Groeneboom [2021] ). The main contribution of this paper is to provide the first step towards efficient, semiparametric estimation of features of the generation time: Consistent density estimation. Based on such an estimator the theory for efficient, semi-parametric inference methods about features of ϕ G can be developed. To this end we extend the methodology of Ferretti et al. [2020b] , Ganyani et al. [2020] in two ways: Firstly, we estimate the densities of the incubation period and the generation time jointly in one step with the same data and, secondly, we provide a semi-parametric framework which makes no a-priori assumptions about the incubation and generation times. We pursue this by constructing a sieve estimator based on Laguerre polynomials highlighting the flexibility of Laguerre polynomials as approximating functions. In order to identify the model we need to specify a model for the relation between the exposure window and the infection time but we will not make parametric assumptions about the distributions of the incubation period and the generation time. Thus, we will do semi-parametric estimation. General background information about data analysis in disease transmission can e.g. be found in Held et al. [2019] , Chowell et al. [2009] . The estimation of R 0 from observations of the serial interval in particular is discussed in Lipsitch et al. [2003] . As an alternative Ferretti et al. [2020b] suggest the following two step procedure: Firstly, fit a parametric model for the incubation time (cf. Lauer et al. [2020] , ) and then, secondly, use this to fit a parametric model to the generation time. Groeneboom [2021] describes how to do non-parametric estimation of the incubation time. Problems related to under-reporting or delays (cf. Azmon et al. [2014] ) are not considered here because we have data in mind which was collected by researchers rather than observational data from self-reporting. In addition to this type of experimental data, it is also possible to collect larger sets with covariates and use prediction techniques to identify transmission pairs (cf. Leavitt et al. [2020] ). Specific parametric results about the above mentioned quantities for Covid-19 can e.g. be found in , Ferretti et al. [2020b] , Ganyani et al. [2020] , Lauer et al. [2020] , Tindale et al. [2020] and in many other places. The exact mathematical setting described below is close to de-convolution and measurement error settings, classical results about which can be found for example in Fan [1991] , Devroye [1989] , Carroll et al. [2006] . General background about sieve estimators and series estimation can be found in Newey [1997] , Chen [2007] and the references therein. Ai and Chen [2003] , Newey and Powell [2003] use sieve estimation based on moment conditions while we start from a likelihood. Further reading about plug-in sieve estimates can be found in Shen [1997] , Chen and Shen [1998] . The structure of this paper is as follows: In Section 2 we introduce the exact modelling framework. Afterwards, in Section 3, we provide a sieve estimator for the incubation pe- riod and generation time based on Laguerre polynomials. Its properties will be discussed in Section 4 and we show that our estimator is asymptotically consistent. The asymptotic distribution of the methodology will be studied in Section 5 by means of simulations and it will be applied to a real-world dataset consisting of 191 SARS-COV-2 transmission pairs that has also been used by Hart et al. [2021] and Ferretti et al. [2020a] . The R-code which is used for these computations is available on github (https://github.com/akreiss/ SemiParametric-Laguerre.git). Finally, we finish the paper with some concluding remarks in Section 6. Additional simulation results and some proofs are collected in the Appendix. We study observations of transmission pairs as shown in Figure 1 . The first person, the index case, gets infected at time T 1 which is unobserved. However, it is known to lie in the interval [0, W ], the exposure period. The exact conditional distribution of T 1 in [0, W ] is allowed to depend on a random variable C which determines for example the location of the infection. In certain situations, it will turn out that the likelihood will depend only on min(W, S 1 ), where S 1 ≥ T 1 denotes the time at which Person 1 shows symptoms. In that sense, observation of W is only then required if W ≤ S 1 . Otherwise we do not have to observe W but we have to observe that W ≥ S 1 . The incubation period is defined as I 1 := S 1 − T 1 . Moreover, Person 1 is known to have infected Person 2, the secondary case. Similarly as above, we define S 2 , T 2 , I 2 for the second person: Person 2 shows symptoms at the observed time S 2 . But, of course, the infection time T 2 and hence the incubation time I 2 := S 2 − T 2 are unobserved. The generation time is thus G := T 2 − T 1 and it is also unobserved, while the serial interval S := S 2 − S 1 can be observed. Such data was for example collected by Ferretti et al. [2020b] , , Lauer et al. [2020] . The following relations can be directly read off from the definitions and Figure 1 . ..,n be iid copies of (W, T 1 , C, S 1 , S 2 , G). However, we suppose that our observations are only given by (S 1,i , S 2,i , W i , C i ) i=1,...,n , where W i := min(S 1,i , W i ). Based on our observations, we can only make direct inference about the joint distribution of (S 1 , S 2 ) and the serial interval S := S 2 − S 1 . But we have only indirect information about I, G and T 1 by means of (2.1)-(2.2) (note that (2.3) is the difference of (2.1) and (2.2)). Informally speaking, we have three unknown densities and two equations. In other words, identification of the joint-distribution of the three random variables (I, G, T 1 ) from a joint distribution of the two variables (S 1 , S 2 ) is impossible without additional assumptions. Throughout we will assume the following to be true: Assumption (M): Model I 1 , I 2 and G are non-negative and C takes values in {1, ..., K C }. Their distributions have the following densities (with respect to the Lebesgue measure) or count measures and relate as follows ∼ ϕ G , independent of (I 1 , I 2 , T 1 , W, C). (2.4) When inspecting the likelihood of Ferretti et al. [2020b] and the assumptions of Ganyani et al. [2020] , we see the same independence assumptions for I 1 , I 2 , G and T 1 . Independence between I 1 and T 1 can also be found in Groeneboom [2021] . The variable C can be understood as covariate and will be interpreted as the location of the infection. It can be used to capture heterogeneity in the infection dynamics: In a location with an increasing number of infections, it is more likely that the infection time T 1 lies towards the end of the infection window [0, W ] conditionally on (C, W ). In contrast, in locations with a low number of infections, T 1 is possibly uniformly distributed in [0, W ]. See Remark 2.1 below for further details and a motivation for certain choices of h. It has been reported (see e.g. Tindale et al. [2020] ) that the transmission dynamics, i.e., ϕ I and ϕ G , can be different in different scenarios. Thus, we need to restrict to data coming from the same environment. Moreover, the above assumption means that we observe true transmission pairs (see e.g. Tindale et al. [2020] for a parametric model which allows for an unknown number of intermediate infectors). Remark 2.1. All discussion below is conditional on W . The dataset we imagine is recorded in the beginning of a pandemic, e.g. a person comes home after travelling to a region where the virus is circulating. Suppose that the time points, at which the person of interest has an infectious contact (i.e. a contact which definitely leads to an infection), are the jump points of a counting process N with intensity function λ : It is plausible that the intensity of infection is proportional to the number of infections in the population. Since (in most cases) the infection numbers behave exponentially, we also consider λ(r) = λ 0 exp(−α(W − r)) ∼ exp(αr), where α ∈ R describes the exponential growth (or decay) of the number of infections. In this notation, T 1 , the time of infection, is the first jump of N . In an early stage of the pandemic it is plausible that there is only one infectious contact. Then, using that counting processes have independent increments and that the number of jumps in a certain interval is Poisson distributed with parameter equal to the integral of the intensity over the respective interval, we obtain Deriving the distribution function above yields that the density of T 1 is proportional to exp(−α(W − s)). Thus, we will later consider primarily the situation in which h(u|c) = exp(−r(C)u), where r(C) is the (known) exponential growth of the infection numbers in location C. Note that taking α → 0, i.e., the case of a constant intensity function, yields P(T 1 ≤ s|N (t) = 1) → s/t (the distribution function of a uniform distribution on [0, t]). The discussion in Remark 2.1 is then plausible, if it is plausible to assume that the person of interest has infectious contacts with a rate which is so low/the time W is so short that there is only one infectious contact in [0, W ]. In particular in the beginning of a pandemic this seems reasonable. Note that for the subsequent infection (within the observed transmission pair) this assumption is no longer valid: It is, intuitively speaking, more likely that infections, e.g., in a household happen directly after the infected person has returned, rather than days after. Note finally that in the special case r(c 0 ) = 0 for some location c 0 , i.e., if the infection level remains constant, we get h(·|c 0 ) = 1 (then n(w, c 0 ) = 1/w), i.e. T 1 ∼ U([0, W ]) (uniform distribution). From a mathematical standpoint, this case leads to another very intuitive result: The conditional distribution of S 1 given W and C = c 0 is given by the convolution through (2.1) The conditional density of S 1 can then be obtained through differentiation of the above with respect to x and is given by Hence, the likelihood of observing S 1 = x is given by the probability of observing an incubation period between x − W and x, the natural lower and upper bounds. This way we can identify ϕ I from observations of S 1 . Then, in a second step, we can use (2.3) to identify ϕ G from observing S. Hence, under the assumption that T 1 ∼ U([0, W ]) conditionally on W , we can identify the distributions of I and G from the distribution of (S 1 , S 2 ). We make this mathematically precise also in the general case in Corollary 4.4 below. We mentioned in the introduction that using the serial interval in place of the generation time can lead to biases. We discuss in the next remark in which situations this practice is safe to use and we argue that it does not work for R 0 . Remark 2.2 (Serial Interval vs. Generation Time). Consider the set-up from Assumption (M) and let ϕ S denote the density of S. Our interest lies in estimation of F (G), where F is an arbitrary feature of G. As a first example let F be linear, e.g. F (G) := E(G) = R tϕ G (t)dt be the expectation. In that case by (2.3) and F (G) can probably be well (maybe even optimally) estimated by estimating the integral on the right based on observations of the serial interval alone. For the case of the basic reproduction number, however, F R 0 (G) = E(exp(−rG)) is a moment generating function, i.e., it is non-linear. Other examples of interesting non-linear features of interest are quantiles, a quadratic test statistic or the probability of pre-symptomatic infection P(G ≤ I 1 ). For these features, we cannot simply estimate the serial interval in place of the generation time, but it is necessary to have an estimate of the density ϕ G available. Remark 2.2 together with the unavailability of observations from G shows that estimation of features like R 0 is non-trivial and requires more complicated methods. Before turning to the estimators we make a remark about asymptomatic patients. Remark 2.3 (Asymptomatic Patients). It is unclear how asymptomatic patients can be handled in this framework. However, it should be emphasized that asymptomatic means here infected people who infect others but who never show symptoms. Pre-symptomatic infections are well allowed, i.e., Person 1 is allowed to infect Person 2 before showing symptoms. However, we assume here that Person 1 and Person 2 both will eventually show symptoms. We introduce now our estimator by specifying approximations to ϕ I and ϕ G . Consider the following class of densities on the non-negative real line for m ∈ N 0 := {0, 1, 2, ...} where · 2 is the Euclidean norm and being the k-th Laguerre polynomial. Since the Laguerre polynomials form an orthonormal system of functions on [0, ∞) with respect to the weight function e −x it is simple to show that the above construction yields a density under the simple condition θ 2 = 1, cf. Lemma 3.1. Moreover, it is well known that a large class of densities can be approximated by Laguerre polynomials. Define to this end the Hellinger distance of two distributions on X which have densities ϕ 1 , ϕ 2 with respect to a measure µ via The following lemma is essentially a rephrasing of Theorem 1 in Chapter II.8 of Nikiforov and Uvarov [1988] which we state here for the convenience of the reader. Lemma 3.1. For any m ∈ N 0 and any θ ∈ R m+1 with θ 2 = 1 we have that ϕ θ as defined in (3.1) is a density function on [0, ∞). Suppose moreover that ϕ is an arbitrary Then, for any sequence m n → ∞ there are θ n ∈ R mn+1 with θ n = 1 such that for n denotes the k-th entry of θ n . Then The first part of the lemma is a simple calculation using the orthonormality of the Laguerre polynomials, i.e., use that For the second statement we note This and (3.2) are exactly the conditions of Theorem 1 in Chapter II.8 of Nikiforov and Uvarov [1988] which we stated in the Appendix as Theorem 7.2 for the convenience of the reader. The theorem states that the sequence p n ( converges locally uniformly, i.e., uniformly on compact sets, to p. Let Hence, we can conclude that also converges locally uniformly to p because p is bounded on compact sets K ⊆ (0, ∞). From this we conclude also the locally uniform convergence of ϕ θn to ϕ. Moreover, Theorem 7.2 states that The proof of the Lemma is complete since the above implies The conditions of Lemma 3.1 cover a wide class of piecewise continuously differentiable densities like any sub-Gaussian density, densities with compact support and subexponential densities. We consider condition (3.2) therefore as not-restrictive. Before we can write down the likelihood, we have to find the conditional density of (S 1 , S 2 , W ) given C. Lemma 3.2. Suppose that the random variables W, C, T 1 , I 1 , I 2 , G are related as in Assumption (M). The conditional joint density of (S 1 , S 2 , W ) given C, where S 1 := I 1 + T 1 and S 2 := T 1 + I 2 + G, is given by for x 1 , x 2 , ω ≥ 0 and 0 otherwise. Proof. Let x 1 , x 2 , ω ≥ 0 be arbitrary. Note firstly that, by (M), the conditional density of (W, T 1 ) given C is given by Below integrals of the type a 0 are to be understood over the set (min(0, a), max(0, a)). Since all integrands are supported in [0, ∞), integrals are zero when a < 0. We have by the independence assumptions in (M) We obtain furthermore by differentiating under the integral and using that ϕ I and ϕ G are supported on the non-negative real line In the last line above we can replace the upper integration bound of the outer integral by x 2 because for y > x 2 , we have x 2 − t − y < 0 and hence ϕ I (x 2 − t − y) = 0. By using the density from above, we obtain an expression for the likelihood for estimation of ϕ I and ϕ G . ..,n be iid copies of (S 1 , S 2 , W, C) which are related as in Assumption (M). Let m 1 , m 2 ∈ N and θ 1 ∈ R m 1 +1 , θ 2 ∈ R m 2 +1 . Consider the candidate densities ϕ I,θ 1 ∈ G m 1 and ϕ G,θ 2 ∈ G m 2 . The likelihood for this configuration is given by, denote where C n is a random constant which does not depend on ϕ I,θ 1 and ϕ G,θ 2 . In the specific case of h(u|c) = exp(−r(c)u) for some exponential parameter r(c) ∈ R we even get that depends on W i only through W i . Above C n is a random constant which does not depend on ϕ I,θ 1 and ϕ G,θ 2 . Proof. Let f ϕ I,θ 1 ,ϕ G,θ 2 (x 1 , x 2 , ω|c) denote the conditional density of (S 1 , S 2 , W ) given C as defined in Lemma 3.2 when I has density ϕ I,θ 1 and G has density ϕ G,θ 2 . By independence, we have where the latter term is independent of ϕ I,θ 1 and ϕ G,θ 2 . This finishes the proof of (3.3). For the specific choice h(u|c) = exp(−r(c)u) we get which finishes the proof of (3.4) since the second line does not depend on ϕ I,θ 1 or ϕ G,θ 2 . Note that Lemma 3.3 implies that for given ϕ W and p C , the likelihood to be optimized is, conveniently, independent of ϕ W and p C . Thus, we can treat ϕ W and p C as known without loosing any practicality and, therefore, we may define the following estimators for ϕ I and ϕ G : For given sequences (m 1,n ) n∈N , (m 2,n ) n∈N ⊆ N 0 , we study At this point it is not clear that we can identify the parameters θ 1 ,θ 2 . Later, in Corollary 4.4, we will see that consistent estimation of the distribution functions of ϕ I and ϕ G is possible if a mild assumption on the characteristic functions holds. Under the same assumption, a similar proof-technique can be applied to show that two different sets of parameters lead to different likelihoods. In this section we will prove and discuss a consistency result. For ϕ I and ϕ G being arbitrary densities of I 1 , I 2 and G, respectively, we denote by f ϕ I ,ϕ G the conditional density of (S 1 , S 2 , W ) given C as defined in Lemma 3.2. Then, f ϕ I ,ϕ G p C denotes the joint density of (S 1 , S 2 , W, C). When ϕ I and ϕ G are chosen from the approximating spaces G m 1 and G m 2 , the following set denotes the set of all possible approximations to the joint density of (S 1 , S 2 , W, C) In the following we will always assume that we are in the setting presented in Section 2. Theorem 4.1. Suppose (M) holds true and let m 1,n , m 2,n = n β and ε n = n −γ for some β, γ > 0 such that γ < 1/2 and β ≤ 1 − 2γ. Suppose that the true densities ϕ I and ϕ G fulfil the assumptions of Lemma 3.1 and let ϕ I,n and ϕ G,n be the sequences from Lemma 3.1 such that ρ H (ϕ I , ϕ I,n ) → 0 and ρ H (ϕ G , ϕ G,n ) → 0. Suppose that there is α ∈ (0, 1/2) such that for some N 0 ∈ N 0 (4.2) Then, we have that The proof of this result will be given later in Section 4.2. We will begin with a discussion of the result and its assumptions in the next subsection. Before turning to the assumptions of Theorem 4.1 we make a remark about the convergence rate. Remark 4.2. Note that the requirements on β and γ in Theorem 4.1 allow for the choice β = 1/5 and γ = 2/5 < 1/2. This yields the classical convergence rate n −2/5 for series estimation with polynomials if the true density is sufficiently smooth (cf. Newey [1997] and Proposition 3.6 in Chen [2007] ). We continue with a discussion of the condition (4.1) above. This can be understood as a tail condition: For any compact set K ⊆ (0, ∞) 3 we have by Lemma 3.1 that f ϕ I,n ,ϕ G,n converges uniformly to f ϕ I ,ϕ G . Thus, if we restrict the integral in the definition of C α (f ϕ I ,ϕ G , f ϕ I,n ,ϕ G,n ) to K, the restricted integral remains bounded. Note also that by choosing α > 0 small, possible singularities of f 2 are integrable. Hence, (4.1) is only restrictive for the integral over K c . Thus, if f ϕ I ,ϕ G is compactly supported (4.1) follows. Note furthermore that f ϕ I ,ϕ G is compactly supported if ϕ I , ϕ G and ϕ W are compactly supported. In our specific epidemics setting for COVID-19, this is a highly plausible assumption because people stop being infectious at some point. However, for other diseases people can stay infectious for an extended period of time, e.g. for Hepatitis C, the mean generation time is about 20 years (cf. Wallinga and Lipsitch [2007] ). In those cases of non-compactly supported distributions, (4.1) is a restriction: It is required that the approximations do not decrease much faster to zero than the actual density. The meaning of much faster is to be understood relative to f ϕ I ,ϕ G and it can be adjusted by choosing α > 0 small. Any polynomial difference can therefore be handled. Theorem 4.1 above shows, strictly speaking, that we can consistently estimate the joint distribution of (S 1 , S 2 , W, C). However, from a practical point of view, the estimation of ϕ I and ϕ G is of interest. In order to ensure identifiability of ϕ I and ϕ G , we need the following additional assumption: Assumption (C): Characteristic Functions: Let Φ I denote the characteristic function of I. There are functions z 1 : R → R and z 2 : R → R such that for almost all x ∈ R Φ I (x) = 0 and E e iz 1 (x)W +iz 2 (x)C E e ixT 1 W, C = 0. In particular the second requirement above looks cryptic in full generality. In the following remark we show that in specific scenarios it simplifies to simpler conditions which are more interpretable. above can be rewritten in more specific scenarios. 1. If Φ T 1 (x) := E(exp(ixT 1 )) = 0 almost everywhere, (4.3) holds because we may simply choose z 1 ≡ z 2 ≡ 0. 2. If C ≡ 1 and T 1 is uniformly distributed in [0, W ], i.e., h(w −t|c) ≡ 1 and n(w, c) = 1/w, we obtain with z 2 ≡ 0 and z 1 (x) = −x/2 that (note that sin(x)/x → 1 for x → 0 such that the below inequality chain can be used also for x = 0) where Φ W is the characteristic function of W . The above is non-zero for almost all x ∈ R (implying (4.3)) for many standard distributions like the exponential distribution which characteristic function has a strictly positive real part. Our setting is very similar to the de-convolution set-up: In (2.1) the signal I 1 is perturbed by the noise T 1 (the distribution of which is determined by (W, C)) and in (2.2) the signal G is perturbed by the noise I 2 + T 1 . Thus it is clear that assumptions on W, C and I 1 similar to those for the de-convolution set-up are necessary for identification. The assumption of almost everywhere non-vanishing characteristic functions has already been mentioned in Devroye [1989] (see the Remark below Theorem 1 therein) to be necessary to guarantee consistent estimation. Let F I and F G denote the distribution functions corresponding to the densities ϕ I and ϕ G , respectively. Similarly, defineF I,n andF G,n to be the distribution functions ofφ I,n andφ G,n , respectively. The following corollary ensures under Assumption (C) thatF I,n andF G,n are consistent estimators for F I and F G , respectively. The detailed proof is given in Section 7.3 in the Appendix. We finally come back to estimating the basic reproduction number R 0 . The above framework provides us with a methodology to consistently estimate the reproduction number without making parametric statements about the incubation period or the generation time. From (1.1) it is evident that we require next to an estimator for ϕ G also an estimater n of the growth rate of the expected incidence. In Ferretti et al. [2020b] the exponential growth rate of the reported numbers is estimated to be r = 0.14. But, strictly speaking, what we need here is the growth rate of the infection numbers which is (intuitively speaking) very similar but can be different due to under-reporting and delays. The estimator is specified in the following corollary, the proof of which can also be found in Section 7.3 in the Appendix. In order to prove Theorem 4.1 we need to introduce the same distance relation which was used also by Wong and Shen [1995] . Define for any two densities f 1 , f 2 with respect to a measure µ on a space X and α = 0 the distance relation Note that A second property of ρ α we mention here, is non-negativity for α ≥ −1: Let g α : (0, ∞) → R be given by g α (x) := 1 α (x −α − 1). Then g α (x) = (α + 1)x −α−2 ≥ 0 because α ≥ −1. Hence, g α is a convex function. We hence obtain by Jensen's Inequality (for the measure with density f 1 with respect to µ) Lemma 3.1 shows that the sieve spaces G m lie dense in a large class of densities with respect to the Hellinger distance. However, for our later consistency result, we require that the sieve spaces provide also good approximations with respect to ρ α for α > 0. The following lemma provides the main tool. Lemma 4.6. Let ϕ I,1 , ϕ I,2 be two densities for incubation periods and let ϕ G,1 , ϕ G,2 be two densities of generation times and consider f 1 (x 1 , x 2 , ω|c) and f 2 (x 1 , x 2 , ω|c) defined as in Lemma 3.2 using (ϕ I,1 , ϕ G,1 ) and (ϕ I,2 , ϕ G,2 ), respectively. Define C α (f 1 , f 2 ) as in (4.2). Then, for any α ∈ (0, 1/2), The proof of the above Lemma is presented in Section 7.4 in the Appendix. Before we can prove the consistency result, we need as a last preparation a bound on the bracketing entropy of the spaces F m 1 ,m 2 . Definition 4.7. Let ε > 0, F 0 ⊆ F classes of functions and ρ : F × F → [0, ∞) a metric on F be given. The bracketing number N [] (ε, F 0 , ρ) is defined as the smallest number of pairs (l i , u i ) ∈ F 2 of functions such that ρ(l i , u i ) ≤ ε for all pairs and such that for Lemma 4.8. For m 1 , m 2 ≥ 1 and any ε > 0 with ε ≤ (15m 1 m 2 2 /4) 1/2 we have This Lemma is also proven in Section 7.4 in the Appendix. We have now all ingredients together to prove the main result of this paper. Proof of Theorem 4.1. This theorem is a consequence of Theorem 4 from Wong and Shen [1995] . For the convenience of the reader, we have stated the result in Section 7.1 in the Appendix as Theorem 7.1. Since we are interested in an asymptotic result we may assume below that n ≥ N 0 . We apply Theorem 7.1 with Y i = (S 1,i , S 2,i , W i , C i ) and F n = F m 1,n ,m 2,n . We show next that the entropy condition is fulfilled: We note firstly that (15m 1,n m 2 2,n ) 1 2 → ∞ and ε n → 0. Hence, the condition ε n ≤ (15m 1,n m 2 2,n ) 1/2 is eventually fulfilled and we may apply Lemma 4.8. Thus, we obtain for a suitable c 2 < ∞ because − 1 2 + γ + 1 2 β ≤ 0. Hence, the entropy condition of Theorem 7.1 is fulfilled. Moreover, by the assumptions and Lemma 4.6 we find that δ n (α) ≤ ρ α (f ϕ I ,ϕ G p C , f ϕ I,n ,ϕ G,n p C ) → 0. Hence, we eventually have δ n (α) < 1/α. Under these conditions, we have that ε * n (α) → 0 and nε * n (α) 2 ≥ nε 2 n = n 1−2γ → ∞ since γ < 1/2. Hence, we conclude from Theorem 7.1 that ρ H (f ϕ I ,ϕ G p C , fφ I,n ,φ G,n p C ) = O P (ε * n (α)) and the proof of the theorem is complete since ε * n (α) = O(max(ε n , max(ρ H (ϕ I , ϕ I,n ), ρ H (ϕ G , ϕ G,n )) α ) by Lemma 4.6. The analytic evaluation of the likelihood presented in Lemma 3.3 is very tedious. Therefore, the following results were obtained by numerically approximating the integrals. As illustrated by the simulations below this approximation does not cause problems in the estimation. However, we believe that a speed-up of the method is possible if the integrals are analytically computed. In order to efficiently enforce the constraint that θ 1 2 = θ 2 2 = 1, we optimize the angles of the polar-coordinates of θ 1 and θ 2 and fix their radii to 1. The angles can then be optimized under the box constraint [0, π] (note that θ and −θ yield the same model). Finally, we note that the likelihood can have singularities because the Laguerre densities (3.1) can be zero. We overcome the resulting problems in the optimisation, similar to Zhang and Davidian [2008] , by repeating the optimisation of the likelihood with several randomly chosen starting values. Below we always considered 10 different, random starting values. We begin with an illustration of our methodology with synthetic data in Section 5.1 followed by a brief analysis of a real-world dataset in Section 5.2. The R-code which is used for both parts is available on github (https://github.com/akreiss/SemiParametric-Laguerre.git). It appears to be very difficult to obtain data on transmission pairs, e.g. an early dataset of Ferretti et al. [2020b] contains only 40 transmission pairs (the data is available on the website of the journal on https://doi.org/10.1126/science.abb6936 in the supplementary material section). To this end, we regard it useful to illustrate in the simulation experiments that our methodology works also for small datasets and we choose n = 40 observations as well. In this simulation study we use as a data generating process the model as in (2.4) with the following choices: ϕ W exponential with rate λ = 0.3820225 p C P(C = 0) = 0.65 and P(C = 1) = 0.35 r r = log(2)/5 ϕ I ϕ I,0 =log-normal distribution: meanlog= 1.644, standard dev.= 0.363 ϕ G ϕ G,0 =Weibull distribution: shape= 2.826, scale= 5.665 All quantities above are chosen to imitate the data used in Ferretti et al. [2020b] : The average window length E(W i ) equals roughly the average length of the observed exposure windows in the dataset of Ferretti et al. [2020b] . Likewise, in their dataset the authors distinguish two locations, those with an exponential growth of cases with growth rate log(2)/5 (in roughly 35% of the cases) and those with no exponential growth (in roughly 65% of the cases). Moreover, the density of the incubation period ϕ I = ϕ I,0 is the distribution of incubation times fitted by Ferretti et al. [2020b] , Lauer et al. [2020] . Lastly, the density of the generation time is taken from Ferretti et al. [2020b] . We show firstly in Figure 2 that these densities can be well approximated through Laguerre type densities as defined in (3.1). The graphs show those Laguerre densities which minimize the Hellinger distance to the true densities. Not-surprisingly, larger choices of the degrees m 1 and m 2 lead to better approximations. However, we also see that, in both cases, already degree two yields reasonable approximations. Note in particular that the flat beginning of the density of the incubation period can be well captured by the approximating densities. In order to determine the degree of the approximating Laguerre polynomials, we use the information criterion BIC. Since the computations are quite intensive we do the model selection only for one dataset and choose the selected model for all subsequent repetitions. The resulting values of the BIC are shown in Table 1 and it can be seen that the minimal value is obtained for m 1 = 2 and m 2 = 2. For the remainder of this Section, we use these degrees. We simulate now N = 1, 000 datasets according to the above specified data generating process and fit our model to it. Figure 3 shows the true densities as solid blue lines and corresponding closest Laguerre densities (3.1) when choosing degrees m 1 = m 2 = 2, respectively. The shaded areas show point-wise confidence bands (constructed based on simulations), e.g. 90% of the estimators lie point-wise in the confidence band of level 90%. Different confidence levels (99, 95, 90, 85, 80, 75, 70, 65, 60) are indicated by different intensities of gray (from light gray to black). It can be seen that the estimation works visibly quite well for the incubation period. The height of the mode is underestimated but this comes from the fact that fitting an order two Laguerre density cannot do better. The estimation for the generation time works a bit less good but still the general trend is captured well by most estimators if we keep in mind that we have here only 40 observations of a heavily convolved variable. In order to assess the fit of the non-parametric estimator a bit more formally, we compare the estimators to the true densities in terms of the squared Hellinger-distance ρ 2 H . The resulting histogram is shown in Figure 4 . Note that squared Hellinger distances are bounded from above by 2. It can be seen that the estimation of the Incubation Period works better than the estimation of the Generation Time. This is not surprising because we observe transmission pairs each of which contains two independent realisations of incubation periods but only one generation time. Overall the fit of the generation time appears to be fairly good (90% of the distances are smaller than 0.103 and 50% of the distances are smaller than 0.034). In order to illustrate that the corresponding plug-in feature estimates enjoy asymptotic normality properties, we consider estimation of the basic reproduction number of a fictional pandemic and estimation of quantiles. In order to estimate the basic reproduction number, we plug the true and the estimated generation time densities into equation (1.1) and take the inverse of it (in this fictional pandemic we choose r = log(2)/5 which means that the case numbers double every five days). The histogram of the estimates is shown in Figure 5 . We emphasize that these are simulated results and we cannot draw any conclusions about COVID-19. We can see that the estimation appears to be almost unbiased even in finite samples. This is remarkable because the estimator is based on a non- Squared Hellinger Distance parametric estimator which is biased (we do not control for over-or under-smoothing). Moreover, the approximation through a normal distribution seems to be reasonably accurate. This motivates further research in establishing a formal asymptotic normality result for this type of semi-parametric inference. Figure 6 shows the histograms of the estimated 30%-quantiles. We can see that the estimates appear to have a bias in finite samples (the dashed lines indicate the true quantiles). This bias seems to stem from the fact that the approximation through low dimensional Laguerre polynomials is not perfect because the estimates centralize around the quantiles of the best Laguerre approximations (dotted lines). This effect remains true for other quantiles which are reported in Appendix 7.2.1. When inspecting the BIC values in Table 1 we see that the next smallest scores are obtained for the models (m 1 = 2, m 2 = 3) and (m 1 = 3, m 2 = 2). In order to check the robustness of the model selection, we also inspect the next smallest BIC model, i.e. (m 1 = 3, m 2 = 2). The results are similar and are shown in Appendix 7.2.1. Finally we apply our methodology to a dataset containing 191 transmission pairs which has been studied by Hart et al. [2021] , Ferretti et al. [2020a] and can be downloaded from https://elifesciences.org/articles/65534/ figures#content (see the source data corresponding to Figure 2 ). The data-set is a compilation of five data-sets: Ferretti et al. [2020b] , He et al. [202] , Xia et al. [2020] , Cheng et al. [2020] , . In all data-sets the authors had access to or collected data on transmission pairs. In all cases the authors ensure that the transmission pairs are indeed true transmission pairs e.g. by examining contact and travel histories or quarantines of the involved people. The data was collected with different targets concerning the transmissibility of SARS-COV-2. Thus we can reasonably illustrate our methodology on this dataset: Semi-parametric estimation of features of the generation time and the incubation time. The dataset contains symptom onset dates for all transmission pairs, but the exposure window is not always reported. In that case we impute the dataset in the same way as Ferretti et al. [2020b] : The beginning of the exposure window is at the earliest 60 days before symptom onset of the infector. The end of the infection window is at the latest the symptom onset time of any of the two people in the pair or the end of the exposure window of the second person which is reported in some cases. Since the data is discrete, i.e. we know only the days of symptom onset rather than the exact time, we suppose that the exact time is uniformly distributed throughout the day. Therefore we add a uniform random time between 0 and 24 hours to the symptom onset times and exposure window end points. We stress that our interest lies in the theoretical analysis of the methodology and we provide here an illustration for how our methodology can be used. A complete data analysis would for example also require a robustness analysis against potential issues like the question whether the transmission pairs are random samples from the pandemic. In the following our aim is to use our semi-parametric approach to construct a test whether the parametric fit suggested by Ferretti et al. [2020b] is appropriate for the data. Let ϕ I,0 and ϕ G,0 denote the densities as defined in Section 5.1. More formally, we would like to test the hypotheses : ϕ G = ϕ G,0 . As test statistics we consider ρ H (φ I,n , ϕ I,0 ) 2 and ρ H (φ G,n , ϕ G,0 ) 2 , where ϕ I,0 and ϕ G,0 denote the best approximations with respect to the Hellinger distance of ϕ I,0 and ϕ G,0 through Laguerre polynomials, respectively. In order to choose the degrees of the approximation we use in the same way as before the BIC. The resulting values are shown in Table 2 and it can be seen that m 1 = 4 and m 2 = 3 yields the smallest BIC. Figure 2 shows that m 1 = 4 yields already a good approximation to ϕ I,0 , similarly we see that m 2 = 3 allows a good approximation of ϕ G,0 . However, in both cases, the representation is not perfect. Therefore, it is very important that we compare the estimate with the closest Laguerre-type density ϕ G,0 rather than with ϕ G,0 directly. In order to assess the distribution of the test statistic under H 0 , we suppose that the densities of the incubation period and generation time are given by ϕ I,0 and ϕ G,0 , respectively, and simulate data accordingly including the rounding to complete days and adding uniform noise (note that when testing for H we suppose that the respective other density is correctly specified). We do this N = 1, 000 times and show in Figure 7 the histograms of the squared Hellinger distances between the estimated densities and those Laguerre densities of type (3.1) (with m 1 = 4 and m 2 = 3) which lie closest to the true densities. Note, that in Figure 4 we compared the estimates with the true densities, so both figures cannot be compared. The observed squared Hellinger distances from the dataset are 0.05408553 for the incubation period and Figure 7 . In our simulations none of the simulated Hellinger distances for the incubation period are larger than the observed distance. Consequently, in none of the simulated cases both distances are simultaneously larger than the observed distances. For the generation time 44.8% of the simulated distances are larger than the observed distance. We conclude that the data-set we considered here suggests that the suggested parametric fit for the incubation time might not be correct. For the generation time, the data-set contains no evidence which suggests that the parametric fit might be wrong. In Table 2 we see that the BIC values for all choices of m 1 = 3, 4, 5 and m 2 = 2, 3, 4 are all very similar. In order to check the robustness of the method to the model selection, we provide in Section 7.2.2 in the Appendix a similar analysis for the second smallest BIC: m 1 = 5 and m 2 = 3. We stress that the above analysis should be understood as a recipe for a data analysis rather than an in-depth analysis of the provided dataset. Other types of semi-parametric analyses like the ones outlined in Section 5.1 can be implemented in a similar way. We have introduced a semi-parametric estimator for the generation time and incubation period from observational data. We have shown that both distributions can be identified from the observations and we presented a simple, consistent semi-parametric estimator which is based on Laguerre polynomials. These results are the first steps for more results in the general realm of semi-parametric inference for epidemics: As specific examples we mention the reproduction number R 0 and tests for parametric assumptions. But also the probability of pre-symptomatic infection P(G ≤ I 1 ) can be of interest. All these quantities are continuous functions of the densities ϕ I and ϕ G and therefore it can be expected that asymptotic normality results for estimators based on our estimatorsφ I,n andφ G,n can be proven. However, it should be mentioned that such results for R 0 are possibly more challenging because they require another estimatorr n . This work can be extended in several directions. It might for example not be clear when symptoms exactly start. Therefore, it might be possible that just a window for symptom onset can be supplied (Lauer et al. [2020] ). Moreover, it can be of interest to include asymptomatic patients by including a certain cure-probability, i.e. the probability with which patients will never show symptoms. As an alternative one could also consider follow up studies in which the symptom onset time of patients is considered as a censored variable (in which case asymptomatic patients can be interpreted as patients who show symptoms at ∞). Finally, the dataset by Ferretti et al. [2020b] includes also a window for infection of the second person. It would of course be interesting to include this information in the model. But we would like to point out that this is not entirely trivial because it is unclear how to model the time of infection within this window. As we motivated in the discussion after Remark 2.1 a simple uniformity assumption is possibly difficult to justify. Therefore, we would suggest to include this distribution in the estimation in a suitable way. In theoretic terms, the most interesting question would certainly be to establish asymptotic normality results which allow the researcher to make quantitative statements. Such statements can for example be achieved by following the semi-parametric framework as e.g. in Shen [1997] . Another interesting question would be how to adequately incorporate covariates in the model. It could be the case that e.g. younger and older people have different incubation periods or generation times. Finally, in a different branch of literature, one tries to avoid the assumption of independence between symptom onset and infection and rather models the infection time relative to the symptom onset (cf. Hart et al. [2021] , Ferretti et al. [2020a] , Chun et al. [2020] ). Our methodology can be applied to this setting as well with some adjustments of the theory. In this section we state two results from the literature which are relevant for this paper. The following Theorem is just a re-formulation for a special case of Theorem 4 in Wong and Shen [1995] which is stated here for the convenience of the reader: In their paper the authors study approximate sieve estimation, i.e., they allow that the estimator only approximately maximizes the likelihood. In their notation this means that we assume in the present paper that η n = 0, this is already included in the following version of Theorem 4 of Wong and Shen [1995] . Theorem 7.1. Let Y 1 , ..., Y n be iid observations which have a density p 0 . Let moreover F n be an arbitrary sequence of sieve spaces for density estimation and letp n denote the sieve-MLE. Suppose that there are constants c 1 , c 2 > 0 and a sequence ε n > 0 such that Let δ n (α) := inf q∈Fn ρ α (p 0 , q) for α ∈ (0, 1]. Suppose that there is α ∈ (0, 1] such that δ n (α) < 1/α. Then, there is a constant c > 0 (which depends on the model) such that for ε * n (α) := max ε n , 4δ n (α) c we have for another constant C > 0 The next result is a statement about approximating functions by using Laguerre polynomials. The following is a combination of Theorem 1 and Remark 2 in Chapter II.8 of Nikiforov and Uvarov [1988] . We formulate this theorem here in our setting. The original statement is more general. In this Section we present additional simulation results complementing the results from Section 5.1. Figures 8-10 show histograms for estimation of the 50%-, 70%-and 90%quantiles. The results are very similar to the results for the 30%-quantile which are discussed in Section 5.1. Next, we use the same set-up as in Section 5.1 in the main text, however, here we choose as a model m 1 = 3 and m 2 = 2, i.e., the model has now more flexibility for the incubation time. The estimation results are visualized in Figure 11 and the difference in the squared Hellinger distance is shown in Figure 12 . In Figure 11 it appears that for the incubation time the mode moves closer to its true location, it is sometimes even overestimated (compared to the case m 2 = 2 which was shown in Section 5.1). Moreover, the estimated incubation time densities seem to fluctuate more, i.e., they show a higher variance due to the higher flexibility. The estimates for the generation time appear to be almost identical. In terms of the Hellinger distances, cf. histograms in Figure 12 , the results appear to be very similar to the results obtained in Section 5.1. In general we see that estimation with n = 40 observations yields reasonable results for both model complexities. However, we expect that the estimation can be improved if more observations are available enabling the method to choose better approximations. In this section we show an analysis similar to that from Section 5.2 in the main text, but here we choose m 1 = 5 and m 2 = 3. We begin with a simulation: We generate N = 1, 000 datasets assuming that the models specified in H are correct. Figure 13 shows the squared Hellinger distances of the estimates to the closest Laguerre type densities with m 1 = 5 and m 2 = 3. The vertical lines show the distances which are obtained from the real data set. In case of the incubation period only 1% of the simulated distances are larger, for the generation time this percentage is higher, 44.9%. Finally, both distances are simultaneously larger for both densities in 0.3% of the simulated cases. Thus, we conclude that also in this larger model class there is no evidence in the dataset which would contradict the parametric model for the generation time fitted by Ferretti et al. [2020b] . However the parametric model for the incubation time used by Ferretti et al. [2020b] and Lauer et al. [2020] might be questionable depending on the desired level of the test. Proof of Corollary 4.4. For the proof we make the following definitions: Let Φ I , Φ G and Φ S 1 ,S 2 ,W,C be the characteristic functions of the corresponding random variables, i.e., for real numbers x, x 1 , ..., x 4 ∈ R (note that in the definition of ϕ I it doesn't matter whether we take I 1 or I 2 ) As a first step, we compute Φ S 1 ,S 2 ,W,C . We have by definition of (S 1 , S 2 , W, C) and independence Φ S 1 ,S 2 ,W,C (x 1 , x 2 , x 3 , x 4 ) = E e i(x 1 +x 2 )T 1 e ix 1 I 1 e ix 2 I 2 e ix 2 G e ix 3 W e ix 4 C =E e ix 3 W +ix 4 C E e i(x 1 +x 2 )T 1 W, where Ψ(x, z 1 , z 2 ) := E e iz 1 W +iz 2 C E e ixT 1 W, C . Let now z 1 (x), z 2 (x) be as in Assumption (C). We obtain the following two equations Φ S 1 ,S 2 ,W,C (x, 0, z 1 (x), z 2 (x)) = Ψ(x, z 1 (x), z 2 (x))Φ I (x), Φ S 1 ,S 2 ,W,C (0, x, z 1 (x), z 2 (x)) = Ψ(x, z 1 (x), z 2 (x))Φ I (x)Φ G (x). Note that Assumption (C) implies that Φ S 1 ,S 2 ,W,C (x, 0, z 1 (x), z 2 (x)) = 0 almost everywhere. All of this, in turn, implies the relations (almost everywhere by Assumption (C)) LetĈ n ,Ŵ n ,T 1,n ,Î 1,n ,Î 2,n andĜ n be random variables which have the relation as specified in (2.4) but where ϕ I is replaced byφ I,n and ϕ G is replaced byφ G,n . Let moreover, S 1,n andŜ 2,n be defined analogously to their corresponding quantities S 1 and S 2 with corresponding characteristic functions ΦŜ 1,n ,Ŝ 2,n ,Ŵn,Ĉn , ΦÎ n and ΦĜ n . Since convergence in probability implies the existence of an almost surely convergent subsequence (cf. Lemma 1.9.2 in van der Vaart and Wellner [1996]), we find by Theorem 4.1 that there is an event Ω of probability 1 and a subsequence which we indicate again by fφ I,n ,φ G,n such that ρ H f ϕ I ,ϕ G p C , fφ I,n ,φ G,n p C → 0 for all results in Ω. Theorem 1 in Chapter 3.9 in Shiryaev [2016] implies that then K C c=1 r(s 1 , s 2 , w, c) f ϕ I ,ϕ G (s 1 , s 2 , w|c) − fφ I,n ,φ G,n (s 1 , s 2 , w|c) p C (c)d(s 1 , s 2 , w) → 0, uniformly over all functions r : R 3 × {1, ..., K C } → C with r ∞ ≤ 1. Since the functions r(s 1 , s 2 , w, c) = exp(i(x 1 s 1 + x 2 s 2 + x 3 w + x 4 c)) are bounded for all x 1 , x 2 , x 3 , x 4 ∈ R, we conclude that ΦŜ 1,n ,Ŝ 2,n ,Ŵn,Ĉn converges uniformly to Φ S 1 ,S 2 ,W,C . Hence, for almost any fixed x we eventually have ΦŜ 1,n ,Ŝ 2,n ,Ŵn,Ĉn (x, 0, z 1 (x), z 2 (x)) = 0 and thus eventually analogue versions of the relations (7.1) and (7.2) hold for ΦÎ n and ΦĜ n . As a consequence, we obtain the following point-wise almost everywhere convergences Now by Levy's Theorem (cf. Theorem 1.7. 6 Billingsley [1968] , an inspection of the proof reveals that in the univariate case convergence almost everywhere is a sufficient condition), we conclude thatF I,n → F I andF G,n → F G point-wise for all realisations in Ω. Since the distribution functions are continuous, these convergences are also uniform. Recall that the above argument holds for a subsequence and all realisations in Ω. But we can repeat the same argument starting from a subsequence and showing in that way that every subsequence contains a sub-subsequence for which the corresponding distribution functions converge almost surely. This implies convergence in probability by Lemma 1.9.2 in van der Vaart and Wellner [1996] and the proof is complete. Proof of Corollary 4.5. We make a similar subsequence of a sub-sequence argument as in Corollary 4.4: Consider sub-sequences ofr n andF G,n (which we again do not indicate in the notation) such thatr n → r almost surely and F G,n − F G ∞ → 0 almost surely (the former is possible by assumption and the latter by Corollary 4.4). The following considerations are made for each realisation in a set of probability one on which these two convergences hold. Choose n so large such thatr n ≥ r/2. Choose moreover, c > 0 such that t exp(−tr/2) ≤ c for all t ≥ 0. We have By a Taylor expansion we obtain for the first part of the right hand side above and for some r * ∈ [r,r n ] (note that by assumption r * ≥ r/2 and hence t · exp(−r * t) ≤ c) ∞ 0 e −rnt − e −rt φ G,n (t)dt ≤ |r n − r| ∞ 0 te −r * tφ G,n (t)dt ≤ c |r n − r| → 0. For the second term on the right hand side of (7.3) we obtain via integration by parts which converges to zero sinceF G,n → F uniformly. Now, a subsequence of a subsequence argument completes the proof. Proof of Lemma 4.6. We firstly apply Hoelder's Inequality with p = 1/(1 − α) and q = 1/α (use that α ∈ (0, 1)) to get |f 1 (x 1 , x 2 , ω|c) α − f 2 (x 1 , x 2 , ω|c) α | 1 α p C (c)d(x 1 , x 2 , ω) α . By using the definition of C α (f 1 , f 2 ), we see that (4.4) follows from the above if we can prove that |f 1 (x 1 , x 2 , ω|c) α − f 2 (x 1 , x 2 , ω|c) α | 1 α p C (c)d(x 1 , x 2 , ω) ≤2ρ H (ϕ G,1 , ϕ G,2 ) 2 + 8ρ H (ϕ I,1 , ϕ I,2 ) 2 . (7.4) We begin by applying the reverse triangle inequality for the L 1/α norm repeatedly. Define to this end B 1 (t; x 1 , x 2 , c, ω, y) :=ϕ G,1 (y)h(ω − t|c)ϕ I,1 (x 1 − t)ϕ I,1 (x 2 − t − y), B 2 (t; x 1 , x 2 , c, ω, y) :=ϕ G,2 (y)h(ω − t|c)ϕ I,2 (x 1 − t)ϕ I,2 (x 2 − t − y), A 1 (y; x 1 , x 2 , c, ω) := min(x 1 ,ω) 0 B 1 (t; x 1 , x 2 , c, y)dt, A 2 (y; x 1 , x 2 , c, ω) := min(x 1 ,ω) 0 B 2 (t; x 1 , x 2 , c, y)dt. We have now by the reverse triangle inequality for L 1/α that for any x 1 , x 2 , ω ≥ 0 (ϕ G,1 (y)h(ω − t|c)ϕ I,1 (x 1 − t)ϕ I,1 (x 2 − t − y)) α − (ϕ G,2 (y)h(ω − t|c)ϕ I,2 (x 1 − t)ϕ I,2 (x 2 − t − y)) α 1 α dtdy. By using the above inequality chain, we obtain K C c=1 [0,∞) 3 |f 1 (x 1 , x 2 , ω|c) α − f 2 (x 1 , x 2 , ω|c) α | (ϕ G,1 (y)h(ω − t|c)ϕ I,1 (x 1 − t)ϕ I,1 (x 2 − t − y)) α − (ϕ G,2 (y)h(ω − t|c)ϕ I,2 (x 1 − t)ϕ I,2 (x 2 − t − y)) α 1 α dtdyd(x 1 , x 2 , ω). (7.5) Above we have an iterated integral over a non-negative function, we may thus re-arrange the order of integration and use substitution. We substitute below a = x 1 − t for x 1 and b = x 2 − t − y for x 2 . Note that we implicitly take care of the integration bounds by using the indicator function and the fact that all densities are zero on the negative real line. Hence, we can continue the above inequality chain (7.5) = K C c=1 [0,∞) 5 1(y ≤ x 2 , t ≤ min(x 1 , ω))ϕ W (ω)n(ω, c)p C (c)h(ω − t|c) × (ϕ G,1 (y)ϕ I,1 (x 1 − t)ϕ I,1 (x 2 − t − y)) α − (ϕ G,2 (y)ϕ I,2 (x 1 − t)ϕ I,2 (x 2 − t − y)) α 1 α dx 1 dx 2 dt dy dω 1(y ≤ x 2 , t ≤ min(a + t, ω))ϕ W (ω)n(ω, c)p C (c)h(ω − t|c) × (ϕ G,1 (y)ϕ I,1 (a)ϕ I,1 (x 2 − t − y)) α − (ϕ G,2 (y)ϕ I,2 (a)ϕ I,2 (x 2 − t − y)) α 1 α da dx 2 dt dy dω = K C c=1 [0,∞) 5 1(0 ≤ b + t, t ≤ min(a + t, ω))ϕ W (ω)n(ω, c)p C (c)h(ω − t|c) × (ϕ G,1 (y)ϕ I,1 (a)ϕ I,1 (b)) α − (ϕ G,2 (y)ϕ I,2 (a)ϕ I,2 (b)) α 1 α da db dt dy dω. (7.6) Note next that the indicator equals actually 1(t ∈ [0, ω]). We again interchange the order of integration, to integrate with respect to t first and then with respect to ω. By doing this, =10δ 2 m 1 m 2 2 , where we used the Cauchy-Schwarz-Inequality in between and the integral properties of the Laguerre polynomials at the end. Thus, when putting δ = ε (10m 1 m 2 2 ) −1/2 (the condition on δ is fulfilled by the assumption on ε) we find together with (7.8) N [] (ε, F m 1 ,m 2 , ρ H ) ≤ π 15m 1 m 2 2 2ε + 1 m 1 +m 2 ≤ π 15m 1 m 2 2 ε m 1 +m 2 and the proof is complete. Lemma 7.3. Let n ≥ 2 and δ ∈ [0, 1/2]. Denote by e 1 := (1, 0, ..., 0) the first unit vector of R n . Let x ∈ R n have x 2 = 1 and angles of polar coordinates ψ 1 , ..., ψ n−1 ∈ [0, δ]. Then, e 1 − x 2 ≤ δ 3/2. Proof. By a Taylor expansion, we have for any ψ ∈ [0, 2π] (below ψ * denotes different intermediate values between 0 and ψ) | sin(ψ)| =| cos(ψ * )| · |ψ| ≤ |ψ| | cos(ψ) − 1| =| sin(ψ * )| · |ψ| ≤ |ψ| 2 . By the definition of polar coordinates, we compute (for n = 2 the sum disappears) e 1 − x 2 2 = (1 − cos(ψ 1 )) 2 + n−1 k=1 sin(ψ k ) 2 + n−1 i=2 i−1 k=1 sin(ψ k ) 2 cos(ψ i ) 2 ≤ψ 4 1 + n−1 k=1 ψ 2 k + n−1 i=2 i−1 k=1 ψ 2 k ≤ δ 4 + δ 2(n−1) + n−1 i=2 δ 2(i−1) ≤ δ 2 δ 2 − δ 4 + 1 1 − δ 2 . The statement follows since (δ 2 − δ 4 + 1)/(1 − δ 2 ) ≤ 3/2 for δ ∈ [0, 1/2]. Efficient estimation of models with conditional moment restrictions containing unknown functions On the estimation of the reproduction number based on misreported epidemic data Epidemiology and transmission of Covid-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: A retrospective cohort study. The Lancet Infectious Diseases Convergence of probability measures Estimation in emerging epidemics: biases and remedies Measurement error in nonlinear models: A modern perspective Chapter 76 large sample sieve estimation of semi-nonparametric models. volume 6 of Handbook of Econometrics Sieve extremum estimates for weakly dependent data and for the Taiwan COVID-19 Outbreak Investigation Team. Contact Tracing Assessment of COVID-19 Transmission Dynamics in Taiwan and Risk at Different Exposure Periods Before and After Symptom Onset Mathematical and statistical estimation approaches in epidemiology Transmission onset distribution of covid-19 Consistent deconvolution in density estimation. The Canadian Journal of Statistics / La Revue Canadienne de On the optimal rates of convergence for nonparametric deconvolution problems The timing of covid-19 transmission. medRxiv Quantifying sars-cov-2 transmission suggests epidemic control with digital contact tracing Estimating the generation interval for coronavirus disease (Covid-19) based on symptom onset data Estimation of the incubation time distribution for COVID-19 Comparing nonparametric versus parametric regression fits High infectiousness immediately before covid-19 symptom onset highlights the importance of continued contact tracing. eLife Temporal dynamics in viral shedding and transmissibility of covid-19 Handbook of infectious disease data analysis The incubation period of coronavirus disease 2019 (Covid-19) from publicly reported confirmed cases: Estimation and application Estimating the relative probability of direct transmission between infectious disease patients Transmission dynamics and control of severe acute respiratory syndrome Convergence rates and asymptotic normality for series estimators Instrumental variable estimation of nonparametric models Special functions of mathematical physics Two critical issues in quantitative modeling of communicable diseases: Inference of unobservables and dependent happening A central limit theorem under metric entropy with l 2 bracketing On methods of sieves and penalization Evidence for transmission of Covid-19 prior to symptom onset. eLife Weak convergence and empirical process theory How generation intervals shape the relationship between growth rates and reproductive numbers Probability inequalities for likelihood ratios and convergence rates of sieve mles Transmission of corona virus disease 2019 during the incubation period may lead to a quarantine loophole Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside hubei province, china: a descriptive and modelling study smooth" semiparametric regression analysis for arbitrarily censored time-to-event data keeping in mind that |x 2α − y 2α | ≤ |x − y| 2α for all x, y ≥ 0 (since 2α ∈ (0, 1)) and that (x − y) 2 ≤ 2x 2 + We are grateful to Niel Hens for useful discussions. Moreover we thank an unknown associate editor and two anonymous referees for reading and commenting our paper. Their remarks have lead to a great improvement of the paper.