key: cord-1027761-ax4q0z3r authors: Lin, HuaZhen; Yip, Paul S. F.; Huggins, Richard M. title: A nonparametric estimation of the infection curve date: 2011-08-04 journal: Sci China Math DOI: 10.1007/s11425-011-4224-7 sha: 609eb041407dc9f78347e86e9b226481f2553b03 doc_id: 1027761 cord_uid: ax4q0z3r Predicting the future course of an epidemic depends on being able to estimate the current numbers of infected individuals. However, while back-projection techniques allow reliable estimation of the numbers of infected individuals in the more distant past, they are less reliable in the recent past. We propose two new nonparametric methods to estimate the unobserved numbers of infected individuals in the recent past in an epidemic. The proposed methods are noniterative, easily computed and asymptotically normal with simple variance formulas. Simulations show that the proposed methods are much more robust and accurate than the existing back projection method, especially for the recent past, which is our primary interest. We apply the proposed methods to the 2003 Severe Acute Respiratory Syndorme (SARS) epidemic in Hong Kong. An important public health issue that arises over the course of an epidemic determines how many individuals are infected at a given time. This quantity is of concern to policy-makers and managers of health care systems as well as epidemiologists. For example, the 2003 SARS epidemic in Hong Kong, which killed 298 persons and infected about 1800, presented one of the most serious global health threats since the HIV/AIDS epidemic. One of the reasons leading to the epidemic was the considerable uncertainty about the current epidemic state during the course of this epidemic. A key feature of infectious disease data is that infected individuals are only observed when they are diagnosed so the exact infection times are unknown and hence full information on the current epidemic state is not available. Limited methods are available for analysing epidemic data. One is mathematically convenient curves, for example, exponential or polynomial. Predictions based on these models might not be reliable because these parametric curves are not data-based (see [13, 25] ). On the other hand, transmission models (see [2, 19, 20, 6] ) try to explore the latent nature of the spread of the disease. Unfortunately, data are usually inadequate to estimate key model parameters, and these transmission models have not been extensively studied and used. Back projection (see [8, 9] ) has become one of the most popular methods of reconstructing the past pattern of infections and it is also widely used to predict future numbers of cases with the disease (see [5, 7, 27] ). It makes elementary assumptions about the way the data are generated and the only additional information required is knowledge of the distribution of the time from infection to clinical diagnosis (see [7] ). However, the back-projection method is not without problems. One problem with the back projection method is ill-posed inverse (see [26] ). To avoid the problem, it is necessary to impose some kind of structure on the infection curve. Some implementations of back projection used a smooth parametric model (see [14] ) or a parametric step function (see [8, 9, 27] ) for the incidence curve. However, as the epidemic is only partially observed, it is not easy to correctly specify the incidence curve. Other investigators have allowed a nonparametric form for the incidence curve. Brookmeyer [10] , Bacchetti et al. [5] and Liao and Brookmeyer [22] used a smoothing spline method based on a penalized likelihood. Becker, Watson, and Carlin [7] applied a smoothed EM algorithm developed by Silverman et al. [29] . The smoother used in [7] is easy to compute, but does not perform well at boundaries. For example, Figures 1(b) and (c) show simulation results of the average of the estimated incidence curve over 500 replications of an epidemic, and the smoothed EM, denoted EMS, of Becker and Watson [7] , is biased, and is noticeably unsatisfactory for times close to the end of the observation period. Another common problem with the parametric and nonparametric back-projection methods is that the theoretical properties of the methods are largely unknown. To overcome the disadvantages of the existing approaches, in this paper two alternate approaches are proposed. Firstly, via the independent incubation process among the infected individuals, we propose the one-step estimate of the infective number at each day j based on the number of observed cases. The one-step estimator has closed-form expression and is obtained without any effort on the program and the computation. Simulations in Section 4 indicate that the one-step estimator performs much better than the unsmoothed and the smoothed back projection methods in terms of the mean square error. Considering that the one-step estimator may have slightly large variance for the number of infections in the recent past due to the very little accurate information available, and because that the recent past is our interest, we make an effort to improve the one-step estimate in the recent past by "borrowing" or making use of information of the neighbouring time points. That is, smoothing the one-step estimates over time using one of the existing smoothing techniques. Since the one-step estimator has closed-form, any existing smoothing techniques can be applied without any extra effort to the programming. Compared with the existing methods, the one-step and the smoothing one-step methods are noniterative, easy to compute and are shown to be asymptotically normal both with simple variance formulas. The paper is organized as follows. In Section 2, we give the one-step estimator and the smoothed one-step estimator of the incidence. The asymptotic normality of the proposed estimators are established in Section 3. Section 4 conducts simulation studies to compare the behavior of the one-step estimator, the smoothed one-step estimator and the back-projection methods. Finally, in Section 5, the approach is applied to the Hong Kong SARS data. The simulations and the application indicate that our proposed methods are robust and efficient. Some discussions are given in Section 6. Typical data from epidemics are interval censored so that cases are reported in batches on a daily or weekly basis. To reflect this, we divide the time axis [0, τ] over which data have been collected into intervals of equal length, that may be thought of as "months", "weeks" or "days". These are indexed by the nonnegative integers j, j = 1, . . . , n, where n is the most recent interval beyond which no detected cases are available. We call τ the current time. Let the observed data d j be the number of cases detected on day j for j = 1, 2, . . . , n, z j be the unobserved number of individuals who were infected on day j, and z ju be the number of individuals infected on day j with incubation period u, u = 0, . . . , k, where k denotes the longest incubation time. Then z j = k u=0 z ju . Let p u be the probability that an infected individual is detected on day u, u = 0, . . . , k after infection. We suppose that p u , u = 0, . . . , k are known and assume that the incubation processes for different infected individuals are independent. This assumption is also made by the existing methods including back-projection. For infectious disease with a short incubation time such as SARS, reliable information on the incubation times is available and accurate estimates of the p u are readily obtained [15, 1] . For more detail on the estimate of the incubation distribution, see [3] [4] [5] . Our objective is to estimate the expected number of infective individuals Ez j at each day j using the observed number of cases d j , j = 1, . . . , n. A natural estimator of Ez j is the conditional mean of z j = k u=0 z ju given the observed data. For that, we consider the conditional distribution of z j given the observation data {d j , j = 1, 2, . . . , n}. A natural way to obtain the conditional distribution is computing the probability distributions of the data (e.g., the likelihood) and the probability distributions of z j . However, in our case, both the distributions are difficult to obtain due to the feature of partial observation of the epidemic data. Some observations form the estimator proposed in the paper. (1) In all of the observed data, only d j+u which is related to z ju and thus the conditional distribution of z ju given the observed data {d j , j = 1, 2, . . . , n} is equal to the conditional distribution of z ju given d j+u . (2) The incubation process among infected individuals is independent. (2) and (3) imply that given d r , the conditional distributions of {z r−u,u , u = 0, . . . , k} are independent multinomial distributions based upon d r trials with probabilities {p u , u = 0, . . . , k}. Hence, (2.1) Let D = {d j , j = 1, 2, . . . , n}, the observation (1) implies that E{z ju |D} = E{z ju |d j+u }. This, coupling with (2.1), for j n − k, we get However, (2.2) cannot be used when j > n − k, because d j+u is unobservable. Since a simple estimator, that we call one-step estimator (termed OS), of Ez j is, where p u = 0 when u > k. It is easy to show that E{ẑ j } = Ez j , henceẑ j is an unbiased estimator of Ez j . It is interesting to make a comparison of the one-step estimator and the back-projection estimator. To motivate the back-projection estimator note that so that the back-projection estimator is based on the conditional distribution of d j given z 1 , . . . , z n . Contrarily, the one-step estimator is based on the conditional distribution of z j given the observed data , then the back-projection estimator of λ j is obtained iteratively from which is the combination of the E step and the M step of the EM algorithm (see [7] ). The EM algorithm generally converges very slowly and can be time-consuming, for example, averaging around 10 minutes for each repetition of simulation 1 in Section 4. Also, because the complication of the computation of the back-projection estimator, the used nonparametric techniques to smooth the back-projection are restrictive, for example, the smoother used in [7] , which is easy to compute, but does not perform well at boundaries. However, very little programming effort and time are needed to compute the one-step estimator. Furthermore, it is difficult to establish the asymptotical properties of the back-projection estimators and these are largely unknown. In contrast, as the one-step estimator has a closed form, under the regular conditions given in Appendix A.1, it can be proved that the one-step estimator is asymptotically normal (see Theorem 1 in Section 3). Theorem 1 in Section 3 and the results in Tables 1, 2 and Figure 1 of Section 4 show that the variance of the one-step estimatorẑ j increases for j close to n. Since the infection number near the current time is our primarily interest, it is worthy to make an effort to reduce the variance of the one-step estimator and hence reduce the mean square error for the recent past. A natural method to reduce the variance is applying nonparametric techniques to smooth the one-step estimator over time. As a smoothing method, we choose the local linear model (see [16] ). This method has many good statistical properties. For example, it adapts automatically to the boundary of design points, which is especially important for our problem because our interest is on the boundary. We also note that it may be possible to improve the EMS estimator by using a more reasonable smoother than that which has been used in the EMS algorithm. However, a more reasonable smoother generally means greater computational complexity. Specifically, for the EMS estimator, we need to apply the smooth technique to each iterative EM step of the back projection method, which is a huge computational burden. However, since the one-step estimator has a closed-form, any existing nonparametric smoothing techniques can be used without any extra programming and computational efforts. Write t j = jδ n , where δ n = τ/n, so that t j is the absolute time at the end of the jth interval. Now, z j is the number of new infectives arising in the jth interval so that z 1 , . . . , z n arises from a discretization of an underlying continuous time infection process. Let λ(t) be the intensity of this continuous time process over the interval [0, τ] and η be the size of the underlying population so that we can take Since Λ(t) is differentiable, for any fixed t 0 ∈ [0, τ] and each t close to t 0 , a Taylor expansion gives, where β 1 and β 2 depend on t 0 . This, coupling with Eẑ j = Ez j = Λ j , motivates a local linear model fitted using a locally weighted linear regression. We estimate β = (β 1 , β 2 ) by minimizing where K h (·) = K(·/h)/h, in which K(·) denotes a kernel function and h is a bandwidth. The kernel is introduced so that the local model (2.6) is only applied to the data close to t 0 . Denote the minimizer of (2.7) byβ = (β 1 ,β 2 ) . From (2.7), for fixed t 0 we obtain the closed form estimator, is estimated byΛ(t 0 ) =β 1 and the Λ j are estimated byΛ j =Λ(t j ), j = 1, . . . , n. We refer to these as the smoothed one-step estimates (SOS). In this section, we investigate the asymptotic properties of the one-step and smoothed one-step estimators. Firstly, we consider the one-step estimator. By [21, p. 98] , and Conditions (iv) and (v), it is straightforward to get Theorem 1 implies that the convergent rate ofẑ j depends on j, the convergent rate decreases with j increasing. As a result, the variance of the estimator for the number of infection will increase when j is close to n. The conclusion is confirmed by the simulation studies in Section 4. Denote we have the following theorem for the smoothed one-step estimator. (A.2) in the Appendix. Hence if τ − t j = O(h), then (nh) 1/2 (Λ j − Λ j − h 2 Λ (t j )u 2 /2) → N (0, b(t j , t j )). (3.2) 2. If τ − t 0 = O(1), we have n 1/2 {Λ(t 0 ) − Λ(t 0 ) − h 2 Λ (t 0 )u 2 /2} → N (0,b(t 0 , t 0 )), (3.3) Therefore, the asymptotic bias ofΛ j is bias(Λ j ) = h 2 Λ (t j )u 2 /2, and the asymptotic variance ofΛ j is If τ − t j = O(1), so that t j is not close to the current time τ , increasing h cannot decrease the variance, but does increase the bias and the optimal bandwidth to estimate Λ j is h = 0. By (2.7),Λ j =ẑ j when h = 0. These results suggest that when t j is far away from the current time τ , the smoothing step cannot improve the one-step estimator. This is confirmed by the simulations in Section 4. For τ − t j = O(h) and t j is close to the current time, we need to select the bandwidth h. Theoretically, an optimal local bandwidth is obtained by minimizing the integrated mean squared error given by n j=r [Bias 2 {Λ j } + Var{Λ j }], where r is the time point from which we smooth the one-step estimator. The estimation of the bias can be obtained by the empirical bias approach proposed by Ruppert [28] , which has been proved to work well in related studies (see [23, 24] ). The proof of the theorem shows that the variance-covariance matrix of (β 1 (t), hβ 2 (t)) can be estimated by and The variance ofΛ j is estimated by the (1, 1)-entry of the matrix V with t replaced by t j = τj/n. When h = 0, the variance ofΛ j can be estimated by which is exactly the empirical version of the variance of the one-step estimatorẑ j (see Theorem 1). In the example concerning the SARS epidemic in Hong Kong, we will give a method to determine the point from which we smooth the one-step estimators. In practice, we are interested in the number of the infected individuals in the recent past, that is, the target time always is close to the current time. Hence, generally, we need to smooth the one-step estimator. Since the properties of the back-projection are unknown, we cannot compare the one-step estimator with the back projection methods via theoretical results and instead, we conduct a numerical study. Two models are considered. The first concerns an infection processe without intervention, and z j depends on the size of the infective population just before j. Following traditional infection models, we simulate an epidemic process with hazard function h(t) = 0.05y(t−), where y(t−) is the total number of infectives in the population just before time t. We use a Weibull distribution with shape 1.5 and scale 8 to model the distribution of the incubation time (see Figure 1(a) ). The epidemic commences with 15 infective individuals. We conducted 500 simulations and the average of the total number of infected individuals was 959.08 (sd= 220.19). We obtained the estimates of the incidence curve using the one-step estimator (termed OS), the back projection estimator (termed BP) and the back projection method with a smoothed EM (termed EMS, see [7] ). Figure 1(b) shows the average of the estimated incidence curve over the 500 replications for the OS estimator, the BP estimator and the EMS estimator. Table 1 for each method. The bias is defined by the difference of the estimator from the mean number of cases E[z j ] as generated by the simulation model. From Figure 1 (b) and Table 1 , we see that the BP method has the largest variance and is considerably inefficient. The EMS is biased, particularly for times close to n = 95. In contrast, the proposed one-step estimator yielded a estimator that has much less bias than the EMS and has much less variance than the BP, as a result, has consistently smaller RMSE than the EMS and the BP estimators, and the improvement of the one-step estimate over the EMS and the BP increases as j becomes closer to n. Hence, the one-step estimator is much better than the EMS estimator and the BP estimator. The considerable inefficience of the BP estimator is caused by the well-known ill-posedness of the inverse problem, which can be appreciated by observing the following equation obtained by (2.4): Since p j−i smoothly change over i, as a result, relatively large perturbations of E(z i ), i = 1, . . . , n can give rise to very slight perturbations in the data d j , j = 1, . . . , n and conversely. It follows from this that least squares, minimum χ 2 , or maximum likelihood solutions will be very sensitive to slight changes in the data. Our second simulation considers an infection process with a control factor. The infection process was time dependent with hazard function: h(t) = β(t)y(t), where β(t) = 0.06 for t 40 and β(t) = 0.03 for t > 40, so the hazard drops at t = 40. The epidemic commenced with 20 infective individuals. The results displayed in Figure 1 (c) and Table 2 yield similar conclusions to our first set of simulations. We conducted simulations to compare the performance of the smoothed one-step estimator (SOS) with the one-step estimator (OS) and the smoothed back-projection estimators (EMS). Table 3 gives the bias, SD and RMSE of the resulting estimators for the number of infectives at j = 66, 70, 74, 78, 82, 86, 90, 94 using the SOS estimator with h = 10, the OS and the EMS estimators using the first simulation. From Table 3 we see that the SOS estimator has slightly less variance and less MSE than the OS estimator at time j 74, while the SOS estimator has larger bias and larger MSE than the OS estimator when time j 74. Hence, the SOS estimator is better than the OS estimator when time is close to the present, and the OS estimator is better than the SOS estimator when time is far away from the present. There results are consistent with Theorem 2 in Section 3. Simulations according to the second simulation scenario lead to the same but more confirmative conclusions and are reported in Table 4 . We now test the accuracy of our standard error formula given in Section 3. We provide the results of simulations with n = 100 and data z j = We assume Λ j = 80 + 10(j − 50) 2 + 2j. We generated 500 simulations. For each simulated dataset, we obtained estimates of the incidence curve using the proposed approach with bandwidths h = 0, 0.5, 1 and 2 to test the accuracy of our standard error formulas, where h = 0 corresponds to the OS estimator and h = 0.5, 1 and 2 correspond to the SOS estimator. The standard deviations, denoted by SD in Table 5 , of 500 estimatedΛ j , based on 500 simulations, can be regarded as the true standard errors. The average and standard deviations of 500 estimated standard errors, denoted by SE ave and SE sd , summarize the overall performance of the standard error formula. Table 5 presents the results at the points at j = 10, 20, 40, 60, 80, 90, which correspond to the 10th, 20th, 40th, 60th, 80th and 90th percentiles of the distribution of time. The performance of the standard error formula is quite satisfactory. The SARS epidemic poses one of the most serious global health threats since the AIDS epidemic. Here we use the proposed method to estimate the number of infected cases based on the reported cases over the duration of the epidemic. The daily number of reported cases of severe acute respiratory syndrome is obtained from the Department of Health of the Hong Kong Administrative Region. The first observed case occurred on 11th March 2003, which is set to be j = 0. There were 1150 cases up to 13th April 2003. On 10th April 2003 and 11th April 2003, the trend of the severe acute respiratory syndrome showed an abnormal pattern with 28 and 61 reported cases, respectively. It is suggested that a reporting delay occurred in the previous day, and some of the cases released on 11th April 2003 should be counted as the cases on the 10th April (see [12] ). Averages for the two days, that is 44 and 45 cases, are used in the analysis. There were no infection times reported. But some information exists on the incubation. Tsang et al. [30] suggest that the incubation period varies from 2 days to 11 days; whereas the Department of Health in Hong Kong reports that the incubation period varies from 2 days to 7 days. In view of these statements, Chau and Yip [12] suggested that the parameters of the distribution are chosen to satisfy the followings: i. the minimum incubation time is 2 days; ii. more than 90% of the infections are reported within 7 days of their infections; iii. more than 99% of the infections are reported within 11 days of their infections. Furthermore, Chau and Yip [12] suggested using the Weibull family to model the incubation time. Let U be a continuous random variable representing the incubation time. The Weibull densities have the form: where ζ > 0 and η > 0. The parameter θ represents the minimum incubation time, and ζ and η are the parameters that together determine the shape of the curves. Following Tsang et al. [30] , the Department of Health and the latest two conditions of Chau and Yip [12] , we choose θ = 2, ζ = 0.4057 and η = 1.1793, so, u 0 = u 1 = 0, u 2 = 0.2936, u 3 = 0.2516, u 4 = 0.1763, u 5 = 0.1126, u 6 = 0.0721, u 7 = 0.0382, u 8 = 0.0248, u 9 = 0.0132, u 10 = 0.0075, u 11 = 0.01, where u j = Pr{j < U j + 1}, j = 2, . . . , 10, u 11 = Pr{U 11}. Figure 2 gives the OS estimator for the incidence and the associated 95% pointwise confidence interval. Since the day 19/5 is the end of the SARS epidemic, the information of the infection even on the day 19/5 has already been provided by the data, the OS and EMS estimators are similar, and it is not necessary to smooth the OS estimator based on the whole data. The pattern of the infection curve is in line with the outbreak occuring (see [12] ). The first infection wave, which started around 16th March 2003 in Amoy Garden, a large residential estate made up of many individual blocks. This was initiated by a patient who was treated for chronic renal failure but had been infected by SARS at Prince of Wales Hospital. He visited Amoy Garden on 14 and 19 March 2003 and used the toilet of his brother's flat. After the first wave, the epidemic had spread throughout Hong Kong. In the second wave, there were cluster infections in various hospitals. Two regional hospitals, the United Christian Hospital and the Princess Margaret Hospital, which started admitting SARS patients resulting from the first outbreak around 26th March 2003, both reported local outbreaks in the hospitals. 386 of 1755 infections were medical and healthcare workers. On 10th April 2003, home quarantine was implemented for all households with contacts of confirmed SARS patients. This preventive measure was implemented at the third wave. It seems that this preventive measure was very effective in preventing the spread in the community. Note that the EMS estimator performs quite well retrospectively. However, in general, the current time τ will not be the end of the epidemic but may be some intermediate time when the epidemic is still running its course. To appreciate the performance of the OS estimator, the SOS estimator and EMS estimator under the case when the information is not complete, we use the observed data on and before the day 17th April. The result estimators are displayed in Figure 3 . The result estimators show that the The estimated incidence of SARS and their corresponding pointwise 95% confidence intervals in Hong Kong using the observed data on and before the day 17th April. performance of the EMS method changes rapidly on 5th April from close to the one-step estimator to far away from the OS estimator. That suggests that it may be necessary to smooth the OS estimator after 5th April. Considering that the data before the day 19th May have already provided all the information on the epidemic, we can regard the OS estimator based on the whole data as the true incidence. Therefore, we can approximate the mean squared error (MSE) ofΛ j by n j=1 (Λ j −Λ j ) 2 , whereΛ j is the OS estimator based on the whole data. With the definition, the MSEs of the OS, SOS and EMS estimators based on the data before 17th April are 68.79, 66.59 and 5643.51, respectively, suggesting that the SOS is a little bit better than the OS estimator, the EMS estimator performs poorly when the information of epidemic is not complete, where SOS is obtained by smoothing the OS estimator after the 5th of April with the bandwidth h = 1.5. We choose h using the method described in Section 4. We propose a new nonparametric method to estimate the unobserved infection numbers. The key idea is that we try to estimate the infective number based on the incubation process, which is independent among the infected individuals, rather than directly modelling the infectious process, which is difficult and may be impossible. We develop a simple closed-form expression to estimate the number of infections with the assumption of independent incubation process, which is easy to be satisfied in a real epidemic. Our method is noniterative. The simulations of Section 4 indicate that our method is more powerful, robust, accurate as well as much easier to compute than the back-projection method. As the case counts provide very little information about recent infections, the variance of the nonparametric one-step estimator is large for the recent past. We reduce this by borrowing strength from the estimate of earlier time and although may introduce some limited bias, the resultant estimator has smaller mean square error for the recent past by choosing an adaptive bandwidth. The simulations, the SARS data and the theoretical results show that the smoothing step can improve the estimator for the recent past considerably. The new method performs best if the estimates are only smoothed for times near the current time. If t j is far away from the current time, most of the infected on t j have been diagnosed as cases and hence, "borrowing" the information near t j gives only a marginal increase in the amount of information but can introduce bias, and as a result, increase the mean squared error. On the other hand, if t j is close to the present, the information on the numbers of infected at t j is limited, hence, the "borrowing" the information near t j can increase significantly the amount of information, even in the same time can introduce the bias, but by choosing the suitable bandwidth, the reduction in the variance may be larger than the increase in the bias, resulting in a reduction of the mean squared error. The SOS estimator requires the numbers of infected smoothly change over time. This may not be true when some intervention is implemented. If the numbers of infected do not smoothly change over time, the estimators of the numbers of infected around the time, at which a intervention is implemented, may have a little biased (see Figure 1 (c) for the second simulation in Section 4.1). A varying bandwidth with small value at implementing time point may be helpful to handle with the problem. In addition, the incubation process is estimated from exogenous data, which may add some uncertainty in the proposed estimators. The uncertainty depends on the model and the data from which the incubation process is estimated. These problems, including the model and estimation of the incubation process and their effect on the estimator of infection curve, will be considered in our future work. Epidemiology, transmission dynamics and control of SARS: the 2002-2003 epidemic A preliminary study of the transmission dynamics of the Human Immunodeficiency Virus (HIV), the causative agent of AIDS Nonparametric-estimation of the incubation period of aids based on a prevalent cohort with unknown infection times Estimating the incubation period of aids by comparing population infection and diagnosis patterns Different AIDS incubation periods and their impacts on reconstructing human immunodeficiency virus epidemics and projecting AIDS incidence Statistical studies of infectious disease incidence A method of nonparametric back-projection and its application to aids data A method for obtaining short-term projections and lower bounds on the size of the AIDS epidemic Minimum size of the acquired-immunodeficiency-syndrome (AIDS) epidemic in the united-states Reconstruction and future-trends of the AIDS epidemic in the united-states Discussion of "Backcalculation of HIV infection rates Monitoring the severe acute respiratory syndrome epidemic and assessing effectiveness of interventions in Hong Kong Special Administrative Region The epidemiology of AIDS: current status and future prospects Predictions of the AIDS epidemic in the U.K. -The use of the back projection method Epidemiological and genetic analysis of severe acute respiratory syndrome Local Polynomial Modelling and Its Applications Methods for projecting course of acquired immunodeficiency syndrome epidemic Kernels for nonparametric curve estimation Mathematical-modeling of the transmission dynamics of HIV infection and AIDS -a review Models for infectious human diseases: Their structure and relation to data Elements of Large-Sample Theory An empirical Bayes approach to smoothing in backcalculation of HIV infection rates Nonparametric function estimation for clustered data when the predictor is measured without/with error A double-nonparametric procedure for estimating the number of delay-reported cases Some problems in the prediction of future numbers of cases of the acquired immunodeficiency syndrome in the U.K A statistical perspective on ill-posed inverse problems Backcalculation of flexible linear-models of the human-immunodeficiency-virus infection curve Empirical-bias bandwidths for local polynomial nonparametric regression and density estimation A smoothed em approach to indirect estimation problems, with particular reference to stereology and emission tomography A cluster of cases of severe acute respiratory syndrome in Hong Kong Acknowledgements Lin's research was supported in part by National Natural Science Foundation of China (Grant Nos. 10771148, 11071197). Yip's research was supported by an RGC grant, the Chief Executive Community Project and Hong Kong Jockey Club Charities Trust. Let U denote the incubation time and f (·) be the density function of U . To determine the properties of the estimator, we impose the following regularity conditions on Λ(·), f (·) and the kernel function:i. f (·) is a continuous function with bounded support [0, τ 0 ]; ii. Λ(·) is bounded and continuous function on [0, τ] and Λ (·) is continuous at the point t 0 ; iii. The kernel K is a symmetric density function with bounded support; iv. sup s d s < ∞. For any fixed j,. . , d n are independent random variables. Condition (iv) require that the variance of the number of observed cases in unity interval is bounded. Let a ju = p u / n−j u=0 p u for u = 0, 1, . . . , k, j = 1, . . . , n and rewriteẑ j asẑ j = n−j u=0 a ju d j+u . Here, p u = 0 for u > k. Sincewhere t j = jδ n , a j,s−j is a continuous function of t j and it follows thatis a continuous function of t j and t m . From (A.1), we see that a j,s−j = O(δ n /(min(τ − t j + δ n , τ 0 ))). Then using condition (iv) and noting n s=j a j,Following Fan and Gijbels [16] , the conditions on K(·) and t 0 ∈ [0, τ], we haveexchanging the summation, we haveHence by condition (v), we getFollowing Fan and Gijbels [16] , the conditions on K(·) andwhere A 1 = b(t 0 , t 0 )/τ 2 , A 2 = b (10) (t 0 , t 0 )u 2 /τ 2 , A 3 = b (11) (t 0 , t 0 )u 2 2 /τ 2 , and b (k1,k2) (x 1 , x 2 ) = ∂ (k1+k2) b(x 1 , x 2 ) ∂x k1 1 ∂x k2 2 , k 1 , k 2 = 0, 1.The first part of Theorem 2 follows from (A.3)-(A.6). The second part of Theorem 2 can be proved in the same way described above.