key: cord-0035163-hjmprklx
authors: Nishiura, Hiroshi; Kakehashi, Masayuki; Inaba, Hisashi
title: Two Critical Issues in Quantitative Modeling of Communicable Diseases: Inference of Unobservables and Dependent Happening
date: 2009
journal: Mathematical and Statistical Estimation Approaches in Epidemiology
DOI: 10.1007/978-90-481-2313-1_3
sha: 5ea1cd367141130fc2307241db5c8ca4e188c75d
doc_id: 35163
cord_uid: hjmprklx

In this chapter, we discuss two critical issues which must be remembered whenever we examine epidemiologic data of directly transmitted infectious diseases. Firstly, we would like the readers to recognize the difference between observable and unobservable events in infectious disease epidemiology. Since both infection event and acquisition of infectiousness are generally not directly observable, the total number of infected individuals could not be counted at a point of time, unless very rigorous contact tracing and microbiological examinations were performed. Directly observable intrinsic parameters, such as the incubation period and serial interval, play key roles in translating observable to unobservable information. Secondly, the concept of dependent happening must be remembered to identify a risk of an infectious disease or to assess vaccine efficacy. Observation of a single infected individual is not independent of observing other individuals. A simple solution for dependent happening is to employ the transmission probability which is conditioned on an exposure to infection.

What is special about infectious disease epidemiology? Whenever researchers statistically analyze infectious disease data, two important epidemiologic aspects, which differ from the epidemiology of non-communicable diseases, must be remembered.

The first is concerned with observable events. Whereas onset events (e.g. onset of fever and appearance of rash) are directly observable in the field (with or without reporting delay), both infection event and acquirement of infectiousness are unobservable without very rigorous contact tracing and experimental (e.g. microbiological) efforts. Besides, almost all models for the population dynamics of infectious diseases have employed a number of assumptions for unobservable events. Observable intrinsic parameters, which characterize the natural history of infection and epidemiologic characteristics of the spread of disease in the absence of public health interventions, must be systematically quantified and employed for infering unobservable events in order to appropriately describe the transmission dynamics.

The second issue is the so-called dependent happening, i.e., observation of a single infected individual is not independent of observing other individuals. Because of the dependence, our population can enjoy herd immunity. Moreover, a necessity arises for theoretical epidemiologists to study infectious disease dynamics using non-linear models. To conduct sound statistical analyses, we should always bear in mind that it is inappropriate to directly apply the concept of relative risk and odds ratio in epidemiology of non-communicable diseases to any assessments of communicable diseases (especially, when the diseases are not endemic). For example, when we evaluate vaccine efficacy, it is far more feasible to employ the ratio of (conditional) probabilities of infection per contact among vaccinated to unvaccinated than directly using the relative risk of infection (which would inform population effectiveness of vaccination).

This chapter is composed as follows. In Section 2, epidemiologic definitions of two observable intervals, i.e., incubation period and serial interval, are discussed. For illustration, we show how the incubation period and serial interval inform infection events in the simplest settings. In Section 3, these two epidemiologic measurements are effectivelly used to capture the dynamics of infectious diseases. The backcalculation method and the estimation of the generation time are briefly reviewed. In Section 4, the concept and definition of vaccine efficacy and effectiveness of vaccination are considered. Dependent happening is comprehensively reviewed in light of causal inference (i.e. identification and quantification of the average causal parameter of effect in a population). A simple methodological solution for the dependent happening follows in Section 5. In particular, the usefulness of household secondary attack rates for estimating vaccine efficacy is reviewed, and the impact of different types of vaccine efficacy on the reproduction number is discussed using a simple dynamic model.

The first issue is motivated by a need to improve limited practical utility of the well-known SEIR (susceptible-exposed-infectious-recovered) model with respect to the assumption of intrinsic parameters (e.g. latent and infectious periods) and its use in quantifying the transmission potential. As we mentioned above, the event of acquiring infectiousness is not directly observable (i.e. in reality, individuals in latent and infectious periods are not distinguishable without microbiological and contact-frequency information), whereas symptom onset of an apparent disease is readily observed and reported. In addition, infection events are not directly observable for the majority of directly transmitted diseases (an exception is seen in sexually transmitted infections where the contact is countable by recall effort). Although several theoretical studies have implicitly assumed that the latent period is exactly the same as the incubation period, acquisition of infectiousness and symptom onset differ clearly by definition and are not directly related [5, 73] . These facts considerably affect the applicability of previous SEIR models that did not take into account these differences. Besides, compartments I and/or R of classical SIR and SEIR models have been fitted to the observed (and mostly onset) data to derive some parameter estimates, although the observed data do not necessarily measure either the theoretically defined I or R. Therefore, it should be noted that both SIR and SEIR models do not clearly highlight the observable events in field epidemiology. This complicates the application of theoretical models to observed data.

To resolve this issue, it is essential to understand how the observable intrinsic measures are defined and how we should effectively use these epidemiologic measurements to translate observable to unobservable information. Since onset event is directly observable, two epidemiologic intervals, both of which are concerned with symptom onset of a disease, would be useful. The first is the incubation period, defined as the time from infection with a microorganism to symptom development [16, 73] . The second is the serial interval, defined as the time since onset of a primary case to onset of the secondary case caused by the primary case [41] . In the following subsections, these two intervals are separately discussed in relation to the identification (i.e. statistical inference) of infection events.

The incubation period of infectious diseases ranges from the order of a few hours, which is common for toxic food poisoning, to a decade (or a few decades) as seen in the case of tuberculosis, AIDS and variant Creutzfeldt-Jakob disease (vCJD). Since symptom onset reflects pathogen growth and invasion, and excretion of toxins and initiation of host-defense mechanisms, the length of the incubation period varies largely according to the replication rate of the pathogen, the mechanism of disease development, the route of infection and other underlying factors.

The incubation period of infectious diseases offers various insights into clinical and public health practices, as well as being important for epidemiologic and ecological studies. In clinical practice, the incubation period is useful not only for making rough guesses as to the causes and sources of infection of individual cases, but also for developing treatment strategies to extend the incubation period (e.g. antiretroviral therapy for HIV infection [16] ) and for performing early projection of disease prognosis when the incubation period is clearly associated with clinical severity due to dose-response mechanisms (e.g. diseases caused by exotoxin) [74] . Moreover, during an outbreak of a newly emerged directly transmitted disease, the incubation period distribution permits determination of the length of quarantine required for a potentially exposed individual (i.e. by restricting movement of an exposed individual for a duration sufficiently longer than the incubation period) [36] . Further, if the time lag between acquiring infectiousness and symptom onset appears long (i.e., if the incubation period is relatively long compared to the latent period), it implies that isolation measures (e.g. restriction of movement until the infectious individual loses infectiousness) are likely to be ineffective, complicating disease control [42] .

Understanding the incubation period distribution also enables statistical estimation of the time of exposure during a point source outbreak [90] as well as a hypothesis-testing to determine whether the outbreak has ended [20] ; the former is discussed below. The distribution is also useful in statistical approaches of epidemic curve reconstruction and short-term predictions of slowly progressing diseases; the backcalculation method uses the incubation period to estimate HIV prevalence and project the future incidence of AIDS [19] . During the last decade, this method has also been extended to prion diseases such as Bovine Spongiform Encephalopathy (BSE) [31] and vCJD [24] . The backcalculation method is briefly discussed in the next section. This approach has also recently diverged to quantification of the transmission potential of diseases with an acute course of illness [35] and infectiousness relative to disease-age [78] . Moreover, in cases such as the short and long incubation periods of Plasmodium vivax malaria in temperate zones, the incubation period also enhances ecological understanding of adaptation strategies; in temperate zones, clearly separate bimodal peaks with approximate lengths of 2 and 50 weeks are observed [79] , helping malaria transmissions continue over the winter season when transmission is usually greatly reduced due to seasonal entomologic characteristics.

The epidemiologist Philip E. Sartwell (1908 Sartwell ( -1999 contributed most to the foundation of the incubation period distribution modeling [73, 90] . Dr. Sartwell initially found that the incubation period of acute infectious diseases tends to follow a lognormal distribution, and applied such distribution to various diseases. Observing that the distributions often skewed to the right, Dr. Sartwell suggested the use of two parameters (i.e. an estimated median, which is also the geometric mean due to the characteristics of the lognormal distribution, and a dispersion factor as a measure of variability) rather than the sample mean and standard deviation. The lognormal distribution has a probability density function (pdf) of the form:

for x > 0, where μ and σ are the mean and standard deviation of the variable's logarithm. The lognormal assumption for the incubation period was further extended to the estimation of the time of exposure during a point source outbreak. The theoretical basis is illustrated in Fig. 1 , the logic of which is explained in the following. Since all cases in a point source outbreak share the same time of exposure, the epidemic curve, which is drawn according to the time of onset (i.e. incidence), is equivalent to the incubation period distribution (Fig. 1) . Suppose that the median point of the case frequency was observed x days after exposure and, further, that there are 100α percentile points on both sides of the observed distribution (upper and lower percentiles 100α where 0 ≤ α ≤ 1) with the distances from the median to both percentiles points being a and b days, respectively, the following relationship is given (because the logarithm follows normal distribution)

which is rearranged as

Consequently, the time of exposure can be inferred using the distance from the time of exposure to the median, x, by taking the distances to any equal percentiles on both sidesx

Since recall bias (i.e. the extent of imperfection by recalling events in the past) is unavoidable in retrospective epidemiologic studies of food poisoning requiring huge efforts of food traceback, this method appears to be very useful in determining the most plausible time of exposure and narrowing down the amount of information to be traced. The classic method likely includes sampling errors and does not achieve acceptable precision. More precisely, estimation of the time of exposure is addressed, statistically, by precise solution of the three-parameter lognormal distribution [58, 95] . Let γ be the time of exposure, the pdf of the three-parameter lognormal distribution is given by

for x > γ . In other words, the statistical issue of the estimation of time of exposure can be replaced by the estimation of the threshold parameter of a standard 3-parameter distribution of the incubation period.

It should be noted that we have limited explicit explanations for the biological validity of assuming lognormal distribution for the incubation period. The fundamental biological reason to assume a lognormal distribution is related to an inoculation study of ectromelia virus (mouse pox) [38] , which suggested exponential growth of pathogens within the host during the initial phase. Another similar study suggested that a fixed threshold of pathogen load likely exists when the host response is observed [71] . In other words, what we have learnt to date can be described as follows: if the growth rate of a microorganism is implicitly assumed to follow normal distribution, and if there is a fixed threshold of pathogen load at which symptoms are revealed due to the host response, exponential growth of microorganisms should result in an incubation period sufficiently approximated by a lognormal distribution [73] . However, the host-defense mechanism, which is almost entirely responsible for symptom onset, was later shown to be far more complex than previously expected. For example, fever is induced by very complex reactions and by several factors including circulating cytokines such as interluekin-2 [72] . Thus, whereas the lognormal distribution may be applied to the incubation periods of many acute infectious diseases, it is necessary to bear in mind that the assumption is supported only by previous experience. When other distributions (e.g. gamma and Weibull distributions) are alternatively chosen to model the incubation period, at least, the statistical issue of inferring time of exposure (during a point source outbreak) can be addressed by estimating threshold parameter for these distributions (i.e. as it can be done with Equation (5)).

The serial intervals are observed when contact tracing is performed as a control measure. The transmission network is then observed, which represents the chain of transmission as a function of calendar time that yields the information of who acquired infection from whom. This type of information has been explored to assess the number of secondary transmissions over the course of an epidemic [56] and to evaluate individual variations in transmission [66] , but it also enables us to obtain the serial interval [32, 65, 80, 97] . Using this information, here we consider a method to infer the relative infectiousness of infected individuals to certain disease-age (i.e. the time elapsed since onset of disease).

Specifically, we consider a situation when researchers would like to gain some information of the relative frequency of infectiousness or of secondary transmissions with respect to the time elapsed since infection or since onset of disease. Here we give an example of the relative infectiousness of smallpox to disease-age.

The infectious period has traditionally been defined as the period in which pathogens are discharged [7] . It presently refers to the period in which infected individuals are capable of generating secondary cases. Knowledge of the infectious period allows us to determine for how long known cases need to be isolated and what should be the latest time point after exposure at which newly infected individuals should be in isolation. However, as we mentioned above, infectiousness itself is unobservable, and thus, some inferential techniques to quantify this complicated index are called for.

One approach to addressing this issue is to quantify how the pathogen load changes over time using the most sensitive microbiological techniques (e.g. polymerase chain reaction), but such observations are usually limited to the period after onset of symptoms. Several attempts have been made to measure the distribution of the virus-positive period of smallpox cases [32, 89] , but sample sizes were small and only very few samples could be obtained during the early stage of illness. Moreover, linking virus-positive results to the probability of causing secondary transmission is difficult without further information, especially about infectious contact (e.g. frequency, mode and degree of contact).

Another way of addressing this complicated issue is to determine the frequency of secondary transmission relative to disease-age [78] . An estimate of the relative infectiousness is obtained by analyzing historical data in which it is known who acquired infection from whom. The known transmission network permits serial intervals to be extracted, i.e. the times from symptom onset in a primary case to symptom onset in the secondary case [41, 80] . Given the length of the serial interval s and the corresponding length of the incubation period f , the disease-age l from onset of a symptom in primary case to secondary transmission satisfies

Considering the statistical distributions for each length results in a convolution equation:

The frequency l(t − τ ) of secondary transmission relative to disease-age can be backcalculated by extracting the serial interval distribution s(t) from a known transmission network, and by using the incubation period distribution f (τ ) which is assumed known. This concept is illustrated in Fig. 2A . If we have information on the length t i of the serial interval for n cases, the likelihood function is given by

The parameters that describe the frequency of secondary transmission relative to disease-age can be estimated by maximizing this function. Their times of onset are t m , t l and t k , respectively. Using the difference of the disease onset (serial interval) t k − t l together with the distribution of the incubation period, the disease-age specific probability of transmission from case l to case k is obtained. B. Expected daily frequency of secondary transmissions with corresponding 95% confidence intervals. The disease-age t = 0 denotes the onset of fever. The illustration was drawn by the author with reference to [76, 78] Figure 2B shows the back-calculated infectiousness of smallpox relative to disease-age [76, 78] . When the frequency is discussed as a function of disease-age of smallpox, day 0 represents the onset of fever. Before onset of fever (i.e. between day -5 and day -1) altogether only 2.7% of all transmissions occurred. Between day 0 and day 2 (i.e. in the prodromal period before the onset of rash) a total of 21.1% of all transmissions occurred. The daily frequency of passing on the infection was highest between day 3 and day 5, yielding a total of 61.8% of all transmissions. These estimates help determine the latest time by which cases should be in isolation. If each primary case infects on average 6 individuals (i.e. R 0 = 6), and if the efficacy of isolation is 100%, the isolation of a primary case before the onset of rash reduces the expected number of victims to 6 × (0.027 + 0.211) = 1.428. In other words, Fig. 2B implies that isolation could be extremely effective if performed before onset of rash and that delayed isolation of symptomatic smallpox cases could still be effective if performed within a few days after onset of rash. Consequently, we can expect that optimal isolation could substantially reduce the number of secondary cases, and the outbreak could quickly be brought under control by additional countermeasures (e.g. contact tracing [34] ).

Nevertheless, it should be noted that the relative frequency of secondary transmissions tends to be biased by various factors in observation: small sample size of serial intervals may have been influenced by local factors such as differences in contact behavior and mobility of cases. Unless extrinsic factors (e.g. isolation measure and behavioral changes) were explicitly adjusted in the statistical model with more detailed data, the estimated infectiousness several days after appearance of rash would be underestimated. This could partly explain a disagreement of Fig. 2B with a previous epidemiologic study [35] in which the number of secondary cases generated during the prodromal period was estimated as 8.2% of the overall transmission potential.

We then consider how the incubation period and serial interval play their roles in translating observable to unobservable information. Two practical issues are discussed as examples. The first is the so-called backcalculation method which has been effectively employed to estimate the total number of HIV-infected individuals in a population using the incubation period of AIDS and AIDS incidence [75] . The second is concerned with the statistical estimation and mathematical definition of the generation time which is interpreted as the time interval between infection of a primary case and infection of a secondary case caused by the primary case [94] . As will be shown using Euler-Lotka equation in the second subsection, probability density function of the generation time would be a critically important distribution for the estimation of the basic reproduction number, R 0 , using the intrinsic growth rate of an epidemic. In line with this, analytical insights into the relationship between the serial interval and generation time are discussed.

Whereas the number of AIDS cases is thought to be relatively accurately reported and documented in industrialized countries, asymptomatic HIV infections are seldom noticed unless the infected individual undertakes a voluntary blood test or develops the disease. Backcalculation uses the statistical distribution of the incubation period as key information, and is frequently applied to HIV/AIDS in industrialized countries where the previous AIDS incidence can be assumed to be confidently diagnosed and reported [17, 18, 43] . The epidemic curve for HIV is reconstructed using AIDS incidence and the incubation period, enabling estimation of HIV prevalence and short-term projections of AIDS incidence.

The long incubation period of HIV infection enables assessment of the extent of the epidemic during its course. Backcalculation uses AIDS incidence data at calendar time t, a(t), and the incubation period distribution at time τ after infection, ω(τ ), to reconstruct the number of HIV infections with calendar time. Assuming that documentation of diagnosed AIDS cases is not significantly delayed, and assuming the impact of antiretroviral therapy on the length of the incubation period is negligible in the simplest setting, the fundamental relationship is given by the following convolution equation

where h(t − u) is the number of HIV infections at calendar time t − u. The basic idea of backcalculation is to estimate h(t) using known a(t) and ω(u). It should be noted that the structure of this simple convolution equation is principally the same as what we discussed with Equation (7). Here, to ease understanding of the deconvolution procedure, Equation (9) is considered in discrete time [10, 26] . Since surveillance-based data of AIDS incidence is obtained for a certain interval, t (e.g. every 2 or 3 months), the following equation is obtained

Assuming that h t is generated by a nonhomogeneous Poisson process, a t is an independent Poisson variate. Thus, the likelihood, which is needed to estimate HIV infections (and, sometimes, the parameters of incubation period distribution), is proportional to

where r t is the observed number of AIDS cases at calendar time t and T is the most recent time of observation. The shape of the curve of HIV infections, h t , is usually modeled parametrically or non-parametrically [11, 14] . The main sources of uncertainty arise from uncertainties in the incubation period distribution, the shape of the HIV infection curve, and AIDS incidence data [87] . Short-term predictions are obtained based on estimated numbers of HIV infected individuals who have not yet developed AIDS. However, it should be noted that backcalculation such as this provides no information about future infection rates and little information about recent infection rates [39] . Further details of the backcalculation method are described elsewhere [19, 23, 61] .

We consider the generation time using a renewal equation:

where j(t) is the number of new infections (i.e. incidence) at calendar time t and A(τ ) is the integral kernel informing the rate of secondary transmissions per single primary case at infection-age τ (i.e. the time elapsed since infection). When the incidence increases with constant (intrinsic) growth rate r 0 (i.e. when j(t) = k exp(r 0 t) where k is constant), the Equation (12) is simplified as

which is referred to as the Euler-Lotka equation. Since the integral kernel A(τ ) directly informs R 0 , defined as the average number of secondary cases generated by a single primary case in a fully susceptible population [28] [29] [30] , by (14) and because the density function of the generation time, g(τ ), can be interpreted as the frequency of secondary transmission relative to infection-age τ , i.e.,

the Euler-Lotka equation (13) offers an interpretation,

representing the relationship between R 0 and the probability density function of the generation time, g(τ ). From the initial growth phase of an epidemic, the intrinsic growth rate, r 0 , i.e. the intrinsic rate of (natural) increase for infected individuals [33] , is estimated, and R 0 can be subsequently estimated using the Equation (16). Thus, the generation-time distribution has been recognized as playing a key role in estimating the transmission potential of a disease [86, 96] . In many instances, R 0 has been inferred from real-time growth data by using the estimate of r 0 and by assuming that the generation-time distribution is known. However, it is very difficult to estimate the generation-time distribution in practice, because infection events are seldom directly observable. Indeed, the estimation methods of the generation time and its sampling scheme have yet to be developed. Previously, the distribution of the generation time (or, at least, the mean generation time) was implicitly (and wrongly) assumed to correspond exactly to that of serial interval. However, this is not the case when the incubation period of the primary case depends on the time from onset to secondary transmission [94] and even the means are different when we deal with diseases with asymptomatic secondary transmissions (which will be discussed below). Figure 3 illustrates an interpretation of the relationship between serial interval S, incubation periods F 1 and F 2 , and generation time G in the absence of asymptomatic cases (i.e. where there is no infected individual who does not exhibit any symptoms throughout the course of infection). We denote the time from onset of primary case to secondary transmission by L (note that L can be negative if pre-symptomatic transmission occurs). The serial interval S is given by

which is interpreted as the sum of the generation time and incubation period of the secondary case minus the incubation period of the primary case. Thus, if G, F 1 and F 2 were independent random variables, the serial interval distribution would be Fig. 3 The relationship between generation time and serial interval. Given the serial interval, S, and incubation periods of primary and secondary cases, F 1 and F 2 , generation time G is expressed as

the convolution of the generation time and incubation period distributions followed by the cross-correlation of this convolution and the incubation period distribution (However, it should be noted that it is frequently biologically more natural to assume that F 1 and G are dependent). As it is intuitively clear from Equation (17), the mean serial interval would be expected to be identical to the mean generation time, provided that all infected individuals developed symptoms.

In the presence of asymptomatic secondary transmissions, caused by those who were infected and have not developed symptoms yet, and also by those who were infected and will not become symptomatic throughout the course of infection, the interpretation of relationship between S and G is confused [60] . Figure 4 illustrates the most precise, but yet simplistic, model of a directly transmitted disease, accounting for the presence of asymptomatic secondary transmission. Following infection, asymptomatic individuals, i 1 (t, τ ), develop disease at the rate η(τ ) or recover from infection without developing any symptoms at the rate γ 1 (τ ), where τ is the infection-age. Symptomatic individuals, i 2 (t, σ ) recover from (or die of) infection at the rate γ 2 (σ ) where σ is the disease-age. Assuming further that the rates Following infection, all infected individuals experience asymptomatic state i 1 (t, τ) where τ is infection-age representing the time elapsed since infection. Asymptomatic infected individuals will either develop symptom at the rate η(τ ) or recover from infection without developing disease at the rate γ 1 (τ ). Symptomatic individuals are denoted by i 2 (t, σ ) where σ is the disease-age representing the time elapsed since onset of a disease of asymptomatic and symptomatic secondary transmissions are, respectively, β 1 (τ ) and β 2 (σ ) and that the initial number of susceptibles is S 0 , the linearlized system (for initial growth phase of an epidemic) is governed by the following McKendrick equations:

Integrating the McKendrick Equations (18), (19) , (20) , (21) along the characteristic lines, and ignoring contribution from the initial data, we get the following renewal equations:

where j 1 (t) and j 2 (t) are, respectively, the numbers of new infections and new onsets at calendar time t (i.e., j 1 (t) := i 1 (t, 0) and j 2 (t) := i 2 (t, 0)) and the remaining functions are defined as

Thus, A 1 (τ ) and A 2 (σ ) are interpreted as the rate of asymptomatic and symptomatic secondary transmissions, respectively, per single primary case at infection-age τ and disease-age σ . α is the probability that an infected individual ever develops symptoms. f (τ ) gives the probability density of the incubation period of length τ .

Replacing j 2 (t) in the right-hand side of (22) by that of (23), we get

where

The Equation (28) describes the renewal process of newly infected individuals, and thus, the basic reproduction number, R 0 , is given by

Consequently, the mean generation time, T g , is calculated as

where θ is the proportion of asymptomatic transmissions (0 ≤ θ ≤ 1) among the total number of secondary transmissions, i.e.,

and L 1 , L 2 and F are

which are interpreted as the mean infection-age of asymptomatic transmission, the mean disease-age of symptomatic transmission and the mean incubation period, respectively.

Although we omit further technical details for simplicity (see [60] for original descriptions, in particular, of the analytical expression of the integral kernel A+(σ )), the mean serial interval, T s,multi , can be analytically derived from another renewal equation of symptomatic infected individuals:

which leads to

where R 1 , which is assumed to be less than unity, is the average number of asymptomatic transmissions per single asymptomatic infected individual (i.e. the reproduction number for asymptomatic transmission), expressed as

which can also be written as θ R 0 , and Q := ∞ 0 A + (σ )dσ is what we call the state reproduction number for the symptomatic class (i.e. the average number of symptomatic secondary transmissions per single primary symptomatic case during its entire course of infectiousness [60] ). Here, it must be noted that T s,multi is what we call multi-step serial interval defined as the average length from the primary symptomatic cases to the secondary symptomatic cases who are infected either directly from the primary case or indirectly by way of asymptomatic cases. Rather than this, classic definition of the mean one-step serial interval, T s,one , is the period from observation of symptom onset in one case to observation of symptom in a second case directly infected from the first (i.e. indirect transmission is unobservable, and thus, was not explicitly taken into account in the tranditional definitions given by Pickles [84] , Hope Simpson [59] and Bailey [7] ). T s,one is much easier than (37), and expressed as

which is exactly what we discussed in Section 2.2 using Equation (6) . Consequently, we get the following relationship

where equality holds if there is no asymptomatic transmission (which leads to R 1 = 0 or θ = 0; see [60] for further details). In other words, it is analytically proven that the mean lengths of one-step and multi-step serial intervals are longer than the mean generation time, as long as asymptomatic transmission exists. In this way, key unobservable information has to be estimated mainly by extracting observable and quantifiable parameters. If one would like to focus on symptom onset as observable event, then the incubation period and serial interval, both definitions of which are concerned with onset event, would play the most important roles among all epidemiologic measurements to translate observables to unobservables. To widen the applicability of mathematical models of infectious diseases, it is essential to construct a theory by observable ingredients and derive an estimator to address the issue of unobservability of infection event and acquirement of infectiousness.

As seen in the origin of field epidemiology (i.e. an identification of the source of environmental contamination with cholera, which is believed to have been initially suggested by John Snow), causal inference has played a central role among all epidemiologic disciplines. In particular, epidemiologic studies of chronic illness have been (and will be) focused on the cause of disease to find potentially effective preventive measures and therapeutic methods. The challenges posed by chronic illness have pointed out to epidemiologists the multifactorial complex nature of disease causality, which has been referred to as a web of causality. Appropriate epidemiologic designs and sound statistical approaches to address the relevant issue have been the main interests among general epidemiologists [44, 57] .

Of course, efforts on the similar point have to be made for clarifying useful prevention strategies against infectious diseases, but it must be remembered that the epidemiology of directly transmitted infectious diseases is rather different from other (e.g. chronic) non-communicable diseases in that the disease spreads from person-to-person. That is, observation of a single infected individual is not independent of observing other individuals in a population of interest [63] . If this is the case, the usual formulation of risk assessment parameters, such as odds ratio, relative risk and risk difference, which are so useful in chronic disease epidemiology, do not offer stable assessments of risk for factors that affect contagion [64] . We first illustrate this concept in the next subsection and thereafter discuss the definitions and properties of vaccine efficacy, and direct and indirect effects of vaccination.

In the epidemiology of non-communicable diseases, causal relationship between disease and a single risk factor is usually measured by examining relative risk (synonymous: risk ratio) or attributable risk (which will be denoted by RR and AR, respectively). For example, supposing that the frequencies of lung cancer among smokers and non-smokers are p 1 and p 0 , RR and AR of smoking with respect to the development of lung cancer are calculated as

Therefore, if the risk ratio is greater than 1, we suspect that smoking elevated the risk of lung cancer, which is useful to discuss the causality. Moreover, the attributable risk is useful to quantify the impact (or contribution) of smoking on (to) development of lung cancer. The similar simple discussion can be applied to the frequencies of Japanese encephalitis cases among vaccinated and unvaccinated individuals, denoted by p v and p u , respectively. Since the natural reservoir of Japanese encephalitis is believed to be swine (and other animals including birds), and because human is belived to be dead-end host (i.e. who does not generate secondary infections including infection among mosquitoes), we can ignore the issue of dependence, at least, for now. Then, the relative risk of vaccination with respect to infection with Japanese encephalitis virus is given by RR in Equation (41) and, subsequently, the vaccine efficacy, VE, is evaluated as

which has been a fundamental idea in field epidemiology [45, 83] . Here's an example:

Vaccination program for prevention against Japanese encephalitis was conducted in a population where the disease is endemic. The cases are constantly observed over time and, thus, we assume the disease is in an endemic equilibrium. Among vaccinated individuals, 20% experienced infection. On the other hand, 80% of unvaccinated individuals experienced infection. The relative risk is

and thus, we expect that the vaccination was effective because RR < 1. Further, the vaccine efficacy is

From these, we conclude that the risk of Japanese encephalitis among vaccinated individuals was 0.25 times as large as that among unvaccinated individuals and moreover, the vaccine efficacy was estimated at 75%.

This simple discussion required two of the key assumptions. The first is the endemic equilibrium in which the frequency of infection would not be influenced by time effect. The second is the independence between individuals. In statistical terms, the latter is referred to as no interference [25] or stability assumptions [88] .

Epidemiology should have been much easier, if we could directly attribute the population effectiveness to the average causal effect at an individual level. In addition to the basics, infectious disease epidemiologists have to account for dependent happening, the simplest illustration of which is given in Fig. 5 . We consider the generation of cases, where each primary case causes 2 secondary cases in the absence of vaccination (Fig. 5A) .

What happens if a portion of this population was vaccinated? In Fig. 5B , two individuals were vaccinated prior to the outbreak and were uninfected. Not only these two vaccinated individuals, but also unvaccinated two individuals (who had been expected to be cases in the absence of vaccination) were uninfected, due to the protection of a vaccinated individual. Protection among the two unvaccinated individuals can be deemed indirect effect of vaccination, which was caused by dependence between individuals [54] .

We consider this issue using response variables X 0 and X 1 for unvaccinated and vaccinated populations, following a series of studies by Halloran [53] [54] [55] . Since the response of interest is infection, which is dichotomous, we write X i = 1 if infected under treatment i and X i = 0 if uninfected under treatment i where i = 1 or 0. The causal effect of vaccination, T, is usually measured by attributable risk (see Equation (42)) as the average of the individual effects, and more strictly speaking, is expressed as the difference between the expected value of the potential outcomes 

Since we cannot observe the potential outcomes of each individual under each intervention, we have to rewrite Equation (46) to reflect each individual's potential outcome under the intervention that she/he used. Let Y be the particular intervention that an individual used (i.e. Y = 1 and 0 for vaccinated and unvaccinated), the actual observable difference, A, is

where E(X i | Y = i) is the average of the potential outcomes among individuals who received intervention i. Under two assumptions, i.e., non-interference and independence, T and A are assumed equal [50] . Nevertheless, if the population expected value depends on fraction of vaccinated (due to indirect effect), the relation does not hold, i.e.,

Therefore, directly applying risk assessment parameters, such as relative risk and attributable risk, to the assessment of a specific risk (or evaluation of vaccine efficacy) of communicable diseases would be unfortunately flawed [53, 64] .

Let us compare two different small populations, each with 25 individuals (Fig. 6) . The vaccination coverage of population A is 20%, whereas that of population B Assuming that the contact patterns are homogeneous and not different between A and B, the risk of infection for a striped individual in population B is smaller than that in population A is 80%. Suppose that the contact pattern is homogeneous in both populations, and assuming that the frequencies of contact are not different between A and B, how different are the risks of infection of a vaccinated individual in the center of these populations? Obviously, the risk in population A is higher than that in B, because individuals surrounding striped (vaccinated) individual in population A are mainly unvaccinated. In other words, even if the vaccination does not offer perfect protection, individuals in population B can enjoy better community benefit compared to that in population A. The community benefit extends to those who would have been infected by the vaccinee, had he developed the disease. Consequently, vaccinees are not only protected to their own benefit, but to the benefit of the community, and moreover, unvaccinated individuals are susceptible not only to their own adversity, but to the adversity of the community. The degree of community protection is referred to as herd immunity [40, 62, 77] . When it comes to the assessment of vaccination, this is also referred to as indirect effect of vaccination. Because of the presence of herd immunity, disease eradication (e.g. of smallpox) was (and can be for other diseases) achieved without vaccinating all susceptible individuals. And, this concept indeed results in a well known control relation to achieve a vaccination coverage, c, which is sufficient to eradicate a disease in a randomly mixing population, i.e.,

where κ is referred to as critical coverage of vaccination for eradication, is vaccine efficacy and R 0 is the basic reproduction number of a disease [2, 93] . It should be noted that the threshold principle itself may better account for individual heterogeneity (i.e. variance of contact frequency) to precisely reflect realistic contact patterns. Let the mean and variance of contact rate be m and σ 2 , respectively, and let us assume that the transmission mechanism is described by the so-called frequency dependence [15, 27, 70] . If the distribution of contact rate is explicitly taken into account, R 0 in the heterogeneouxly mixing population is expressed as

where R 0,random denotes the basic reproduction number without individual heterogeneity (where σ 2 = 0) [1, 6] . Assuming that the vaccination takes place independently (i.e. independent of contact) and that the vaccine effect is irrelevant to secondary transmission (i.e. not reducing infectiousness of vaccinated individuals), R 0 in Equation (50) directly applies to the right-hand side of Equation (49) [37] . If the distribution is extremely right-skewed (e.g. σ → ∞), this leads R 0 → ∞, making it impossible to control the disease by means of mass vaccination only [22, 69] . Rather than discussing the herd immunity threshold of a disease using mathematical models (which can be found elsewhere [3, 4, 40, 81] ), here we emphasize the issue of dependent happening which complicates statistical estimation of vaccine efficacy using epidemiologic observations. Although the effectiveness of vaccination reflects the result of protection of a vaccinated population and can be measured using observed data, this would not be connected to average causal effect at an individual level (i.e. vaccine efficacy) which would have been identical to the effectiveness under the stability assumption.

The definitions of direct and indirect effectiveness of vaccination (i.e. protection at a population level) were formulated using final sizes of an epidemic (i.e. fraction of those who experienced infection during an epidemic among a total of susceptible individuals) among vaccinated and unvaccinated groups, z v and z u [47] and the relevant discussions on epidemiologic study design have been made using these definitions (which can be found elsewhere [51, 55] ). The definition also uses another final size which would have been observed in the absence of vaccination, z c . The direct effectiveness, DE, indirect effectiveness, IE, and the total effetiveness, TE, of vaccination are respectively defined as

which measures the benefit to an vaccinated individual, the overall benefit of the vaccination program to unvaccinated people and vaccinated people, respectively. Moreover, if we define the average risk of infection in the study population, z 0 , as

where p is the vaccination coverage, the average effectiveness is defined as

which measures the overall benefit of the vaccination program to the entire population. Since DE does not directly inform vaccine efficacy, VE, because of dependent happening, another definition of field efficacy, FE, has to be defined as

as a solution, where β v and β u are the transmission rates among vaccinated and unvaccinated, respectively (please continue reading for the details of their roles in a population).

In a randomly mixing population, the relationship between FE and DE is analytically interpretable [47] . Here we consider this relationship as well as an analytical interpretation of IE. Specifically, we theoretically consider two different types of vaccines. The first is the so-called leaky vaccine which would not offer perfect protection from disease but would reduce the susceptibility among vaccinated. The second is all-or-nothing type which offer perfect protection among a portion of vaccinated individuals.

Let the numbers of vaccinated susceptible, infectious and recovered individuals be S v , I v and R v , respectively. Similarly, the numbers of unvaccinated susceptible, infectious and recovered individuals are, respectively, denoted by S u , I u and R u . When a leaky vaccine is considered, we assume S v (0) = N v and S u (0) = N u where N v and N u are the total number of vaccinated and unvaccinated individuals, respectively. Assuming that the recovery rate γ is independent of vaccination, the dynamics of vaccinated individuals are described by

Similarly, the dynamics of unvaccinated individuals are described by

As written above, because we assume that there was no immune individuals (due to infection) prior to an epidemic, R v (0) = R u (0) = 0. The final size equations are subsequently derived as

where p(:= N v /(N u + N v )) is vaccination coverage (as utilized in (54)). Taking the ratio of (63) to (64), we get

From (65), we observe that DE approximates FE of leaky vaccine in a randomly mixing population, but it is also clear that DE is always smaller than FE and that FE is more appropriate estimator to attribute observation at a population level to the average individual effect (i.e. efficacy) of vaccination. Therefore, we'd better use the ratio of transmission rates (and, more precisely, the ratio of transmission probabilities per contact; see next section) among vaccinated to unvaccinated, rather than using the ratio of the numbers of infected individuals, to appropriately interpret the causal effect of vaccination. When an all-or-nothing vaccine is considered, we assume

where α is regarded as field efficacy under the all-or-nothing assumption (0 ≤ α ≤ 1). Since vaccines of this type (theoretically) do not reduce susceptibility, we assume that the transmission rates are identical, i.e., β := β v = β u . Assuming again that the recovery rate γ is independent of vaccination, the final sizes satisfy

It should be noted that (66) and (67) result in

which conincides with DE. In reality, the leaky assumption may reflect the so-called imperfect vaccines (e.g. vaccines against influenza, malaria and various bacterial diseases), whereas the all-or-nothing assumption may be the case for vaccines against viral diseases with narrow antigenic diversity (e.g. measles and smallpox).

Estimation of indirect effect, IE, has to consider another theoretical epidemic in the absence of vaccination. The final size, z c , in the absence of vaccination satisfies

where β u is assumed to be smaller than β v for the leaky assumption, and is assumed identical to that among vaccinated (= β in (66) and (67)) for the all-or-nothing assumption. No explicit analytical solution can be obtained from (69) , but this can be iteratively solved (and it should be noted that β u N /γ in the right hand side is defined as the basic reproduction number, R 0 ). Subsequently, IE is estimated from (52) . It should be noted that even when the transmission rate among vaccinated is identical to that among unvaccinated (i.e. all-or-nothing vaccine), IE is always positive due to dependent happening (as long as we ignore demographic stochasticity which could yield negative IE by chance). Conditional assessment for causal inference (which is aimed at an appropriate estimation of vaccine efficacy) will be further elaborated in the next section. Further technical details on the relevant modeling exercises can be found elsewhere [47] .

To be strict, it should be noted that the above mentioned definitions of (mainly direct) effectiveness are flawed. Especially, DE is not precise as it contains indirect effect in its definition (because the above mentioned arguments consider only indirect effects on unvaccinated individuals). More appropriate definitions should take into account the indirect effects on both vaccinated and unvaccinated individuals, which yields three different definitions of IE, i.e., among vaccinated, unvaccinated and the entire population, and two different definitions of DE, i.e., among vaccinated and the entire population. Theoretical foundations on this matter have been developed by Haber [46] and Becker [13] .

Because of the dependent happenings, quantitative modeling of infectious diseases has an important role in appropriately predicting the likely population effectiveness of a single intervention, yet mathematically separating the population effectiveness from the individual effect (i.e. efficacy). In other words, we should always remember that the need to assess causal effect or to simulate population effectiveness arises from this complicated principle of infectious diseases. When it comes to the causal inference, it is frequently the case that researchers have to clarify the average causal effect using observed data and clearly (and possibly analytically) bridge between an estimate at a population level and that at an individual level. This point is relevant to the estimation of vaccine efficacy from observed data in field epidemiology [50] .

In this section, we discuss a method to address dependent happening using a conditional epidemiologic measurement. The method utilizes the household secondary attack rate (SAR) [49] , which has been traditionally regarded as a measure of infectiousness [21] . We first show that the use of SAR can separately estimate the reductions in susceptibility and infectiousness among vaccinated individuals compared to unvaccinated individuals, and then prove that the combined effect directly and equally contribute to the reduction in the reproduction number.

To address the dependence, recall causal effects under stability assumption T and A in subsection 4.1. Since T = A in (48) , we have to consider alternative strategies for inference. One of the simplest methods to resolve dependent happening is to employ conditional direct causal effect for examining the effect of a preventive measure (e.g. vaccination) on susceptibility which is conditioned on a specified exposure to infection [54] . That is, let K denote the exposure to infection where K = + represents positive exposure to infection and K = − represents no exposure to infection. We condition the expected values of potential outcomes among vaccinated and unvaccinated on K ; i.e., let E(X 1 | K = +) and E(X 0 | K = +) be the expected outcomes in the population, respectively, if everyone were vaccinated and exposed to infection, and if everyone were unvaccinated and exposed to infection, the average conditional causal effect of the vaccine in the population compared to that without vaccination, T conditioned , is

As we discussed with Equation (47), we have to rewrite (70) to reflect observation (in real world scenarios) where only a portion of the population is vaccinated. In the presence of an intervention, exposure to infection is influenced by the treatment assignment (e.g. due to epidemiologic study design or irregular distribution of vaccination in the population), and thus, to be more precise, we write the exposure K as a function of assignments Y , i.e., K (Y). Using this Y denoting the particular intervention that an individual used, Equation (70) can be rewritten as

That is, causal effect of the vaccination (i.e. which leads to an estimator of vaccine efficacy) can be defined by conditioning the outcome on exposure to infection [54] , which would be extremely useful to fill in the gap between individual and population effects. Furthermore, the average conditional indirect effect, IE conditioned , can also be defined in a similar way:

where K (Y = 0 | +) and K (Y = 1 | +) are, respectively, the exposure to an unvaccinated infectious individual and to a vaccinated infectious individual. In other words, the Equation (72) measures the reduction in infectiousness among vaccinated cases compared to unvaccinated cases. In observation, this conditional measurement can be achieved, in the simplest manner, using the household secondary attack rate. The secondary attack rate, SAR, is the probability that infection occurs among susceptible individuals following a known contact with an infected person (or another infectious source) [49] . In other words, the SAR is conditional on the contact between an infectious source and a susceptible host (it should be noted that the term with rate is a misnomer, because this is actually a proportion). Thus, we write SAR = number of individuals exposed who developed disease total number of susceptible exposed individuals (73) When estimating SAR from epidemiologically observed data, we have to account for the correlation of susceptibles exposed to the same infectious source in order to appropriately quantify SAR. The ratio of two SARs would be extremely useful to estimate the relative infectiousness and susceptibility of two types of populations [48] . Suppose that SAR ij denotes the household secondary attack rate where i and j, respectively, give the previous vaccination histories of the secondary and primary case (i.e. i or j = 1 represents previously vaccinated, whereas i or j = 0 represents unvaccinated individuals). Vaccine efficacy for susceptibility, VE S , and infectiousness, VE I , can be estimated using the following ratios:

Moreover, we also get

which is interpreted as a combined effect of susceptibility and infectiousness and can be thought of as the naive susceptible equivalent of a vaccinated compared to an unvaccinated individual [49] . We consider the following household transmission data of smallpox, which were observed in India [76, 85] :

The household SARs caused by unvaccinated primary cases among unvaccinated and vaccinated contacts were estimated to be SAR 00 = 40/650 = 0.0615 and SAR 10 = 11/583 = 0.0189, respectively. Those caused by vaccinated primary cases among unvaccinated and vaccinated household contacts were SAR 01 = 10/499 = 0.0200 and SAR 11 = 2/421 = 0.0048, respectively. The crude efficacy of vaccine in reducing susceptibility VE S , infectiousness VE I , and a combined effect of both VE T is then estimated by

If we make the simplifying assumption that the biological effect of vaccination was identical for all vaccinated individuals, vaccination reduced susceptibility by 69.3%, infectiousness by 67.4%, and the combined effect was 92.3%.

Limiting our interest to the household transmission data (or conditioning observation on those with household contact which would not be too different by individual), and stratifying the vaccination histories of both primary and secondary cases, we can appropriately estimate not only the reduction in susceptibility but also that in infectiousness among vaccinated individuals [55] . This method is useful not only for assessing vaccine efficacy but also for estimating other treatment effect at an individual level such as epidemiologic effects of antiviral agents against influenza transmission [52] .

In this way, although comparison of two groups (i.e. with and without intervention) have been simply assessed by popualtion data for non-communicable diseases (as long as their frequencies of exposures are identical), dependent happening in communicable diseases confuses the interpretation of the population effectiveness. The confusion is caused by indirect effect. To address this issue in infectious disease epidemiology and attribute observation at a population level to an average causal effect at an individual level, conditional measurement can be deemed extremely useful to appropriately analyze epidemiologic datasets.

In relation to the conditional measurement in households, we lastly consider the impact of different effects of vaccination (e.g. reductions in susceptibility and infectiousness) on the transmission dynamics using SIR model. Specifically, we consider a vaccine which elicits both all-or-nothing and leaky effects. The following model simplifies the previously published exercise by Simon and Koopman [92] . As we have done with Equations (57), (58), (59), (60), (61), (62), let S v , I v , S u and I u be the numbers of vaccinated susceptible and infectious individuals and of unvaccinated susceptible and infectious individuals, respectively. Rather than investigating an epidemic which ignores background demographic dynamics, here we consider the system with constant per capita birth rate, μ, which is assumed equivalent to the natural mortality rate. Vaccination is assumed to take place at birth with the coverage p. Because of all-or-nothing effect, the fraction pα of newborns becomes permanently immune, and the remaining fraction p(1 − α) is susceptible. Since we assume that the population sizes of both vaccinated and unvaccinated individuals are constant over time, we ignore the recovered individuals, R v and R u , for simplicity. We also assume that the recovery rate γ is independent of vaccination, because duration effect is seldom reported [76, 91] and moreover, such observations tend to be limited to the symptomatic period (not infectious period). Then, the four equations of the system, representing the transmission dynamics in a randomly mixing population, are

where β is the transmission rate which is assumed identical among vaccinated and unvaccinated individuals. However, due to leaky effect, susceptibility of vaccinated individuals is reduced by a factor λ S and infectiousness of vaccinated cases is reduced by λ I , both of which are assumed to lie in the range of 0 ≤ λ S , λ I ≤ 1. If α = 0, it should be noted that λ S and λ I , respectively, correspond to 1 − VE S and 1 − VE I in the last subsection, both of which are also referred to as transmission probability ratio [48] . We combine Equations (81) and (83) to explore λ I I v + I u , i.e.,

Replacing λ I I v + I u by I c (where the subscript c is intended to represent combined), Equation (84) is simplified as

where R 0 = β N /(γ + μ) and N is the total population size (here, N = N v + N u under vaccination). Since we know that

if

or

the parenthesized term in the right hand side of Equation (85) will always be negative. Then, I c (t) → 0 as t → ∞, and every solution of system (80) , (81) , (82) , (83) coverges to the disease-free equilibrium

when the condition (88) holds. In other words, to achieve eradication of a disease in question, the vaccination coverage p should satisfy

Nevertheless, if

the solutions of the system (80), (81), (82) , (83) move away from (89), indicating that the disease-free equilibrium is unstable. If everyone is vaccinated so that p = 1, the threshold condition (90) to eradicate the disease is

From Equations (90) and (92), we clearly see that the different vaccine effects (i.e. all-or-nothing effect and leaky effects in reducing susceptibility and infectiousness) act mathematically in the same way to reduce the critical vaccination fraction. Moreover, if α = 0, it should be noted that λ S λ I in (90) is equivalent to 1 − VE T in (76) , which makes (90) idential to (49) . From (90), we observe that the more potent the vaccine (i.e., the smaller is λ S λ I (1 − α)), the smaller the vaccination coverage p needs to be to achieve the herd immunity threshold. The use of I c is what we call the Lyapounov function approach, more detailed exercise on this matter (with other types of biological effects of vaccination) can be found elsewhere [92] . This kind of expectation (of different types of vaccine efficacy) arose from field trials of vaccination against HIV/AIDS and malaria [67, 68] , both of which have yet to be developed. As we have seen, it has been very striking that the different biological effects would work as the product to contribute to lower R 0 . Nevertheless, it has to be remembered that this equality of vaccine effects does not hold for non-randomly mixing population. In heterogeneously mixing populations, no single vaccination fraction can define the threshold level of vaccination. Therefore, the challenging issue is that the eradition threshold in such a population will be achieved by vaccinating different subgroups at different levels. In any case, to quantitatively address the issue of estimating different types of vaccine efficacy, the use of household data would be recommended, because household outbreaks contain some information about the possible source of infection and the data reasonably permit assuming homogeneous mixing within the household [12] .

In this chapter, we discussed two critical issues which have to be remembered whenever researchers analyze observed data of infectious diseases. First, since many unobservable events would always be the source of uncertainty in all mathematical models of the transmission dynamics, observable statistical distributions must be effectively employed to translate observables to unobservables. For this reason, the incubation period and serial interval are deemed critically important epidemiologic measurements, if symptom onset is observable for a disease in question. Therefore, it is essential to make sure that systematically collected data are aggregated and stored for posterity in order to appropriately discuss the dynamics of infectious diseases using observed data. Second, transmission probabilities (or other conditional epidemiologic measurements) per exposure to infection should be effectively employed to address the dependence between individuals, as long as we deal with directly transmitted infectious diseases (i.e. communicable diseases). Although our example of dependent happening was focused on an assessment of vaccine efficacy, the readers are advised to remember (to gain some sense of quantitative modeling) that the need for mathematical modeling in all practical settings arises due to this complicated issue. Rather than numerical computations of complicated models with lots of unsupported assumptions, it is often the case that an analytical approach to conveniently address this issue or useful dataset which is conditioned on infection event may work better to answer to key questions in the field of medical epidemiology and public health.

One important future work still remains with respect to the issue of observability. Although our framework permits estimating various unobservable epidemiologic variables (e.g. the generation time), statistical and biological validity to assume a specific distribution has yet to be clarified. For this reason, it is necessary to understand the detailed natural history of an infection, especially, as to what is happening within infected host. Symptom onset is not only determined by pathogen dynamics within host but also regulated by complex immune responses [82] . For example, an explicit reason why lognormal distribution fits well with the incubation period of diseases with acute course of illness, and the similar reason for assuming Weibull distribution for the incubation period of AIDS, have yet to be offered.

Since this chapter was intended to summarize the issue of dependent happening in a rudimentary fashion, and because of the space limitation, we did not discuss the details of heterogeneously mixing populations on this matter. Whereas various types of effectiveness of vaccination during an epidemic were defined using final size equations, final size would be greatly confused by heterogeneous contact patterns. For example, even when we consider household transmission, dependence between households must be addressed using an appropriate mathematical approach [9] . Although a mathematical foundation of household transmission has been developed and well-formulated [8] , a quantitative method to effectively utilize the model (e.g. to derive an estimator) has yet to be offered. Given that the final size is always confused by contact heterogeneity, observational approaches to conditionalize key epidemiologic measurements on exposure to infection would play a crucial role in many epidemiological and statistical studies.

Much work remains to be carried out for powerful general analyses to give insights into the transmission dynamics of communicable diseases using observed data.

Populations and infectious diseases: Ecology or epidemiology?

Directly transmitted infectious diseases: Control by vaccination

Vaccination and herd immunity to infectious diseases

Immunisation and herd immunity

Infectious Diseases of Humans: Dynamics and Control

A preliminary study of the transmission dynamics of the human immunodeficiency virus (HIV), the causative agent of AIDS

The Mathematical Theory of Infectious Diseases and Its Applications

Optimal vaccination schemes for epidemics among a population of households, with application to variola minor in Brazil

Epidemics with two levels of mixing

Part 5: Data analysis: Estimation and prediction. Statistical challenges of epidemic data

Uses of the EM algorithm in the analysis of data on HIV/AIDS and other infectious diseases

Estimating vaccine effects on transmission of infection from household outbreak data

Estimating vaccine effects from studies of outbreaks in household pairs

A method of non-parametric back-projection and its application to AIDS data

A clarification of transmission terms in host-microparasite models: Numbers, densities and areas

Incubation period of infectious diseases

Minimum size of the acquired immunodeficiency syndrome (AIDS) epidemic in the United States

A method for obtaining short-term projections and lower bounds on the size of the AIDS epidemic

AIDS Epidemiology: A Quantitative Approach (Monographs in Epidemiology and Biostatistics)

A hypothesis test for the end of a common source outbreak

The Sources and Modes of Infection

Risk behavior-based model of the cubic growth of acquired immunodeficiency syndrome in the United States

Proceedings of the Conference on Quantitative Methods for Studying AIDS

Predicting the CJD epidemic in humans

Planning of Experiments

Predictions of the AIDS epidemic in the U.K.: The use of the back projection method

How does transmission of infection depend on population size

Mathematical Epidemiology of Infectious Diseases: Model Building, Analysis and Interpretation

On the definition and the computation of the basic reproduction ratio R 0 in models for infectious diseases in heterogeneous populations

The estimation of the basic reproduction number for infectious diseases

Extending backcalculation to analyse BSE data

Studies on the virus content of mouth washings in the acute phase of smallpox

On the true rate of natural increase

Case isolation and contact tracing can prevent the spread of smallpox

Transmission potential of smallpox: Estimates based on detailed data from an outbreak

SARS incubation and quarantine times: When is an exposed individual known to be disease free?

On vaccine efficacy and reproduction numbers

The pathogenesis of the acute exanthems. An interpretation based upon experimental investigation with mouse-pox (infectious ectromelia of mice)

The epidemiology of BSE in cattle herds in Great Britain. II. Model construction and analysis of transmission dynamics

Herd immunity: history, theory, practice

The interval between successive cases of an infectious disease

Factors that make an infectious disease outbreak controllable

Methods for projecting course of acquired immunodeficiency syndrome epidemic

An overview of relations among causal modelling methods

The statistics of anti-typhoid and anti-cholera inoculations, and the interpretation of such statistics in general

Estimation of the direct and indirect effects of vaccination

Measures of the effects of vaccination in a randomly mixing population

Concepts of infectious disease epidemiology

Secondary attack rate

Epidemiologic Methods for the Study of Infectious Diseases

Direct and indirect effects in vaccine field efficacy and effectiveness

Antiviral effects on influenza viral transmission and pathogenicity: Observations from household-based trials

Study designs for dependent happenings

Causal inference in infectious diseases

Study designs for evaluating different efficacy and effectiveness aspects of vaccines

The construction and analysis of epidemic trees with reference to the 2001 UK foot-and-mouth outbreak

Instruments for causal inference: An epidemiologist's dream?

The three-parameter lognormal distribution and Bayesian analysis of a point-source epidemic

The period of transmission in certain epidemic diseases: An observational method for its discovery

The state-reproduction number for a multistate class age structured epidemic system and its application to the asymptomatic transmission model

Herd immunity and herd effect: New insights and definitions

The ecological effects of individual exposures and nonlinear disease dynamics in populations

Assessing risk factors for transmission of infection

Transmission dynamics and control of severe acute respiratory syndrome

Superspreading and the effect of individual variation on disease emergence

Measuring vaccine efficacy for both susceptibility to infection and reduction in infectiousness for prophylactic HIV-1 vaccines

Optimal vaccine trial design when estimating vaccine efficacy for susceptibility and infectiousness from multiple populations

Infection dynamics on scale-free networks

How should pathogen transmission be modelled?

The growth of micro-organisms in vivo with particular reference to the relation between dose and latent period

Circulating cytokines as mediators of fever

Early efforts in modeling the incubation period of infectious diseases with an acute course of illness

Incubation period as a clinical predictor of botulism: analysis of previous izushi-borne outbreaks in Hokkaido

Lessons from previous predictions of HIV/AIDS in the United States and Japan: epidemiologic models and policy formulation

Extracting key information from historical data to quantify the transmission dynamics of smallpox

The earliest notes on the reproduction number in relation to herd immunity: Theophil Lotz and smallpox vaccination

Infectiousness of smallpox relative to disease age: estimates based on transmission network and incubation period

Estimates of short and long incubation periods of Plasmodium vivax malaria in the Republic of Korea

Transmission potential of primary pneumonic plague: time inhomogeneous evaluation based on historical documents of the transmission network

The use of mathematical models in the epidemiological study of infectious diseases and in the design of mass immunization programmes

Virus Dynamics: Mathematical Principles of Immunology and Virology

Assessing vaccine efficacy in the field. Further observations

Epidemiology in Country Practice

Epidemiological studies in smallpox. A study of intrafamilial transmission in a series of 254 infected families. Indian

Model-consistent estimation of the basic reproduction number from the incidence of an emerging infection

Uncertainty in estimates of HIV prevalence derived by backcalculation

Comment: Neyman (1923) and causal inference in experiments and observational studies

Virus excretion in smallpox. 1. Excretion in the throat, urine, and conjunctiva of patients

The distribution of incubation periods of infectious diseases

Evidence of the partial effects of inactivated Japanese encephalitis vaccination: analysis of previous outbreaks in Japan from 1953 to 1960

Infection transmission dynamics and vaccination program effectiveness as a function of vaccine effects in individuals

Factors in the transmission of virus infections from animal to man

A note on generation time in epidemic models

Maximum likelihood estimation of date of infection in an outbreak of diarrhea due to contaminated foods assuming lognormal distribution for the incubation period

How generation intervals shape the relationship between growth rates and reproductive numbers

Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures

Acknowledgments The work of HN was supported by The Netherlands Organisation for Scientific Research (NWO).