key: cord-0425883-kn8ysra0 authors: Zhang, Y.; Britton, T.; Zhou, X. title: Monitoring real-time transmission heterogeneity from Incidence data date: 2022-04-16 journal: nan DOI: 10.1101/2022.04.07.22273591 sha: 71f73abb41d7b43768e282081e74ee0ec5fd0dc2 doc_id: 425883 cord_uid: kn8ysra0 The transmission heterogeneity of an epidemic is associated with a complex mixture of host, pathogen and environmental factors. And it may indicate superspreading events to reduce the efficiency of population-level control measures and to sustain the epidemic over a larger scale and a longer duration. Methods have been proposed to identify significant transmission heterogeneity in historic epidemics based on several data sources, such as contact history, viral genomes and spatial information, which is sophisticated and may not be available, and more importantly ignore the temporal trend of transmission heterogeneity. Here we attempted to establish a convenient method to estimate real-time heterogeneity over an epidemic. Within the branching process framework, we introduced an instant-individualheterogenous infectiousness model to jointly characterized the variation in infectiousness both between individuals and among different times. With this model, we could simultaneously estimate the transmission heterogeneity and the reproduction number from incidence time series. We validated the model with both simulated data and five historic epidemics. Our estimates of the overall and real-time heterogeneities of the five epidemics were consistent with those presented in the literature. Additionally, our model is robust to the ubiquitous bias of under-reporting and misspecification of serial interval. By analyzing the recent data from South Africa, we found evidences that the Omicron might be of more significant transmission heterogeneity than the Delta. Our model based on incidence data was proved to be reliable in estimating the real-time transmission heterogeneity. The transmission of infectious disease is typically uneven or heterogeneous in terms of 2 time and space due to a complex mixture of host, pathogen and environmental 3 factors [1] [2] [3] [4] [5] [6] . High level of transmission heterogeneity may indicate superspreading 4 events (SSEs) in which certain individuals infect a greater large number of secondary 5 cases than average [1] , invoking the so-called 20-80 rule. It has been documented that 6 the SSEs considerably reduced the efficiency of population-level control measures [1] 7 and played a key role in dramatically driving the spread of many pathogens in scale and [6, 9, 10] . Therefore, 10 monitoring the degree of transmission heterogeneity and its change could be vital for 11 epidemic forecasting and efficient intervention in infectious disease epidemiology. 12 Mathematically, the transmission heterogeneity is represented by the variation in 13 offspring distribution, namely, the distribution of secondary cases that may be 14 generated by a given infectious case . Classical methods of estimating heterogeneity rely 15 heavily on reconstructing the offspring distribution. As the epidemiological links among 16 reported cases are complex, this reconstruction poses considerable challenges in both 17 data collection and model building. According to different types of data used in the 18 reconstruction, the existing methods of inferring heterogeneity can be grouped into 19 three categories. The first category are methods based on contact-tracing-data. By 20 interviewing patients to document their contacts with other infected patients, all or 21 most of the cases could be positioned in the network of transmission, and the resulting 22 empirical offspring distribution could be directly used to estimate the transmission 23 heterogeneity [1, 4, 10, 11] . 24 The second category is based on virus-sequence-data. For many pathogens, in 25 particular RNA viruses, evolutionary processes occur on the same timescale as 26 epidemiological processes, which makes it possible to extract epidemiological 27 information from genetic analysis [12, 13] . Many studies showed that the virus 28 phylogeny reconstructed from the virus sequence sampled from the infected individuals 29 reflected the underlying transmission history of the epidemic, with the branching events 30 in a phylogeny corresponding to transmission events in the past. By incorporating the 31 level of heterogeneity into the likelihood function of the virus phylogeny, it is possible to 32 estimate the heterogeneity as well as other epidemiological parameters from the sampled 33 sequence data [2, 14, 15] . 34 For the third category, individual-level spatial information has been integrated to 35 reconstruct the transmission history in recent years. By developing a continuous-time 36 spatiotemporal transmission model with a distance-based kernel to characterized the 37 infectiousness between individuals as a function of the mutual distance, it is possible to 38 infer explicitly the mean offspring distribution of each case and hence to infer the 39 transmission heterogeneity and other epidemiological parameters [3, 9, 16] . 40 Although considerable progress has been made for analyzing heterogeneity, these 41 methods also showed some theoretical and practical limitations. Firstly, all these 42 methods required context-specific information which could be hard to obtain and/or 43 could be erroneous. For example, the contact tracing in epidemiological investigation 44 may be time-consuming and subjective [17] and has to be limited to a certain number of 45 infected cases. In viral genetic analysis, the commonly used correspondence between the 46 reconstructed viral phylogeny and the transmission history may be biased if there are 47 within-host evolution and recombination in viral genomes [18] . When incorporating the 48 spatial information, the model simply assumes that transmission occurred mostly within 49 close residence because of the lack of detailed individual movement data, which is only 50 appropriate under certain control measures [3, 9] . 51 In addition, most of existing studies assumed a constant level of heterogeneity for an 52 epidemic under study, which may in fact grow and/or decline through the epidemic. This simplification would bring some computational benefit but failed to characterize 54 the temporal change of heterogeneity over the epidemic. Although Lau et al [3, 9] 55 compared the degree of heterogeneity in different periods of an outbreak (i.e., before 56 and after deploying the control measures), it could still be hard to reflect the real-time 57 development of the epidemic and consequently lead to inadequacy in epidemic control to 58 a certain extent. , Monitoring real-time transmission dynamics from incidence data has drawn a lot of 60 research efforts. Several tools for the estimating of real-time reproduction number based 61 on incidence data had been developed with successful applications [19] [20] [21] , but the 62 study on real-time transmission heterogeneity is so far rather limited. In some recent 63 studies, researchers suggested the relationship between the transmission heterogeneity 64 and the incidence over an epidemic [22] [23] [24] [25] , but none have attempted to accurately 65 delineate the heterogeneity with incidence data and to compare with those records in 66 literatures. In this study, we attempted to develop a simple method to estimate the 67 transmission heterogeneity on the basis of incidence data. Specifically, we extended the 68 homogeneous transmission model in [19, 20] to allow for the variation of infectiousness at 69 different times and among different people, and consequently generated real-time 70 estimates of transmission heterogeneity and reproduction number simultaneously. Moreover, we evaluated this model with both simulated data and historic epidemic data, 72 which turned out to be consistent with that of those involving contact-tracing or spatial 73 data. Our model performed robust even in the presence of measurement errors such as 74 under-reporting or misspecification of serial interval. We further explored the 75 transmission heterogeneity of the new SARS-CoV-2 variant Omicron based on the 76 incidence time series from South Africa. Renewal process model of transmission 79 We considered an outbreak observed regularly (in days, weeks or months) over the time 80 period 1 ≤ t ≤ T . Let I t be the incidence or number of newly infected cases at time t, 81 and the epidemic curve till time t is denoted asĪ t 1 = {I 1 , I 2 , · · · , I t }. For simplicity, we 82 excluded the possibility of imported case during the study period. However, this 83 restriction could be relaxed by discriminating the effect on newly infections of 84 local/imported cases as in [20] . 85 We adopted the renewal process to model the transmission of the infectious disease. 86 Under the standard renewal process model [19] , the newly infected at time t (i.e., I t ) is 87 generated by all the infectious individuals who had been infected before time t 88 according to a Poisson relation as: [1] , so the number of secondary cases caused by a particular case 105 (i.e., offspring distribution) in the given context is Pois(v i s,t ). In addition, we adopted 106 the assumption that the offspring distributions of different cases were independent, so 107 the incidence I t is the sum of these Poisson-distributed variables. In other words, I t is 108 The concept of IIRN provides a new tool to explore the variation of infectiousness Another common method of allowing for transmission heterogeneity is an instant-level heterogeneity model [22, 25] . This model extended the standard model (1) by replacing the instantaneous reproduction number R t with an instant-related random variable for all the infected cases, that is, where Γ(·, ·) stands for Gamma distribution in the shape-rate parameterizations. Therefore, the composite rate under this model is v t = s≤t,i v i t,s = Λ t η t ∼ Γ(k t , kt ΛtRt ). And the incidence I t is Negative Binomial distribution as (NegB indicating Negative Binomial distribution): This model accounted for the variation in infectiousness at different times, which 120 could be useful in epidemic forecasting in the long term [22, 25] . But this model 121 overlooked the variation in infectiousness of different infectious individuals, and hence 122 failed to identify the exact degree of heterogeneity from incidence data (showed in Recently, Johnson et al [27] proposed an individual-level heterogeneity model to 125 characterize transmission heterogeneity within the renewal process framework. The 126 authors assumed random infectiousness for each infected individual at the time of being 127 infected (e.g. at time s), so its infectiousness in later time steps could be calculated as 128 With this model, the composite rate of newly infection at time t is v t = s≤t,i v i t,s = s w t−s Θ s , where Θ s = i η i s ∼ Γ(k s * I s , ks Rs ). Θ s was referred to as the disease momentum [27] , representing the total infectiousness of all the cases infected at time s. As the weighted summary of Gamma variables is not Gamma distributed, the incidence I t can only be approximated by Although the individual level transmission heterogeneity has been characterized in showed unstable estimation results of transmission dynamics [27] . [1, 28] . Therefore the reproduction number is 139 specific to time and individual. Here we assumed v i s,t to be a random variable, and its 140 values are drawn independently, for each individual i and each instant t, from a Gamma 141 distribution with mean of w t−s R t and the rate of kt Rt , that is, Under this random IIRN assumption, heterogeneous transmission stems from the 143 variation in reproduction numbers of different individuals and at different times. And 144 superspreading events were likely triggered by those important realizations from the 145 right-hand tail of the distribution of IIRN, which indicated a random mixture of host, 146 pathogen and environmental factors of assisting the rapid transmission of disease [28] . 147 The parameter k t in (2), referred to as (instantaneous) dispersion number, was introduced to control the transmission heterogeneity. Similar to the explanation of instantaneous reproduction number R t in [26], the instantaneous dispersion number k t also controls the variation in the offspring distribution of a random infected case. Suppose the transmission dyanmics remains the same (i.e., the R t and k t keep constant) during the infectious time of the i-th case, its individual reproduction number over the whole infectious period is the sum of independent IIRNs over all infectious instants, that . As a consequence of this Gamma-Poisson mixture, the offspring distribution of the particular case is Negative Binomial distribution as with the mean of µ = E(I i s ) = R t and variance σ 2 = R t (1 + R t /k t ). The offspring 148 distribution was identical to the standard model of transmission heterogeneity in [1] . April 7, 2022 5/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.07.22273591 doi: medRxiv preprint Obviously, the dispersion number k t is an empirical measure of degree-of-transmission 150 heterogeneity, with smaller k t indicates higher variance in offspring distribution (i.e., 151 higher level of heterogeneity). When k t decreases both the likelihood of super-and that 152 of sub-spreading events increase [22] . Traditionally, it is regarded as significant 153 transmission heterogeneity when k t gets smaller than 1 [1] . 154 Based on the random IIRN assumption, the total effect of all the infected cases on the newly infection at time t was the sum of their independent IIRNs, that is, This incidence model is referred to as the Instant-individual heterogeneity model. heterogeneity over a time period [t − τ + 1, t], measured by R t,τ and k t,τ [19] . With this 164 assumption, the likelihood of the incidence I t−τ +1 , · · · , I t given the transmission 165 dynamics ({R t,τ , k t,τ }) and conditioned on the previous incidences Rt,τ +kt,τ ) Λskt,τ , (4) On the basis of this joint likelihood function of both reproduction number and 167 dispersion number, it is possible to infer the real-time transmission heterogeneity from 168 the incidence data, which gives a more complete view of the characteristics of disease 169 spreading. In particular, the maximum likelihood estimation of the reproduction 170 number with this new likelihood function is given byR t,τ = t s=t−τ +1 Is t s=t−τ +1 Λs , which 171 coincides with that of the homogeneous model [19, 36] . This property guarantes that the 172 estimation of reproduction number with our model is robust to the bias of constant 173 under-reporting rate (shown in Results). It is also possible to derive the posterior 174 distribution of R t and k t by using a Bayesian framework. Simulation of Incidence time series 176 We applied the Instant-individual heterogeneity (IIH) model to simulated datasets to 177 test its accuracy under various levels of transmission heterogeneity and reproduction 178 number. Each simulation began with 10 infected index cases, and stopped after 24 days. 179 We assumed constant reproduction number R and dispersion number k, and simulated 180 the newly infection according to the likelihood of the incidence in (3). gamma distribution with mean of 5.2 days and the standard deviation of 1.72 days as in 185 the COVID-19 [29] 186 We chose the incidence data from the last time window to perform estimation. We 187 assumed non-informative priors of uniform distribution over [10 −6 ,100] and [0.1,10] for 188 the reproduction number and the dispersion number respectively. Both the maximum a 189 posteriori (MAP) estimation and the 95% highest posterior density (HPD) interval of 190 reproduction number and dispersion number were generated. The simulation was repeated 100 times under each condition. Three criteria were used to evaluate the accuracy of the estimation. Firstly, the relative root mean squared errors (RMSEs) were calculated for the estimation of R and k respectively. The relative RMSE was defined as: where θ is the true value of parameter, andθ i is the estimation of parameter based on 192 the i-th simulation. n stands for the number of simulations 193 Secondly, the coverage of the 95% HPD of reproduction number R was calculated. Thirdly, the probability of correctly identifying heterogeneity, namely the proportion of 195 simulations where both the true dispersion number k and its estimate were larger or 196 smaller than 1, was calculated for the estimation of k. Analyzing real epidemic data 198 We also applied the instant-individual heterogeneity model to disease incidence time 199 series from five past outbreaks where the levels of heterogeneity were estimated on the 200 basis of contact tracing data or individual level spatial information. The commonly used 201 transmission heterogeneity model in [22] (referred to as the instant-level heterogeneity 202 model) was also used to analyze these incidence time series under the same setting for 203 comparison. 204 We retrieved the epidemic curves, as well as the mean and standard deviation of the 205 serial intervals of these epidemics from the literature (Table 1) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; real-time estimation of k t and R t on the basis of the incidence data. We set the window 229 length as 7 time-steps (i.e., days or weeks depending on the frequency of incidence data 230 collection) for all these analyses, which was recommended in [20] when monitoring the 231 temporal trend of reproduction number. In addition, we also explored the transmission heterogeneity of the variant of sensitivity analysis on the basis of the epidemic data of Ebola, Sierra Leone [33] . Firstly, 251 we explore the effect of underreporting on our analysis by testing 4 reporting rates (i.e., 252 ρ =0.8, 0.6, 0.4, 0.2). With each rate, we generated synthetic incidence time series in 253 the Ebola epidemic by increasing the recorded incidence data proportionally. 254 Secondly, we tested the errors in the serial interval by analyzing the Ebola epidemic 255 data with biased serial interval distribution. We performed estimation with three values 256 of bias for the mean (i.e., -7 days, 7 days, and 14 days) and three biases for std (i.e., -3.5 257 days, 3.5 days and 7 days) respectively. April 7, 2022 8/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022 With the simulated data, our model could accurately estimate the overall dispersion 261 number and the reproduction number providing sufficient data ( Figure 1 ) . As the 262 window length increased, the relative RMSEs of these two estimates k and R showed a 263 decreasing trend under all simulation settings. Also, the probability of identification of 264 k and the coverage of 95% HPD of R increased with the window length. condition. In addition, as to the estimation of R, the relative RMSE decreases and the 274 coverage of 95% HPD increases when the true k increaseed, suggesting that the estimate 275 of R is more accurate for the homogeneous situation. April 7, 2022 9/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. When analyzing the incidence data with the instant-level heterogeneity model [22] , 286 the estimates of k were 2.50 (95% HPD: 1.48 3.30), 2.23 (95% HPD: 0.69 5.39), and 1.60 287 (95% HPD: 0.68 3.14) for these three epidemics respectively, which exceeded the 288 threshold value of 1 and hence failed to recognize significant transmission heterogeneity 289 in these outbreaks. As to the estimation of reproduction number R, both the IIH model and the 291 instant-level model gave consistent estimates with previous studies (Figure 2 B) , while 292 the estimates of the IIH model were closer to those estimates from the contact tracing 293 data. is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; reproduction number (R t ) over an epidemic. Firstly, we analyzed the weekly incidence 299 of probable and confirmed cases of Ebola between August 4th, 2014, and March 29th, 300 2015, in the capital Freetown of Sierra Leone. By setting the reference time of 301 2014-11-01 as in [16] , the whole duration was divided into 5 periods (P1 P5, Figure 3 ). 302 (Figure 3 C) . This temporal trend of 306 k t was consistent with previous study based on individual level spatial information, 307 suggesting the transmission heterogeneity were becoming more significant as the 308 epidemic went on and might be crucial to driving the spreading of Ebola disease in the 309 study area [16] . In contrast, the instant-level model generated much higher estimate of 310 dispersion number k t which remained above 1, suggesting it failed to reveal the 311 significant transmission heterogeneity during this outbreak (Figure 3 C) . 312 We also noted that both the IIH model and the instant-level model gave similar 313 estimation of the real-time reproduction number, which showed a declining trend in 314 most part of the period, and was below 1 since the middle of the fourth period (Figure 3 315 April 7, 2022 11/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.07.22273591 doi: medRxiv preprint B). 316 Secondly, we validated the IIH model with the COVID-19 incidence data, between 317 March 1, 2020 and May 3, 2020, in five counties of Georgia state, USA (Figure 4) . The 318 estimated real-time dispersion number (k t ) in all the five counties declined from the 319 level of above or closer to 1 during period 1 (before Apr 03) to the level of closer to 0.1 320 in period 3 after Apr 17 ( Figure 4C ), suggesting significant transmission heterogeneity 321 of COVID-19 in all these counties [9] . Notably, the transmission heterogeneity became 322 mostly significant in the rural area (Dougherty) with the estimated k t reached the 323 lowest level of around 0.01 in the second period, which was consistent with the 324 documented superspreading event in this county [40] . In contrast, the instant-level 325 model, generated the real-time estimation of k t being above 1, which failed to identify 326 the significant transmission heterogeneity in all these counties. Transmission dynamics (i.e., reproduction number R and dispersion number k) were assumed constant over a window of 7 days, and the estimates were obtained by analyzing the incidence data of the time window. Solid lines show the mean estimates from two methods, i.e., red curves and blue curves represent the estimation from the instant-individual heterogeneity model (IIH) and the instant-level heterogeneity (ILH) model respectively. The shaded areas show the 95% high probability density (HPD) intervals. As in [9], the reference time was set as April 3rd, 2021 when the shelter-in-place order was announced. The whole study period was divided into three periods, i.e., before April 3rd, between April 3rd and April 17th, after April 17th. A: Incidence data of the confirmed and probable cases; B: Estimation of reproduction number (R t ); C: Estimation of dispersion number (k t ). The IIH model and the instant-level model gave similar estimation of reproduction 328 number R t ( Figure 4B ). We found that the reproduction numbers in four countries (i.e., 329 except for Gwinnet) declined below 1 short after Apr-17 (i.e., 2 weeks after the 330 shelter-in-place order), suggesting the order was effective to reduce the transmission of 331 COVID-19. Similar to the findings in [9], our IIH model also indicated that the urban 332 area of Dougherty was the first country where R t declined below 1. April 7, 2022 12/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.07.22273591 doi: medRxiv preprint Sensitivity analysis 334 By analyzing the synthetic data with the IIH model, we found that as the real-time 335 dispersion number (k t ) decreased as the reporting rate decreased, suggesting that the 336 estimation of heterogeneity was conservative if there were a lot of missing cases. This 337 finding is consistent with [16] . Fortunately, this effect of reporting rate was not 338 considerable even when the reporting rate decreased to 0.4 (i.e., 60% cases were 339 missing), where the estimation of k t was still covered by the 95% HPD obtained under 340 the 100% reporting rate ( Figure 5B) . Also, the temporal trends of k t estimated under 341 different reporting rates were similar, suggesting the surveillance of the temporal trend 342 of the heterogeneity with the IIH model was robust to the bias of underreporting. In addition, we found that the estimation of R t with the IIH model was unaffected 344 by the reporting rate ( Figure 5A ). The underlying reason is that the maximum 345 likelihood of R t under our model is identical to that of the homogeneous transmission 346 model [19] , the estimation of R t was robust to missing cases providing the fraction of 347 cases observed is time-independent through the epidemic. It has been reported that the misspecification of the serial interval (or generation 349 interval) is a large potential source of bias when estimating reproduction number from 350 observed incidence data [36] . However, we found that estimation of the dispersion 351 number k t was robust to the biases either in the mean or in the std of the serial interval 352 April 7, 2022 13/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.07.22273591 doi: medRxiv preprint ( Figure 5 D and F) . The effects were small and were covered by the 95% HPDs under 353 the true values. As in [36] , the estimation of R t showed more visible changes than k t because of the 355 biases in serial interval ( Figure 5 C and E) . Generally, shorter serial interval (either 356 because of change in mean or of change in std) may lead to lower estimate R t when the 357 true value is high and higher estimate R t when the true value low. and 9.27 * 10 −4 (95% HPD: 6.03 * 10 −4 ,1.27 * 10 −3 ) respectively. Notably that the 374 overall dispersion number in the Omicron wave was lower than that in the Delta wave. 375 By setting the window size of 7 days, we got the real-time estimation of transmission 376 dynamics during these two periods. During the Omicron wave, the estimated 377 reproduction number R t reached the peak value of 2.15 on 2021-12-03 and then declined 378 to the level around 0.9 after 2021-12-15. The underlying reason for this decrease in R t 379 was the deploying of control measures by the South Africa government as indicated by 380 the government stringency index [41] . We also noted that the estimated dispersion During the Delta wave, however, we estimated reproduction number R t remained 384 around 1 during this period which was smaller than the amount in the early of Dec 2021. 385 In addition, the estimated dispersion number k t remained close to 10 −3 , which was 386 higher than the stable level in the end of Dec 2021. Therefore, the overall and real-time 387 estimation of transmission dynamics of these two period hint us that Omicron might 388 not only have higher transmissibility but also a greater potential for superspreading. In this study, we proposed a reliable, flexible and generic model to estimate real-time 391 heterogeneity using incidence time series. When it was applied to the epidemic of Ebola 392 in Sierra Leone and the epidemic of COVID-19 in the state of Georgia, USA, the series 393 of daily/weekly heterogeneities, according to its estimation, paralleled with the trends 394 reported by previous studies based on individual spatial data [3, 9] . heterogeneity rely heavily on sophisticated data to reconstruct the offspring distribution 407 and largely ignore the temporal change in heterogeneity. One existing model, which 408 involves instant-level heterogeneity [22, 25] , could only allow for part of the variation 409 and hence failed to reveal accurate real-time heterogeneity. As evidenced in our analysis 410 of the instant-level heterogeneity model, its estimation of transmission heterogeneity (in 411 terms of dispersion number k) of all the real epidemics remained above the threshold of 412 1, indicating no significant heterogeneity in these epidemics, which completely deviated 413 from the records in literature. Our model, however, addressed the heterogeneity with a 414 flexible and generic way to estimate the real-time heterogeneity on the basis of incidence 415 data, which is easy to implement and was proved reliable. The benefits of our model stem from the two theoretical advantages. Firstly, we is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.07.22273591 doi: medRxiv preprint Secondly, our model is easy to implement as it employs only incidence data. We 425 deduced the joint likelihood function of incidence data on both the reproduction number 426 (R t ) and transmission heterogeneity (k t ), which enabled us easily to monitor these 427 epidemiological parameters simultaneously. When comparing the precision of different methods, we found that our estimation 429 was less precise with broader credible intervals than the results based on contact-tracing 430 data for the two outbreaks (i.e., MERS in South Korea 2015 and COVID-19 in Tianjin 431 China, 2020) with smaller size (i.e., 100 200 cases). For the outbreak of COVID-19 in 432 HongKong with more than 1,000 cases, our estimation had better precision than the 433 result from contact-tracing data in terms of narrower credible interval. This might be 434 related with the sample size of the outbreak, and our model might be more applicable 435 to larger size epidemics. This merit of our model could allow for fast and timely epidemiological surveillance, 437 possibly even for the new SARS-CoV-2 variant of Omicron, which has been spreading 438 wildly across the world since its first detection in November 2021 in Gauteng Province, 439 South Africa. We estimated the heterogeneity (in terms of dispersion number k) was 440 k ≈ 3.43 * 10 −4 in December 2021 in South Africa, which was more significant than that 441 of the Delta wave (i.e., k ≈ 10 −3 ) [42] . The more significant heterogeneity of Omicron, 442 together with its higher reproduction number, might be able to explain its 443 unprecedentedly fast spreading. So far, little is known about the transmission 444 heterogeneity of Omicron, and the traditionally used data for heterogeneity analysis 445 including contact tracing data, viral sequence data and individual spatial-information 446 have not been fully available for the analysis of its transmission heterogeneity. Our 447 results also highlighted the need of taking more efficient measure of to reduce people 448 gathering and the possible superspreading events [28, 43] . 449 During the implementation of our model, the serial interval distribution is required 450 to approximate the infectiousness profile w s . This distribution information may not be 451 correctly obtained at the early stage of newly emerging infectious disease or may be 452 biased for some pathogens where infectiousness occurs before symptoms. Fortunately, 453 our model performed robust to the misspecification of serial interval (showed in results). 454 Additionally, we could also relieve this dependence by integrating detailed 455 epidemiological linkage data to estimate the serial interval separately [20] or extending 456 the inference framework to incorporating estimation of serial interval distribution and 457 transmission dynamics simultaneously as in [44] . 458 When interpreting the results, we regarded the transmission heterogeneity estimated 459 based on the incidence of confirmed cases accumulating over a time window till time t 460 as the result at that time. Since the confirmation of a case occur after the time of its 461 infection, together with the delay due to the accumulation of data, our estimation of 462 transmission heterogeneity definitely fell behind the reality. This delay might make our 463 estimation misleading if the underlying transmission dynamics change rapidly during the 464 period. We could reduce the delay by applying our model to the transformed infection 465 data which was generated by accounting for the possible delay between infection and 466 diagnosis [21, 45] . In addition, we could also optimize the time length of data 467 accumulation size based on certain performance constrain such as short-term predictive 468 accuracy [46] to get a timely and accurate estimation of transmission dynamics. The analysis with our model could be biased by the fact that we assumed all cases 470 be detected when analyzing the incidence data. We also showed with synthetic data 471 that our model performed robust as long as the reporting rate (e.g., being 40% ) was 472 constant through the epidemic. However, the reporting rate could vary with time in 473 reality because of improved case ascertainment or case definition, or testing capacity. In this study, we utilize the Gamma distribution to characterize the transmission 475 heterogeneity, which has been widely used in other studies. It also should be noted that 476 April 7, 2022 16/20 . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted April 16, 2022. ; https://doi.org/10.1101/2022.04.07.22273591 doi: medRxiv preprint the Gamma distribution is not suitable for all types of heterogeneity in the transmission. 477 For example, the ongoing vaccination could incur heterogeneity as some people are 478 vaccinated and others are not. This type of heterogeneity should play an important role 479 especially when modelling the transmission heterogeneity in the pandemic of COVID-19, 480 which should probably be Bimodal-distributed instead of Gamma distributed. In summary, we proposed a simple and generic model to estimate the real-time 482 transmission heterogeneity based on incidence data. This model could help 483 epidemiologists better understand the complex mechanism in disease spreading, 484 especially for those that are lack of more detailed data. 485 Superspreading and the 495 Quantifying transmission heterogeneity using both 498 pathogen phylogenies and incidence time series Spatial 501 and temporal dynamics of superspreading events in the Ebola epidemic Chains 505 of transmission and control of Ebola virus disease in Conakry, Guinea, in 2014: 506 an observational study. The Lancet Infectious Diseases Catch me if you can: superspreading of 508 Superspreading drives the COVID pandemic-and could help to tame it. 510 Nature A major outbreak of 512 severe acute respiratory syndrome in Hong Kong Super-spreaders in infectious diseases Characterizing 517 superspreading events and age-specific infectiousness of SARS-CoV-2 transmission in Georgia, USA Evaluating transmission heterogeneity 521 and super-spreading event of COVID-19 in a metropolis of China. International 522 journal of environmental research and public health Ebola superspreading. The Lancet Infectious Diseases Relating phylogenetic trees to 526 transmission trees of infectious disease outbreaks Viral phylodynamics. PLoS computational 529 biology A multitype birth-death model for 531 Bayesian inference of lineage-specific birth and death rates Inferring transmission heterogeneity 534 using virus genealogies: Estimation and targeted prevention. PLoS computational 535 biology A 537 mechanistic spatio-temporal framework for modelling individual-to-individual 538 transmission-With an application to the 2014-2015 West Africa Ebola outbreak. 539 PLoS computational biology Inflow restrictions can prevent epidemics when contact 541 tracing efforts are effective but have limited capacity Phylodynamic inference across 544 epidemic scales A new framework and software to 546 estimate time-varying reproduction numbers during epidemics Improved inference of time-varying reproduction numbers during infectious 550 disease outbreaks Estimation 552 and worldwide monitoring of the effective reproductive number of SARS-CoV-2. 553 medrxiv Sub-spreading events limit the reliable elimination of heterogeneous 555 Sexual transmission and the probability of an end of the 557 Ebola virus disease epidemic 559 Measuring the path toward malaria elimination Assessing 562 the heterogeneity in the transmission of infectious diseases from time series of 563 Disease momentum: estimating the reproduction number in the presence of 568 superspreading Superspreading events without superspreaders: 570 using high attack rate events to estimate Nº for airborne transmission of 571 COVID-19 Estimating 573 the generation interval for coronavirus disease (COVID-19) based on symptom 574 onset data Coronavirus Pandemic (COVID-19) Preliminary 578 epidemiological assessment of MERS-CoV outbreak in South Korea Middle East respiratory syndrome spread with Google search and Twitter trends 582 in Korea Spatiotemporal analysis of the 2014 Ebola epidemic in 584 Georgia coronavirus cases and deaths. Data provided by USAFacts Probabilistic programming in Python 589 using PyMC3 591 Practical considerations for measuring the effective reproductive number Quantification of parasite aggregation: a simulation 594 study Clustering 596 and superspreading potential of SARS-CoV-2 infections in Hong Kong Transmission 599 characteristics of MERS and SARS in the healthcare setting Days After a Funeral in a Georgia Town, Coronavirus 'Hit Like a Bomb A global 605 panel database of pandemic policies Superspreading and 608 heterogeneity in transmission of SARS, MERS, and COVID-19: A systematic 609 Evidence that coronavirus superspreading is fat-tailed Estimation 613 of the reproductive number and the serial interval in early phase of the A/H1N1 pandemic in the USA. Influenza and other respiratory viruses Reconstructing 617 influenza incidence by deconvolution of daily mortality time series Using information theory to optimise epidemic models 620 for real-time prediction and estimation