key: cord-0819888-69ixor9p authors: Ghafari, Mahan; Hejazi, Bardia; Karshenas, Arman; Dascalu, Stefan; Kadvidar, Alireza; Khosravi, Mohammad A.; Abbasalipour, Maryam; Heydari, Majid; Zeinali, Sirous; Ferretti, Luca; Ledda, Alice; Katzourakis, Aris title: Lessons for preparedness and reasons for concern from the early COVID-19 epidemic in Iran date: 2021-05-29 journal: Epidemics DOI: 10.1016/j.epidem.2021.100472 sha: a34e51a23b0e7ba18bc9728e15b9c53250ce0f26 doc_id: 819888 cord_uid: 69ixor9p Introduction Many countries with an early outbreak of SARS-CoV-2 struggled to gauge the size and start date of the epidemic mainly due to limited testing capacities and a large proportion of undetected asymptomatic and mild infections. Iran was among the first countries with a major outbreak outside China. Methods We constructed a globally representative sample of 802 genomes, including 46 samples from patients inside or with a travel history to Iran. We then performed a phylogenetic analysis to identify clades related to samples from Iran and estimated the start of the epidemic and early doubling times in cases. We leveraged air travel data from 36 exported cases of COVID-19 to estimate the point-prevalence and the basic reproductive number across the country. We also analysed the province-level all-cause mortality data during winter and spring 2020 to estimate under-reporting of COVID-19-related deaths. Finally, we use this information in an SEIR model to reconstruct the early outbreak dynamics and assess the effectiveness of intervention measures in Iran. Results By identifying the most basal clade that contained genomes from Iran, our phylogenetic analysis showed that the age of the root is placed on 2019-12-21 (95 % HPD: 2019-09-07 – 2020-02-14). This date coincides with our estimated epidemic start date on 2019-12-25 (95 %CI: 2019-12-11 – 2020-02-24) based air travel data from exported cases with an early doubling time of 4.0 (95 %CI: 1.4–6.7) days in cases. Our analysis of all-cause mortality showed 21.9 (95 % CI: 16.7–27.2) thousand excess deaths by the end of summer. Our model forecasted the second epidemic peak and suggested that by 2020-08-31 a total of 15.0 (95 %CI: 4.9–25.0) million individuals recovered from the disease across the country. Conclusion These findings have profound implications for assessing the stage of the epidemic in Iran despite significant levels of under-reporting. Moreover, the results shed light on the dynamics of SARS-CoV-2 transmissions in Iran and central Asia. They also suggest that in the absence of border screening, there is a high risk of introduction from travellers from areas with active outbreaks. Finally, they show both that well-informed epidemic models are able to forecast episodes of resurgence following a relaxation of interventions, and that NPIs are key to controlling ongoing epidemics. With significant levels of mortality and morbidity, the ongoing pandemic of Coronavirus Disease 2019 (COVID-19) continues to have a major impact on many regions around the world [1] . With many nations struggling to contain the spread of its causative pathogen SARS-CoV-2, it is paramount to re-evaluate our knowledge of the epidemic in search for useful lessons. Different countries represent different realisations of SARS-CoV-2 epidemics in different contexts; therefore, accurate country-specific analyses can provide key insights into the dynamics of the epidemic and the effects of interventions. The pandemic was first detected in the city of Wuhan, in the Chinese province of Hubei in December 2019, with isolated cases recorded in European and East Asian countries until the second half of February, when large outbreaks were detected at the same time in Lombardy, Italy, and in Qom, Iran. While the early phases of the pandemic in China and Italy have been widely studied, the early Iranian epidemic has not attracted the same attention, possibly due to the lower amount of data available. As a consequence, very little is known about the origins of the outbreak and pattern of spread across the country. Some early reports suggested that the first two confirmed cases in Qom, one of the thirty-one provinces of Iran, could have been infected by a merchant who had reportedly travelled from China [2] . However, with COVID-19, identifying the so-called 'patient zero' is problematic due to high rates of asymptomatic and pauci-symptomatic infections [3] . As an example, the COG-UK consortium has shown that there have likely been more than one thousand unique introductions of the virus into the UK, possibly many of which are from individuals who are asymptomatic or show mild symptoms [4] . Therefore, the most likely route of virus spread from China to Iran, and indeed many other countries, was via asymptomatic, presymptomatic, or mildly symptomatic travellers. Infected individuals typically show no symptoms for about 5 days [5] [6] [7] or sometimes no symptoms at all [8] [9] [10] , while silently spreading the virus [11] . Using aggregate data from the influenza surveillance networks has been proposed as one of the ways to detect early arrival of SARS-CoV-2 into a country [12] . However, a major flu outbreak in Iran which reportedly stretched from November 2019 to early January 2020 may have conflicted with the early diagnosis of COVID-19 patients across the country, thereby hindering control and diagnosis efforts [13, 14] . By the time the first cases were reported, the outbreak was large enough to overrun the testing capacity of the Iranian health system, similar to a pattern seen in other countries that were caught off-guard by the rapid spread of the virus outside China. The Iranian COVID-19 epidemic is an interesting case study because Iran has not only been one of the first countries to face an outbreak, but also acts as a major source of the spread in Central Asia and the Middle East during the early stages of the pandemic. While struggling with under-reporting issues, the Iranian health system has been able to collect informative clinical and epidemiological data, which can be combined with external sources to improve our understanding of the likely epidemic dynamics in other Central Asian countries. It has also been the first country to relax public health measures by reopening businesses and public facilities. This untimely relaxation of NPIs led to a second COVID-19 incidence, showing, for the first time, that the so-called "second wave" was not just a theoretical prediction. In this study, we provide a detailed analysis of the COVID-19 outbreak in Iran. By gathering genetic and air travel data from passengers with a link to Iran, we estimate the start date of the epidemic and its prevalence across the country. Coupling this with the estimated number of excess deaths and other key clinical and epidemiological information, such as the age-stratified infection fatality ratios and delays from onset of infection to death, enables us to reconstruct the full transmission dynamics of COVID-19 despite significant levels of under-reporting in prevalence and deaths. First, we use epidemiological and genetic data from air travellers with a travel history to Iran along with the first whole genomes from inside the country to determine the start date and early growth rate of the epidemic. Next, we provide a province-level analysis on the pattern of spread during winter and spring, highlighting the degree of under-reporting in both point prevalence and deaths. We also assess the risk of importation of infected individuals from China into the country. Finally, we evaluate the impact of non-pharmaceutical interventions (NPIs) on the growth rate of the epidemic and use an SEIR model to reconstruct the early transmission dynamics in the country. We use the time series data for the number of confirmed cases and deaths (see Data S2) from the Johns Hopkins University Centre for Systems Science and Engineering COVID-19 GitHub repository (accessed on 2020-06-04). We also obtain time series data on confirmed cases in all the 31 provinces of Iran from the Ministry of Health website (behdasht.gov.ir). We note that the Ministry stopped releasing province data from 2020-03-23 onward. They also did not release province data on 2020-03-02 and 2020-03-03. We obtain seasonal mortality statistics from the National Organization for Civil Registration (NOCR) of Iran (sabteahval.ir). From a globally representative sample of 802 genomes, including 20 from passengers with a travel link to Iran and additional three genomes that we sequenced from inside Iran, we construct a maximum-likelihood tree using RAxML [15] to examine the number of independent introductions into the country from the start of the pandemic until January 2021. The samples are randomly selected from 11 major clades as identified by NextStrain in 2020 (i.e., 19A, 19B, 20A, 20B, 20C, 20D, 20E, 20F, 20G, 501Y.V1, and 501Y.V2). We enrich the sampled set with 143 samples from clade 19A and 157 samples from 20A to ensure that there is enough resolution in identifying the lineages of interest related to samples from Iran. We bootstrap the maximum-likelihood tree 100 times and use an HKY+G substitution model [16] to construct the tree. All the genomes were downloaded on 2021-01-21 from GISAID's EpiCoV TM Database (gisaid.org), a public database for rapid data sharing hosted by the global initiative on sharing all Influenza dataa complete metadata table with detailed information about all the samples and acknowledgements of all of the authors and institutions involved in sequencing is available in Data S1. We use MUSCLE v3.8.425 [17] for sequence alignment and mask sites in the first and last 130 bp of the alignments. We only include complete sequences (>29 Kbp) with high coverage as determined by GISAID's default search options. To jointly infer the Time to the Most Recent Common Ancestor (TMRCA) of the earliest genomes linked to Iran and the early doubling times in cases from a monophyletic tree using J o u r n a l P r e -p r o o f BEAST [18] , we first remove genomes that are epidemiologically linked. We then use the known sampling times with a Continuous-Time Markov Chain reference prior on the substitution rate, an exponential population coalescent model with a log-normal prior on size (mean=1, SD=2), Laplace prior on growth rate (scale=100), and an HKY+G substitution mode [19, 20] and allow the Markov chain Monte Carlo to run for a hundred million steps and discard the first 10% steps as burn-in to find the parameters of interest. Effective sample sizes of all parameters are >1,000 ensuring that they are well-sampled [21] . We use FigTree for tree visualisations [22] . . To find the probability, () pt , that a traveller on board the plane is infected on a given day, t, we assume only asymptomatic or pre-symptomatic cases are on board the plane. This probability is given by where  is the inverse of the proportion of ascertained infections (e.g. 1   implies complete ascertainment), () Sk is the CDF of the time from infection to symptoms (incubation period) or to detection (incubation period plus delay from onset of symptoms to having a positive RT-PCR test in an infected individual), () it is the incidence on day t, and M is the catchment population size. The probability that the infected individual is detected upon travelling is given by where () nt is the number of infected air-travellers that are detected on day t, total number of cases on day t, ( ) / ( ) D n t p t  is the daily number of outbound international travellers, and T is the mean time from exposure to detection. Given that the incidence over time in Iran is unknown, we made the approximation that () pt only depends on the incidence on day t such that ( ) ( ) / p t Ti t M   . Finally, we estimate the total number of cases, i  , by finding the maximum likelihood of a binomial function with the number of successful outcomes, i N , and success probability, i  , for country i, given by is the total number of imported cases in country i and P is the proportion of undetected asymptomatic and mildly symptomatic air travellers. By taking the point-prevalence estimates for each country, we can find a line of best fit for the prevalence over time, assuming an exponential regression model, to estimate the start date of the epidemic and early doubling time in cases. We consider air travel data from passengers in four of the busiest airports in Iran (Tehran, Mashhad, Isfahan, and Shiraz) who flew to Oman, Lebanon, UAE, Kuwait, and China (see Data S3). We allow for 50% (20% -100%) of the 55 million residents in provinces near the airports to represent their catchment population size (see Data S4). There is typically 5 (4-6) days of incubation period [5-7, 23, 24] and an additional 5 (3-7) days of delay from symptom onset to detection [24, 25] . We further assume that an additional 45% (30%-55%) of asymptomatic and 15% (10%-25%) mildly symptomatic cases were among the undetected exported carriers of COVID-19 [26, 27] . We use the publicly available data from the NOCR which records the all-cause registered deaths in Iran per province per season according to Solar Hijri (SH) calendar [28] . We assemble this data for the last 5 years to compare the excess mortality during winter and spring 2020 to previous years. Excess deaths refer to the number of deaths above expected seasonal baseline levels, regardless of the reported cause of death, and can be used as a nonspecific measure of the severity of the epidemic and provide a more accurate measure of its burden on the healthcare system [29] . We analyse the NOCR data from 1394 SH to 1399 SH (from 2015-12-22 to 2020-06-20 in the Gregorian calendar). For each province, we apply a least-square regression model on the seasonal deaths from previous years to find the 'expected' seasonal death in 2020 (see Fig. S2 and S3) and then calculate the difference with respect to the observed death toll for each province to calculate excess mortality (see Table S2 and S3). We attribute excess deaths in those provinces with significant deviations (i.e., two standard deviation units) with respect to their expected seasonal value to COVID-19related deaths. We define the importation risk as the mean number of infectious individuals travelling from China to Iran over the span of approximately one month, from when China started reporting cases on 2020-01-22 until when the epidemic in Iran started. We assume importations to Iran only occur via air travel from infected areas in China. The main routes of travel to Iran are from airports in Beijing, Shanghai, and Guangdong. We allow a fraction, f, of passengers to come from the Hubei province. To calculate f we take the ratio of total passenger traffic from airports in Hubei to 100 busiest airports across the country according to the 2018 report from the Civil Aviation Administration of China [30] . To calculate the probability, () pt , that an individual on board the plane from a given province is infected, we correct for ascertainment bias in reported cases in China during the early stages of the pandemic,   1 0.23 95%CI : 0.14 0.42     [31] . We further assume that the catchment population size is equal to the population of the province, the incubation period is log-normally distributed (mean=4.8 days and s.d.=1.9 days) [24] , and the delay from the onset of symptoms to having a positive RT-PCR test in an infected individual is Gamma distributed (shape parameter=2.12 and scale parameter=0.39) [24] . Finally, to calculate the expected number of imported cases, i N , to Iran from 2020-01-22, t=0, to a later date, t*, we have * Hubei Hubei {Beijin,Shanghai,Guangdong} 0 where Di is the average number of passengers flying from province i per day. We use a generalised SEIR model with age-stratified compartments to reconstruct the early transmission dynamics of the epidemic in Iran (covid19-scenarios.org) [31] . In this model, susceptible individuals can be exposed to the virus through contact with an infected individual. They then progress towards the infectious stage where they can either recover without hospitalisation or progress towards severe illness that requires hospitalisation. For the latter group, individuals either recover or transition to the ICU stage at which point they either die or return back to the hospitalisation state (see the schematic diagram of the model in Fig. 4a) . We use the line of best fit for prevalence over time based on our air travel analysis on exported cases to parameterise the basic reproduction number, 0 R , for the SEIR model. To do so, we fix the start date of the epidemic according to the line of best fit and re-fit the exponential regression model to the pointprevalence estimates to capture the variation in growth rate and estimate 0 R using the following equation: is the generation time, and I T is the infectious period (see Tables 2 and 3 ) [32, 33] . The effective reproduction number, Rt, is modelled as a piecewise constant function that changes immediately upon a new NPI. The expected reduction in Rt upon the implementation of new NPIs is bound to be within the same range as estimates (of the same NPI) from other studies [34] [35] [36] and the exact value of the mean Rt is found by finding the line of best fit that matches the mean cumulative deaths with our estimates for COVID-19-related deaths based on excess seasonal mortality in winter and spring. The lower bound in the effectiveness of each NPI is set such that the modelled cumulative deaths does not drop below the confirmed COVID-19-related deaths as reported by the Ministry of Health and Medical Education (MoHME) from the first day of case reporting, 2020-02-19, until the day that the last NPI, mandatory face-covering, went into effect on 2020-07-05. The SEIR model, covid19scenarios.org, allows the user to set the range of transmission reduction during each intervention. Therefore, by fixing the mean and lower bound of Rt, the upper bound can also be found. From the time of the announcement of the last NPI (i.e., mandatory face covering) until the last day of the simulation run, the reconstructed transmission dynamics are solely based on an assumed effectiveness of mandatory face covering which is derived from previous studies (see Table 2 ). Thus, any inference beyond 2020-07-05 should be treated as part of the model's projection of the actual dynamics. The duration of hospital/ICU stay is taken from the 2020-03-14 report by MoHME (Fig. S3) [37]. We also assume all infected individuals are immune to reinfection over the course of the simulation. First reported cases of COVID-19 in Iran were of two infected individuals in the province of Qom on 19 February. In only a few days, reported cases grew rapidly in almost a dozen provinces (see Movie S1 and Figure 1a ) which raised concerns that the virus had already been widespread by the time the first two cases were detected in Qom. During the early days of the epidemic, Pasteur Institute of Iran (PI) was the primary source of testing suspect COVID-19 cases and had enough resources to perform only a few hundred RT-PCR tests per day. By mid-March, WHO and the Chinese Government delivered several shipments of emergency medical supplies along with more than 200,000 additional test kits which helped with the diagnosis of more hospitalised patients [40] . Nevertheless, some of the preliminary analyses showed that Iran was only reporting less than 10% of its symptomatic cases during its first peak in late March [41] . The extent of underreporting of COVID-19 cases and deaths also led to speculations on the possibility of data manipulations, which are however not substantiated by an analysis based on Benford's law (see supplementary materials). From early April, as the country ramped up its testing capacity, MoHME started to report outpatients and carried out limited levels of contact tracing in several provinces (see Fig. 1b ). However, despite the growing testing capacity, since the government announced plans for re-opening high-risk activities in late May, the second peak in new cases emerged and the number of outpatients started to shrink over time. This raised concerns that, like the first peak when hospitals in many provinces were at near maximum capacity, the degree of under-reporting in both prevalence and deaths have risen substantially higher again. Another point of concern is that as the delay from the onset of symptoms to testing increases, the probability that an infected individual tests positive likely decreases. This is because by the time tests are taken from hospitalised cases of COVID-19, there could be typically 3 to 7 days past since the onset of their symptoms and many could be clearing the infection by then. As a result, a significant portion of their tests (up to 50%) may come out as negative [5, 42] . Since the virus spread undetected during the initial phase of the Iranian epidemic, no epidemiological information is available during this period. Molecular epidemiology provides a powerful tool to reconstruct the early epidemic trajectory a posteriori, especially when direct epidemiological data are absent, incomplete or biased. The report on 2020-03-16 by NextStrain and findings from returned travellers to Australia and New Zealand showed that sequences from cases with a travel history to Iran correspond to a distinct clade of SARS-CoV-2 [43, 44] . We first construct a maximum likelihood phylogenetic tree of all sequences that were linked to Iran (see Fig. 2a ) to identify the most basal clade (i.e., earliest introduction) related to genomes from Iran. After identifying the basal clade, we use these genomic samples to determine the TMRCA of the clade and characterise the initial epidemic growth rate in Iran. Consistent with other studies [20] , our inferred substitution rate is 3 1.06 10   (95% Highest Posterior Density (HPD): 53 9.9 10 2.14 10     ) per site per year and the exponential growth rate of 41 in units of years, corresponding to a doubling time of around 6.14 (95% HPD: 3.34 -38.33) days. The age of the root is placed on 2019-12-21 (95% HPD: 2019-09-07 -2020-02-14), nearly two months before when the first cases in Qom were reported. High levels of under-reporting have been noticed in several countries during the early phase of the COVID-19 pandemic. Such under-reporting creates biases that hinders direct estimates of the actual incidence of the disease and are likely to be present in MoHME's report as well. A more reliable estimate of the size of the Iranian epidemic can be obtained from the number of exported cases detected abroad [45] . In late February, a total of 8 internationally exported cases of COVID-19 with direct flights from Iran were detected in Lebanon, UAE, Oman, and Kuwait. Similarly, in early March, an additional 28 cases were identified in China (see Table 1 ). By finding the approximate number of passengers per week travelling from Iran to each of these countries, we use a binomial likelihood estimation (see Methods section) to find the nationwide incidence on the dates when the exported case was reported (see Fig. 2b ). We then use an exponential regression fit to the estimated nationwide incidence and find that the outbreak size by 2020-02-25 and 2020-03-06 to be 45,981 (95%CI: 1 -96,840) and 258,661 (95%CI: 171,443 -345,879) , respectively, which is aligned with estimated outbreak size in late February from other studies [46, 47] and is more than 100 times higher than confirmed cases that MoHME reported at the timesee the sensitivity analysis on the parameters of our model in Table S1 . These estimates also suggest a doubling time of 4.0 (95%CI: 1.4 -6.7) days, which is close to the growth rate in the initial phase of European COVID-19 epidemics [48] . We then use this range of doubling times to extrapolate the number of cases back in time and find 2019-12-25 (95%CI: 2019-12-11 -2020-02-24) to be the approximate start date of the epidemic, which is also well-aligned with the estimated root age of our phylogenetic analysis. Given the high levels of under-reporting, excess mortality data represent a more reliable source to estimate for COVID-19-related deaths. By analysing the seasonal reports from NOCR on provincelevel all-cause deaths, we find that five provinces in winter and twenty-eight provinces in spring 2020 had significantly higher recorded deaths compared to previous years (see Fig.3a and 3b ). During winter, the five provinces in central and northern Iran (Qom, Gilan, Golestan, Qazvin, and Mazandaran) showed a 24% rise in mortality with 3,558 (95% CI: 3,171 -3,949) deaths in excess of previous years. In spring, our estimates show a 22% increase with a total of 18,359 (95% CI: 13,506 -23,212) excess deaths in twenty-eight provinces compared to previous years. Qom and Gilan, two of the hardest-hit provinces in winter, show a lower percentage of excess deaths in spring possibly indicating that the largest part of their outbreak occurred back in winter while in many other provinces such as Mazandaran, Khuzestan, and Qazvin the excess deaths continued to increase during spring. This is also corroborated by numerous reports from various provinces across the country and is aligned with the fact that the first nationwide epidemic peak occurred in late-March. Given the debate on travel restrictions, it is worth assessing what was the actual early-stage risk of COVID-19 importation to Iran. Our analysis shows that the expected number of imported cases into the country from 2020-01-22 to 2020-02-14 (the upper bound of our TMRCA estimate for the start date of the epidemic in the country) was 0.17 (95%CI: 0.04 -0.28) which suggests that the Iranian epidemic was most likely seeded by one or a few imported cases flying from China. The fact that the epidemic in Iran started despite the low number of importations suggests a strong overdispersion in transmission and that the infected individuals likely belonged to a high-risk subgroup of the population, such as businesspeople, politicians, or other individuals who frequently travel to the country and tend to meet a large number of contacts [49] . We first evaluate the effectiveness of key NPIs (see Methods section) implemented on set dates in Iran and their impact on reducing the effective reproduction number, Rt (Table 2 and Figure 4a ). Our analyses indicate that while intervention measures, particularly the nationwide lockdown during the New Year's holiday month, were effective in curbing the spread (Rt < 1), the reopening J o u r n a l P r e -p r o o f of high-risk businesses in May led to the emergence of a new peak of infection across the country (Rt > 1). They also suggest that mandatory face covering in public spaces coupled with social distancing measures and other health and safety protocols can be effective enough to stop the exponential growth pattern (Rt <1) and reduce the transmission rates by late-July/early-August conditioned on having no other major change in NPIs or social behaviour (e.g., travelling for holidays or mass gatherings) during this time period. We show that if this intervention is 70% (95%CI: 60 -85) effective, i.e. at least 60% effective, it can reduce Rt from 1.4 (1.2 -1.5) to 0.9 (0.7 -1.0) which is enough to curb the spread. This corresponds to a relative reduction of 0.64 (95%CI: 0.58 -0.80) in Rt which is in agreement with other studies reporting the reduced risk of transmission for mask wearers compared to a control group [50] . Our model successfully recreates the nationwide trend of the outbreak until mid-July and forecasted the insurgence of the second epidemic peak (see Figure 4c and 4d) . We predict that a total of 15. Countries in different regions of the world with a lack of adequate epidemic surveillance systems can suffer greatly from under-reporting of prevalence and deaths associated with an infectious disease epidemic and, as a result, may not have a timely public health response to contain the spread or closely monitor local outbreaks. In this work, we provided a number of approaches that can help to understand the pattern of spread and to gauge the level of under-reporting in a region, in this case Iran, where detailed epidemiological data from local surveillance is insufficient. We combined air travel and genomic data from confirmed cases with a history of travel to the country to estimate the true prevalence and growth rate of the epidemic. Our analysis shows that it is possible to reliably infer the epidemic size and its early growth dynamics by combining indirect sources of information. Phylogenetic analysis (clades with genomes related to Iran), coupled with epidemiological (cases with a recent travel history to Iran in late February and early March) and clinical data (date of symptom onset), further helped with identifying the earliest introduction of SARS-CoV-2 into Iran and estimating the start date and early growth rate of the epidemic in the country. The overall low importation rate of new cases from China suggests that importation to Iran likely occurred via a high-risk individual with frequent travels to the country. It also suggests that risk of importation from travellers is higher than expected, most likely because of their mobility patterns and high number of social contacts. By further combining our epidemiological and phylogenetic analyses, we were able to fully reconstruct the outbreak of SARS-CoV-2 across the country. The first version of this manuscript made a clear prediction on 2020-04-23 about the risk of a second peak due to relaxed NPI measures, based on well-accepted models for nowcasting and forecasting of COVID-19 epidemics [51] . Iran was in fact the first country to experience a second epidemic peak from the beginning of May (Figure 1) , confirming both the validity of our epidemic modelling and the relevance of NPIs to control the spread. Our SEIR modelling analysis shows that by 2020-06-10 7.7% (95%CI: 2.18 -13.9) of the population have recovered from COVID-19 which is in general agreement with seroprevalence measurements of SARS-CoV-2 in the general population of 18 cities across the J o u r n a l P r e -p r o o f country by early June [52] . This also implies that a large fraction of the population is still vulnerable to contracting COVID-19which has significant implications both for the possibility of the virus becoming endemic to the country and its likely return during winter this year which, if coupled with seasonal flu, can significantly overwhelm the hospitals. Furthermore, the continuation of underreporting in prevalence due to limited testing of suspect cases and tracing their contacts will likely lead to several undetected superspreading events that can spark new outbreaks in different parts of the country making nowcasting and forecasting of the COVID-19 epidemic in Iran extremely challenging. Furthermore, it highlights the major risk imposed by exported cases from any region of the world with an uncontrolled outbreak of COVID-19 in seeding new epidemics in different countries. Thus, it is prudent for countries that have successfully controlled the epidemic to implement more careful screening of travellers from such regions of the world to avoid new bursts of epidemic in their countries. The combination of air travel and genomic data with the all-cause seasonal mortality data enabled us to assess the level of under-reporting in cases and deaths across the country by comparing them to surveillance data. We assessed the internal consistency of reported numbers by MoHME (both in daily reported cases and deaths due to COVID-19) using Benford's law. However, this test alone cannot be used to rule out some systematic or random patterns of absence of data on case or death counts, e.g. from certain hospitals. The number of confirmed COVID-19 cases in each country may vary depending on the transparency to report correct statistics and the capacity of the healthcare system to detect new cases. The latter also depends on the accuracy of laboratory test kits and accessibility of diagnostic and screening tests. Indeed, many countries with limited testing capacities may have to prioritise testing that informs policy decisions, e.g. they may not test suspected cases with mild symptoms or those who are asymptomatic and are a close contact of a confirmed case. As a result, the true number of cases is always significantly higher than those reported by the health agencies. In addition, the burden of the epidemic can significantly impact the performance of the healthcare system to properly allocate cause of deaths in individuals with underlying health conditions such as diabetes or heart disease. Therefore, tracking all-cause registered deaths and estimating excess mortality during the outbreak provides a more sensitive measure of COVID-19-associated deaths than would be recorded by counting confirmed deaths. We note that all-cause deaths may also include factors that are not causally associated with SARS-CoV-2 that might affect death rates such as the circulation of the 2019-20 seasonal influenza. According to the United Nations Statistical Division the estimated coverage of registered deaths in Iran is 92% which could potentially be suffering even more from under-counting during the peak of the outbreak when there are likely going to be further delays in death registrations. Also, excess seasonal mortality does not take into account the catalysing role of COVID-19 in deaths among individuals with underlying comorbidities who would have likely died during a particular season even without contracting COVID-19 (as part of the 'background' deaths). Our data have many limitations. We investigated air travel data only to countries with direct flights to Iran and discarded information from detected cases in countries such as Qatar and Canada since we were not able to independently verify the fraction of passengers on board the planes that travelled from Iran to those countries. Also, given the lack of mobility data from Iran, we were unable to investigate possible international exportation of cases to Afghanistan, Iraq, Syria, Azerbaijan, Turkey, and other countries with significant flow of ground transportation (i.e. trains, buses, and cars) from Iran. We did not have access to the province-level number of confirmed COVID-19 deaths which is a significant source of information to assess excess deaths in the winter and spring 2020. Also, there is likely extreme heterogeneity in the geographical spread of COVID-19 across the country due to various factors such as demographic structure of the provinces' populations, the pattern of social contacts between age groups, and the quality of healthcare and effectiveness of NPIs in different districts. Despite these drawbacks, our approach successfully integrated multiple sources to obtain a more reliable picture of the COVID-19 epidemic. This kind of study should be extended to countries with unreliable local epidemic surveillance systems in order to evaluate the local stage of the epidemic, inform local policies and highlight countries in need of international support to control the epidemic. Fig. S1 ) . The scale bar shows the mean number of substitutions per site. (B) Estimating the start date of the epidemic and its initial growth trajectory using an exponential regression fit based on the likelihood analysis on exported cases to UAE, Lebanon, Kuwait, Oman, and China (see Table 1 ). The blue dashed line and gray area show the line of best fit and its corresponding mean prediction bands, respectively. Table 2 and Table S4 ). (C) Shows the estimated number of weekly infections (orange line), deaths (blue line), confirmed cases (orange dots), and confirmed deaths (blue dots). (D) Shows the estimated cumulative deaths (gray line), total number of individuals recovered (green line), cumulative reported deaths (gray dots), estimated excess mortality during winter and spring (red dots). Lines represent model predictions and shaded areas are their 95% confidence intervals. Open circles show the data points that are not used for fitting to the model. *based on annual reports on vital statistics from the Statistical Center of Iran (amar.org.ir). †We use the statistics from the Chinese CDC [55] for age-stratified IFR which are also broadly compatible with estimates from other studies [56] ‡Latency period is the delay from infection to onset of infectiousness. Infectiousness typically starts from 2.5 days and peaks at 1 day before the onset of symptoms. Together with infectious period, latency period sets the serial interval [11] . COVID-19): situation report, 176. 2020, World Health Organization How Iran became a new epicenter of the coronavirus outbreak Substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (SARS-CoV-2) Preliminary analysis of SARS-CoV-2 Importation & Establishment of UK Transmission Lineages Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia The Incubation Period of Coronavirus Disease 2019 (COVID-19) From Publicly Reported Confirmed Cases: Estimation and Application Estimating the generation interval for coronavirus disease (COVID-19) based on symptom onset data Prevalence of Asymptomatic SARS-CoV-2 Infection A Narrative Review The Role of Asymptomatic SARS-CoV-2 Infections: Rapid Living Systematic Review and Meta-Analysis. medRxiv Presumed Asymptomatic Carrier Transmission of COVID-19 Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing Using influenza surveillance networks to estimate statespecific prevalence of SARS-CoV-2 in the United States World Health Organization, Influenza situation in the Eastern Mediterranean Region, W11/2020 In search of the murder: Making sense of Iran's reported deaths, in Medium RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies Dating of the Human Ape Splitting by a Molecular Clock of Mitochondrial-DNA MUSCLE: multiple sequence alignment with high accuracy and high throughput Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10 Phylogenetic analysis of nCoV-2019 genomes. Virological.org Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7 Temporal dynamics in viral shedding and transmissibility of COVID-19 Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo'. Nature Estimating pre-symptomatic transmission of COVID-19: a secondary analysis using published data. medRxiv, 2020. 28. National Organization for Civil Registration, Registered Deaths by Province Preliminary Estimate of Excess Mortality During the COVID-19 Outbreak Civil Aviation Administration of China, 年民航机场生产统计公报 Reconstruction of the full transmission dynamics of COVID-19 in Wuhan Model-consistent estimation of the basic reproduction number from the incidence of an emerging infection A practical generation-interval-based approach to inferring the strength of epidemics from their speed Estimating the effects of non-pharmaceutical interventions on COVID-19 in Europe Estimating COVID-19 cases and reproduction number in Brazil. medRxiv State-level tracking of COVID-19 in the United States WHO Director-General's opening remarks at the media briefing on COVID-19 WHO ships emergency medical supplies to the Islamic Republic of Iran Reconstructing the early global dynamics of under-ascertained COVID-19 cases and infections Estimating the false-negative test probability of SARS-CoV-2 by RT-PCR. Eurosurveillance Genomic analysis of COVID-19 spread An emergent clade of SARS-CoV-2 linked to returned travellers from Iran Report 2: Estimating the potential total number of novel Coronavirus cases in Wuhan City Estimation of Coronavirus Disease 2019 (COVID-19) Burden and Potential for International Dissemination of Infection From Iran Preliminary estimation of the novel coronavirus disease (COVID-19) cases in Iran: A modelling analysis based on overseas cases and air travel data Challenges in control of Covid-19: short doubling time and long delay to effect of interventions Transmission heterogeneities, kinetics, and controllability of SARS-CoV-2. Science Modeling COVID-19 scenarios for the United States COVID-19 Scenarios: an interactive tool to explore the spread and associated morbidity and mortality of SARS-CoV-2 SARS-CoV-2 antibody seroprevalence in the general population and high-risk occupational groups across 18 cities in Iran: a population-based cross-sectional study Temporal dynamics in viral shedding and transmissibility of COVID-19 Suppression of a SARS-CoV-2 outbreak in the Italian municipality of Vo The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China. Chinese Center for Disease Control and Prevention Estimates of the severity of coronavirus disease 2019: a model-based analysis TreeAnnotator v2.6.0 The law of anomalous numbers Newcomb-Benford law and the detection of frauds in international trade Quick Anomaly Detection by the Newcomb-Benford Law, with Applications to Electoral Processes Data from the USA, Puerto Rico and Venezuela The effective use of Benford's law to assist in detecting fraud in accounting data Data Quality of Chinese Surveillance of COVID-19: Objective Analysis Based on WHO's Situation Reports This includes a list of all countries with confirmed cases of COVID-19 travelling from Iran via direct flights (see Data S3 for more information) Acknowledgments: Funding: MG and SD are funded by the Biotechnology and Biological Science Research Council (BBSRC), grant number BB/M011224/1. Author contributions: MG, AL, and AK conceptualised and developed the work. MG wrote the original draft and all other authors reviewed and edited the manuscript. BH and MG investigates data manipulation and carried out the Benford analysis. AKar collected the air travel data for confirmed exported cases. AKar, LF, AL, and MG analysed the air travel data to measure incidence across the country. AKad and MG collected and analysed the excess mortality data. MAK, MA, and SZ collected and validated three genomic samples from Iran. AK and MG carried out the phylogenetics analysis. MH, SD, and MG collected the data regarding daily reports on testing and set dates for non-pharmaceutical intervention measures. Competing interests: Authors declare no competing interests. Patient and public involvement: Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research. Data and materials availability: All data and codes used for the analysis are available online on our GitHub repository (github.com/mg878/Iran_study). *This intervention is followed by the closure of all business across the country due to the Iranian New Year's holiday period which started in the week leading to 2020-03-19 and has the highest impact on reducing Rt. †Older interventions end as soon as a new one takes into effect. ‡We assume that the relative reduction in Rt because of this NPI is aligned with the estimated odds ratio of 0.6 (95%CI: 0.5 -0.8) in reducing risk of transmission for mask wearers compared to a control group from other studies [50] ..