key: cord- -oj v x authors: catala, m.; pino, d.; marchena, m.; palacios, p.; urdiales, t.; cardona, p.-j.; alonso, s.; lopez-codina, d.; prats, c.; alvarez lacalle, e. title: robust estimation of diagnostic rate and real incidence of covid- for european policymakers date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: oj v x policymakers need a clear and fast assessment of the real spread of the epidemic of covid- in each of their respective countries. standard measures of the situation provided by the governments include reported positive cases and total deaths. while total deaths immediately indicate that countries like italy and spain have the worst situation as of mid april , on its own, reported cases do not provide a correct picture of the situation. the reason is that different countries diagnose diversely and present very distinctive reported case fatality rate (cfr). the same levels of reported incidence and mortality might hide a very different underlying picture. here we present a straightforward and robust estimation of the diagnostic rate in each european country. from that estimation we obtain an uniform unbiased incidence of the epidemic. the method to obtain the diagnostic rate is transparent and empiric. the key assumption of the method is that the real cfr in europe of covid- is not strongly country-dependent. we show that this number is not expected to be biased due to demography nor the way total deaths are reported. the estimation protocol has a dynamic nature, and it has been giving converging numbers for diagnostic rates in all european countries as of mid april . from this diagnostic rate, policy makers can obtain an effective potential growth (epg) updated everyday providing an unbiased assessment of the countries with more potential to have an uncontrolled situation. the method developed will be used to track possible improvements on the diagnostic rate in european countries as the epidemic evolves. the evolution of the epidemic in europe has affected spain and italy more strongly than in other countries so far. this is clear from reported cases and fatalities in these however, they lack the recipe-type nature needed sometimes to direct a policy response. the focus of this paper is, thus, to introduce a method to compute the real diagnostic rate and the real incidence of covid- in each european country, testing that the key hypothesis of the method is fulfilled and that, if they were to be slightly off, they would affect all countries in the same direction. in other words, we provide a recipe for policymakers that we have tested to be correct, unbiased across countries and useful to make cross-country comparison provided the evolution and prognosis of the disease in a patient is not strongly dependent on socio-economic factors and only on age, sex and previous clinical history. we must recall here that the ability to determine the diagnostic ratio is essential to evaluate what the real number of infected people is. knowledge of this number is not only useful to visualize the full scope of the epidemic but also to properly estimate the number of people with probable short-term immunity. in this sense, our method can be added as an empirical take of other assessments about the real incidence of the disease and to study the possibility of developing herd immunity. a large number of real infected people would be a positive scenario for policymakers while a low number will be negative. it is thus very important to err on the side of caution in all our estimates giving always the less optimistic take. the basic structure of the paper is the following. first, we give a general overview of our framework in the methods section. then we discuss our key assumption: the real case fatality rate (cfr) in european countries experiencing a significative incidence will be roughly the same, given the similar structure of the population. if the real cfr were to be lower, or higher, it would affect all countries in the same way and would not affect most policy decision-making since it will move all countries in the same direction. we take this real cfr to be % and proceed to test that, effectively, there is a strong correlation between the day of reported deaths with the number of cases taken - days before. once a given value for the real cfr is taken, one must consider that people do not die immediately from the disease, as it takes roughly days after infection [ ] [ ] [ ] . in other words, the present values of the death toll can provide an estimation of the number of infected people days ago. knowing the number of infected people at present, not days in the past, is crucial. we attack this problem considering that people who become infected are usually diagnosed a few days after the onset of the symptoms, which can be to days after infection occurs. by comparing the number of people diagnosed on a certain date with our estimation of the real number of infected people, we can estimate what percentage of the cases are being diagnosed. we can calculate this for different countries and regions and test how this ratio has changed dynamically as the epidemic advanced. in the results section, we provide a full detailed description of how this fraction has become steady in the last weeks. we demonstrate that the percentage of diagnosis throughout the development of the epidemic has taken values that gradually converge for most countries. this gives a final clear picture showing the rate of diagnosis for each country. using this rate is straightforward to give a present-day estimate of the incidence given the number of reported infected people in each country as long as we can observe that the rate of diagnosis remains fairly constant. for policymakers, we have constructed an index named effective potential growth (epg) that combines this information with the growth rate of the epidemic to provide insight regarding which countries are, comparatively and in the short-term, in the most potentially complicated situation [ ] . framework of our methodology our analysis will be applied to european countries with a minimum of deaths on april so that we can guarantee a minimum statistical significance. the analyzed countries are: belgium, france, germany, italy, netherlands, portugal, spain, sweden, switzerlands and united kingdom. our two core assumptions are that the real cfr in all european countries is roughly the same and that reported data of death due covid- is uniform in all european countries under consideration. we will address these two hypothesis in the following sections. with these assumptions we need to carry out four steps, as indicated in fig. , to obtain the percentage of diagnosis. first, using a common reference cfr = % and, given the reported reported death count, we estimate the number of cases days ago. according to medical reports people die between and days after the development of the first symptoms [ ] . this time to death, ttd, after the development of the first symptoms will not be country-specific for demographic reasons. the estimated number of infected people with the disease at time t (see process in fig. this allows us to know to estimate the number of cases days ago. this value can be compared with the number of cases detected days ago, obtaining a diagnostic depending on the availability of tests, saturation of the health system and other external factors, countries have a great variability in the time of diagnosis delay. countries accumulate some delay that may arrive to days in the case that a country detected people as late as they were detected on death. this delay to detection (dd) due to lags in diagnosis corresponds to the time between the patient having the first symptoms and being reported by the health system. in fact, this time in some countries may vary throughout the course of the infection. therefore we cannot assume that the estimated and the reported are comparable and we need to know what the diagnostic time was for each of the countries studied. we can compare the reported deaths with the reported cases to find the maximal correlation, see process in fig. (a) , to estimate the dd, see process in fig. (a) . finally the ratio between reported cases at dd with the estimated cases, see below, provides an estimation for the percentage of diagnosis, see process in fig. (a) . note that the usual development of the reporting of a new case/death, see fig. (b), depends on the particular country under consideration, which determines dd. in fact, dd also includes a delay in reporting the diagnostic to death to official information systems. the cornerstone of our analysis is that the real cfr in all european countries will not be biased against any country in particular. we should point out immediately that we are not arguing that there are not important uncertainties in the real cfr, what we do claim and check in this methodology is that these uncertainties will not generate any biased against particular countries and should not affect policy decision. we take the cfr in of covid- in europe to be between . - % and we assume % to be the benchmark scenario. this value ( %) is the cfr observed in the initial stages of the south korea pandemic and the diamond princess cruise. in both cases, it was found to be around - . % and, in both, error margins came from different sources [ , ] . in south korea, the ability to test all the population in contact with infected people and the tracking of contagious chains was thorough, despite that, the reported cfr increased from initial values around . - . % to higher values around %. in the diamond princess cruise, cfr for confirmed cases was % but estimation of false negatives and the possibility that a fraction of the passengers never developed symptoms and was never tested put the cfr again around %. both south korea and the diamond princess cruise provide complementary evidence, one coming from a natural experiment and another from a country with the ability to perform half a million tests/day from the very beginning of the transmission chain [ ] . if we accept the two measurements of the cfr independent, the most likely interval of real cfr is between . and %. recent experimental results from random testing in the german city of gangelt [ ] and preliminary results from iceland [ , ] indicate the presence of a layer of people fully asymptomatic that are normally not detected. this group of people have passed the disease without any knowledge seems to be larger than previously thought. these preliminary studies point to a cfr of around . % in zones where the epidemics was not fully spread. we cannot disregard the possibility that, just as cfr inceased with time even in south korea, similar studies in countries with more cases, could have a real higher cfr. it is thus reasonable to consider cfr at % as an easy policy guiding principle and not to use the more positive scenario of . %. unbiased nature of cfr in europe there are three sources of possible biased cfr across countries. the disease affects more strongly elder people with comorbidity problems than healthy younger ones, and more men than women. in all european countries the male/female ratio is unbiased except for older people. this is precisely the group with higher mortality rate. it is thus very important to asses how the different demographic structure of european countries could affect our central benchmark [ ] . the same must be said about the relative prevalence of other comorbidity factors. we proceed to show that, with the data we have today: the demographic and comorbidity structure, none of these possible sources of bias can have anything but a small effect. to do so, we will do a comparison with the cfr of south korea on april , . %. table shows the demographic structure of south korea and the corresponding cfr for each analyzed age group reported on april . the first row shows the demographic structure according to eurostat, but the analysis has be performed by using only the three age groups shown in the second row: ≤ , − and ≥ years. this was done because for many countries reported cases and fatalities consider different age groups, and some countries even report this two figures using different age groups. the three age groups considered in the analysis were the only ones that includes all the analyzed countries. as can be observed in the to analyze what is the role played by the differences in demography in europe in the covid- cases and fatalities we have downloaded from eurostat the demographic distribution by age (see table ). we can readily asses that, when comparing with south korea, all the countries have a larger percentage of population above years ( % larger for italy) and larger median age except sweden and united kingdom, but the relative differences in each of the cohorts in between the european countries shown in the table is small. only italy presents a relevant larger than average ratio of people over . using this demographic data and assuming each european country presents the same cfr by age group as south korea on april , we have computed the cfr for each country. table shows the results of this analysis and the officially reported cfr by the different european countries on the same date. both values are presented relative to the cfr reported by south korea on april , . %. as can be observed in the first column, when demography is the only difference between countries, between the worst and best case of the relative cfr the differences april , / . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . having previously a very bad prognosis. we know this group is strongly affected by the virus [ ] . in blunt terms, we must examine the possibility that different countries are counting the raw number of dead people differently. before entering in the detail of the analysis, let us point out that two indications go against this possibility. first, health care systems in europe can have different april , / . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . resources in different countries with different focus and priorities, but they attend there is a single exception that we know of: belgium [ ] . belgium seems to be reporting unconfirmed cases from nursing homes without tests as due to covid- . it is quite clear that this includes a good number of people who, either, did not die from covid- or that covid- was not an important factor in the prognosis. therefore, we will include a reminder that belgium data is biased compared with other countries, being anywhere from % to % lower given the number of reported deaths from nursing homes compared with hospitals. there is a second argument regarding the treatment of the elder population in other countries. if large undercounting woul be the case, it should be noted in the mortality rate for people years and older, which is not observed in the countries where we have data. in this framework, spain becomes a key country. if spain were not to have an important undercounting is highly implausible to think that other countries would. we proceed to analyze the data of the national epidemiology center (instituto de salud march to april for the whole of spain, they see that, as expected, mortality is much higher than in previous years. an increase of % is observed. however, it is interesting to compare this with the data reported for covid- deaths. the reported deaths by covid- are roughly , depending on how you attribute deaths to a particular day in the calendar. on the other hand, the reported excess of deaths by the momo surveillance system is . we think that the assessment of around % underreporting can be taken indeed as a worst-case scenario for a highly impacted country. it seems reasonable to expect other countries to have underreported way below or slightly below this level. all the data point out right now, that the undercounting due to a different treatment of the very fragile population is highly unlikely across europe, and at most introduces changes in cfr around ± %. having shown that the real cfr should not present bias in european countries larger than %, we address now how to deal with the real sources of bias in the diagnostic rate for each country. to estimate dd we look for a correlation between the number of reported cases (see fig. a ) and the number of reported deaths (see fig. b ) [ , ] . to deal with noise effects we put a weighted moving average filter on the data of both cases and deaths. the correlation time between reported cases and reported deaths will be named as time from diagnosis to death (dtd), and: ttd = dd + dtd. ( ) correlation between reported cumulative cases and reported cumulative deaths exploring different delays between diagnose (reported) and death for germany (red), spain (green) and switzerland (green). (d) maximum correlation is marked with a red square for each country. % correlation interval can be seen with black bars. april , / . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . in fig. c we can see the correlation [ ] between reported cases and reported deaths assuming different dtd for germany, spain and switzerland. as you might expect, correlations have values close to . in most cases the correlation has a concave parabolic shape with a clearly defined maximum. we assume this maximum represents dtd for each country. the correlation interval is estimated as the points where the correlation is greater than % of the observed maximum. we decided to set a lower limit of days and a higher limit of days [ ] because we believe that time outside diagnostic rate by country as discussed in the methods, we use the same cfr = % in all european countries instead of making small corrections for demography. the bias due to demography was shown to be around - %, precisely the same order of magnitude we obtain for the possible bias in the counting of reported mortal cases. given that our aim is to provide a clear method for policymakers and that there is no data on how, or even if, both correlate, a common cfr allows us to homogenize the results with the clear limitation that we will obtain reasonable estimations and not exact results. the resulting picture is expected to be closer to reality than using purely reported data, but worse than correcting properly for age and diagnosis if the data of cfr for all age brackets and locations (nursery homes, hospitals, individual homes) were available, which is not the case. the estimation of the diagnostic rate is straightforward. from the cumulative number of deceased each day, and multiplying by ( % cfr) we get the cumulative number of people with symptoms days ago [ ] [ ] [ ] simply by rescaling and displacing backward in time the cumulated death curve of any country. to give an initial realistic and homogenous diagnostic rate we must establish how many days are needed as a bare minimum to be able to detect a patient from the onset of symptoms. first, the patient has to feel sufficiently sick and then contact the health service. from this contact, the doctor needs to be suspicious that the person has the disease and request a test. then, this test must be available, performed and the result received and annotated. it is clear that a bare minimum of one week is needed for this process. we use the name -days . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . symptoms and then days forward to be detectable/diagnosable. from this curve, we can obtain the rate between the cumulated number of people who had symptoms for or more days and the cumulated number of people detected days ago. it is thus clear that this homogenous analysis across countries could be performed assuming d-dr or d-dr and different cfr. it gives a proper first estimation of the situation. we argue, however, that there is indeed bias in the way people deal with the health care system in normal situations and, especially, under an epidemic. different countries and populations are in fact behaving very differently. we have observed that this is the case in the methods section checking the delay between diagnostic and death using time-displaced correlation analysis. this is the reason why we also define the delay to detection diagnostic rate (dd-dr) as the diagnostic rate computed using a time delay between the appearance of symptoms and detectability different for each country. we proceed to use fig. , with spain as an example, to explain the concept behind dd-dr. for spain, the maximum correlation between cumulated death curves and cumulated reported cases appears when cumulated deaths are displaced days backward. this suggests a dd of around two weeks ( − = days). this makes sense in a situation like the one in spain during march . the population receiving news that the health care system is under stress may decide to delay reporting of symptoms unless they are very serious. additionally, there is the possibility that tests are not available to people who report with symptoms to primary health care centers, and that the delay between the test, its positive result, and its record to official information systems is not negligible as well. it is thus important to correct for this bias in the estimation of the diagnostic rate. it is clearly not the same to have a time delay from symptom to the detection of days than . dd-dr can be computed from spain just like we did before for the d-dr using the same rescaling of the cumulated dead curve as before but using a displacement backward of days instead of days. fig. shows how the dd-dr is obtained in different countries depending on the delay between symptoms and detectability. countries with a lower dd, such as germany, have the same d-dr than dd precisely because they diagnose as early as realistically possible. we notice now that both d-dr and dd-dr can be tracked along time, as the epidemic advances we can check how these diagnostic rates changes. each new day we can look days back for the d-dr and compute the diagnostic rate. dd-dr can be tracked similarly. in fig. we show the evolution for both as a function of time for three selected countries. we observe that the dd-dr reaches a steady state after the initial stages of the disease while d-dr seems more affected by trends. this can be expected since dd-dr uses, precisely, the maximum of the correlation delay so it is expected to fluctuate less. the dd-dr is not only more stable but it also allows as to produce a proper assessment of the errors involved. the main one is the fact that the april , / . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . april as above million in spain and close to half million in germany. the table in fig. shows a list of the d-dr as of the beginning of mid-april of , and the dd-dr, which seems stable, together with the associated error. to interpret estimated cumulative cases and estimated attack rate we must take into account detection delay, because they are computed using the reported data. data updated on april . belgium data is biased due to reporting of unconfirmed death cases [ ] . best estimations might shift - % once the diagnostic rate is known, it is straightforward to establish a real incidence no longer affected by the presence of important differences in the time delays to diagnostic in different countries (see the table in fig. ). the level of diagnosis and the real incidence is indeed useful for policymakers since it gives a clear general picture. however, the policy response needed to improve the diagnostic rate is limited, in the short-term, by the ability to increase the production of pcr kits and other diagnostic tools. policymakers have more ability to affect immediately mobility patterns and social contact. in this sense, a key number for policymakers would be to have a reliable and robust estimation of the number of infected people in each country that can propagate the disease. providing an exact number is, right now, impossible. we can, however, produce an index of the effective potential growth using the dd-dr and the guidelines used by the ecdc to track the epidemic. even if the precise number of people with the disease were known, and the distribution of symptoms by sex and age was reported, there is no clear knowledge regarding the level of infectivity of the different type of person and symptoms. for instance, it is not known the days a person with mild symptoms can transmit the disease. the same can be said for people with serious symptoms. virus loads in the throat seem to be rather high across the board [ ] , but data on how this influence contagion is unclear. the only way to assess the situation is to use a general unbiased broad measure, which is indicative of the potential for infection. the ecdc uses the number of newly infected people in the last days [ ] . we use this same criterion. this number can only be obtained properly some days in the past, on the day we have a typical diagnosis. after that, we would need input from new data to properly compute how many people are diagnosed. so the number i is strictly a measure of the recent past, but good enough to give the proper picture that the system will face the following days. april , / . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . fig . schematics of the procedure to obtain incidence a , recovered and estimated cases using germany as an example. incidence of estimated cases (blue), contagious incidence (red) and total estimated recovered cases (green). blue shaded part is the number of cases used to compute the estimated contagious incidence. to interpret final number of total cumulative cases, recovered cumulative cases and estimated attack rate we must take into account detection delay, because they are computed using the reported data. similar figures for all countries are shown in si fig. we also consider those undetected cases which appear earlier than days as recovered r i . notice that here we use the word recovered lousily. it does not mean literally that all of them are fully recovered since most of them never fell ill to begin with, and some of them could not have neutralized tests yet, but that those infected and undetected for more than two weeks ago do not seem to pose a serious risk. a alone, however, does not give a full picture of the situation. it is not the same to have contagious per inhabitants when the number of contacts is high that when the number of contacts is low. it is important to take into account the level of spreading velocity of the epidemic related to the effective reproductive number (r t ). the effective reproductive number depends on multiple factors, from the properties of the virus itself to the number and type of contacts. those, again, depend on different social behavior and structure such as mobility, density or the typical size of the family unit sharing a house, to name a few. the only feasible way to estimate r t is using fits from seir models. complex seir models which include spatial and contact-processes have a large number of parameters which, due to the present lack of knowledge, are ( ) and epg.ρ ( ) is computed using the mean value for the last three days. epg: effective potential growth described in the text. to interpret table data we must take into account detection delay, because they are computed using the reported data. data updated on april . * belgium data is biased due to reporting of unconfirmed death cases [ ] . best estimations might shift - %. given the partial empiric nature of present r t , we prefer to take a fully empiric surrogate as a quantitative evaluation of the level of infections. we define an alternative reproductive number as the number of new cases detected today divided with the number of new cases detected five days ago as n t /n t− . however, the high fluctuations on this quantities imposes the use of averaged values over three days [ ] : where n t stand for new cases reported at day t. this rate is one if the number of new cases is constant. it will be below if new cases are decreasing and larger than if the april , / . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint number of cases is increasing. we take days as the key delay unit since this is roughly the time since infected people develop symptoms if they do develop them. there are still clear fluctuations on a day-to-day basis of this measure ρ t due to common delay and irregularities in reporting. most fluctuations can be eliminated by taking the average of ρ t during three daysρ ( ) which is normally enough to get a rather smooth measure. it is not uncommon to find still some fluctuations and one-week averages can be done if required. we propose the following day-to-day index epg: epg is just the multiplication of the growth rate of the diseaseρ ( cases is biased by diagnosis protocols and ratios in each country, as well as by the pool of asymptomatic cases. moreover, any attempt to improve diagnosis percentage requires an economic, infrastructural and logistical effort that is not always possible. in addition, this health system structure is a strong conditioning that limits the possible actions to carry out in this direction. the reported number of deaths, if uniformly and properly recorded, provides very relevant information as a first general overview. even in countries where there is a bias on death reporting, the effort that should be made to improve these data collection is much lower than the necessary effort to increase data about cases. the assumption of a common lethality, which has been situated around %, allows for using the cfr as an indicator of real incidence. current information on cfr is still not complete, since many countries do not report distribution of deaths by age or sex, neither provide covid- mortality outside hospitals. however, we argue that the kingdom ( , ), france ( , ) and belgium ( , ). if we estimate the cases that should have been diagnosed by that time, the ranking is lead by italy ( , , ) and followed by france ( , , ), spain ( , , ), united kingdom ( , , ), belgium* [ ] ( , ) and germany ( , ). thus, differences in diagnostic rate are absolutely significant when analyzing global situation in europe. countries like germany, portugal and switzerland would be diagnosing around % of cases, while belgium, france, sweden and united kingdom would be in the level of %. assessing the risk of countries to enter or remain in the epidemic growth phase is essential. in this sense, the epg index is a valuable tool for policy makers. a high epg in the situation where there is a high growth rate of the epidemic and large number of active cases is a clear situation of danger, while a very low epg because both the reported epg vs estimated real epg. different european countries in terms of the epg computed using the reported data on the attack rate vs the epg using our estimation of the real attach rate. the order of the different countries should be done from right to left (for the reported state of the index) and from top to bottom (for the estimated value of the index). we observe how the comparative situation of the different countries changes as of april . * belgium data is biased due to reporting of unconfirmed death cases [ ] . best estimations might shift - %. despiteρ ( ) is quite independent of the diagnostic rate, reported i directly depends on the level of diagnosis. thus, if epg is evaluated with reported data, it can provide a wrong picture of the situation. based on reported epg, the worst situation in europe at april would be for belgium, followed by spain, united kingdom, netherlands and portugal. if risk is evaluated with estimated epg, highest value would still correspond to belgium as well, but followed by sweden, united kingdom, spain, netherlands and italy. portugal is in much better position that its reported data suggest. actually, countries with similar reported epg like portugal, and netherlands have, in fact, totally different estimated epg, being the last country at significantly higher risk than the former . we have shown in the methods section that the basis for obtaining estimated i and a is not biased due to demographic differences and, right now, there is no indication that it is biased due to a different way of accounting for the cumulative dead toll of the epidemic. there is also no indication that comorbidity factors are largely different in different countries or that cfr is higher on some countries given that icu units and hospitals are not available for people that would need it, at least so far. if this were the case, under any scenario where the situation occurs, the epidemic in that country will have such a larger number of cases, attack rate and growth that the epg will be extremely high. the only real limitation is that the social and environmental issues could affect the prognosis of the infected. if living in a small house with other people infected could lead to worse prognosis than staying in a large house alone, a new analysis regarding the unbiased nature of the cfr would need to be done. it is important to indicate that not only i is unbiased, as analysed in previous sections, but thatρ ( ) is not biased as well. even though absolute reported cases is biased, as we have shown, ρ t deals with ratios and its evolution. as long as the diagnosis and recording of the people with disease follows roughly the same criteria along time in each country, ρ t is a good measure of the growth the epidemic. indeed, if evaluated diagnosis percentage is more or less constant in time, we can assume that ρ t correctly reveals tendencies in contagiousness. if a change in criteria in reporting the cases occurs (i. e., a large increase in the number of tests per day leading to an increase of cases due to more testing), ρ t will be temporally affected but will go back to be a good measure once the new criteria is established. in this case, epg will provide a wrong picture for a while as well, until stationary conditions in diagnosing and reporting are achieved again. there is another important point to address in order to guarantee that ρ t is a robust measure. as soon as we are estimating real number of cases, we can determine the associated ρ t . it is expected that both ρ t behave similarly but with a certain delay. this delay can be determined by translating both ρ t in time until error between both is minimized. we show this detailed analysis in the supplement material si file where we evaluate that both the reported ρ t and the inferred ρ t are indeed different but that follow the same type of evolution once the proper delay is accounted for si fig. . the third important outcome of this analysis is the estimation of recovered people. this is an important number to assess the possibility of herd immunity discussed as a possible exit strategy. the idea is that those that recover might have immunity and act as barrier in the transmission of the disease. a recent study from the fudan university at shangai [ ] has analyzed antibody titters of adult covid- recovered patients. the study is based in the detection in plasma of spike-binding antibody using rbd, s , and s proteins of sars-cov- using an elisa technique. it is also the first study that looks after neutralizing antibodies (nabs) specific for sars-cov- using a gold standard to evaluate the efficacy of vaccines against smallpox, polio and influenza viruses. the study highlights the correlation between the nab titters and spike-binding antibodies that were detected in patients from day - after the onset of the disease, remaining afterwards. middle and elderly age patients had higher titters compared with young age patients, in which in cases the titters were under the limit of detection. nab titters had a positive and negative correlation with c-reactive protein (crp) levels and lymphocyte counts, respectively. this indicates that the severity of the disease, in terms of inflammatory response (crp levels), usually worse in middle and elderly age, favors the increase of antibody titters. equally, the negative correlation with lymphocyte counts suggests an association between cellular and humoral response. therefore, it is possible that the immunity reached by young people, which were mostly asymptomatic, is residual. in that case, this sub-population would keep being carriers of covid- . serological studies that many countries are designing and carrying out should provide further information on post-infection immunity. even if the entire recovered population acquires middle-term immunity, current incidence situates european countries far from herd immunity. nevertheless, it is feasible that regions with highest affectation were closer to use herd immunity as a strategy for de-confinement. governments might wish to explore the possibility of local deconfinement. there are two possible limitations of this present study. it could be possible, in theory, that some countries present an intrinsically different cfr if they are able to isolate completely and significantly its elder population more than others. the epidemics real cfr is a measure of the case fatalities if all the population, or a representative sample of it, has become infected. if one country would effectively prevent all infections among all its elder population from contagious forever, it will certainly have a different cfr. right now, it is impossible to assess if this is indeed the case in different countries given the lack of reported cases and mortality rates by age and sex. we should notice however that, if this disaggregation were to be provided, we could proceed with exactly the same methodology but instead of using the whole country as a whole we would divide it into different age brackets and treat them separately. the second limitation is related to the first one but coming from a more structural perspective. a clear possibility is that countries under stress could be failing in providing the same medical support changing the cfr. we must notice that health care in european countries, even under stress, has been able to increase dramatically its number of health personnel, of beds and hospitalization in short notice [ , ] . italy and spain present some regions under stress but not the whole country [ ] . finally, one cannot disregard the possibility that complex mechanisms of mutations and repetitive exposure to the virus may change the prognosis depends on the type of residence and, hence on socio-economic factors, which are clearly different across countries. if any proof that a close environment not only increases the level of infections, which they obviously do, but also changes the disease evolution in the patient, one should again test that the uniform/unbiased cfr hypothesis holds with the proper knowledge at hand. to obtain dtd for each country and the corresponding evolution of the diagnostic rate. we also provide fore each country the evolution of recovered and the attack rate in the last days a . we also provide the demonstration thatρ ( ) is also unbiased showing the correlations between real and estimated growth rates. fig. series of figures showing the evolution of the estimated cases for different european countries. in blue, incidence of estimated cumulative cases. in green, estimated incidence of cumulative recovered cases. in red, estimated incidence of attack rate lasts days (a ). day is considered the first day where cumulative cases was over cases, it is different for each country. data extended till april . . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . growth rate and, in blue, reported cases growth rate. (b) the gorwth rate of estimated cases is displaced to find better match with the growth rate of reported cases. (c) error between estimated and reported growth rates using differents delays. minimum delay is marked and is the one used in (b). european centre for disease prevention and control. download today's data on the geographic distribution of covid- cases worldwide situación actual covid- ministerio della salute republica italiana. covid- , i casi in italia coronavirus testing: how are the hardest-hit countries responding? suppression of covid- outbreak in the municipality of vo transmission potential and severity of covid- in south korea correcting under-reported covid- case numbers: estimating the true scale of the pandemic using a delay-adjusted case fatality ratio to estimate under-reporting. centre for mathematical modelling of infectious diseases. london school for hygiene and tropical medicine analysis and prediction of covid- for eu-efta-uk and other countries. dpt. of physics. universitat politècnica de catalunya estimating clinical severity of covid- from the transmission dynamics in wuhan, china real estimates of mortality following covid- infection case-fatality estimates for covid- calculated by using a lag time for fatality clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study. the lancet estimating the asymptomatic proportion of coronavirus disease (covid- ) cases on board the diamond princess cruise ship estimating the infection and case fatality ratio for coronavirus disease (covid- ) using age-adjusted data from the outbreak on the diamond princess cruise ship central disease control headquarters. coronavirus disease- , republic of korea vorläufiges ergebnis und schlussfolgerungen der covid- case-cluster-study large scale testing of general population in iceland underway spread of sars-cov- in the icelandic population demographic science aids in understanding the spread and fatality rates of covid- neutralizing antibody responses to sars-cov- in a covid- recovered patient cohort and their implications chain safety and environment belgium government. the covid- figures: collection, verification and publication vigilancia de los excesos de mortalidad por todas las causas. momo world health organization. coronavirus disease (covid- ) situation reports correlation (in statistics virological assessment of hospitalized patients with covid- european centre for disease prevention and control. coronavirus disease (covid- ) in the eu/eea and the uk -eighthupdate el ministerio de sanidad amplía las medidas para el refuerzo de personal sanitario hospital fira salut ja està a disposició del sistema sanitari per si calen llits addicionals incidencia de la covid- en las camas uci en españa key: cord- - uxwojzo authors: the gibraltar covid- research group health systems,; goyal, d. title: oxygen and mortality in covid- pneumonia: a comparative analysis of supplemental oxygen policies and health outcomes across countries. date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: uxwojzo introduction hypoxia is the main cause of morbidity and mortality in covid- . during the covid- pandemic some countries have reduced access to supplemental oxygen (e.g. oxygen rationing), whereas other nations have maintained and even improved access to supplemental oxygen. we examined whether such variation in the access to supplemental oxygen had any bearing on mortality in covid- . methods three independent investigators searched for, identified and extracted the nationally recommended target oxygen levels for the commencement of oxygen in covid- pneumonia from the worst affected countries. mortality estimates were calculated from three independent sources. we then applied linear regression analysis to examine for potential association between national targets for the commencement of oxygen and case fatality rates. results of the nations included, had employed conservative oxygen strategies to manage covid- pneumonia. of them, belgium, france, usa, canada, china, germany, mexico, spain, sweden and the uk guidelines advised commencing oxygen when oxygen saturations (spo ) fell to % or less. target spo ranged from % to % in the other nations. linear regression analysis demonstrated a strong inverse correlation between the national target for the commencement of oxygen and national case fatality rates (spearmans rho = - . , p < . ). conclusion our study highlights the disparity in oxygen provision for covid- patients between the nations analysed, and indicates such disparity in access to supplemental oxygen may represent a modifiable factor associated with mortality during the pandemic. sars-cov causes covid- (coronavirus disease ). as of may , the total reported cases of covid- was over million, with , deaths over five months [ ] . more than half these deaths have occurred in the last month. whilst there has been a slight reduction in the rate of growth for new infections globally, this is most likely due to strict infection control policies (e.g. case-isolation, social distancing and 'lockdown') [ ] . with the seroprevalence of sars-cov being reported as between < % to % [ ] , it is most likely the majority of infections are yet to come, and the rate of infections will once again increase as infection control measures are balanced with economic pressures. the true covid- mortality rate is difficult to ascertain during the outbreak. background infections, asymptomatic infections, testing criteria, reporting of fatalities and the time-lag between new cases and outcome are all potential confounders [ ] . this makes measuring the effects of national interventions difficult. it is though, reasonable to expect a nation's covid- mortality rate will depend on access to healthcare, and likely will also depend on the type of healthcare offered. the need for effective healthcare can be reasonably inferred from the marked disparity between mortality rates during a surge of cases versus mortality post-surge [ ] . oxygen is a cornerstone of treatment for patients with covid- pneumonia. indeed, the major mechanism for injury and death in covid- relates to hypoxia [ ] . despite the critical nature of oxygen therapy in covid- pneumonia there remains marked variation between national guidelines for when to offer supplemental oxygen. many nations seem to have implemented conservative oxygen strategies during the pandemic, effectively limiting the access of patients to supplemental oxygen. others seem to have actively increased their capacity to offer supplemental oxygen for patients with covid- pneumonia. here, we examine the national guidelines from nations in an effort to understand the potential impact the varying thresholds for commencing supplemental oxygen have on covid- outcomes. we followed the advice for global reporting on health estimates as per the gather statement [ ] . all countries with more than , cases as of / / , were assessed. three investigators independently identified the specific national recommendations for the target oxygen saturations (spo ) to commence oxygen in patients with covid- . two investigators were blinded as to the reason for the study. each nation's ministry of health, national guideline bodies, respiratory medicine bodies and national health service were searched for relevant covid- clinical guidelines. the european society of respiratory medicine was a useful resource with direct links to a number of covid- specific clinical guidelines from across the world. literature databases were also used as a means of identifying links to national guidelines. if guidelines were not available in one of the languages spoken by the investigators, on-line translation services were utilised, specifically for guidelines on 'supplemental oxygen' or 'oxygen therapy' -the entire guideline was not translated. note, only guidelines applicable to the majority of the population were extracted, and guidelines for patients with underlying conditions such as chronic obstructive airways disease were not recorded. if guidelines were unclear, instruction was to disregard the country from further analysis. where there were more than one recommendation the investigator was to make a determination as to the most likely guideline to be followed (figure ). where there was divergence between the three investigators, the consensus value was used. results were tabulated and compared. case fatality rate (cfr) is the percentage ratio of deaths to total cases. it is a crude figure privy to a number of potential confounders. for most nations it is likely to be numerically incorrect [ ] . cfr is though, likely to maintain a relationship to actual infection mortality rate (ifr) [ , ] , and as such was used in this study. cfr was calculated and cross-referenced from three different sources -the who, john hopkins university and worldometer. there was no significant difference between the calculated cfr across the three sources. it was not possible to include patients or public in the present study. linear regression was performed to identify a potential trend between cfr and target spo , and presented using scatter-plots. due to the sample size (n= ), it was not clear whether the assumptions of normality and linearity were met, so the statistical significance of the possible relationship between cfr and target spo was established using the nonparametric spearman's-rho test. we have also explored the effect of potentially confounding variables using the non-parametric spearman's-rho test rather than using a parametric mancova test to adjust for confounding, given the small sample size and uncertainty regarding linearity and normality. in total there were countries with total case numbers over , on th may . of those, countries had accessible clinical guidelines referring to target oxygen levels for the commencement of supplemental oxygen in covid- . uae (united arab emirates) was excluded from further analysis as the national guidelines advised (at page ) admitting all patients with covid- to hospital, and commencing oxygen when 'needed' [ ] -the country's low cfr is noted. the netherlands and belarus were also excluded due to all three investigators failing to find clear national guidelines regarding oxygen targets. of the remaining countries there was concordance between all three investigators identifying the same national target oxygen levels in countries. of the remaining countries (uk, pakistan and qatar), determination of national target spo in covid- was made by consensus. for links to national guidelines please see supplementary file. of the nations analysed, six recommended commencing oxygen if spo fell to below % (singapore, peru, switzerland, ireland, qatar and pakistan), five made recommendation for below % (saudi arabia, chile, brazil, india and russia), five for of . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july , . . below % (portugal, iran, turkey, bangladesh and italy), six for below % (canada, belgium, france, uk, usa and china) and four for below % (germany, mexico, spain and sweden). cfr ranged from · % (qatar) to · % (belgium). there was a strong inverse correlation between recommended target oxygen saturations and national case fatality rates (rs = - · , p < · ). a scatter graph with linear best-fit line is at figure . national guidelines for target saturations were relatively clear for most countries. together with the high rate of consensus amongst investigators it seems unlikely that investigator bias was a significant factor. the main confounders are more likely to stem from the many variables associated with cfr. we found no correlation between cfr and cases/million inhabitants ( figure b ), or tests/ thousand inhabitants ( figure c ), or overall positivity rate ( figure c ), suggesting testing strategy between the countries examined did not have a significant visible relationship with our mortality measure, cfr (table ) . we could not examine the potential impact of national-level reporting bias on the cfr from the data available. . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july , . . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july , . . is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july , . . https://doi.org/ . / . . . doi: medrxiv preprint national guidelines for when to commence supplemental oxygen in patients with covid- varied significantly between the countries examined. combined, the target spo for the commencement of oxygen and target spo for ongoing treatment varied from % to %. countries that implemented conservative oxygen strategies in response to the pandemic -effectively limiting access to supplemental oxygen -had statistically significant higher case fatality rates. there are a number of potential reasons for the variation in oxygen policies between countries and the association with mortality. it has been established that a delay in identifying and correcting hypoxia in pneumonia leads to increased disease severity, increased rate of mechanical ventilation and increased mortality [ , ] . whilst there are no controlled studies in covid- specifically examining duration spent hypoxic and subsequent disease burden and mortality, sun et al., has reported a reduction in the need for mechanical ventilation where hypoxia was detected and corrected early in patients with covid- [ ] . in relation to conservative oxygen strategies generally, there are four key mortality studies. the iota meta-analysis published in examined conservative versus liberal oxygen strategies across a range of studies. whilst none of the studies analysed in the iota metaanalysis related to pneumonia, and the majority of studies examined oxygen as a treatment not as a means to correct hypoxia, the authors suggest optimal target spo for all acute medical patients might be - % [ ] . since the iota study, there have been three clinical studies, two of which were randomised controlled trials (rct), examining the mortality effect of conservative oxygen. the icu-rox trial suggests there may be no mortality effect at the higher target levels of spo ( - % versus - %) in mechanically ventilated patients from any cause (n= ) [ ] . another, retrospective analysis, published in march examined over , intensive care patients and found the optimum spo target of - %. the authors note that patients who were in the optimal range for only % of the time had nearly twice the mortality of those who spent % of the time within the optimal target [ ] . the most recent rct, and the most well-controlled study of true conservative oxygen strategies to date (and the most relevant to , examined patients with acute respiratory distress syndrome (ards). patients were randomised to either a conservative arm (actual spo of - %) versus a liberal arm (spo of - %), and then followed up for days. the study was halted early due to excessive deaths in the conservative oxygen group, with a % increase in intensive care deaths and a % increase in day mortality [ ] . based on our current understanding of the affects of hypoxia on inflammation [ ] and coagulation [ ] , there is good scientific basis for the increased mortality associated with sub-optimal oxygen strategies and/or a delay in correcting hypoxia. there are direct effects of hypoxia leading to increased mortality, such as cardiac arrhythmias and ischaemic related pathologies (as identified in the aforementioned ards study [ ] ). it is also quite plausible, indeed quite likely, given that hypoxia is pro-inflammatory, the delay in correcting hypoxia leads to more severe disease. this of course raises the possibility that rationing, or a conservative oxygen approach, or a failure to provide access to supplemental oxygen in covid- pneumonia, actually increases healthcare burden and resource consumption. whilst sometimes arguably unavoidable, the decision to limit access to supplemental oxygen should be undertaken mindful of the likely mortality impact, and of the possibility of perpetuating a healthcare crisis. implementing oxygen early is likely to prevent disease progression, as suggested by sun et al. [ ] , and as is consistent with established practice relating to pneumonia generally [ , ] . for some nations there was a resource limitation issue, or at least a fear of resource limitation, with secondary implementation of conservative oxygen strategies. for example, the uk directive to ration oxygen supply in april reduced the normal national target for the commencement of oxygen from spo of % to a new value of %. the reason for rationing was related to the surge of infections and subsequent concern over the supply of oxygen [ ] . if such practises are common in other nations, the relationship between national guidelines' spo and national cfr identified here may be a representation of the demands on healthcare during a surge of covid- cases. there are a number of reasons why mortality increases during a surge of infections. patients are less likely to attend hospital or seek medical care, either for fear of contracting covid- or over-burdening their health service [ ] . triage systems during a surge can be set with high thresholds for onward referrals [ ] . another mortality factor is a potential lack of resources both staff and consumables. the overall delay to treatment that ensues prevents early correction of hypoxia, implementation of vte (venous thromboembolism) prophylaxis, readjustment of medications (e.g. nephrotoxics) and the detection of secondary bacterial infection, and thus a likely increased mortality [ ] so then, the association between target spo and cfr identified here may be more related to target spo being an indicator of an overwhelmed healthcare service. if this is the case, a lower than usual target spo may still contribute to the higher mortality experienced in nations that suffered an overwhelming surge. in a uk cohort, during the initial surge of infections, and during oxygen rationing, approximately % of patients presented hypoxic, and of those that died % ( / ) of patients presented hypoxic (defined broadly in this study as spo < %) [ ] . in a cohort from mexico, % of the patients who died presented with spo under % [ ] . part of the increased mortality seen during a surge of infections may be related to the secondary conservative oxygen policies and delay in correction of hypoxia. the issuing of national guidelines recommending lower target oxygen saturations than would be typical for viral pneumonias [ ] , may relate more to the overall approach of a national response to covid- , and as such, it is this 'national approach' that relates to mortality rate. all three investigators noted the quite different approaches between nations, as set out in their national guidelines. some followed a 'stay home' approach, whereas others defaulted to clinical assessment of patients either with covid- or with any risk factor associated with it. for example, singapore guidelines default to clinical assessment[ ], whereas a country with a similar prevalence burden, the uk, has much higher thresholds for referral onward for assessment[ ] ( table ) . in this situation, where the national guideline target spo is part of an overall strategy of avoiding admissions, then whilst it does remain likely conservative oxygen approaches do contribute to higher mortality, there may also be contribution of other policies. in the uk versus singapore example, a target spo of less than % is likely to be harmful, but equally, failing to account for age of the patient or duration of fever may also be harmful. as such, the relationship identified here between cfr and target spo may be more a relationship between cfr and national strategy. target spo may be more of an indicator of national policy. this study highlights the variation in national guidelines for when to commence supplemental oxygen in patients with covid- . in of itself, this raises important questions as to the optimal response to covid- . attempting to delineate the interventions and strategies that are potentially beneficial between nations is difficult without using a mortality estimation, which carries inherent confounders. cfr depends on many factors, not least of which is the accurate reporting of covid- related deaths. whilst we found no correlation between cfr and rates of testing or crude case burden, we could not account for disparities in reporting of deaths. we were aware of this limitation prior to the study, but agreed with the view that while cfr is unlikely to be numerically correct, the difference between countries will remain [ , ] , and as such the association is likely to be accurate. we undertook an analysis of the national guidelines using three independent investigators. the consensus amongst the investigators supports the accuracy of the target spo . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted july , . . extracted. the possibility remains that localities within a country, or individual doctors and nurses, chose not to follow their national guidelines. even if such local differences were significant, the national guidelines permits not implementing oxygen therapy until the target oxygen level is reached, therefore triage systems, nurses and physicians can avoid admissions, and thus limit access to supplemental oxygen. despite the prospect of local variations in following the guidelines, the presence of the guidelines will shape and likely reflect practice nationally. there is clear disparity between national guidelines for target oxygen saturations (spo ) in covid- across the countries analysed here, and such disparity is associated with national case fatality rates (cfr). whilst there are multiple confounders to the cfr, the overall relationship between increasing cfr with a decreasing target spo is likely real. the cause for the relationship may be a true causative effect of hypoxia on mortality. it may also be an indirect effect of delayed or reduced access to supplemental oxygen stemming from an overwhelmed or under-resourced health service, or, a similar policyrelated reduction in access to supplemental oxygen associated with the differing overall national approaches to covid- -stay home versus clinical assessment. all three possibilities highlighted here implicate delayed initiation of oxygen in the excess mortality associated with covid- . further research is needed to explore the population risk associated with delayed correction of hypoxia in covid- . additionally, it would be useful to undertake a healthrisk analysis, from a resource allocation perspective, on the benefits of increasing access to supplemental oxygen for patients with covid- versus other interventions. as it stands currently, our results support the position that managing covid- pneumonia should not differ from the management of other pneumonias, in so much as, access to supplemental oxygen is necessary to prevent excessive mortality. the authors declare no conflicts of interests contribution dg, hd, ak, jn, sb and fb contributed to the conception and/or design of the study and contributed to the manuscript. dg, hd and fb conducted the analysis of the national guidelines. statistical analysis was undertaken primarily by jn. dg wrote the majority of the manuscript. sb and fb undertook final edits. all contributors reviewed the final manuscript before submission. of . cc-by . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted july , . . have deaths from covid- in europe plateaued due to herd immunity? the infection fatality rate of covid- inferred from seroprevalence data -novel coronavirus ( -ncov): estimating the case fatality rate -a word of caution potential association between covid- mortality and health-care resource availability clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study guidelines for accurate and transparent health estimates reporting: the gather statement united arab emirates ministry of health and prevention. national guidelines for clinical management and treatment of covid- initial diagnosis and management of adult communityacquired pneumonia: a -day prospective study in shanghai effects of delayed oxygenation assessment on time to antibiotic delivery and mortality in patients with severe community-acquired pneumonia lower mortality of covid- by early recognition and intervention: experience from jiangsu province british thoracic society emergency oxygen guideline group; bts emergency oxygen guideline development group. bts guideline for oxygen use in adults in healthcare and emergency settings mortality and morbidity in acutely ill adults treated with liberal versus conservative oxygen therapy (iota): a systematic review and meta-analysis the search for optimal oxygen saturation targets in critically ill patients: observational data from large icu databases liberal or conservative oxygen therapy for acute respiratory distress syndrome hypoxia-induced inflammation in the lung: a potential therapeutic target in acute lung injury? impact of respiratory symptoms and oxygen saturation on the risk of incident venous thromboembolism-the tromsø study clinical guide for the optimal use of oxygen therapy during the coronavirus pandemic clinically significant fear and anxiety of covid- : a psychometric examination of the coronavirus anxiety scale calculated decisions: covid- calculators during extreme resource-limited situations early intervention likely improves mortality in covid- infection covid- management in a uk nhs foundation trust with a high consequence infectious diseases centre: a detailed descriptive analysis medrxiv . . the low-harm score for predicting mortality in patients diagnosed with covid- : a multicentre validation study diseases-and-conditions/documents/ treatment% guidelines% for% covid- % % % apr% % % -final.pdf . nice (uk) guideline [ng ] covid- rapid guideline: managing suspected or confirmed pneumonia in adults in the community published date turkey % turkey ministry of health covid- (sars-cov enfeksİyonu) rehberİ aipo managing the respiratory care of patients with covid- clinical guide for the optimal use of oxygen therapy during the coronavirus pandemic belgium % institute of tropical medicine antwerp interim clinical guidance for adults with suspected or confirmed covid- dijon, france procedure for the pulmonary management of non-icu patients hospitalized in the context of the covid- pandemic canada % government of canada clinical management of patients with moderate to severe covid- -interim guidance peru lineamientos de manejo hospitalario del paciente con covid- society of medical intensive care and emergency medicine german recommendations for critically ill patients with covid- iran % health and treatment deputy of the ministry of health and medical education ( ) control division directorate general of health services ministry of health & family welfare government of the people's republic of bangladesh national guidelines on clinical management of coronavirus disease (covid- ) version oxygen and mortality in covid- pneumonia: a comparative analysis of supplemental oxygen policies and health outcomes across countries. goyal link to national guidelines all viewed between th may and th june key: cord- -d cf ms authors: gupta, sourendu title: epidemic parameters for covid- in several regions of india date: - - journal: nan doi: nan sha: doc_id: cord_uid: d cf ms bayesian analysis of publicly available time series of cases and fatalities in different geographical regions of india during april is reported. it is found that the initial apparent rapid growthin infections could be partly due to confounding factors such as initial rapid ramp-up of disease surveillance. a brief discussion is given of the fallacies which arise if this possibility is neglected. the growth after april is consistent with a time independent but region dependent exponential. from this, r is extracted using both known cases and fatalities. the two estimates are seen to agree in many cases; for these cfr is reported. it is seen that cfr and r increase together. some public health implications of this observation are discussed, including a target doubling interval if medical facilities are to remain adequate. sars-cov- is a virus which has newly entered the global human population [ ] . as this host-parasite system evolves towards an equilibrium, its epidemiology has been studied extensively, but with some conflicting results [ , ] . the true extent of its penetration into the population is as yet open to question [ ] , since testing is fairly restricted in most countries. nor is the progression of the disease, covid- , or its method of spread completely clear [ ] . since the virus is already so widely established, it seems unlikely that it will be totally eliminated soon. so it is important to extract basic epidemiological parameters as cleanly as possible. india has managed to geographically contain the spread of the covid- epidemic with the nation-wide lock-down which started on march, . at the end of april the proportion of identified cases in india as a whole was a few tens per million, with - orders of magnitude more in hot spots. even if this were wrong by an order of magnitude, it would still mean that the epidemic remains at an early stage in india. this, combined with the lock-down, presents an opportunity to examine the growth of the epidemic in multiple isolated regions which implement essentially the same policy with regard to testing. this study examines the heterogeneity in the growth rate of the disease, in several ways. first, the doubling intervals, τ , of the cumulative number of identified cases, c(t), and the cumulative number of fatalities, d(t), is examined. from τ it is possible to extract the basic reproduction rate, r , within epidemic models. marked heterogeneity are observed. after this the correlation between the case fatality ratio, cfr, and r is studied. epidemic data, especially at the beginning is never clean. the public health system has to gear up for disease surveillance. the continuing recurrence of cholera epidemics [ ] , the spread of dengue and chikungunya [ ] , the successful surveillance and elimination of nipah [ ] and zika [ ] , show that india has a mixed record on epidemic surveillance. in addition to a possible lag between the beginning of the epidemic and its surveillance, there could be a problem of incomplete surveillance during the time the health service ramps up. any examination of data has to allow for the identification of confounding factors such as these. for the covid- surveillance data, there are further cautionary remarks. the icmr guidelines for testing [ ] specify that only symptomatic cases should be tested using rrt-pcr. this part of the policy has been unchanged since the middle of march. depending on the fraction of cases which are symptomatic, this could miss the actual prevalence of the disease in the population. estimates of the fraction of asymptomatic infections range as high as % [ ] , implying that, in this extreme case, the testing policy can never reveal more than % of the cases. the social stigma attached to covid- [ ] also means that some fraction of infections may be cryptic. there are uncertainties in the statistics of fatalities also. it has been reported that in europe and the us the number of fatalities due to covid- may have been underestimated by a factor of - . indian cities have fairly complete registries of deaths, so miscounting of covid- fatalities could come mainly from mistaken or incomplete reports of the cause of death. for larger regions, say districts and whole states, where most deaths happen at home and death certificates are not common [ ] the errors in counting fatalities may be significantly larger, and hard to estimate at this time. one point about the quality test that is developed here is that absolute numbers are not as important for it as the check that fatalities and identified cases are independently tracing the same rate of growth of the epidemic. this is expected at the beginning of the epidemic, when all epidemic models become linear, and the growth of generic measures is driven by the maximum eigenvalue of the linearized models. however, in the extraction of the cfr, the absolute counts do matter. in spite of the uncertainties, the correlation of cfr and r holds important lessons for public health in the inevitable later stages of this epidemic in india, and the middle and low income countries of the world. data has been extracted from official sources where possible. for ahmedabad city, the data is made publicly available by the municipal council of the city [ ] . this well-organized site corrects data retrospectively for up to about days. for chennai city, the data has been collected from daily tweets by the municipal council [ ] . for indore city, the data was collected from daily bulletins of the chief medical and health officer and the collection is available for public use [ ] . for mumbai city the data has been collated from the daily tweets by mcgm [ ] into a publicly available site [ ] . for pune district the data was collated from the daily tweets by the district authority [ ] and the collection is publicly available [ ] . for delhi and all other states, the data was taken from the publicly available collection at covid india [ ] . this site corrects data retrospectively for over a week. only data on the cumulative number of identified cases, c(t), and the cumulative number of known fatalities, d(t), are used in this analysis. for this work data collection stopped on may , , and retrospective corrections made after this are not included. the unquantifiable parts of the errors in the counts of cases and fatalities due to covid- were discussed in the previous section, along with the reasons why their estimates need not be included in this analysis. however, there is another part of the errors in the daily counts of cases and fatalities which come from backlogs of tests or hospital records. these shuffle a fraction of the numbers from one day to another, and therefore cause errors in the daily counts. as long as the number of facilities keep pace with the growth of the epidemic, these errors remain proportional to the number of cases and fatalities. since the wait time for hospital beds for covid- cases has remained roughly constant during the period of study, this argument is expected to hold. in view of this, of % of the reported values of c(t) and d(t) are assigned as errors. the specific fraction, %, was chosen to in order to cover the long range fluctuations visible in the time series (for example in those visible on days and of figure ). it has been seen that official reports and independent estimates of these number are generally within this range. at such an early stage in the infection, it is reasonable to assume exponential growth, i.e., doubling every τ days. within this assumption one can check how well the lock-down is working by letting the doubling interval become time dependent. the simplest function to try is a linear change in τ , i.e., and a similar set of three parameters for c(t). note that τ has dimensions of time, whereas τ is dimensionless. a fitting form with constant doubling interval was also used; this is denoted τ , dropping the subscript. the fitting procedure follows the methods of [ ] , with gamma distributions used as prior probability distribution functions (pdfs) for τ and c . the additional parameter τ is allowed to take positive and negative values, by letting the prior pdf to be a gaussian. for all these distributions, the widths are taken large enough that the posterior distribution is insensitive to the choice of priors. the appendix contains details of the relation between a time varying doubling interval and time variation of the basic reproductive rate r . this requires choosing a model of the epidemic. using the seir model, and the median interval between the appearance of symptoms and the time of fatalities, t = . days [ ] , one has when a constant τ is used, one can set τ = in the above formulae and write r and τ instead of r and τ . exactly the same procedure is followed for fits to the time series for d(t). estimates of the median values of the parameters, along with interquartile ranges (iqr) and % credible intervals (cri) are quoted for the doubling intervals as well as r . the analysis of the time series for c(t) and d(t) lead quite naturally to the case fatality ratio, cfr. this is defined as the ratio cfr = d(t)/c(t). ( if c(t) is underestimated, then cfr is overestimated, and conversely, when d(t) is underestimated, then cfr is also underestimated. this was regulated using a bayesian estimator. since the outcome is binomial, the prior pdf used is a beta distribution with α = and β = . these choices make the posterior distribution insensitive to doubling or halving the values of the priors. the posterior distribution is of the same form with α = + d(t) and β = + c(t) − d(t), with t taken to be the final day of the analysis. since c(t) and d(t) are both large, the following approximations for the median, µ, and standard deviation, σ, may be used: iii. results the time series of c(t) and d(t) is shown for the example of delhi in figure . of the regions that we analysed, most cities show an initial rapid growth followed by a tempered growth. the exceptions are ahmedabad and chennai among cities, and the states of gujarat, kerala, and west bengal. note that day one is taken to be march , , which is days after the beginning of the national lock-down. since t = . days, it might be expected that the growth rate of cases in the pre-lock-down period could manifest itself in that of fatalities until around day . in case of a successful lock-down, d(t) could then show an initial exponential growth, tempered after day . the initial data for fatalities in delhi, indore, mumbai and pune can indeed be described by an exponential. however, the doubling interval in pune turns out to be half of that in mumbai, although the average population density of mumbai is about times larger than the average in pune city. the ansatz of eq. ( ), i.e., a linearly varying doubling interval, was also examined for urban regions. the results are collected in table i . in most locations the initial doubling interval seems to be between half and day and two days. when converted to r , one obtains extremely high values, far in excess of what has been quoted in the literature. certainly r could vary from place to place, since it depends on infectivity of the virus as well as the social networks in each location, and the latter may change from one place to another. however, τ for pune is one third that of mumbai, when mumbai has six times the average population density. the wrong dependence of the doubling time for fatalities on population density, together with the observation that c(t) shows a growth till the same date, supports the idea that there could be a more parsimonious explanation for this common period of growth. this is discussed in the next section. at the moment, any statistical evidence for a gradual slowing of the growth rate of the epidemic is hidden due to some confounding factors. in view of this, the analysis was continued with a constant doubling interval, τ , applying it to the period after day or . for this part of the analysis data was from three states, namely gujarat, kerala, and west bengal, was also used. from figure , one sees that this simpler model provides as good a description of the data as the model of eq. ( ). furthermore, this yields more realistic values of r implies that during the lock-down each of these places has seen a location dependent constant doubling interval. the values of τ , along with inferred values of r , are collected in table ii . these are the primary results of this analysis. it was noted that the number of known cases, c(t), is definitely missing cases among those who have not been tested. this could include a possibly large, fraction of asymptomatic and non-critical or pre-symptomatic cases [ , ] . however, india's disease surveillance mechanism has concentrated on identifying critical cases and contact tracing, which could be a good tracer of the growth of epidemics. if this reasoning is correct, then, during the early growth of the epidemic, one should be able to obtain reliable doubling intervals from the cumulative counts of test positives [ ] . the results of this analysis are also given in table ii . the two independent estimates of r agree well enough that a closer look reveals interesting patterns. the scatter plot in figure of r obtained in two different ways shows several interesting patterns. first, there seem to be two groups of outbreaks. most regions have r below . among the regions that we studied, ahmedabad and gujarat were a separate group, which saw a faster epidemic growth, with r above . finally there is kerala, a different outlier, whose doubling intervals are longer than t , and therefore with very low values of r . the case of kerala merits a separate remark. the cumulative number of fatalities reached at the end of the period of study. with such low counts of fatalities the assumption of exponential growth cannot be well tested. the counts of total infections was larger, and supported the hypothesis of exponential growth over the period studied. second, one sees that most estimates lie close to the diagonal line. if the data was perfect, and the epidemic grew steadily, the estimates would lie exactly on this line. with this requirement we can separate the regions into two groups. one consisting of ahmedabad, chennai, delhi, gujarat, and west bengal are, within statistical uncertainties, on this line. the second group, with indore, kerala, mumbai, and pune, are not. this could indicate some issues with the data. on the other hand, if the data is as good as the other regions, then the fact that they are off the diagonal line should be understood. kerala, which is the only region which lies below the diagonal, is perhaps seeing a lower growth in new cases than fatalities, which could be indicative of a gradual slowing down of the epidemic. due to the lag by t , fatalities would see the slowing down later. conversely, the regions which lie above the diagonal (namely indore, mumbai, and pune, and, possibly, chennai) could be seeing an increased growth in infections, not yet visible in fatalities because of the same time lag. whether these scenarios are true, or the data quality is not dependable, should be known to the health agencies now, and would become visible to the public later. when there is a statistically significant difference between the doubling interval determined by c(t) and d(t), then the ratio gives a time-dependent cfr. this is usually understood to be a transient phenomenon. in view of this, the analysis was restricted to ahmedabad, chennai, delhi, gujarat, kerala, and west bengal, i.e., the regions which lie on the diagonal line of figure , and therefore are seeing a steady growth of identified cases as well as fatalities. the case fatality ratios for these regions are plotted against the r inferred from d(t) in figure . the most obvious trend is that for the group of three cities there is an overall trend towards smaller cfr with decreasing r . this is also true of the two states. however, the cfr for states is displaced upwards from that for cities. both trends have strong implications for the public health outlook and will be discussed further in the next section. counts of known cases and fatalities of covid- from five cities (ahmedabad, chennai, delhi, indore, and mumbai), one district (pune), and three states (gujarat, kerala and west bengal) was investigated in this work. in two of the groups, there was one case each where the epidemic was not severe at the end of april (chennai among cities, and kerala among states). the others were known hot spots. kerala is special because the number of fatalities is too low for statistical tests to be meaningful. there are strong regional heterogeneity in the course of the epidemic, indicating the necessity of looking at its spread at extremely local scales in order to check and control it. the time series both of known cases and fatalities in four out of the six urban centers showed a rapid rise for about days after the lock-down. this was followed by a much slower growth. since fatalities track cases with a delay of . days on the average, the early part of this data could track the growth in the time before the lock-down. however, it turns out that the data grows faster in less dense urban areas. moreover, this hypothesis is not tenable for the growth in the number of known cases. a possibility which resolves these difficulties is that this rapid rise of numbers in the early days tracks the rapid improvement of disease surveillance rather than the epidemic. the fact that the positive cases in kerala does not show such a rapid initial growth is consistent with reports that the state activated disease surveillance after the first infections came from abroad [ ] . this could also be true of ahmedabad and gujarat, two other centers which show no such initial increase, since the state had passed through the surveillance challenge of zika virus in recent years [ ] . due to this confounding factor, it is not possible to use the data until april or to make any statistically valid measurement of the growth of the epidemic before the lock-down. neglecting this leads to multiple fallacies, which i remark on next. the apparent slowing down of the growth in later stages may be falsely interpreted as a transition to polynomial growth. as shown in eq. (a ) and eq. (a ), this is equivalent to a time dependent doubling interval. it has been discussed in the previous subsection that this leads to highly unlikely properties of the covid- epidemic. the same apparent slowing down of the growth rate in india has also been interpreted within the homogeneous sir model with constant, time invariant, parameters [ ] . in such a simple model the time dependence can only come from early evolution towards herd immunity. this gives rise to the unlikely conclusion that herd immunity will be reached for covid- while % of the population remains susceptible. a misrepresentation of data also arises when "instantaneous doubling intervals" or similar measures of exponential growth are constructed using c(t) for one day, or averaged over small windows of time [ ] . this shows a spurious gradual slowing of growth during the first three weeks of the lock-down. in later weeks these estimates are also plagued by spurious effects which result when delayed reports are dumped into cumulative numbers on one day instead of being assigned to correct past dates. these appear as evidence of local spurts or slumps in growth. evidence of retroactive corrections from [ ] shows that delays of as much as ten days may occur. when these artifacts are averaged over a moving window, this gives the mistaken appearance of peaks and troughs, and may put erroneous pressure to change policies. due to the reasons discussed in the previous subsections, the period after april or constitutes the base data for the main part of this analysis. as shown in figure a constant growth rate in each locality during the the lockdown models the data as well as a growth rate which changes linearly with time. this is also the most parsimonious hypothesis about the growth of the epidemic. the observed doubling interval, and the derived quantity r , fall into three groups (see table ii and figure ). several geographical regions have r less than . kerala has r ≃ . (a doubling interval larger than t , the interval from the emergence of symptoms to death). gujarat and ahmedabad have r higher than . since this is the growth rate during the lock-down, population density effects are unlikely to be the major determinant of r . it would be worthwhile to consider the role of individuals with extremely large number of contacts in this context, or a significant tail of the distribution with small number of contacts, but still above three. five regions pass the following data quality test-the value of r obtained from the growth of fatalities and cases are equal. this does not mean that the number of cases is correctly counted. rather it indicates that the effort to find the cases requiring critical care, and tracing their contacts has successfully resulted in tracking a constant fraction of all infected persons. it may miss, for example, a large fraction of asymptomatic cases. for the five geographical regions which pass the quality test described in the previous section, a further study was performed. the dependence of the case fatality ratio, cfr (i.e., the ratio of the observed number of fatalities and cases) on r was investigated. although the number of cases identified may be much smaller than the actual number of cases, the chance that cases are identified in these five regions are expected to be similar, since the rate of testing is about the same. a positive correlation between r and cfr is observed. one possible reason for this is that with lower r the number of critical cases grows slower, giving medical practitioners time to figure out good practices which prevent critical care patients from progressing to fatality. deeper studies of this factor, comparing case data from different regions, is called for in future. it is possible that this is one of the most positive, and least discussed, outcomes of the lock-down. another possibility may also be conjectured. careful maintenance of social distancing, necessary to reduce r , results in evolutionary pressure on the virus. lock-down and similar methods force the virus to evolve in a direction which maximizes its ability to reproduce, which it can do if the disease becomes less critical or asymptomatic, and the chances of fatality decrease. it would be interesting to compare different regions across the globe for changes in the serial time and cfr. at the observed rate of growth, and with the current rate of testing, more than . % of the population in hot spots will begin to test positive for infections in about a month. a constant rate of growth of infections means that the number of hospital beds will also grow at the same rate, for as long as the epidemic is growing. even if the rate is slowed down heavily, as it is already in delhi, mumbai, and chennai, the demand for hospital facilities will keep on growing, as long as the epidemic grows. this demand is already beginning to outstrip resources in the larger cities. the mean interval between the start of symptoms and discharge was estimated to be . days [ ] . this means that unless the doubling interval is kept above days (= . / ln ), the demand on hospitals will keep rising. of the places we studied, only kerala has begun to approach this break-even point. cfr is currently small, partly because medical facilities have been able to cope with the rate of growth. if the number of cases exceeds the capacity of the medical system, cases which might have recovered will be harder to treat. inevitably in such cases cfr will climb. it is useful to note that in figure the statewide figures for cfr are higher than those for cities. this is a reflection of the relative paucity of medical services outside cities, and is a pointer to what might happen when the number of infections rises beyond the sustainable capacity of hospitals. i thank rahul banerjee, prahlad harsha, d. indumathi, and r. shankar for sharing collated data on various cities. i thank jayasree subramanian for providing me with the reference [ ] . in this form of the equation time is measured in units of the case resolution time. this equation assumes that the fraction of susceptible persons is close to unity, and the fraction of persons in any other compartment is very small. as argued before, this is a reasonable assumption to make. the cumulative number of infections is then found by integration. there is no closed form result for the general case. if only the linear term in the expansion of r is retained, then the function erfi is defined through the integral it is possible to use an expansion for t ≪ t , which gives the form i(t) = i( ) λ e λ t + ǫ( − λ t + λ t ) + o(ǫ ) (a ) where the notation λ = r − , and ǫ = r /(λ t ) are introduced. the imaginary part vanishes exponentially. this is easy to match to the phenomenological form i(t) = i( ) t/(τ +τ t/t ′ ) = i( ) t/τ − ln τ where an artificial expansion parameter t ′ is introduced. it is set to unity after expansion. matching these two expansions is accurate only when λ t is large. then the phenomenological parametrization of eq. (a ) can be connected to the parameters of (non-autonomous) evolution equations for the epidemic. note that t and t ′ are both regularization scales, in the sense of a renormalization group, whose numerical value need not be specified. in order to change units of time to days, it is necessary to choose a model of the epidemic. if one uses the seir model, then the unit of time would be the median interval between the appearance of symptoms and the time of fatality or recovery, whichever is earlier. this quantity, t = . days [ ] . if one instead uses the sir model, then it is appropriate to choose the unit of time to be the median interval between the beginning of the infection and the earlier of the time of fatality or recovery. this is t + t , where t is the median pre-symptomatic period, t = . days. here the conversion is made within the seir scheme. this gives when a constant τ is used, one can set τ = in the above formulae and write r and τ instead of r and τ . evolutionary origins of the sars-cov- sarbecovirus lineage responsible for the covid- pandemic changes in contact patterns shape the dynamics of the covid- outbreak in china effect of non-pharmaceutical interventions to contain covid- in china first antibody surveys draw fire for quality, bias coronavirus disease (covid- ): a literature review identification of burden hotspots and risk factors for cholera in india: an observational study emergencies preparedness, response who, zika virus infection -india, emergencies preparedness, response strategy for covid testing in india global covid- case fatality rates ministry of health and family welfare, government of india, addressing social stigma associated with covid- undated advisory nationwide mortality studies to quantify causes of death: relevant lessons form india's million death study amdavad municipal corporation, covid- website greater chennai corporation, official twitter page collection of press releases by praveen jadiya, chief medical and health officer municipal commission of greater mumbai, health department, official handle district information office, pune, official twitter account covid- india api estimates of the severity of coronavirus disease : a model based analysis inferring epidemic parameters for covid- from fatality counts in mumbai estimating the asymptomatic proportion of coronavirus disease (covid- ) cases on board the diamond princess cruise ship estimation of the asymptomatic ratio of novel coronavirus infections (covid- ) coronavirus: surveillance is the key, kerala shows the way singapore univ. of tech. and design, data driven innovation lab, predictive monitoring of covid- coronavirus (covid- ) cases our world in data in this appendix the unit of time will be taken to be the inverse of the mean rate of fatality of the infected. in these units, r is the average number of new infections caused by an infected person. r depends on the infectivity of the virus, as well as an average degree of the contact network. as a result, it may be affect by public health policies, such as a lock down. say a policy measure has a time scale is t . due to this, r may become time-dependent, and one may write a taylor series expansionone may introduce this into a typical epidemic model equation, to obtain di dt = (r − )i, which gives log i(t) i( ) = t r − + key: cord- -omruua n authors: hick, john l.; thorne, craig d. title: personal protective equipment date: - - journal: disaster medicine doi: . /b - - - - . - sha: doc_id: cord_uid: omruua n nan c h a p t e r personal protective equipment john l. hick and craig d. thorne personal protective equipment (ppe) recently has become a rather common acronym in the lexicon of healthcare providers, even though it has been common in the fire services, emergency medical services (ems), and military for quite some time. essentially, ppe helps ensure that individuals are safe from physical hazards that they may encounter in their work environment. ppe may be used to protect workers from general environmental threats (e.g., temperature extremes, noise), specific work-related threats (e.g., falling objects, falls from heights), or threats faced in an emergency situation (e.g., hazardous chemical and infectious agents). no equipment is appropriate for all individuals and threats, but it must be selected and properly used according to the setting of use and the level of risk. the critical problem with most ppe, particularly in regard to chemically protective suits and respirators, is that with higher levels of protection come not only higher prices and required training levels, but also a higher physiological and physical burden to the user. thus, a structured approach to assessment of risk and selection of proper equipment is important to achieve a reasonable level of protection in relation to the hazard. this chapter reviews the concepts of ppe, recent lessons learned in regard to ppe, types of respirators, key regulations, and issues in the selection of ppe for emergency medical care and decontamination operations. until recently, ppe for medical providers received little attention short of the "standard precautions" of gloves, with the addition of simple masks and barrier precautions, when needed. the severe acute respiratory syndrome (sars) pandemic, the tokyo subway sarin attack, the murrah federal building bombing in oklahoma city, and the terrorist attacks of september are some examples of situations in which the lack of proper ppe resulted in adverse health effects for healthcare providers and thus focused attention on ppe as a critical issue in disaster response. in march , a crude form of the nerve agent sarin was released in the tokyo subway system on separate cars bound for a common downtown station. this attack resulted in deaths and more than persons presenting to the hospital for medical evaluation. none of the casualties was decontaminated before treatment or transport. retrospectively, prehospital and hospital personnel reported symptoms consistent with nerve agent exposure. fortunately, none required emergency treatment. , eleven physicians caring for the sickest victims (including one in cardiac arrest and one in respiratory arrest) were most affected, and six of them required antidotal therapy. fortunately, all recovered fully and did not have to cease their patient care efforts due to symptoms. approximately % of victims self-referred to hospitals, which is consistent with u.s. experiences indicating that few victims of chemical contamination events undergo decontamination before arrival at a medical facility. , this has caused most jurisdictions to reconsider historical plans that contaminated patients would not be in contact with medical care personnel until they were "clean." ems and hospital personnel need to be prepared for contaminated patients presenting directly to them and to recognize that in certain situations, ppe may be required to safely provide care. sars posed unique risks and challenges to healthcare workers. this novel viral agent with incompletely defined transmission characteristics was controlled in with aggressive quarantine measures and use of ppe. in the first wave of sars in toronto, . % of all cases were acquired in a healthcare setting. aggressive use of ppe, including n masks, barrier precautions, and gloves, was generally effective at preventing spread, although during one difficult and prolonged intubation attempt, at least six providers contracted sars from a patient despite complying with ppe recommendations. this case led to recommendations that higher levels of ppe may be required during procedures that are likely to generate aerosols or provoke coughing, such as intubation, airway suctioning, positive pressure ventilation, and nebulization treatments. the national institute for occupational safety and health (niosh) and the rand corporation produced a comprehensive "lessons learned" report summarizing issues from the terrorist bombings at the world trade center (wtc), anthrax incidents, and the oklahoma city murrah federal building bombing. the report, titled "protecting emergency responders: lessons learned from terrorist attacks" describes in detail many of the challenges responders faced (box - ). it is clear from the wtc events that a large number of jurisdictions responding, conflicting messages regarding use of ppe and safety of the environment, and lack of a plan to implement respiratory precautions can complicate a response and potentially place providers at risk. wtc responders continue to suffer respiratory symptoms attributable to exposures at "ground zero." selection of appropriate ppe begins with an analysis of the hazards that responders may encounter and an assessment of responders' roles and responsibilities. hazard vulnerability analyses (hva) are required for community emergency planning grants and are required of healthcare facilities that are accredited by the joint commission on accreditation of healthcare organizations (jcaho). the hva uses a numerical ranking of factors for specific threats (e.g., chemical release), including the risk of the event occurring, the current preparedness for the threat, and the risk to life. the numerical score determines the gravity of each threat to the community. each community's hva will reflect the unique risks that must be considered by its emergency responders. choice of ppe may be affected by factors within the hva such as: • population density of the community and surrounding area • high-or moderate-risk terrorist targets in the community (e.g., government buildings, centers of commerce, or another symbolic site) • chemical hazards posed by community industry (e.g., use of cyanide and hydrofluoric acid in the electronics industry) • risk of transportation incidents and major transportation routes, particularly highways and railroads • proximity of healthcare facilities, schools, or other key locations to these potential targets and industrial and transportation hazards • frequency of hazardous materials (hazmat) incidents in the community • resources available to respond to hazmat incidents (e.g., rapid access to on-site decontamination may decrease, but not eliminate, contaminated persons leaving the scene) stakeholders in emergency response, including ems and healthcare facilities and fire and rescue, emergency management, and law enforcement agencies, must clearly define the responsibilities of each entity and the support and resources that each may need or offer during an emergency, particularly one involving a hazmat release. ems roles in a hazmat event vary depending on jurisdictional planning. fire services personnel may or may • physical hazards including fires, burning jet fuel and explosions, rubble piles with sharp rebar and heated metal, falling debris (which resulted in the death of a nurse in oklahoma city), hazardous materials, electrical hazards, structures prone to collapse, heat stress, exhaustion, and respiratory irritants • heat-related seizures while wearing chemically protective suits • eye injuries (usually related to particulate exposure), which accounted for % of all wtc disaster response worker injuries • potential for secondary hazards, including explosive devices and chemical, biological, and radioactive agents • ppe shortcomings: • heavy helmets hindered performance • self-contained breathing apparatus (scba) was heavy and cumbersome • scba face pieces fogged (reducing visibility), and the equipment hindered verbal and radio communication • scba air bottle made it difficult to enter small spaces, and the limited air supply (up to hour) necessitated leaving the operation to exchange the air bottle • air tanks and/or filters were not interchangeable between teams, and teams worked under different standards • powered air-purifying respirator (papr) filters became clogged and were uncomfortable for long duration use. many workers instead opted to use dust masks (which offered little protection and caused nose-bridge chafing) or to wear the masks/hoods around their necks ("neck protectors") • use of respirators made it difficult for workers to communicate with each other, often resulting in users breaking the face seal to talk • turnout gear (the common protective garments used by firefighters) increased heat stress and physical fatigue • at the wtc, the rubble pile was so hot in places that it melted the soles of workers' boots; providing wash stations to cool the boots resulted in wet feet and serious blisters for many workers; some wtc disaster response workers sought treatment for blisters • steel-reinforced boots (soles and toes) protected against punctures by sharp objects but conducted and retained heat, which contributed to blisters and burns • structural firefighting gloves worked well until they got wet and hardened, reducing their dexterity • wtc disaster response workers did not consistently protect their hands against potential hazards such as human remains and bodily fluids • safety glasses were readily available but often were open at the sides and did not offer adequate protection against airborne particles • goggles were uncomfortable, hindered peripheral vision, tended to fog, and did not fit well in conjunction with half-face respirators • many disaster response workers at the wtc (especially law enforcement officers) did not consistently use hearing protection, even around heavy machinery, because they needed to hear their radios and voices and listen for tapping when they were searching for survivors • most volunteers at the wtc, pentagon, and oklahoma city did not receive pre-event training on ppe and hazardous materials • although firefighters generally received detailed pre-event training, this was less true for law enforcement officers • accurate "real-time" hazard information was not readily available, especially during the anthrax incidents • protection from falls was available at some sites (in the form of ropes and harnesses) but was inconsistently used not be able to provide treatment in a "warm zone" (i.e., the area of reduced contamination outside of the immediate release zone) depending on their training. non-fire based ems personnel may require ppe to triage and treat victims in the warm zone. in the event of a mass chemical exposure, victims will likely self-refer to visible ambulances, call from sites removed from the site of release, or make their way to hospitals, by-passing organized ems and fire services. this movement of contamination on the bodies of patients essentially causes a "migrating" warm zone, causing contamination of previously clean ("cold") areas. this migrating contamination may require protective equipment for ems responders, and appropriate plans and equipment should be in place. the roles and responsibilities of the responders, as well as the equipment required, need to be defined and drilled in advance of an incident. hospitals, until very recently, usually relied on fire services for patient decontamination at the hospital. these resources, however, are often deployed to the scene of the event and are thus unavailable to support the hospital. most hospitals have now recognized the need for at least some internal capacity for patient decontamination and are equipping their teams with ppe appropriate for decontaminating self-referred contaminated patients. a few hospital teams integrate with community hazmat teams, necessitating additional training and equipment as the mission then changes from a defensive decontamination response to an offensive response at the scene of release. hazmat releases seldom cause serious injury, but the potential exists for both scene responders and hospital receivers to suffer serious consequences of exposure. the agency for toxic substance and disease registry (atsdr) maintains a multistate voluntary accounting of hazardous substance releases, excluding petroleumrelated incidents. the hazardous substances emergency events surveillance (hsees) database currently involves states. from to , , events were recorded: ( . %) of the incidents caused injuries, and % of victims were transported to a healthcare facility. in another analysis of hsees data, only % of victims required admission to a healthcare facility. the vast majority had self-limited respiratory symptoms. in , the chemicals with highest potential for injury were chlorine (injury occurred in . % of releases), ammonia ( . %), acids ( . %), and pesticides ( %). hsees data from to show responder injuries in incidents out of a total of , incidents ( . %). law enforcement officers and firefighters accounted for the vast majority of responder injuries, which usually consisted of nausea and respiratory irritation. hospital admission occurred in . % of cases. no deaths were reported in this -year period. hospital personnel were injured in . % of the total hazmat events and represented . % of the victims. six events involved emergency department staff contact with contaminated patients, and five events were hazmat releases at the healthcare facility itself. no provider required hospital admission, and no chemical ppe was used. other reports of emergency department evacuation and/or provider illness due to off-gassing from contaminated patients have been summarized. [ ] [ ] [ ] [ ] [ ] [ ] the most serious of these incidents involve patients with suicidal ingestions of organophosphate pesticides. [ ] [ ] [ ] exposures to these patients caused at least one provider to require intubation and receive aggressive antidotal therapy due to contact with pesticide in emesis and vapors during patient resuscitation. patients who have ingested organophosphate may off-gas for days and present an ongoing risk to healthcare workers. niosh has documented healthcare worker injuries from pesticide agents between and . in conjunction with the information from the tokyo subway sarin attack and the chemical terrorism risk posed by these agents, it is clear that these pesticides present a substantial risk of toxicity from secondary exposures. limited research is available to document the degree of the off-gassing that occurs from the bodies and clothing of contaminated patients. , clothing removal and control may be expected to remove % of the contaminant and thus should be a priority. , ideally, this should take place in an open-air environment. providers may not initially recognize a chemical release when they arrive at a scene. even though structural firefighting ensembles with self-contained breathing apparatus (scba) offer some chemical protection that may be sufficient for victim rescue, the incident commander must determine what actions are appropriate for the situation. protective suits, gloves, and boots and appropriate respiratory protection must be donned as soon as possible when a chemical threat is recognized. the occupational safety and health administration (osha) and environmental protection agency define four basic levels of ppe for hazmat scene responses ( cfr . , appendix b). generally, as the level of protection increases (a being the highest level), so do the weight, cost, and physiological burden. increasing protection also generally means decreasing mobility, dexterity, and scope of vision. inherent risks to ppe include trip and fall hazards; a reduced ability to complete tasks; heat stress , - ; anxiety ; and seizures, which, although rare, have been reported. cardiovascular demand is dramatically increased as ensemble weight and heat retention increase. ppe must be selected on the basis that it does not impose unnecessary risks on the provider while at the same time offering an appropriate margin of safety against the chemical hazard. because the selection of ppe usually revolves around the selection of the respiratory component, various types of respirators must be reviewed. each respirator has an assigned protection factor that reflects the degree of protection afforded to the user. simply put, /protection factor equals the amount of exposure for the wearer. for example, a provider wearing a powered air-purifying respirator (papr) with an assigned protection factor (apf) of is exposed to / the level of contaminant as compared with wearing no protection. atmosphere-supplying respirators provide breathable fresh air to the user independent of the environment via an air supply hose and/or tank and thus offer a high level of respiratory protection. this type of respirator is required for entry into environments where the identity of and/or the potential quantity of a hazardous substance are unknown or where the quantity of oxygen in the air is unknown. scba is the most common atmosphere-supplying respirator for emergency responses. it provides air via a tank, usually worn on the back. the operational time is limited by the capacity of the tank (usually less than hour). fire services personnel routinely use this form of respiratory protection, and fire-based ems services personnel generally incorporate this ppe into their chemical protection planning. limitations include the equipment's weight (approximately to pounds), cost, need for fit-testing, duration of air supply, and need to refill air bottles. even though scba provides excellent protection, its limitations make it inappropriate for many situations (e.g., caring for a patient with an infectious disease, providing hospital-based decontamination, or securing a perimeter in the warm zone). scba has an apf of about , , the highest of any type of respirator. supplied-air respirators (sars) provide air via a hose line from a nearby clean air source (e.g., compressor or hospital supply line). to meet osha requirements for level b, respirators must have a tight-fitting face piece and an emergency supply of air in case of line failure or problems. loose-fitting hoods with a supplied air source do not meet level b standards but are used by some decontamination teams when an additional level of protection is desired due to institutional preference or local hazard profile. advantages include a potentially unlimited supply of fresh air and longer duration of use. limitations are primarily mobility and thus flexibility of response. these respirators are best suited to healthcare provider use in a decontamination room or a welldefined area in which the air lines are unlikely to be tangled, stretched, or a tripped hazard. the apf of a typical tight-fitting face piece sar is , although there may be variability among models and types (e.g., tight-fitting mask versus loose-fitting hood). air-purifying respirators (aprs) have cartridges that filter the air in the user's environment to remove particulate matter and specific chemicals that the filter is designed to capture. these filters do not affect the oxygen concentration of the ambient air and thus cannot be used in potentially oxygen-deficient environments. only those chemicals for which the filter is designated are removed. also, the capacity of the filter can be exceeded by large amounts of contaminant, thus these respirators are designed for situations in which the concentration of the agent is either established to be or assumed to be below the threshold for the canister. nonpowered aprs use the wearer's work of breathing to pull ambient air through the filter. examples include dust masks and military and civilian "gas masks."the apf of a nonpowered full face piece apr is when appropriate quantitative fit-testing is performed. of note, this type of mask is used by the military for battlefield protection against lethal levels of nerve and other chemical agents. advantages include low cost and long duration of use. disadvantages include increased work of breathing and physiological stress, mask fogging, and the need for fit-testing. a papr uses a motor to pull air through the filter canisters, thus decreasing the work of breathing and the risk of air entrainment around the respirator face piece.paprs are often supplied with a loose-fitting disposable or reusable hood that eliminates the need to perform fit-testing and allows use by a broad range of individuals. hooded paprs with "stacked" canisters that offer protection against com-personal protective equipment mon hazardous chemical and biological agents encountered by first responders and hospital personnel are in widespread use due to their relatively low cost,weight,and the increased flexibility of response allowed. dependence on battery power,shelf life of the filters,and the need to be able to match the filter to the agent are limiting factors. the currently proposed apf for a papr is . directions for use must be carefully followed; one particular model provides a protection factor of , when properly donned, but when the inner hood is not tucked in, the protection level declines to and less , (personal communication, ) . battery packs are usually either single-use or rechargeable. rechargeable battery packs require ongoing attention to ensure a proper charge, but they offer the flexibility of allowing papr re-use during an infectious disease event. particulate filter masks such as those commonly used for patient care to protect against tuberculosis and other organisms are also considered aprs. masks are classified n (not oil resistant), r (oil resistant), and p (oil proof). n refers to a filter (the entire mask) that removes % of a particulate challenge in the -to -μm range. n respirators filter % of the same challenge, yet simple half-face respirators offer an apf of only due to the entrainment of air around the mask and other factors; therefore, changing from an n to an n offers little additional protection unless a more robust mask ensemble, rather than a simple half-face mask, is used. , respiratory protection technologies are rapidly evolving, and respiratory program administrators should make sure they are familiar with the available options and their relative advantages/disadvantages. regional cooperative planning and purchases may be helpful to allow for sharing of resources during an incident. chemically protective suits must be tailored to the type of use. suits for hot zone entry where direct contact with a hazardous material is likely must be much more robust than suits for patient decontamination activities. selection should be guided by national fire protection association (nfpa) standards and for site-ofrelease response activities and by recent osha guidance for hospital decontamination activities. , chemicals commonly found in local transit, agriculture, or industrial use should also guide selection. appropriate ppe for perimeter control and ems warm zone operations remain topics of debate at this time. generally, suits should be sized far more generously than standard work clothing to prevent tearing during squatting and other activities (e.g., an average -kg man should plan to wear a size xxl suit). many suit configurations are possible, and the optimal configuration will depend on the mission and other equipment in the ensemble. for example, suits without "feet" are preferred when worn with boots (to allow taping over the boot) but those with integrated bootie "feet"are preferred when pull-on "sock"type butyl booties are to be used. these integrated feet should not be used as primary footwear at any time because they have poor abrasion resistance. boots supplied in sizes medium, large, and extra large rather than fitted sizes may be preferred when equipment is purchased for a group (e.g., hospital decontamination team) rather than being purchased for an individual responder (e.g., firefighter). butyl or other rubber boots probably afford appropriate protection for warm zone operations. butyl "sock" type booties may be used on very low abrasion surfaces (e.g., internal hospital decontamination room) but are not generally appropriate for outside use. nitrile undergloves with butyl overgloves provide protection against a broad range of hazards for warm zone activities. silver shield gloves are more expensive but may be better suited for particular compounds when the agent is known. overglove selection should balance the need for abrasion resistance with dexterity required to perform tasks (e.g., to administer intramuscular antidotes). the u.s. army center for health promotion and preventive medicine (usachppm) recommends -mm thickness butyl gloves (standard examination gloves are mm) as a minimum for working with patients contaminated by chemical warfare agents or toxic industrial chemicals. very few situations require physical decontamination of patients exposed to biological agents. an exception would be patients who present after contamination with biological agents (e.g., anthrax spores) from a dissemination device. ppe for decontamination should consist of the same chemical protective suit and high level of respiratory protection, including a high-efficiency particulate (hepa) or sar, that would be used for chemical decontamination activities. ppe for biological agents in relation to care of patients who are already infected and symptomatic is discussed in the following. categories of ppe for biological agents include : . standard precautions: use of gloves and proper hand hygiene to prevent disease transmission for any potentially infectious patient. gowns and eye protection are added only when patient care activities are likely to result in splashing or soiling. . contact precautions: standard precautions plus use of barriers during all patient care activities to protect face, arms, and front torso to prevent contact with secretions, emesis, feces, etc. (e.g., enteric infections, many hemorrhagic fever viruses). . droplet precautions: standard precautions with the addition of a droplet respirator (e.g., surgical mask) when working within feet of the patient to prevent transmission of infectious agents that travel by large droplet spread (e.g., cirborne precautions are used against plague); may not be protective against all droplet nuclei. . airborne precautions: standard precautions with an n or higher protection respirator to prevent transmission of infectious agents that are spread by aerosols (e.g., airborne precautions are used against chickenpox, smallpox, and tuberculosis). . "special pathogen precautions": based on the sars experiences, a high-risk pathogen with respiratory spread probably requires greater levels of protection than previously recommended. constant use of both contact and airborne precautions has generally been advised with the optional use of a papr rather than an n mask during "high-risk" interventions likely to generate aerosols or provoke coughing (e.g., suctioning, intubation, positive pressure ventilation). , these precautions are the subject of current discussion. patient care providers should have routine access to nonsterile examination gloves, barrier gowns that protect the arms and front torso, standard surgical (droplet) masks, and a face shield that provides adequate splash protection (which may be integrated with the mask, a separate face shield, or goggles) according to the osha bloodborne pathogens standard. providers should have ready access to higher levels of protection when needed."bad bug bags"may be assembled with appropriate gowns, gloves, face shields/goggles, n or papr respirators, and other supplies so that healthcare providers do not have to assemble the recommended components. instruction sheets for donning/doffing and disinfection procedures can be included in the bag. practitioners fitted for n respirators may use these for patient care, and others should have access to a papr until they are fitted for an n respirator. plans to rapidly fit-test additional employees during an event that might require prolonged use of airborne precautions (e.g., sars) should be in place. all ppe must be part of an ongoing program of respiratory protection and hazmat/decontamination response within the agency or institution to ensure that employees who are expected to use these protections are competent and comfortable with the indications, use, and limitations of their equipment. numerous regulations apply to the selection and proper use of ppe. all persons using ppe must conform to osha standards on respiratory protection ( cfr . ) ,ppe ( cfr . ),eye and face protection ( cfr . ) , hand protection ( cfr . ), hazard communication ( cfr ( cfr . , and bloodborne pathogens ( cfr ( cfr . ). state osha agencies may have stricter requirements than the federal standards. most occupational or employee health services of agencies/facilities where ppe is used are very familiar with these standards and their application to employees. the nfpa has numerous standards for the training and equipping of responders (including ems personnel) to a hazmat incident (e.g., nfpa standards , , , , , and ) . specific guidance is also provided for urban search and rescue teams (nfpa standard ) . responders to hazmat releases are covered by osha's hazwoper (hazardous waste operations and emergency response) standard cfr . , which is perhaps the most comprehensive standard guiding hazardous materials responses. osha requires use of a minimum of level b equipment (i.e., an atmosphere-supplying respirator and chemically protective suit with sealed seams) during a response into a contaminated environment until the concentration of the agent is shown via air monitoring to be below the threshold required for the safe use of an apr or other lesser degree of protection. this requirement presents difficulty for ems and hospital providers because the agent is often unknown at the time that medical care is provided in the warm zone (i.e., an area where the level of contamination is minimal and controlled). particularly for hospitals, confusion existed as to what constituted appropriate protection for decontamination team members who provide medical care for contaminated patients and to what degree the hazwoper standard applied to community responders geographically separate from the site of release. osha clarified this issue for healthcare facility providers in two letters of interpretation , and a comprehensive guidance document on ppe and training released in . in this document, osha codifies use of paprs as the minimum level of respiratory protective equipment for hospitals under certain conditions: • the facility acts as a "first receiver" for self-referred contaminated casualties, not as a responder to a release zone. • the facility itself is not the site of the hazardous substances release. • an hva has been conducted to identify specific hazards to the community and facility. • the victims must present at least minutes after exposure (to allow time for some of the contaminant to evaporate or dissipate). it will usually take at least this long to get personnel into ppe at the facility. • the victims' clothing must be rapidly removed and contained. • decontamination must occur in a well-ventilated area, preferably outdoors. when these conditions are met, and absent any particular threats within the community that require higher levels of protection (such as close proximity to a specific chemical production, storage, or disposal site), the minimum level of respiratory ppe is a papr with a protection factor of or greater, which filters organic vapor, acid gas, particulate matter, and biological agents (at the hepa level). hazwoper also defines training requirements for responders. the application of these regulations to hospital decontamination teams was also clarified in recent osha guidance. awareness training is required for individuals involved in a hazmat response who will not be using ppe or taking actions beyond recognizing and reporting an incident (emergency department staff, law enforcement officers). at a minimum, all responders who will use chemical ppe must be trained to the operations level ( hours or to competency) so that each responder can: • understand his or her role in the response and the emergency response plan. • identify the presence of a hazardous substance through signs and symptoms of exposure. • assess site safety, including risks to self. • select and safely use appropriate ppe. • understand decontamination procedures. hazmat awareness educational competencies must also be met by providers trained to the operations level. the awareness competencies may be included in the hours of operations training or conducted separately. in addition, any personnel using respiratory protective equipment must be in compliance with osha's respiratory protection standard ( cfr . ). key features of this standard are: • respirator selection procedures. • proper use of respirators in routine and reasonably foreseeable emergency situations. • medical clearance before use (at minimum, a screening questionnaire; see appendix c of the standard). • fit-testing before use and annually thereafter (see appendix a and b of the standard). • inspecting, cleaning/disinfecting, storing, repairing, and maintaining the equipment. • training and education on topics such as the types of respiratory hazards they might be exposed to, proper use (including donning and doffing), limitations, and maintenance. most medical facilities and response agencies have a respiratory protection program in place. this existing foundation and the subject matter experts in occupational safety and health, infection control, or other related disciplines can assist with implementation of new technologies and protocols. ppe technology continues to change rapidly. hopefully, technologies that are lighter weight, less expensive, and less heat-retaining can be developed. technology change is occurring far more rapidly than the current approvals process and new standards that have arisen in the wake of the events of . clear guidance on appropriate technologies for warm zone activities is lacking at this time. this can lead to confusion and difficult choices for agencies and facilities, knowing that their ppe selection may be either too much or too little to satisfy future standards. currently, there is no recommendation or consensus on the level of ppe that is required for hospital-based personnel, much to the consternation of hospital preparedness leaders. some have proposed a ppe level "h" to meet this need. more research is clearly needed regarding safe but comfortable ppe, methods of decontamination, modeling of airborne concentrations of specific agents, and ppe selection. further, detection technologies are needed that can provide better environmental screening for a wide range of hazardous substances and quantitative assessment of agent concentration. currently, incident commanders may remain confused about appropriate ppe, and this may result in ppe selection that is overly conservative (which risks provider noncompliance and adverse effects from the ppe) or overly liberal (which risks provider injury from the contaminant). finally, providers need to be educated about the consequences of not using ppe appropriately, including acute chemical effects and delayed pulmonary effects. in general, communities and regions can help to reduce issues of ppe interoperability by planning, purchasing, and training together whenever possible. this also allows for caches of materials to be deployed that are true replacements for usual materials and thus will be better accepted and require minimal training. for too long, jurisdictions have been reluctant to share their problems, issues, and roadblocks in the area of ppe, lest the agency be seen as having problems protecting its responders. better dialogue and sharing of best practices and lessons learned are of immense value to better hazmat response planning and should be encouraged. the recent niosh/rand report and release of select after-action reports are welcome changes in this history. defining hazards in this age of potential chemical terrorism is fraught with peril because we are unable to truly assess the scope of the threat. thus, ppe must be chosen that will protect appropriately against a broad range of threats without being so restrictive that in the heat of the moment, the provider decides to forgo the ppe and is at risk of becoming a casualty of the event. balancing cost, ease of use, and scope of protection concerns are delicate decisions with few answers at this time, particularly for those who may have long-duration job tasks in a warm zone environment. we can only hope that we are not forced to learn too many more harsh lessons about ppe use in the future. in the meantime, however, we should strive to prepare our communities by selecting appropriate protective technologies in relation to perceived threats and practicing our responses so that our personnel are comfortable using their ppe and understand the consequences of not doing so. the tokyo subway sarin attack: disaster management, part : community emergency response the tokyo subway sarin attack: disaster management, part : hospital response secondary exposure of medical staff to sarin vapor in the emergency room secondary contamination of emergency department personnel from hazardous materials events public health measures to control the spread of the severe acute respiratory syndrome during the outbreak in toronto cluster of severe acute respiratory syndrome cases among protected healthcare workers-toronto, canada public health guidance for community-level preparedness and response to severe acute respiratory syndrome (sars) version : supplement i: infection control in the home, healthcare, and community settings protecting emergency responders: lessons learned from terrorist attacks physical health status of world trade center rescue and recovery workers and volunteers joint commission accreditation manual for healthcare facilities ec . and . (rev) hazardous substances emergency events surveillance risk factors for adverse health events following hazardous materials incidents hazardous substances emergency events surveillance (hsees) annual report : victims nosocomial poisoning associated with emergency department treatment of organophosphate toxicity-georgia malathion overdose: when one patient creates a departmental hazard prolonged toxicity of organophosphate poisoning hospital response to a chemical incident: report on casualties of an ethyldichlorosilane spill dangerous bodies: a case of fatal aluminum phosphide poisoning personal protective equipment for healthcare facility decontamination personnel: regulations, risks, and recommendations simulated exposure of hospital emergency personnel to solvent vapors and respirable dust during decontamination of chemically exposed patients joint publication of the u.s. army soldier and biological chemical command, environmental and occupational health sciences institute, and veterans health administration (vha) weapons of mass destruction events with contaminated casualties: effective planning for healthcare facilities guidelines for incident commander's use of firefighter protective ensemble with self-contained breathing apparatus for rescue operations during a terrorist chemical agent incident impact of the chemical protective ensemble on the performance of basic medical tasks the effect of full protective gear on intubation performed by hospital medical personnel emergence of real casualties during simulated chemical warfare training under high heat conditions the effect of chemical protective clothing and equipment on army soldier performance: a critical review of the literature biopsychosocial responses of medical unit personnel wearing chemical defense ensemble in a simulated chemical warfare environment occupational health and safety administration. hazardous waste operations and emergency response. code of federal regulations domestic preparedness: sarin vapor challenge and corn oil protection factor (pf) testing of m be powered air-purifying respirator with ap cartridge technical data bulletin # :test criteria for the m cartridge fr against various military and industrial chemical agents aerosol penetration and leakage characteristics of masks in the health care industry characteristics of face seal leakage in filtering facepieces osha guidance for hospital-based first receivers of victims from mass casualty incidents involving the release of hazardous substances (final draft) personal protective equipment guide for military medical treatment facility personnel handling casualties from weapons of mass destruction and terrorism events. technical guide . aberdeen proving grounds, md: u.s. army center for health promotion and preventive medicine guideline for isolation precautions in hospitals. the hospital infection control practices advisory committee respiratory protective devices: final rules and notice minnesota department of health chapter association for practitioners of infection control. personal protective equipment for smallpox and viral hemorrhagic fever patient care code of federal regulations . (q)( )(iii-iv) standard interpretations. training and ppe requirements for hospital staff that decontaminate victims/patients standard interpretations. respiratory protection requirements for hospital staff decontaminating chemically contaminated patients hazardous waste operations and emergency response key: cord- -ll pnl authors: saberi, m.; hamedmoghadam, h.; madani, k.; dolk, h. d.; morgan, a.; morris, j. k.; khoshnood, k.; khoshnood, b. title: accounting for underreporting in mathematical modelling of transmission and control of covid- in iran date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: ll pnl background: iran has been the hardest hit country by the outbreak of sars-cov- in the middle east with , confirmed cases and , deaths as of april . with a relatively high case fatality ratio and limited testing capacity, the number of confirmed cases reported is suspected to suffer from significant under-reporting. therefore, understanding the transmission dynamics of covid- and assessing the effectiveness of the interventions that have taken place in iran while accounting for the uncertain level of underreporting is of critical importance. we use a mathematical epidemic model utilizing official confirmed data and estimates of underreporting to understand how transmission in iran has been changing between february and april . methods: we developed a compartmental transmission model to estimate the effective reproduction number and its fluctuations since the beginning of the outbreak in iran. we associate the variations in the effective reproduction number with a timeline of interventions and national events. the estimation method also accounts for the underreporting due to low case ascertainment by estimating the percentage of symptomatic cases using delay adjusted case fatality ratio based on the distribution of the delay from hospitalization to death. findings: our estimates of the effective reproduction number ranged from . to . between february and april , with a median of . . we estimate a reduction in the effective reproduction number during this period, from . ( % ci . - . ) on march to . ( % ci . - . ) on april , due to various non-pharmaceutical interventions including school closures, a ban on public gatherings including sports and religious events, and full or partial closure of non-essential businesses. based on these estimates and given that a near complete containment is no longer feasible, it is likely that the outbreak may continue until the end of the if the current level of physical distancing and interventions continue and no effective vaccination or therapeutic are developed and made widely available. interpretation: the series of non-pharmaceutical interventions and the public compliance that took place in iran are found to be effective in slowing down the speed of the spread of covid- within the studied time period. however, we argue that if the impact of underreporting is overlooked, the estimated transmission and control dynamics could mislead the public health decisions, policy makers, and general public especially in the earlier stages of the outbreak. funding: nil. evidence before this study since the outbreak of sars-cov- in late , several studies have attempted to understand its transmission and control dynamics. the majority of the existing studies reported the dynamics and initial estimates of the effective reproduction number from china followed by a few other studies using data from other countries including italy, spain, south korea, germany, france and iran among others. however, none of the previous work has taken into account the impact of possible underreporting of cases in estimation of the effective reproduction number. also, no other study reported the time-dependent association between the interventions and variations in the effective reproduction number for iran as the hardest hit country in the middle east. we use a mathematical model to estimate the transmission and control dynamics of covid- in iran, taking into account the significant underreporting of cases. we estimated the time-dependent effective reproduction number in association with a timeline of events and interventions that took place. we showed that if underreporting is overlooked, the estimated dynamics could mislead the public health decisions and general public. the impact of control measures on the effective reproduction number could also significantly be overestimated unless under-reporting is taken into account. the estimation of transmission and control dynamics of covid- in any country highly depends on the quality of the reported data. however, in the presence of high uncertainty in the number of confirmed cases, it is of crucial importance to take into account the impact of underreporting when interventions are to be introduced or lifted. the outbreak of sars-cov- in iran was first officially announced in february , two months after the initial outbreak in wuhan, china. iran's patient zero is believed to have been a merchant from qom who had travel history to china. despite the initial signs of a spread in qom, the government declined to place the city under quarantine to contain the epidemic at an early stage for various technical, socio-economic, religious, and security reasons. the first local non-pharmaceutical interventions such as schools and universities closure were put in place a few days after the official acknowledgement of the first cases in qom and tehran. since then, various public health control measures at the local and national levels were taken that are believed to have altered the course of the outbreak. see figure for a spatial illustration of the spread throughout the country by province in the first week since the official announcement of the first case (appendix p ). the relatively high case fatality ratio (cfr), defined as the total number of deaths over the total number of infected cases, in iran's official reports after the first week since the official declaration of the first case ( . %) has raised questions on the true number of cases in the country. , the testing protocol in iran at the early stages of the outbreak was limited to hospital admissions of the patients with severe symptoms. while iran has extended the covid- diagnostic testing capacity later on to patients with milder symptoms, it is believed that under-ascertainment of cases still remains high. this study aims to understand the transmission dynamics of covid- in iran and to assess the effectiveness of the control measures that were put in place over time through estimation of the effective reproduction number ܴ ሺ ‫ݐ‬ ሻ defined as the average number of susceptible persons infected by an infected person during its infectious period at a given time in the course of the epidemic. we assessed ܴ ሺ ‫ݐ‬ ሻ in relation to a timeline of national events and non-pharmaceutical interventions. in the absence of timely and reliable data, modelling can provide helpful answers, including the degree of plausible uncertainty in different estimates and the effectiveness of non-pharmaceutical interventions. by providing explicit and clear information about model assumptions and parameters, modelling can also foster scientific discussion of data gaps and what can be done to improve outbreak-related estimates by borrowing information available elsewhere. finally, models can be developed and presented using both average estimates and measures of their uncertainty or, alternatively, as scenarios that can illustrate possible developments of the epidemic under various conditions. we use official time-series reports of the number of confirmed cases, recovered, and deaths from the world health organization (who) and iran's ministry of health and medical education . the first confirmed case was reported on february which is assumed as the beginning of the outbreak of covid- in iran. we describe the dynamics of spread using a variation of the susceptible-exposed-infected-recovered (seir) model, distinguishing between fatality and recovered cases combined with an estimate of the percentage of symptomatic cases using delay-adjusted cfr (appendix p ). see figure . the model . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint accounts for the time between exposure-to-onset of symptoms (or confirmation), also known as incubation period, assuming a gamma distribution with an average of . days and a standard deviation of . days. we also assume the time from symptoms onset-to-death and -to-recovery both follow a gamma distribution with an average of . days and a standard deviation of . days and an average of . days and a standard deviation of days, respectively. the size of the initial susceptible population is assumed to be million. to estimate the parameters of the developed seir model, we formulate an ordinary least squares (ols) minimization. we use pattern search as a derivative-free global optimization algorithm to find the model parameters that minimizes the sum of the normalized root mean squared error (rmse) of the number of infected ߝ , recovered ߝ and removed cases ߝ . the basic reproduction number, ܴ , a fundamental measure in infectious disease epidemiology and public health, is defined as the average number of susceptible persons infected by an infected person during its infectious period in a fully susceptible population. ܴ ௧ is defined similarly to ܴ but is not limited to the assumption of a completely susceptible population. here, we use empirical data from iran to trace changes in ܴ ௧ over a rolling day period since the beginning of the covid- outbreak and describe its association with various interventions (e.g. school closures, social distancing, and bans on public gatherings) that took place by the public and government. various methods exist to estimate ܴ (and ܴ ௧ ). , here, we use the same framework described in figure in which the parameters of the formulated seir model are inferred through an optimization problem. ܴ ௧ is calculated using a rolling time window of days to capture the evolving trend of the spread over time due to various changes in the social network contact rate. the calculated ܴ ௧ may be overestimated during the early stage of an outbreak due to different reasons including the impact of imported cases and heterogeneity in subpopulations (e.g. older than years old) with higher transmission rates. we account for the under-reporting of the number of infected cases in the official confirmed data using delay-adjusted case fatality ratio (cfr) approach. this approach assumes that the time from hospitalization-to-death has a known statistical distribution and uses this distribution to estimate when the people who died from covid- would have been reported as being infected. the case fatality ratio is the ratio of the numbers of deaths over the numbers of reported infections calculated at the time of reporting not the time of death. this is extremely important for rapidly evolving epidemics. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . the method, however, does not account for underreporting in fatality cases. the distribution of the time from confirmation-to-death is assumed to follow the same distribution as of the time from hospitalization-to-death, following a lognormal distribution with a mean of days and a standard deviation of . days. here, we assume that the estimate for percentage of symptomatic covid- cases reported for iran follows a lognormal distribution (same distribution type as of the time from hospitalization-to-death) with a mean of . % and a standard deviation of % based on the latest estimates in the literature. this is based on the assumption of a baseline cfr of . %. we assume the underreporting level remains constant over time. we estimated that the cfr on february , before adjusting for the time from diagnosis-to-death, was . %. with more data emerging after the second week, the cfr dropped to . % on the th day since the declared beginning of the outbreak on february , . later, between march and april , the cfr stabilized between . % and . % with a mean of . %. see figure (c). with the wider spread of covid- across the country, the cfr increased to . % on march . however, the cfr declined and plateaued around . % between and april . the relatively high cfr could correspond to a significant level of under-reporting of the infected cases and an overwhelmed health system. given the wide distribution of the time from confirmation-to-death of covid- , we also explore the delay-adjusted cfr with -and -day delay period, as examples. the dynamics of the delay-adjusted cfr with -day delay suggests that the cfr has been gradually reducing in iran from . % on march to . % on april . when underreporting of infected cases is overlooked, the estimated effective reproduction number began from . ( % ci . - . ) on march and reduced to . ( % ci . - . ) on april suggesting the outbreak peak has already occurred on april when ܴ ௧ goes below , about days from the confirmation of the first case. the outbreak is also likely to continue until the end of . see figure . the estimates of the effective reproduction number were consistently larger during the early stage of the outbreak when underreporting is overlooked compared to when underreporting is taken into account. however, the estimated effective reproduction numbers converged as the number of infected cases approached the peak. the convergence of the estimates can be partly explained by the fact that the effective reproduction number is more dependent on the rate of change in the infected and recovered cases rather than their absolute numbers. results also suggest that the impact of control measures on the effective reproduction number is significantly overestimated when under-reporting is taken into account. with the gradual reduction of the effective reproduction number to below one and the increasing pressure on an already fragile economy because of the implemented control measures, the government is seeking an exit strategy and is considering easing some of the restrictions. here, we conduct a scenario analysis to understand how three different scenarios could change the projected outlook of the outbreak in iran: i) maintaining the same level of control measures as of april , ii) intensifying the measures to increase physical distancing represented by a % reduction in the reproduction number, and iii) partially lifting the restrictions to ease physical distancing represented by a % increase in the reproduction number. to estimate the number of icu beds needed, we assume % of the confirmed cases require intensive care. we found that in all scenarios the projection of patients requiring icu admission exceeds the original icu capacity. note that no official information is available on the expanded icu capacity. as of april , both projected curves start above the icu capacity. easing the restrictions can quickly push the peak to a level that is five times higher than the scenario where the current level of control measures is maintained and puts additional pressure on the health system. results clearly suggest that with further restrictions . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint (scenario ii), the projected curve quickly goes back under the icu capacity while it takes more than days for the curve to go back under the icu capacity in scenario i and iii. with % ci over a -day rolling window when underreporting is taken into account. (c) estimated cfr with and without delay adjustment over the same time period in a semi-log scale. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . the projected number of icu beds needed under the three scenarios. the original icu capacity is assumed between to beds. in this study, we used a mathematical epidemic model to provide the first estimates of the changing transmission of sars-cov- infection in iran when underreporting of cases are considered. we used official data and adjusted our estimates for underreporting based on delay-adjusted cfr. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . we used a variation of the seir model combined with an estimate of the percentage of symptomatic cases using delay-adjusted cfr based on the distribution of the time from confirmation-to-death . in contrast to previous models of the epidemic , we did not assume any prior information on the distribution of the effective reproduction number. this combined with the use of a series of distributions allowed us to take into account more appropriately the uncertainty and the random variation in both input data and model outcomes. the outbreak of covid- in iran is claimed to have been started in qom province with the first case officially reported on february . as shown earlier in figure (a) , according to the official data, it only took two days, for four more provinces, including tehran, markazi, mazandaran, and gilan (guilan) to report their first cases. the first non-pharmaceutical intervention took place on february with the closure of schools and universities in qom followed by the capital city of tehran after a day. with the school closures in the capital and yet low level of social awareness about the potential risks of covid- , a surge in the number of holiday trips from tehran to northern provinces of mazandaran and gilan was observed. soon after, with no restriction on the inter-city travels the number of identified cases in gilan and mazandaran grew rapidly, and the infection spread throughout the rest of the country. the second major non-pharmaceutical intervention occurred on february with a wide campaign to disinfect public spaces and closure of all schools and universities across all provinces. on march , the government announced an increase in the covid- testing capacity. on march , sporadic closure or reduction of working hours in government offices and banks across the country was reported. on march , two often crowded religious shrines in qom and mashhad were closed to visitors. the increased physical distancing as the result of interventions and increase in public awareness of the crisis through major official and unofficial information campaigns, especially on social media, gradually showed its impact by slowing down the speed of the spread as shown earlier in figure (b) . the estimated effective reproduction number increased from . on march to . on march , perhaps due to the increase in case ascertainment and delay in early interventions to show their impact. the effective reproduction number decreased consistently to . on march . on march , only a few days before the persian new year (nowruz, march ), in the absence of strict travel restrictions, millions of iranians began making road trips to various destinations across the country despite the warnings from the government and many hotels, restaurants and the general hospitality industry's refrainment to provide services to any traveller. this is believed to have increased the speed of the spread of covid- in iran, increasing the effective reproduction number to . on march . on march , government announced more restrictive non-pharmaceutical interventions leading to closure of all non-essential businesses for at least two weeks, followed by further intensified interventions on march , including restrictions on entry and exit to affected provinces and cities, closure of parks, pools and all recreational places, and a ban on public gatherings including sport, cultural, and religious events. these restrictions pushed the effective reproduction number further down to . on april and . on april . iran has had one of the highest case fatality rates (cfr) among the affected countries in the world. while the cfr is known to vary significantly between countries due to various reasons including testing frequency and population age distribution , the cfr is still higher in iran despite having a younger population with the median age of . years old compared to china with % cfr and the median population age of . and south korea with % cfr and the median population age of . . perhaps a comparable country in the middle east region with similar population characteristics (median age of . ) is turkey. the cfr in turkey as of april was %, three times lower than that of iran. while the evidence is indirect, it suggests that the official number of cases reported by iran may has been significantly under-reported, possibly due to relatively low case-ascertainment and under-reporting of identified cases. the continuous reduction in the delay-adjusted cfr shows a different trend compared to the cfr with no consideration of the time from confirmation-to-death. the continuous reduction could be explained by the improving case identification practice in iran over time with various initiatives including a national coronavirus helpline (the " " service), established in late february to self-report symptoms and identify suspected cases. our results confirmed the significant impact of underreporting in describing the story of the covid- outbreak in iran, especially in the early stages. we showed how overlooking underreporting can drastically affect estimation of ܴ ሺ ‫ݐ‬ ሻ and overestimate the impact of control measures. our results also showed the reduction in effective reproduction number, a measure of infection transmission, during this period. this decrease was most likely due to the increased physical distancing as the result of multiple non-pharmaceutical public health interventions, including school closures, ban on public gatherings, travel restrictions, full/partial closure of non-essential businesses, as well as major awareness campaigns over social media. based on the latest trends, while the first peak of covid- in iran occurred on april , the post-peak period may continue to the end of . however, these projections assume the continuation of the current level of control measures and the absence of effective therapeutic treatment or vaccination programs. hence, they can be subject to important shifts depending, in particular, on the public's willingness to continue and the government's success in implementing social distancing measures or easing the restrictions. our model provides tangible evidence of the association between the different non-pharmaceutical interventions and their impact on the course of the outbreak. the results showed how the acceptance and hence effectiveness of the interventions endorsed by the iranian government aimed at "flattening the curve" depended in part on the public's level of awareness of the principles behind governmental policies and their trust in the government and its control measures. for example, closure of schools and businesses in the capital city of tehran near the time of the persian new year was followed by a surge of holiday trips to gilan and mazandaran and a subsequent increase in the number of cases in these provinces and in other parts of the country. this observation shows how interventions may be associated with unintended and at times counterproductive consequences. these negative consequences can be prevented when there is open and credible communications by competent officials and mutual trust between public and government. moreover, intervention measures need to be developed, implemented and enforced as a whole, for example by strict reinforcement of travel restrictions in conjunction with school and workplace closings. there is no substitute for high quality data -complete, accurate, and timely -as a basis for public policy. however, in the absence of such data, modelling of the type presented here can help provide reasonable estimates as well as realistic bounds of their uncertainty. the range of uncertainty can be viewed as the margin of error in the model's predictions of the number of cases, icu admissions or deaths. modeling can also illustrate different scenarios --pessimistic vs. optimistic vs. realistic --of how the epidemic may evolve in relation to current and future public health measures and the possible compliance of the public over time. in conclusion, using a stochastic model of the sars-cov- epidemic in iran, we assessed the dynamic of the epidemic in relation to public health measures to increase social distancing. we took into account both the inherent uncertainty in the data and the possible impact of under-reporting of true cases due to low case ascertainment and reporting. in the absence of consistently reliable data, the modelling approach as presented here can help generated reasonable estimates of key public health metrics such as the number of cases and case fatality ratio. in turn, these metrics and scenarios can help serve the dual purpose of informing public policy and the public and fostering discussions and improvements of epidemic modelling. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint coronavirus disease (covid- ) world economic forum. finding 'patient zero': the challenges of tracing the origins of coronavirus coronavirus could break iranian society iran takes emergency measures after two coronavirus deaths in qom cross-country comparison of case fatality rates of covid- /sars-cov- why is iran's reported mortality rate for coronavirus higher than in other countries? iran's ministry of health and medical education. daily covid- epidemic reports the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application report : severity of -novel coronavirus (ncov) inverse problem for coefficient identification in sir epidemic models analysis of generalized pattern searches complexity of the basic reproduction number (r ) effective reproduction numbers are commonly overestimated early in a disease outbreak the effective reproduction number as a prelude to statistical estimation of time-dependent epidemic trends the estimation of the basic reproduction number for infectious diseases on the exact measure of disease spread in stochastic epidemic models incubation period and other epidemiological characteristics of novel coronavirus infections with right truncation: a statistical analysis of publicly available case data centre for mathematical modelling of infectious disease, london school of hygiene & tropical medicine the centre for evidence-based medicine characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention centre for mathematical modelling of infectious disease, london school of hygiene & tropical medicine key: cord- -zpdnzwle authors: zhao, jinqiu; li, xiaosong; huang, wenxiang; zheng, junyi title: potential risk factors for case fatality rate of novel coronavirus (covid- ) in china: a pooled analysis of individual patient data date: - - journal: am j emerg med doi: . /j.ajem. . . sha: doc_id: cord_uid: zpdnzwle background and objective: since the first case of the pneumonia caused by novel coronavirus (covid- ) is found in wuhan, there have been more than , cases reported in china. this study aims to perform the meta-analysis of risk factors for the case fatality rate (cfr) of the novel coronavirus (covid- ). design and methods: we have searched pubmed, google scholar and medrxiv for the cohort studies involving risk factors for the cfr of covid- . this meta-analysis compares the risk factors of cfr between fatal patients and non-fatal patients. results: two cohort studies are included in this study. after comparing the patients between fatal cases and non-fatal cases, several important factors are found to significantly increase the cfr in patients with covid- , and include the age ranging – (or = . ; % ci = . to . ; p < . ) and especially≥ (or = . ; % ci = . to . ; p < . ), sex of male (or = . ; % ci = . to . ; p = . ), occupation of retirees (or = . ; % ci = . to . ; p < . ), and severe cases (or = . ; % ci = . to , . ; p = . ). as the advancement of early diagnosis and treatment, the cfr after january (or ), is substantially decreased in covid- than before (or = . ; % ci = . to . ; p < . ). conclusions: several factors are confirmed to significantly improve the cfr in patients with covid- , which is very important for the treatment and good prognosis of these patients. sarbecovirus, orthocoronavirinae subfamily), and has some similar features of severe acute respiratory syndrome coronavirus (sars-cov) [ ] . covid- is highly thought to be associated with huanan seafood wholesale market and delivered to humans from wild animals illegally sold [ ] . according to the first confirmed ncip in wuhan from december to january , the epidemiologic characteristics confirmed human-to-human transmission among close contacts since the middle of december and revealed that the epidemic was doubled in size every . days in the early stage [ ] . the human-to-human transmission of ncip was also confirmed by case reports and family settings [ ] [ ] [ ] [ ] . the novel coronavirus was found in stool samples of patients with abdominal symptoms, indicating that fecal-oral transmission might occur for ncip [ ] . some studies have helped understand the molecular, clinical and epidemiological features of the covid- [ , , ] . one cohort study conducted in jin yin-tan hospital (wuhan, china) first reported the epidemiological, clinical, laboratory and radiological characteristics, as well as clinical outcomes in ncip patients [ ] . the clinical features mainly include fever, cough, dyspnea, myalgia, fatigue, sputum production, headache, haemoptysis, and diarrhoea [ , ] . kidney injury and even death [ , ] . some severe patients with covid- resemble that of sars-cov [ , ] . another cohort study also systematically reported the epidemiological and clinical features of patients with ncip in zhongnan hospital of wuhan university (wuhan, china) [ ] . nonpharmaceutical interventions such as shutdown of public gathering places, wearing of facial masks and social distancing are still effectively slow the spread of the disease. there is currently no antiviral treatment or vaccine specifically designed for this virus with field-proven effectiveness, and supportive therapies are mainly used for these patients [ ] . there have been more than , deaths in china. this meta-analysis is conducted to reveal the risk factors for cfr in patients with covid- , which is valuable to improve the treatment and prognosis of these patients. ethical approval and patient consent were not required because this was a meta-analysis of previously published studies. two investigators independently searched the following databases (inception to february ): pubmed, google scholar, medrxiv and cnki. the electronic search strategy was performed using with the following keywords: -novel coronavirus‖ or -covid- ‖, and -epidemiological‖ or -clinical features‖ or -clinical characteristics‖ or -death‖ or -case fatality rate‖ or -cfr‖. the following inclusive selection criteria were applied: (i) patients were diagnosed with novel coronavirus diseases (covid- ) (ii) study design was the j o u r n a l p r e -p r o o f journal pre-proof cohort study comparing fatal patients with non-fatal patients (or severe cases versus non-severe cases). we used a piloted data-extraction sheet, and collected the following information: publication year, first author, number of patients, age, gender and the number of non-severe/severe cases in two groups. data were extracted independently by two investigators. this meta-analysis focused on the risk factors including baseline characteristics and severity on the cfr of covid- . furthermore, the risk factors that would be analyzed should be compared in both of two included studies. odds ratio (or) with % confidence intervals (ci) was used for all dichotomous outcomes. the random-effects model was used regardless of heterogeneity which was assessed by i statistic. i > % indicated significant heterogeneity [ ] . sensitivity analysis was needed when encountering significant heterogeneity. p< . suggested statistically significance between two groups. all analysis were conducted using review manager version . . a detailed flowchart of the search and selection results is shown in figure . one hundred potentially relevant articles were identified initially, and two cohort studies involving , patients are finally included in this study [ , ] . two studies that the data collected is from information system for infectious disease reporting through february th , [ ] . the main characteristics (e.g. age, gender and severity of patients) of the two cohort studies are presented in table . after carefully analyzing the two studies, age, sex, occupation and severity are selected for the association with cfr of covid- . two studies revealed that severe cases are significantly older than those non-severe cases [ , ] [ ] . these indicate that old age may also result in the increase in cfr in these patients. the association analysis between age range with cfr is revealed in figure . the age≤ (or= . ; % ci= . to . ; p< . ) and ranging ~ (or= . ; % ci= . to . ; p< . ) is associated with obviously relatively lower incidence of cfr, while the age ranging ~ (or= . ; % ci= . to . ; p< . ) and especially≥ (or= . ; % ci= . to . ; p< . ) results in the significant increase in cfr. these results suggest that age≥ can be regarded as the risk factor for cfr in patients with covid- . j o u r n a l p r e -p r o o f in this study, we mainly find the association between sex and cfr of these patients ( figure ). this meta-analysis indicates that the sex of female is associated with relatively lower incidence of cfr (or= . ; % ci= . to . ; p= . ), while the sex of male leads to the obvious increase in cfr for covid- (or= . ; % ci= . to . ; p= . ). the cdc study reported five kinds of occupation, including service industry, farmer/worker, medical worker, retiree, and others [ ] . in order to perform the analysis between occupation with cfr, service industry, farmer/worker, and medical worker are generally regarded as employed persons, while others are generally thought to be unemployed persons. in our meta-analysis between occupation with cfr (figure ) it is widely accepted that severe cases receive intensive care unit (icu) care, the time periods are generally divided into three periods: before january (or ), , after january (or ), and the middle time period between them. in our meta-analysis ( figure ), the cfr before january (or ), is relatively high (or= . ; % ci= . to . ; p= . ), but the cfr is significantly reduced after january (or ), (or= . ; % ci= . to . ; p< . ). the transmissibility of covid- is similar to that of sars-cov in the range of . - . % [ ] . the overall adjusted cfr is estimated to be . % for the covid- , which is lower than those of sars-cov ( . %) and mers-cov ( . %) [ ] . one included study demonstrates that patients in icu group is significantly older than those in non-icu group ( ( - ) versus ( - ), median (iqr), p<. ) [ ] , which is consistent with the study conducted by guan et al [ ] . furthermore, the old age limit ≥ years may be defined as the risk factor for exacerbation of covid- (p< . ) [ ] . our results reveal that age ranging ~ age and especially≥ is found to have notably increased cfr in these patients, and thus age≥ can be regarded as the important risk factor for increased crf. in addition, retirees are revealed to have improved cfr than other occupations of j o u r n a l p r e -p r o o f journal pre-proof patients with covid- in this meta-analysis, which may be attributed to older age of retirees. patients with low immune function such as old age, obesity, presence of comorbidity, hiv infection, long-term use of immune-suppressive agents and pregnant women may have improved cfr [ ] . prompt administration of antibiotics to prevent infection and immune support treatment may reduce the complications and cfr of these patients [ ] . the reduced lymphocytes was found in most patients, suggesting that -ncov may mainly damage lymphocytes, especially t lymphocytes, which was similar to sars-cov. substantially decreased t lymphocytes might be an important factor for predicting the exacerbations of patients [ ] . a descriptive study reported cases of ncip from wuhan jinyintan hospital from jan to jan , , and demonstrated that older men with comorbidities was more likely to suffer from ncip and ards. [ ] . in contrast, the proportion of men and women showed no statistical difference between icu patients and non-icu patients in another study [ ] . there are conflicting results regarding the relationship between sex and severity of covid- . in this meta-analysis, male patients have significantly virus acts mainly through binding to ace receptors, which may account for the gender difference [ ] . it is generally known that severe cases and patients receiving icu have higher possibility of death than other patients for diseases. one cohort study involving patients reported deaths in severe cases ( . %) and death in non-severe cases ( . %), and significant difference of cfr is observed between severe cases and non-severe cases [ ] . these results are also confirmed in this meta-analysis. severe cases with covid- have increased cfr than non-severe cases. regarding the sensitivity analysis, there is significant heterogeneity for occupation, severity and time period. several reasons may account for this heterogeneity. firstly, in the analysis of cdc study, service industry, farmer/worker, and medical worker are generally regarded as employed persons, while others are generally thought to be unemployed persons [ ] , which may produce the heterogeneity for occupation. secondly, one study reported the mild pneumonia/non-pneumonia versus severe pneumonia [ ] , while the cdc study reported the mild cases versus severe/critically ill cases [ ] , and thus there is lack of clear definition of non-severe versus severe cases, which may cause the heterogeneity for severity analysis. thirdly, different patient populations in the three time periods are selected in two studies, which may explain the heterogeneity for the analysis of time periods. fourthly, these two studies are retrospective trials, which also produce some heterogeneity. one study also confirmed that the patients in icu group had more comorbid diseases than those patients in non-icu group [ ] . older age and comorbidity may j o u r n a l p r e -p r o o f journal pre-proof be risk factors for the exacerbation of ncip [ ] . furthermore, the comorbid diseases such as cardiovascular diseases, copd and hypertension may also increase the cfr in covid- patients. existing antiviral treatments such as lopinavir/ritonavir and remdesivir have been evaluated and used for treating sars-cov and mers-cov infections [ , ] . they are also considered for the treatment of covid- infections [ ] . clinical trials with large patient sample should be carefully designed and implemented to assess their efficacies. this meta-analysis has several potential limitations. firstly, there are only two retrospective cohort studies included, and more studies with larger sample should be conducted to investigate this issue. secondly, there is significant heterogeneity for occupation, severity and time period, which may be caused by different definition of occupation, severe cases and different patient population in the three time periods. thirdly, there may be some repetitive data in these two studies, which may have some influence on the pooling results. fourthly, there may be some confounding relationship between the occupation of retiree and age, but it is not available to adjust the occupation of the retiree by age based on current limited data. in conclusion, this study reveals the several factors including age≥ , sex of male, occupation of retirees and severe cases can substantially increase the cfr in patients with covid- . these findings are of crucial importance for timely treatment and good prognosis of these patients. huang conducted the study planning, data analysis and data interpretation, junyi zheng and junyi zheng wrote and revised the article. all authors read and approved the final manuscript. all relevant data are within the manuscript. medical university (pyjj - , https://www.cqmu.edu.cn/) and natural science foundation of chongqing(cstc jcyj-msxmx , http://www.csti.cn/govwebnew /). zjq conducted the study design, data collection and analysis, decision to publish, and preparation of the manuscript. we declare no conflict of interest. the association analysis between age with cfr. the association analysis between sex with cfr. the association analysis between occupation with cfr. the association analysis between severity with cfr. the association analysis between time period with cfr. the continuing -ncov epidemic threat of novel coronaviruses to global health-the latest novel coronavirus outbreak in wuhan, china coronavirus infections-more than just the common cold early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia outbreak of pneumonia of unknown etiology in wuhan china: the mystery and the miracle transmission and epidemiological characteristics of novel coronavirus ( -ncov)-infected pneumonia (ncip): preliminary evidence obtained in comparison with -sars, medrxiv a novel coronavirus from patients with pneumonia in china genomic characterisation and epidemiology of novel coronavirus: implications for virus origins and receptor binding epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study a familial cluster of pneumonia associated with the novel coronavirus indicating person-to-person transmission: a study of a family cluster importation and human-to-human transmission of a novel coronavirus in vietnam transmission of -ncov infection from an asymptomatic contact in germany nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study the digestive system is a potential route of -ncov infection: a bioinformatics analysis based on single-cell transcriptomes, biorxiv the -new coronavirus epidemic: evidence for virus evolution clinical characteristics of novel coronavirus infection in china clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in the first case of novel coronavirus pneumonia imported into korea from wuhan, china: implication for infection prevention and control measures a novel coronavirus outbreak of global health concern first case of novel coronavirus in the united states a novel coronavirus ( -ncov) causing pneumonia-associated respiratory syndrome epidemiological and clinical features of the novel coronavirus outbreak in china quantifying heterogeneity in a meta-analysis the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) in china clinical features of three avian influenza h n virus-infected patients in s hanghai t-cell immunity of sars-cov: implications for vaccine development against mers-cov single-cell rna expression profiling of ace , the putative receptor of wuhan -ncov role of lopinavir/ritonavir in the treatment of sars: initial virological and clinical findings comparative therapeutic efficacy of remdesivir and combination lopinavir, ritonavir, and interferon beta against mers-cov none. all authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest (such as honoraria; educational grants; participation in speakers' bureaus; membership, employment, consultancies, stock ownership, or other equity interest; and expert testimony or patent-licensing arrangements), or non-financial interest (such as personal or professional relationships, affiliations, knowledge or beliefs). not applicable. not applicable. not applicable. the authors declare no conflict of interest. not applicable. jinqiu zhao and xiaosong li conducted the design, junyi zheng and wenxiang huang key: cord- - vp fwv authors: simonsen, lone; higgs, elizabeth; taylor, robert j; wentworth, deborah; cozzi-lepri, al; pett, sarah; dwyer, dominic e; davey, richard; lynfield, ruth; losso, marcelo; morales, kathleen; glesby, marshall j; weckx, jozef; carey, dianne; lane, cliff; lundgren, jens title: using clinical research networks to assess severity of an emerging influenza pandemic date: - - journal: clinical infectious diseases doi: . /cid/ciy sha: doc_id: cord_uid: vp fwv background: early clinical severity assessments during the influenza a h n pandemic (ph n ) overestimated clinical severity due to selection bias and other factors. we retrospectively investigated how to use data from the international network for strategic initiatives in global hiv trials, a global clinical influenza research network, to make more accurate case fatality ratio (cfr) estimates early in a future pandemic, an essential part of pandemic response. methods: we estimated the cfr of medically attended influenza (cfr(ma)) as the product of probability of hospitalization given confirmed outpatient influenza and the probability of death given hospitalization with confirmed influenza for the pandemic ( – ) and post-pandemic ( – ) periods. we used literature survey results on health-seeking behavior to convert that estimate to cfr among all infected persons (cfr(ar)). results: during the pandemic period, . % ( . %– . %) of ph n -positive outpatients were hospitalized. of ph n -positive inpatients, . % ( . %– . %) died. cfr(ma) for ph n was . % ( . %– . %) in the pandemic period – but declined -fold in young adults during the post-pandemic period compared to the level of seasonal influenza in the post-pandemic period – . cfr for influenza-negative patients did not change over time. we estimated the pandemic cfr(ar) to be . %, -fold lower than cfr(ma). conclusions: data from a clinical research network yielded accurate pandemic severity estimates, including increased severity among younger people. going forward, clinical research networks with a global presence and standardized protocols would substantially aid rapid assessment of clinical severity. clinical trials registration: nct and nct . in , uncertainty about the emerging ph n virus' clinical severity hindered the early global response. although the rapid spread of the virus around the world fulfilled the traditional pandemic definition, its global mortality impact in the end proved to be smaller than any th century pandemic [ , ] . however, its relative mildness was not known in the early months of the outbreak. the earliest estimate of the case fatality ratio (cfr) was on par with the rating for the catastrophic pandemic, and a june assessment put it in the pandemic range (table ) [ ] . an evaluation of the pandemic response ordered by the world health organization's (who) director general [ ] found that a systematic way to assess both transmissibility and clinical severity-also known as its "seriousness" [ ] -is needed in the early phase of a future pandemic to assess the level of threat accurately and to mobilize resources appropriately. cfr is one important measure of clinical severity; others include the risk of admission to the intensive care unit (icu) and the need for mechanical respiratory support. a who task force is currently developing the data inputs and study designs needed to generate timely estimates of clinical severity [ ] . the centers for disease control and prevention has proposed a scheme for comparing pandemic and seasonal influenza graphically, plotting attack rates against clinical severity [ ] . in , uk public health england spearheaded what has become a standard first-line approach to assessing the clinical severity of a pandemic, known as the "first few hundred" (ff ) [ ] . these and similar studies gather data on the earliest cases that come to medical attention through outpatient facilities and hospitals and provide important descriptive data about symptoms, risk factors, and risk of progression to severe illness or death [ ] [ ] [ ] [ ] [ ] [ ] [ ] . these data can in turn be combined with other data on population attack rates to forecast national and global hospitalization and mortality estimates using a pyramid modeling strategy [ , ] . standard ff studies, however, lack historic controls in the form of a baseline from recent seasonal influenza seasons. they are also subject to selection bias, as the first cases that come to attention are likely to be more severe [ ] . unless an ff study is set in an existing surveillance system or ongoing clinical research data collection scheme, there is no obvious seasonal influenza baseline against which to compare the clinical severity of the pandemic virus. moreover, unless the pandemic is severe, an ff study in the outpatient setting alone will not have the statistical power to accurately estimate the cfr unless many thousands of patients are enrolled. global clinical research networks that study mild and severely ill influenza patients could be used to overcome many of these problems. two ongoing clinical cohort studies of influenza are conducted under the international network for strategic initiatives in global hiv trials (insight) umbrella, sponsored by the national institutes of health. since , insight has undertaken cohort studies- outpatient (flu ) and inpatient (flu )-specifically to address gaps in clinical research on the emerging influenza pandemic, including factors linked to disease progression and severe outcomes [ ] . insight annually enrolls hundreds of patients with suspected or confirmed influenza, with intake sites in countries. at these sites, experienced teams use a standardized protocol to collect extensive clinical data, perform long-term follow-up (at and days for inpatients, days for outpatients), and bank patient samples for further study. several articles on influenza have been published using insight data, including protocol descriptions and preliminary data [ ] , an exploration of biomarkers of influenza case severity [ ] , patient outcomes after ph n infection [ ] , and phylogeography of the ph n virus [ ] . we used insight data collected in the pandemic period ( - ) to retrospectively demonstrate how clinical research networks can provide essential early insights into pandemic clinical severity and other epidemiological parameters. to "leverage" the cfr computation, we multiplied the conditional probability of progression from outpatient to hospitalization by that of progression from hospitalization to death. to underscore the importance of having baseline data, we compared the estimated ph n clinical severity to that of seasonal influenza types and subtypes and noninfluenza respiratory patients in the post-pandemic period ( ) ( ) ( ) ( ) . our cfr estimates were in reasonable agreement with final global cfr estimates based on excess mortality estimates from time series of nationwide vital statistics data and seroepidemiology data-final estimates of a type that would only be available several years after the next pandemic emerges [ , , ] . here, we discuss what it would take to move a clinical research network like insight from routine research operation into emergency mode to generate timely and robust clinical severity assessments. the national institute of allergy and infectious diseases (niaid)-funded insight network initially focused solely on hiv but expanded first to include ph n and then all influenza types and subtypes and emerging respiratory pathogens such as middle east respiratory syndrome and severe acute respiratory syndrome. sites, located in of world regions (figure ), consecutively enroll adult patients aged ≥ years with suspected influenza. flu recruits patients who present at a physician's office or clinic with influenza-like illness (ili), defined as fever with either cough or sore throat. flu recruits patients with known or suspected influenza who require hospitalization. at enrollment, patient medical history and demographic information are recorded, and blood and oropharyngeal swabs are analyzed and stored. testing for influenza is done both locally and at an insight central laboratory. all patients are followed up, regardless of influenza test result, at days after enrollment in flu and at and days in flu . we extracted insight data on demographics, illness onset, medical history, and vital status at follow-up visit from the protocol databases. we defined the pandemic period as the to follow-up were treated as missing and removed from the analysis. we identified relevant case series in the literature reporting data on patients aged > years. after excluding studies with fewer than patients or with a specialty population (such as high-risk patients), we chose outpatient studies, set in the united states [ ] and in the united kingdom [ ] , and inpatient studies [ , ] , both set in the united states, for comparison with flu and flu ph n laboratory-confirmed patients during the pandemic period (table ) . we calculated the medically attended cfr (cfr ma ) from the probability that a medically attended ili (flu ) patient would progress to hospitalization by day and the probability that a hospitalized (flu ) patient would die by day : where h = hospitalization and d = death to estimate cfr among all infected persons (cfr ar ), we used findings from a uk health behavior survey that found that % of patients aged ≥ years with ili sought care for their illness [ ] and a uk serology study that found that % of influenza-infected adults aged - years were symptomatic [ ] . assuming that the nonmedically attended and asymptomatic influenza cases would not progress to severe illness, we have: , where "infection" is defined as a person who responded immunologically. the % confidence intervals (cis) on the cfr estimate were generated from the variance of the product of the proportions, p(h/ili) × p(d/h), using the delta method or a first-order taylor series expansion. we assumed the proportions were independent. in small samples with large variability, this may not be a good approximation. in some cases, negative values for the cis may be obtained. data analysis was done using sas, version . , and excel. the flu and flu protocols were approved by the institutional review boards or institutional ethics committees at the university of minnesota and at each of the participating clinical sites. all patients (or their proxies) gave signed informed consent prior to enrollment. during the pandemic period (october through september ), ili and hospitalized patients tested influenza ph n positive. of these, . % of ph n -infected flu outpatients were aged - years compared to only % of the flu inpatients. during the post-pandemic period (october through september ), ili and hospitalized patients were ph n positive; of these, % of ili outpatients and % of hospitalized patients were aged - years. in the pandemic period about / of outpatients and / of inpatients were from european sites, while during the post-pandemic period, after the network expanded to sites in world regions, these figures were / of outpatients and / of inpatients. we found that demographic and clinical characteristics of insight pandemic period ph n patients were similar to those described in published ff -like studies of adult ph n patients [ ] with respect to mean age, prevalence of symptoms and underlying diseases, mortality rates, and other characteristics ( table ) . five percent of ph n -confirmed ili patients were hospitalized, and . % of ph n -positive inpatients died (table , figure ). this yielded a ph n cfr ma of . % ( . %- . %) both for all adults and for adults aged - years. the cfr ma for patients aged ≥ years could not be established with confidence due to the small number of older outpatients in the study. as a nonhistoric control, the all-ages cfr ma of influenza testnegative patients was . % during the pandemic period, albeit with wide cis. it was not possible to establish a seasonal influenza comparison for the pandemic period because non-ph n influenza cases (h n , b) in the pandemic period were rare. the cfr ma for ph n cases in the post-pandemic period was . % for patients aged - years, -fold lower than the value for the pandemic period and comparable to the influenza-negative patients of the same age. we could not reliably assess ph n cfr ma for the ≥ years age group due to small numbers in the post-pandemic period; however, cfr ma was . % for seniors aged ≥ years positive for any influenza virus in the post-pandemic period vs . % for younger adults positive for any influenza virus. for the post-pandemic period (any subtype), we also estimated the conditional probabilities and the cfr ma by region (table ) . because the final who cfr estimate from the pandemic was based on attack rates as revealed by serology data, we sought to convert our medically attended cfr to one based on the attack rate. to do so, we used data from a study that indicated that approximately % of all cases are asymptomatic [ ] and from survey data that indicate that approximately % of adult ili cases sought medical attention [ ] . we found the cfr ar to be . % ( . %- . %; table ), or -fold lower than the cfr ma . who has recently expanded its pandemic definition to include clinical severity. this means that rapid and accurate estimates of pandemic clinical severity are needed to characterize the threat level and guide the global response. our analysis combining data from inpatient and outpatient insight cohorts demonstrates how preestablished global research networks could immediately begin rigorous studies to estimate the cfr, a key parameter of clinical severity of an emerging pandemic. assessments of the clinical severity in the pandemic became less dire as time passed [ ] . the earliest estimate of cfr, an ff -like case series of hospitalized patients in mexico, was a disturbing % of influenza-positive patients. however, as studies of the first (summer) wave in the united states, the complete southern hemisphere season in new zealand, and further studies from mexico were completed, it became clear that the pandemic would be relatively mild (table ) . several factors contributed to the early confusion in . the most important was probably selection bias toward sicker patients in the earliest ff -type case series studies [ ] . another factor was simply that studies reported on different types of cfr-either as a proportion of medically attended cases (cfr ma ) or as a proportion of all infected individuals (cfr ar ). most early assessments were of the cfr ma type, but these were not directly comparable. our method, retroactively applied to insight databases, yielded a cfr ma estimate of . %. using literature values that indicated that the probability of symptomatic people seeking medical treatment was % [ ] and that the probability of infected individuals being asymptomatic was also % [ ] , our cfr ma value would be equivalent to a cfr ar of . %, which data are for the pandemic and post-pandemic periods, computed as the product of the risk of flu influenza-like illness outpatients getting hospitalized and the flu hospitalized patients having died at day . abbreviations: p (d|h), probability of death given hospitalization; p (h|ili) , probability of hospitalization given influenza-like illness. *case fatality rate not calculated when fewer than outpatients or inpatients contained in any stratum. is in reasonable agreement with the final global who cfr ar estimate of . % [ , , ] . in addition to an absolute measurement of cfr, data from previous seasons can provide a relative comparison of pandemic to seasonal influenza severity; even if the absolute estimate of cfr is uncertain, it would be useful to know if an emerging pandemic has a cfr far higher than previous seasonal influenza experiences. thus, we also estimated cfrs for influenza patients from seasonal influenza epidemics - , as a surrogate for pre-pandemic baseline seasons. age greatly influences both seasonal and pandemic clinical severity estimates. in all influenza pandemics since , mortality was higher than normal in younger people and lower than normal in seniors, sometimes dramatically so [ ] . in the post-pandemic period ( - ) we found that the cfr ma of ph n for patients aged - years had fallen -fold from the pandemic period value, becoming similar to that of a/h n and b. this suggests that the emerging virus had settled into a seasonal epidemic pattern due to accumulated population immunity. moreover, in the post-pandemic period patients aged ≥ years with any influenza virus had a cfr ma approximately -fold higher than patients aged < years. these results corroborate a previous metaanalysis of ff studies that concluded that age is an important confounder of cfr estimates for ph n pandemic influenza [ ] . they also show how important it is to take into account both the age group and the type of cfr being calculated when comparing across regions and time. it is also possible that discrepancies in early assessments of cfr may in fact have reflected true geographical differences. for example, a comprehensive study of pandemic mortality that applied a uniform methodology to different regions found the mortality impact in central and south american countries table was approximately -fold higher than in europe [ ] . this indicates that early reports of higher severity in mexico than in new zealand may not solely have been the result of ascertainment bias. clinical severity can even increase substantially over time, as was seen in the influenza pandemic when a milder summer wave preceded the severe autumn waves [ ] . the best way around the measurement problems that occur early in a pandemic would be to compute the same type of cfr with the same protocol in multiple geographical settings. if possible, estimates should be stratified by risk factors, such as pregnancy and chronic illness, and baseline data should be collected during seasonal epidemics. while some countries have created ff protocols since the pandemic, a global standard along the lines we have outlined here would be helpful. we recognize limitations to our approach to computing cfr by multiplying conditional probabilities of disease progression. first, we used distinct groups of outpatients and inpatients who were recruited under different circumstances at different sites, often in different countries. it is therefore possible the cohorts differed in age composition, health status, or other important respects that could bias the result. however, we argue that the approach, while not ideal, would nonetheless supply timely and useful data, especially if it could be compared to baseline seasons. we also note that the characteristics of the insight ph n outpatients and inpatients in the pandemic period - are reassuringly similar in terms of age, symptoms, comorbidities, and outcomes to published uk and us ff studies of adult ph n influenza outpatients and inpatients ( table ) . a second possible caveat-that insight inclusion criteria might have varied over time and explained the drop in cfr ma over time-could be dismissed on the grounds that the influenza-negative patients did not have a significant drop in cfr ma between the pandemic and post-pandemic period. this means that the measured decrease in ph n clinical severity was real and not due to ascertainment or other bias. our retrospective analysis of pandemic clinical severity indicates that it is possible to use research networks to assess both the absolute magnitude of the clinical severity of a future pandemic and the relative increase compared to a seasonal influenza baseline. even if the seroepidemiology and health-seeking behavior surveys needed to convert cfr ma to cfr ar could not be done rapidly, comparison of cfr ma to previous seasons would reveal much about the relative magnitude of the emerging threat. to be useful in a prospective scenario, however, it would be necessary to ramp up the network's pace of operations from routine to emergency mode. for insight, that would mean, at a minimum, enhancing enrollment in sites located in areas initially affected by the emerging pandemic and increasing the tempo of laboratory processing of specimens and data analysis. in addition to assessing clinical severity, global research networks could play other key roles in pandemic response including studies of comorbidity patterns, risk factors, hospital and icu utilization, and mortality risk of hospitalized patients. moreover, protocols that enroll children could be used to understand the pathogen in this key age group. once a future pandemic outbreak begins, studies set in these networks could both characterize pathophysiology to optimize clinical management and provide a platform for rigorous clinical trials of new therapeutics. we suggest, therefore, that a specific role for clinical research networks carrying out ongoing rigorous research compliant with international standards be added to the international health regulations that govern international and national responsibilities for public health emergencies of international concern. notes acknowledgments. we thank and acknowledge all the patients who participated in this study. disclaimer. the funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. potential conflicts of interest. l. s. and r. j. t. report earning consulting fees from sage analytica, llc, during the conduct of the study. r. l. reports serving as co-editor on a book on infectious disease surveillance published by wiley blackwell, with royalties donated to the minnesota department of health. all remaining authors: no reported conflicts of interest. all authors have submitted the icmje form for disclosure of potential conflicts of interest. conflicts that the editors consider relevant to the content of the manuscript have been disclosed. global mortality estimates for the influenza pandemic from the glamor project: a modeling study estimated global mortality associated with the first months of pandemic influenza a h n virus circulation: a modelling study case fatality risk of influenza a (h n pdm ): a systematic review report of the review committee on the functioning of the international health regulations ( ) in relation to pandemic (h n ) infection fatality risk of the pandemic a(h n ) virus in hong kong novel framework for assessing epidemiologic effects of influenza epidemics and pandemics pandemic (h n ) influenza in the uk: clinical and epidemiological findings from the first few hundred (ff ) cases human infection with new influenza a (h n ) virus: clinical observations from mexico and other affected countries pandemic potential of a strain of influenza a (h n ): early findings epidemiologic analysis of the laboratory-confirmed cases of influenza a(h n )v in colombia pandemic influenza a(h n )v in new zealand: the experience from the severity of pandemic h n influenza in the united states estimating age-specific cumulative incidence for the influenza pandemic: a meta-analysis of a(h n ) pdm serological studies from countries. influenza other respir viruses pandemic influenza a (h n ) in saudi arabia: description of the first one hundred cases critically ill patients with influenza a(h n ) in mexico insight flu : an anti-influenza virus hyperimmune intravenous immunoglobulin pilot study hospitalized patients with h n influenza in the united states systematic review of clinical and epidemiological features of the pandemic influenza a (h n ) factors associated with death or hospitalization due to pandemic influenza a (h n ) infection in california utility of the first few approach during the influenza a(h n ) pandemic in the netherlands optimizing the precision of case fatality ratio estimates under the surveillance pyramid approach potential biases in estimating absolute and relative case-fatality risks during outbreaks surveillance of illness associated with pandemic (h n ) virus infection among adults using a global clinical site network approach: the insight flu and flu studies the association between serum biomarkers and disease outcome in influenza a(h n )pdm virus infection: results of two international observational cohort studies outcomes of influenza a(h n )pdm virus infection: results from two international cohort studies extensive geographical mixing of human h n influenza a virus in a single university community emergence of a novel swine-origin influenza a (h n ) virus in humans using an online survey of healthcare-seeking behaviour to estimate the magnitude and severity of the h n v influenza epidemic in england comparative community burden and severity of seasonal and pandemic influenza: results of the flu watch cohort study preliminary estimates of mortality and years of life lost associated with the a/h n pandemic in the us and comparison with past influenza seasons epidemiologic characterization of the influenza pandemic summer wave in copenhagen: implications for pandemic control strategies flu clinical site investigators: argentina: laura barcan key: cord- -zl url z authors: pearce, n.; moirano, g.; maule, m.; kogevinas, m.; rodo, x.; lawlor, d.; vandenbroucke, j.; vandenbroucke-grauls, c.; polack, f. p.; custovic, a. title: is death from covid- a multistep process? date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: zl url z covid- death has a different relationship with age than is the case for other severe respiratory pathogens. the covid- death rate increases exponentially with age, and the main risk factors are age itself, as well as having underlying conditions such as hypertension, diabetes, cardiovascular disease, severe chronic respiratory disease and cancer. furthermore, the almost complete lack of deaths in children suggests that infection alone is not sufficient to cause death; rather, one must have gone through a number of changes, either as a result of undefined aspects of aging, or as a result of chronic disease. these characteristics of covid- death are consistent with the multistep model of disease, a model which has primarily been used for cancer, and more recently for amyotrophic lateral sclerosis (als). we applied the multi-step model to data on covid- case fatality rates (cfrs) from china, south korea, italy, spain and japan. in all countries we found that a plot of ln (cfr) against ln (age) was approximately linear with a slope of about . as a comparison, we also conducted similar analyses for selected other respiratory diseases. sars showed a similar log-log age-pattern to that of covid- , albeit with a lower slope, whereas seasonal and pandemic influenza showed quite different age-patterns. thus, death from covid- and sars appears to follow a distinct age-pattern, consistent with a multistep model of disease that in the case of covid- is probably defined by comorbidities and age producing immune-related susceptibility. identification of these steps would be potentially important for prevention and therapy for sars-cov- infection. respiratory viral infections are a leading cause of mortality globally ( ) . this has been highlighted by the huge global impact of coronavirus disease (covid- ) ( ) , and the enormous public health threat it poses. covid- is a new and devastating disease, in which the long-term effects of infection are still being discovered. primary infection with sars-cov- and a relatively mild disease followed by successful clearance of the virus seems to represent the norm. however, in a minority of patients, there is systematic inflammation and an exaggerated immune response (cytokine storm) resulting in respiratory and multi-organ failure and potentially death ( ) . sars-cov is a particularly aggressive virus for elderly individuals, who represent between and % of fatal cases in different regions of the world ( , ) . mortality, need for intensive care, mechanical ventilation, oxygen requirement, and hospitalizations increase with advancing age in patients with covid- ( ) ( ) ( ) . median age among , hospitalized patients in the uk hospitals with covid- was years, and more men were admitted than women ( % vs. %) ( ) . in hospitalized patients older than years, mortality is . %, and in intensive care units (icu) can reach a rate of . % ( ) . it is currently unclear whether children have similar infection rates to adults ( , ) , or are equally infectious; however, when infected they have fewer symptoms, and deaths in children are very rare ( ) . overall, death from covid- increases monotonically and steeply with age. this is in contrast with other respiratory pathogens, such as respiratory syncytial virus ( ) and seasonal influenza ( , ) which cause substantial numbers of deaths in children and have a j-shaped pattern with age. in contrast, pandemic influenza viruses often cause severe disease in middle-aged adults ( ) ( ) ( ) , even without pre-existing co-morbidities ( ) , suggesting different pathogenesis from that for seasonal influenza. one exception was mortality from pandemic influenza in , which had different age-distribution to other influenza pandemics ( ) ; severe disease was relatively high in young adults, and appeared to be driven by a non-protective anamnestic response by previously acquired antibodies to the new virus. thus, there are some notable features of covid- death which suggest that it has a markedly different relationship with age than is the case for most other severe respiratory pathogens. the mechanisms explaining excess covid- mortality and icu admissions in elderly are unknown ( ) . having an underlying condition such as hypertension, diabetes, cardiovascular disease, severe chronic respiratory disease and cancer increases the risk ( , )e.g. an . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint italian study ( ) found that % of cases and % of covid- deaths had at least one comorbidity, whereas a similar study in china ( ) found estimates of % and % respectively. however, a strong predictor of mortality after adjusting for major comorbidities is increasing age itself ( ) . furthermore, the extreme rarity of deaths in children suggests that infection alone is not sufficient to cause death; rather, one must have gone through a number of changes, either as a result of unknown aspects of aging, or as a result of chronic disease (effectively speeding up the aging process), increasing susceptibility, leading to severe disease and death. these characteristics of covid- death are consistent with the multi-step model of disease, a model which has primarily been used for cancer ( , ) , and more recently for amyotrophic lateral sclerosis (als) ( ) . briefly, if one assumes that a disease outcome occurs due to the accumulation of a number of discrete steps throughout life, then the incidence of the outcome (this could be the incidence of disease occurrence, or in the case of covid- the incidence of death) will be proportional to age to the power of n- , where n is the total number of steps involved. thus, a plot of log (incidence) against log (age) will be a straight line, and the slope (n- ) will be one less than the number of steps (n). for epithelial tumours, the number of steps appears to be about (this is a population average since some subtypes may involve different numbers of steps), and in some instances (e.g. colorectal cancer), the relevant steps have been identified ( ). similar findings have been obtained for als, whereas some other chronic neurological conditions (e.g. multiple sclerosis) do not follow a multistep pattern ( ) . it thus appears that some diseases are 'digital' and involve discrete steps which produces the log-log linear relationship with age, whereas others are analogic and do not show these same age patterns. given what is known about the age-distribution of covid- death, and its clinical characteristics, we considered that it would be useful to explore where covid- death followed an age-pattern consistent with the multistep model. we were also interested in comparing the findings in males and females, since males have a higher death rate in each age-group ( , ) . the mechanisms explaining excess covid- mortality in males are unknown ( ) , and it has been suggested that this may be from an inherent characteristic of being male ( ). in multistep model terminology, this would mean that males were born with one step already in place, analogous to being born with a particular genetic mutation which accounts for one step of a multistep cancer process. in this scenario, males would have higher . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint death rates at each age-group, but their overall age slope would be one lower (e.g. instead of ), since one step had already been acquired at birth. as we have discussed elsewhere ( ) , there are a number of potential problems with reported covid- infection and mortality data. these include non-random population sampling for testing (mostly symptomatic people get tested) which means that diagnosed cases are likely to be more severe, misclassification of infection and/or misclassification of cause of death. moreover, there is little valid information on the underlying population infection rates. this means that estimates are available for the case fatality rate (cfr=deaths/diagnosed cases), but few estimates are available for the infection fatality rate (ifr=deaths/total infected). for the current analyses, using ifrs may have been preferable, but in almost all cases only cfrs were available in our data sources. even for the cfr, there are significant problems in estimating it accurately ( , ), as is the case for pandemic influenza ( ). underascertainment of infections is most likely to be a problem in younger age-groups where there is likely to be less testing, because there are fewer symptomatic infections ( ) . we needed data on covid- cfrs or ifrs by age-group, ideally separately for males and females. we sought potential data sources through literature searches, examination of national statistics online, and through enquiries with colleagues in different countries. in some instances, population death rates (i.e. with total population denominators) were available, but these are not appropriate for the multistep model in the current analysis which focusses on case fatality. therefore, we restricted the analysis to data with cfrs (cases as denominators) or ifrs (infections as denominators). china: the data for estimates of the cfr and ifr for china were taken from the publication by verity et al ( ) . briefly, these involved , pcr-confirmed cases in wuhan and elsewhere in china during the period st january to th february , extracted from the who-china joint mission report ( ); during the same period , covid-related deaths were reported across china. the main analyses in the publication were adjusted for censoring, demography and under-ascertainment. we report the findings using the unadjusted estimates, since these are more comparable to the data from other countries. however, only . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint the adjusted ifr is available in the report and therefore to compare the findings for the cfr with those for the ifr in the china data we used the adjusted estimates for both. as a comparison, we also obtained similar data for three other respiratory diseases (seasonal influenza, pandemic influenza and severe acute respiratory syndrome-sars). the details of the data sources, and findings for these, are given in the supplementary appendix. under-ascertainment of cases is mostly likely to be a problem in younger age-groups, and deaths are rare in this age-group and the rates are therefore unstable (ln (cfr) cannot be calculated when there are zero deaths). we therefore included all ages ( - years) in the descriptive analyses, but restricted the multistep analyses to ages years and over (an approach that has been used in some other analyses of covid- mortality ( )). we analysed age-specific (and where available, sex-specific) death rates (i.e. cfrs and ifrs) in ten-year age-groups (for publications which reported five-year age-groups we amalgamated these into ten-year groups in order to standardize the analysis). we used the mid-point of each agegroup and regressed the natural log of the cfr or ifr against the natural log of age, using standard linear least-squares regression. as in similar previous analyses ( ), we used unweighted regression so that the estimated slope was not biased towards the older age-. cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint groups where there were many more deaths. the multistep model predicts a slight flattening of the curve in the youngest and oldest age-groups -this occurs because deaths in young agegroups often involve people who have inherited at least one step, and therefore have a high risk but a lower slopein the oldest age-groups many people will already have accumulated - steps, so once again the risk is higher but the slope is lower ( ). for the multistep model, we are interested in the overall slope with equal weight on all of the age-groups (essentially it is the middle age-groups which are predicted to provide the slope which is one less than the number of steps). we did the regressions for each country, by sex (data available for spain and italy), and we also did combined analyses adjusting for country; these were weighted by the total number of cases in each country's dataset. for all analyses, we found a linear log-log relationship between the cfr and age, with a small flattening of the curve in the youngest and oldest age-groups (table ). all five countries had an overall slope of about . figures and show the findings for the cfr rate by age, and the plot of ln (cfr) against ln (age), for all five countries combined. the country-specific findings are shown in figures s -s (supplementary appendix) . for the two countries where we had data separately for males and females (italy and spain), the male death rates were higher at each age-group, but the slope was lower -by . ( figures and ) . in one country, china, we had data for both cfr and ifr, and these yielded very similar findings (not shown in table); these used adjusted data (see above), and yielded slopes of . for the cfr and . for the ifr (figures s and s , supplementary appendix). the findings for selected other respiratory virus diseases (seasonal influenza, pandemic influenza and sars) are shown in figures s -s (supplementary appendix) . seasonal influenza ( figure s ) and pandemic influenza ( figure s ) showed age-patterns that were different from each other, and markedly different from that for covid- ( figure ). although seasonal influenza generally showed an exponential increase with age, there were also high death rates in the - age-group (i.e. the overall pattern is j-shaped). data on mortality in hospitalised influenza cases in england from january to august ( ), i.e. during non-pandemic years, did not clearly show a log-log pattern, and the fitted slope is . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint only . ( figure s ). pandemic influenza (h n ) showed a very different age-pattern, with peak mortality in adulthood. modelling data from england ( ) on mortality due to h n influenza pandemic shows a j-shaped pattern mortality in the youngest age-group, with no log-log pattern, and the fitted slope is close to zero ( figure s ). sars showed a roughly linear log-log relationship ( figure s ), but with a lower slope ( . ) than was observed for covid- ( figure ) . thus, sars showed a similar log-log age-pattern to that of covid- , albeit with a lower slope; in contrast, seasonal and pandemic influenza showed quite different age-patterns, with little evidence of a log-log relationship, and with substantially lower slopes. as hypothesized, we found a linear log-log relationship between the cfr and age, with a small flattening of the curve in the youngest and oldest age-groups (as predicted by the multistep model ( )). all five countries gave similar values for the overall slope (table ), i.e. about , indicating a multistep process of about steps. as with other applications of the multistep model (e.g. cancer), this is a population average, and it may well be that some pathways to the outcome may involve a different number of steps( ). a number of limitations of the data should be recognized. firstly, the appropriate denominator depends on what the hypothesis is. for cancer, where the multistep model has been most commonly used, the hypothesis is that there are steps involved in developing cancer, so the outcome/numerator is incidence and the denominator is total population; most risk factors for cancer that follow the multistep model (e.g. a genetic mutation) affect incidence and usually not survival ( ). in the current context, the outcome is covid- death, and the hypothesis relates to mortality in those who become infected. however, in most cases, only case fatality rates (cfrs) were available, and the denominator was diagnosed cases rather than all infections. a related problem is selective case identification -most cases are identified by testing symptomatic people, and nonsymptomatic cases are largely missed, and even in those tested there will be false positive and false negative results (this is also a problem with other respiratory diseases such as pandemic influenza ( )). given all of these uncertainties, the available data that we have used for these analyses is likely to be subject to inaccuracy and misclassification (of outcome, denominator, or both). however, these uncertainties will not bias the analysis unless they operate with different strengths at different age-groups. for example, if the problem is just that the crf is double the ifr at each age-group greater than years (because at each age-group, only one-. cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint half of the infections are tested and diagnosed), this will affect the absolute value of the cfr but will not affect the slope of the log-log linear trend. furthermore, it is noteworthy that we have obtained very similar findings for the cfr in several different populations and contexts, with differing methods for ascertaining cases. if we can assume that the findings presented here are valid, given all the uncertainties of the data, what do they mean? the multistep model has been used for cancer since , but with a few exceptions (e.g. colorectal cancer ( )) the relevant stages have not yet been identified. those that have been identified typically involve mutations in dna or other cellular changes which lead to cell proliferation. the model has produced a number of testable hypotheses, e.g. relating to the stage (step) at which various carcinogens act ( ) , the dose-response relationships for factors such as smoking, and the changes in the age-incidence patterns after smoking cessation ( ). the application of the multistep model to als ( ) is beginning to yield similar benefits, including clarifying the role of genetic susceptibility which increases the disease rate at each age-group but produces a lower slope compared with persons without the genetic susceptibility ( ) . thus, identifying diseases that show a multistep pattern provides the foundations for identifying the steps and being able to intervene across the lifecourse. importantly, for covid- death if a multistep pattern is identified, it illustrates that identification of final (trigger) steps in (elderly) patients could prevent severe disease and death. we also conducted similar analyses for selected other respiratory diseases. sars showed a similar log-log age-pattern to that of covid- , albeit with a lower slope (indicating a smaller number of steps); in contrast, seasonal and pandemic influenza showed quite different agepatterns. thus, death from covid- and sars appears to follow a distinct age-pattern, consistent with a multistep model of disease. given that the stages involved in multistep cancer causation have mostly not yet been identified, and that sars-cov- is a newly discovered virus, it is not surprising that the stages involved in covid- mortality are not currently readily identifiable. however, they clearly involve changes associated with chronic conditions such as hypertension, diabetes, cardiovascular disease, severe chronic respiratory disease and cancer. they also involve other cumulative changes associated with aging (e.g. one change may be the nasal expression of ace- receptors with aging ( )), since age itself is a risk factor for covid- death. collectively, these comorbidities, and age itself, appear to be markers of immune-related . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint susceptibility, which makes the difference between experiencing a mild illness, or a severe illness possibly resulting in death. given the rapidity of covid- death following infection, it is likely that sars-cov- itself acts at a late stage of the multistep process, with the previous stages having occurred before infection, throughout the life-course. simplistically, if someone has already accumulated a number of the necessary steps (from having one of these chronic conditions, or from other cumulative changes associated with aging), then they are 'primed' to have a serious case of covid- if and when infection occurs. if they are not so 'primed', then infection can still occur, but is usually mild. this is most obvious for infections in children -who have not had one of these chronic conditions (and are not old)where the infection is almost always mild. however, it is not so clear what these steps may be. does having diabetes or a high bmi, for example, involve a specific biological change (analogous to those observed for cancer and als) which is an intrinsic part of the process leading to covid- mortality? if this is the case, are these conditions clearly-defined 'steps' which can potentially be identified? if so, one would expect that people who have one of these conditions (e.g. diabetes), would have a higher cfr for covid- death at each age-group, but a lower slopethis is a hypothesis that can be tested in future as more data with sufficient numbers of the underlying conditions emerge. on the other hand, are these conditions, like age, just markers of more general increases in immune-related susceptibility which accumulate across the life-course? in which case there must be other steps that 'prime' someone for severe disease when they are infected. whether or not distinct risk factors can be identified for the underlying steps, the clear log-log patterns of the cfr rate with age that we have identified indicate that there is a potential to identify population subgroups which are 'primed' for severe disease if they experience a sars-cov- infection. it is also striking that we see a similar pattern with sars (another coronavirus), perhaps indicating a similar mechanism for mortality, whereas we see different age-patterns for other severe respiratory infections, perhaps indicating that different mechanisms are operating. we found that the male cfr was higher than that in females at each age-group, but that the slope of the regression was . lower in males. this is close to the difference of . which was the a priori hypothesis. a difference of would suggest that males are born with one step in place. we propose that this step may be related to a relative deficiency in innate immune responses to viruses in males compared to females. recent analysis in a population-based birth cohort has shown that interferon (ifn)-α, -β and -γ responses to three common . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint respiratory viruses and two viral mimics are deficient in males compared to females, indicating that the excess covid- mortality in males is likely at least in part explained by impaired innate anti-viral immune responses in males compared to females (prof s johnston, personal communication). endosomal-expressed toll-like receptors (tlrs) / / are engaged by positive strand rna viruses such sars-cov- , which require endosomal processing as part of viral entry into cells ( ) . their activation results in production of type i ifns, which is an important step for the induction of antiviral immunity. tlrs and are encoded by loci on x chromosome locus, and biallelic expression of x-linked genes could enhance tlr - expression in female immune cells ( ) , thereby providing a mechanistic explanation for our observation. we propose that the notion that biallelic expression of xlinked tlr - is unlikely to be strictly binary, and will not occur in every female, could explain . (rather than ) difference. our findings should be regarded as preliminary, and require further replication in other populations. moreover, their etiological significance is not yet clear, though as described above specific hypotheses, such as whether diabetes is likely one of the steps, can be tested as more data emerges. nevertheless, the patterns are strikingly consistent across the countries studied. these findings are consistent with a multistep model of disease involving a six-step process that in the case of sars-cov- is probably defined by comorbidities and age producing immune-related susceptibility. identification of these steps would be potentially important for prevention and therapy for sars-cov- infection. d.a.l reports support from medtronic ltd and roche diagnostics for biomarker research unrelated to this publication. ac reports personal fees from novartis, personal fees from thermo fisher scientific, personal fees from philips, personal fees from sanofi, personal fees from stallergenes greer, outside the submitted work. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint lung infection--a public health priority a new coronavirus associated with human respiratory disease in china the cytokine storm in covid- : an overview of the involvement of the chemokine/chemokine-receptor system the health of populations: general theories and particular realities is social capital the key to inequalities in health? clinical characteristics of coronavirus disease in china estimating clinical severity of covid- from the transmission dynamics in wuhan, china epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study uk patients in hospital with covid- using the isaric who clinical characterisation protocol: prospective observational cohort study hospitalization rates and characteristics of patients hospitalized with laboratory-confirmed coronavirus disease -covid-net, states epidemiology and transmission of covid- in shenzhen china: analysis of cases and , of their close contacts covid- ) infection survey pilot: england severe acute respiratory syndrome coronavirus (sars-cov- ) infection in children and adolescents: a systematic review global, regional, and national disease burden estimates of acute lower respiratory infections due to respiratory syncytial virus in young children in : a systematic review and modelling study severe respiratory disease concurrent with the circulation of h n influenza the burden of influenza in england by age and clinical risk group: a statistical analysis to inform vaccine policy age-specific mortality during the influenza pandemic: unreavelling the mystery of high young adult mortaloty the age distribution of mortality due to influenza: pandemic and perr-pandemic age-dependence of the pandemic severe pandemic h n influenza disease due to pathogenic immune complexes covid- : the gendered impacts of the outbreak comorbidity and its impact on patients with covid- in china: a nationwide analysis the age distribution of cancer and a multi-stage theory of carcinogenesis dynamics of cancer analysis of amyotrophic lateral sclerosis as a multistep procss: a population-based modelling study gender differences in patients with covid- : focus on severity and mortality -novel coronavirus ( -ncov): estimating the case fatality rate -a word of caution likelihood of survival of coronavirus disease . lancet infectious diseases. . . nishiura h. case fatality rate of pandemic influenza estimates of the severity of coronavirus disease : a model-based analysis korean centre for disease control; . . iss tfc-ddmiesdi, istituto superiore di sanità. epidemia covid- , aggiornamento nazionale. dipartimento malattie infettive e servizio di informatica, istituto superiore di sanità actualización nº madrid: centro de coordinación de alertas y emergencias sanitarias estimating excess -year mortality from covid- according to underlying conditions and age in england: a rapid analysis using nhs health records in . million adult mortality due to pandemic (h n ) influenza in england: a comparison of the first and second waves multistage modelling of lung cancer mortality in asbestos textile workers the multistep hypothesis of als revisited: the role of genetic mutations nasal gene expression of angiotensin-converting enzyme in children and adults targeting the endocytic pathway and autophagy process as a novel therapeutic strategy in covid- tlr escapes x chromosome inactivation in immune cells we thank elizabeth brickley, stephen evans, matthew fox and judith glynn for their comments on the draft manuscript. we also thank yasuyuki gondo for supplying the data from japan. key: cord- - x dwk authors: fisman, david n.; greer, amy l.; tuite, ashleigh r. title: age is just a number: a critically important number for covid- case fatality date: - - journal: ann intern med doi: . /m - sha: doc_id: cord_uid: x dwk in their article, sudharsanan and colleagues show the importance of adjusting for the age distribution of cases of coronavirus disease before doing cross-country comparisons of case-fatality rates. the editorialists explore the effect of age distribution on these rates and other determinants of between-country variation in the severity of this disease. a pandemic, by definition, represents worldwide, simultaneous epidemics caused by a novel pathogen. the multinational nature of such an event inevitably leads to cross-national comparisons of epidemic growth, impact, and public health response. such comparisons lend themselves to ecological research: for example, the apparent slower epidemic growth rates in countries that use bacilli calmette-gué rin (bcg) vaccine has caused some researchers to assert that bcg vaccination may affect susceptibility to severe acute respiratory syndrome coronavirus ( ) . others have made similar observations about higher mean temperature and slower growth of the coronavirus (covid- ) epidemic ( ) . these national-level comparisons are vulnerable to "ecological fallacy," or attribution of individual-level outcomes to aggregate exposures ( ). however, they also represent "unfair comparisons" ( ), because the countries in question differ fundamentally on a confounder known to be associated with covid- severity: age. the association among age, disease severity, and covid- case recognition has been clear since february ( ) . older cases are more likely to be represented in surveillance data owing to greater severity and hence ascertainment. failure to recognize younger, milder cases diminishes the denominator in case-fatality ratio (cfr) calculations (that is, deaths/cases), so between-country differences in age structure explain some fraction of observed between-country variation in epidemic severity and case-fatality. an analysis by sudharsanan and colleagues ( ) used data from countries to demonstrate the importance of adjusting for the age distribution of cases before doing cross-country comparisons of cfr. to ensure that between-jurisdiction comparisons are fair comparisons, the authors used the epidemiologic tool of standardization ( ) . direct standardization by age requires estimation of age-specific risk from different populations, which are then applied to a standard population, such that resultant differences in overall risk cannot be due to differences in population age structure. in their analysis, the authors show that adjusting for differences in population age structure substantially reduces the observed differences between country-specific cfrs. to further explore the effect of age distribution on cfr, we can take the standardization approach used by sudharsanan and colleagues and turn it on its head. we can apply age-specific risk from a single epidemic to other countries, to observe how an identical epidemic, from an age-specific attack rate point of view, might be perceived differently in different places, simply due to different age structures. here, we use data from mainland china for the -day period from december to february ( ). the analyses described below can be further explored in an associated app (https ://art-bd.shinyapps.io/time_to_outbreak_detection/). when the reported cfr for february in mainland china ( . %) is age-standardized using population pyramids from other countries ( ), standardization to a country with a younger population structure, such as indonesia, markedly reduces observed cfr ( . %), whereas adjustment to a country with an older population, such as italy, increases the cfr ( . %). we can also estimate epidemic size using this approach but need to adjust for population size as well (larger countries, for a given attack rate, will have larger epidemics). we define the ratio of population in the other, comparator country (p o ) to the chinese population (p c ) as r p = p o /p c . the ratio of the epidemic size in the other, comparator country (e o ) to observed chinese epidemic size (e c ) is defined as r e = e o /e c . the "ratio of ratios" is r e /r p , which can be interpreted as the relative apparent outbreak size when an outbreak with identical age-specific attack rates occurs in a population with an age-structure that differs from that of mainland china. just as cfr for an identical epidemic is expected to be higher in countries with older populations, the ratio of ratios, r e /r p , is greater than for countries with older populations ( . for italy) and less than for countries with younger populations ( . for indonesia). in other words, identical epidemics, adjusted for population size, appear smaller in countries with younger populations (shorter life expectancy) than in those with older populations (increased life expectancy), even with identical age-specific attack rates. age structure may also affect the time to recognize an epidemic. countries with younger populations are likely to have more silent spread and be slower to identify epidemics. this may have been the root cause of a controversy that emerged early in the covid- pandemic: indonesia was predicted by models to have early importation of covid- cases, but this was not consistent with indonesian observations ( ) . critical illness and death associated with covid- may result in initial outbreak identification and are more likely to occur in older individuals; we can arbitrarily define "older" as age greater than years. we can calculate the incidence rate for observed infection, and the rate of transition to death, among susceptible older individuals in the mainland chinese population in the early days of the epidemic by using an exponential failuretime model combined with published natural history data ( , ) . when we simulate the mainland china epidemic in other countries, deaths accumulate more quickly in countries with high life expectancy (older populations) and more slowly in those with low life expectancy (younger populations). this is not to say that age distribution is the only determinant of between-country variation in epidemic severity. as sudharsanan and colleagues ( ) show, once age-related effects are removed, variability in cfr estimates remains. differential outbreak responses are likely responsible for some of this variability ( ): weak public health responses that result in overwhelmed intensive care units will cause case fatality to inflect upward. failure to adequately protect long-term care facilities from covid- will swell cfr estimates as well. availability of testing is another key determinant of observed case fatality: a recent analysis found that more testing increases the case numbers in the cfr denominator, resulting in lower cfr, with residual variability in cfr explained by age structure and country per capita gross domestic product ( ) . finally, decisions about which deaths to classify as "covid- attributable" vary across countries. serologic testing will ultimately help determine the true infection fatality ratio for covid- and better quantify underrecognition of cases by age, but until such data are widely available, standardization provides a straightforward first step to ensure that between-country comparisons are fair comparisons. correlation between universal bcg vaccination policy and reduced morbidity and mortality for covid- : an epidemiological study. medrxiv. preprint posted online impact of climate and public health interventions on the covid- pandemic: a prospective cohort study ecological fallacy and aggregated data: a case study of fried chicken restaurants, obesity and lyme disease vital surveillances: the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- )-china, . china centers for disease control weekly the contribution of the age distribution of cases to covid- case fatality across countries. a -country demographic study standardization: a classic epidemiological method for the comparison of rates r: a language and environment for statistical computing. r foundation for statistical computing. . accessed at www.r-project using predicted imports of -ncov cases to determine locations that may not be identifying all imported cases. medrxiv. preprint posted online mathematical modelling of covid- transmission and mitigation strategies in the population of ontario estimating the global infection fatality rate of covid- . medrxiv. preprint posted online key: cord- -pkjov authors: faust, jeremy samuel title: towards a better case fatality estimate for sars-cov- during the early phase of the united states outbreak date: - - journal: clin infect dis doi: . /cid/ciaa sha: doc_id: cord_uid: pkjov nan given the above considerations, data from the model as proposed by kou et al can be harnessed for calculating a denominator statement with acceptable face validity for symptomatic disease at three time points, ranging from march to march . we can further posit an alternative denominator that takes asymptomatic infection into account-which their model does not. while some estimates state that only percent of cases are ultimately asymptomatic, other estimates are closer to percent. but presymptomatic disease comprises a substantial fraction of infections at any given time and should therefore also be considered. universal screening among one healthy population detected that the rate of asymptomatic or presymptomatic disease was as high as percent. another study of older patients who were sicker at baseline found that percent of patients with a positive sars-cov- swab were asymptomatic at the time of testing, and only developed symptoms later (median time from test to symptoms = days). such patients would not be picked up in the final data point in use. taken together, a reasonably conservative a c c e p t e d m a n u s c r i p t attempt to add symptom-free cases to numbers proposed by kou et al could include a percent addition to their estimates. numerator statements, meanwhile, can reasonably be assumed to be sufficiently close to the running cumulative total number of counted covid- deaths as recorded at least two weeks after the day used to estimate the denominator. these counts are relatively reliable because covid- is currently a reportable cause of death in all us states and territories. while excess deaths may ultimately offer an attractive alternative for use as the numerator, expected lags in all-cause mortality reporting renders these numbers incomplete for several weeks. once those numbers are available, they may serve as a partial measure of quality for numerator statements based on counted covid- deaths, which are prone to some degree of error. thus, the use of excess mortality may at some point provide another lens through which to verify the accuracy of these counts, as excess all-cause mortality figures does not rely on the subjective judgement of those filling out death certificates. march st appears to be the best available date upon which to estimate a denominator for the cfr of sars-sov- using the model provided by kou et al. this date has the advantage both of being the peak of ili reporting to the cdc while being directly prior to the time when the effects of many of the mitigation strategies and changes in public behavior mentioned above began to become noticeable on a systemic level. as of april th , public covid- trackers reported a crude cfr of . percent worldwide. using the kou model as a source for the denominator (cases as of march st ) and all deaths through april th as the numerator (including all deaths that occurred on us soil prior to march ), the calculated cfr appears to have been approximately percent of estimates on public-facing covid trackers-and this only accounts for symptomatic cases ( table , column ). further, allowing for the addition of pre-or asymptomatic cases into the denominator reveals a cfr of just percent of the figures published on covid- trackers ( table , column ). these figures mirror estimates obtained in closed systems where universal testing was achieved, such as the diamond princess cruise. while the crude cfr on the diamond princess appears to have settled at around . percent, passengers aged or older were over-represented as compared to other cohorts by a factor of approximately four. this implies an age-adjusted cfr for the diamond princess of . , which is remarkably similar to implied rates we calculate here using the denominator based on kou et al with adjustment for symptom-free infection ( table , column ). these numbers are higher, though not astronomically, than estimates given in the increasingly controversial santa clara county serology study. if we instead use some of the higher reported numbers of pre-or asymptomatic cases found in the emerging literature, the estimated cfr we might calculate would indeed approach the . percent figure proposed by the authors of the santa clara study. together, these data imply that a more accurate cfr for sars-cov- may rest between . and . percent for symptomatic cases, and . and . percent for all cases including pre-and asymptomatic infections. however, this would also appear to imply that sars-cov- has a cfr that is between one and eight times greater than reported figures for seasonal flu. based upon recent ground conditions during the covid- outbreak compared to the peak of the worst flu seasons from recent years (as well as the h n pandemic), no credible case can be covid- fatality is likely overestimated fact sheet medicare telemedicine health care provider fact sheet universal screening for sars-cov- in women admitted for delivery: nejm department of medicine. asymptomatic transmission, the achilles' heel of current strategies to control covid- : nejm. . available at estimating the asymptomatic proportion of coronavirus disease (covid- ) cases on board the diamond princess cruise ship excess deaths associated with covid- field briefing: diamond princess covid- cases. available at covid- antibody seroprevalence a c c e p t e d m a n u s c r i p t a c c e p t e d m a n u s c r i p t key: cord- -y ck lo authors: simon, perikles title: robust estimation of infection fatality rates during the early phase of a pandemic date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: y ck lo during a pandemic, robust estimation of case fatality rates (cfrs) is essential to plan and control suppression and mitigation strategies. at present, estimates for the cfr of covid- caused by sars-cov- infection vary considerably. expert consensus of . - % covers in practical terms a range from normal seasonable influenza to spanish influenza. in the following, i deduce a formula for an adjusted infection fatality rate (ifr) to assess mortality in a period following a positive test adjusted for selection bias. official datasets on cases and deaths were combined with data sets on number of tests. after data curation and quality control, a total of ifr (n= ) was calculated for countries for periods of up to days between registration of a case and death. estimates for irfs increased with length of period, but levelled off at > days with a median for all countries of . ( %-ci: . - . ). an epidemiologically derived ifr of . % ( %-ci: . %- . %) was determined for iceland and was very close to the calculated ifr of . % ( %-ci: . - . ), but . - -fold lower than cfrs. ifrs, but not cfrs, were positively associated with increased proportions of elderly in age-cohorts (n= , spearman's ρ =. , p =. ). real-time data on molecular and serological testing may further displace classical diagnosis of disease and its related death. i will critically discuss, why, how and under which conditions the ifr, provides a more solid early estimate of the global burden of a pandemic than the cfr. in the early phase of a pandemic caused by a novel pathogen, it is difficult to estimate the final burden of disease. in the case of the ongoing pandemic caused by sars-cov- , it has been proposed, on the one hand, that it will be the most serious seen for a respiratory virus since the - h n influenza pandemic . this pandemic contributed to the premature death of percent of the world population at the time being . on the other hand, despite all non-refutable morbidity, covid- may fall short of provoking a comparable impact on mortality as the seasonal influenza, which is estimated to contribute to , deaths per year on average . experts conclude that there is still a range for cfr of . up to . , which practically spoken is reflecting the margin from normal seasonal influenza to the lower boundary cfr estimate of - h n influenza . case fatality rates (cfr) can be helpful to critically control and reflect the outcome of robust modelling to estimate the global burden by mortality . during the phase of an outbreak cfrs are preliminary and should be communicated and used with caution . even in the case of the well-known and frequently studied seasonal influenza a, it is a matter of debate on how to estimate a cfr during the phase of the pandemic, which could be calculated by the total number of "deaths" divided by the number of "cases". there are ongoing discussions, what can and should be regarded as a "case" and a reasonably causal-related death . a "case" could ideally be a confirmed case of the infectious disease according to strict diagnostic guidelines, requiring symptoms and confirmatory testing. moreover, the death would ideally be a causally related death and not a death caused by superinfection over the course of hospitalization, for instance. whether we impose less or more strict guidelines to define a suitable "case" and its causally related death, either way will inevitably introduce selection bias. this may lead to both, either substantially higher, or substantially lower estimation of cfrs , . on the one hand, it can be argued that a strict procedure for confirmation of official cases and deaths may underestimate the effect of disease on mortality, since we will miss out both, deaths and cases, for a significant proportion of the population . on the other hand, an infection with a pathogen of mild up to medium virulence, like sars-cov- , can be completely asymptomatic, or may cause only minor symptoms to a majority of infected persons , . at present the size of the denominator of total community infections is unknown , but at least there is increasing evidence that a major part of the younger population, rarely become symptomatic and even more rarely could die of the disease, if diagnosed with covid- . at the beginning of a pandemic with a novel pathogen central aspects of previously acquired immunity or genetic resistance against viruses , or other pathogens are often unknown or only vaguely explored. if they existed, but were not taken into account, the estimated cfr would not reflect the burden of an infectious disease on a macro perspective. in such a setting looking preferentially at those, who have the full-symptomatic disease or are subjected to the surveillance system, can severely overestimate the burden of disease. therefore, in the early phase of an outbreak, infectious disease epidemiologists will rather base their estimates of the potential burden of a pandemic rather models requiring basic assumptions on the basic reproduction number r , the latent period of the infectious period, and the interval of half-maximum infectiousness and many other factors. these all need to be derived from early field studies that often need to be conducted under sub-optimal conditions, in the heat of a pandemic . these basic assumptions of epidemiological key figures are then used for modelling the almost uncharted , , . from a stochastic point of view, such modelling is prone to an exponential propagation of imputation error, which can principally be controlled for and reported with these models . however, as in the present case such error propagation will finally arrive at conclusions via modelling, that are indeed based on assuming merely all or nothing and thus indeed would need to be communicated with uttermost caution to the public and foremost health politicians . under these circumstances, it could be helpful to look at alternative ways to asses a robust figure for cfr as a typical, hard to predict estimate for global burden of a pandemic. alternative ways may allow interdisciplinary abductive reasoning to arrive at an estimate for the global burden of a disease. particularly under non well-defined and dynamic circumstances, abductive reasoning can be more helpful than the best medical evidence employing inductive statistical inference, alone . there has just recently been a promising approach to arrive at a more robust measure of mortality . this involved calculating an infection fatality rate (ifr), which takes not only the asymptomatic population and their relevance for mortality into account, but also adjusted for censoring and ascertainment bias. however, this approach required again an immense workload on retrieving and curating valid data retrieved, from cohorts studied under specific conditions, and making preassumptions, which is again an approach prone to error propagation. here i will deduce that an ifr adjusted for selection bias in favour of more morbid persons, can be determined with the help of available official data in conjunction with the testing figures. i will show that the ifr adjusts the cfr for some essential sorts of bias. in order to cross validate the computed figures, i will estimate infection fatality from an ongoing large-scale testing pool of citizens representative for the general population in iceland. at the time of finishing the dataset ( th of april) . % of the general population in the representative cohort and . % of the typically symptomatic part of the population had been tested . finally, i will compare how cfrs and the calculated ifrs are suited to reflect essential epidemiological aspects already known to be associated with covid- . the estimation of an ifr is based on two different and -regarding the influence of selection biasdivergent procedures to calculate a cfr from infection-related population data. the first formula ( ) is a variation of the non-adjusted cfr in the following termed "classic cfr", which divides the sum of deaths by the sum of cases on a given day. formula ( ) takes into account the persons that passed away or recovered, and days from reported positive testing (d ) to death (dn) as period (d -n) into account. for a given time interval of n days from (d -n) at d sums of test positives (tp ), deceased persons (dp ) and recovered persons (rp ) are given and at dn sum of deceased persons (dpn) are known. then a case fatality ratio cfr for the interval d -n as cfrn- can be calculated: for the data provided by johns hopkins university (ju) the recovered persons (rp ) can be included, while for the data provided by the european center of disease control (ecdc) a more simplified version can be calculated substituting tp − rp with tp . therefore, data of the ecdc were mostly used for quality control aspects. they served to critically revising some negative case and death numbers in the data set of ju for numbers that had been officially reported at one day to the who, and the ecdc, but then were then corrected later on by national health authorities. this was for instance the case for the data of iceland. as i mentioned in the introduction the denominator of total community infections is unknown, but a rather critical factor of uncertainty under the given ongoing pandemic. here i will put this into a likewise simple mathematical term by calculating the cfr as a cfr', taking this unknown denominator into account. for the beginning of a pandemic rp is often low or zero. if the total number of tests (nt) conducted at d is known (nt) an cfr' can be calculated, by taking the total population np of the country or region into account, in which testing had been conducted. noteworthy, this is not a typical formula to calculate a cfr, since it may tend to underestimate the true fatality until the infectious disease has stopped spreading in the population. however, just like formula , at the end it depends only on registered death of persons and number of cases and it is subject to the same factors that add bias on the estimation of a mortality figure, but only in the opposite direction. moreover, since the size of the denominator of total community infections is unknown but appears to be highly relevant, this equation now puts the cfr into a context with the general population. the prevalence (p) for non-recovered infected persons in a total population (np) can be calculated: based on the prevalence of infected persons in a population a cfr' for the case fatality for the interval d -n (cfr'n- ) can be calculated: ( ) cfr′ − = dp n −dp p * both formulas ( ) and ( ) will have their shortcomings. it will be briefly discussed why ( ) will most likely overestimate and ( ) underestimate an ifr under the unavoidable premise that official testing will tend to test more morbid persons. in the equations ( ) and ( ) the pool of newly infected persons is subject to selection bias. formula ( ) typically underestimates ifr, because the prevalence p of active cases is typically determined too high to be generalized to the total population. during an outbreak, this is unavoidable for testing strategies solely based on the health care systems, since guidelines for testing require -or at least favour -preferential testing of persons with an accumulation of risk factors, like specific disease-related symptoms, or stay or visit at an endemic region. accordingly, active cases are overrepresented in a relatively too small pool nt to be representative for the p in the general population np. furthermore, some countries or regions may have limited resources when a pandemic proceeds and will adjust their testing guidelines to detect positive cases with as few nt as possible. to control for this distortion by selection bias aiming at enriching for positive cases in the test pool nt we need to adjust the overestimated prevalence p with the unknown factor f to turn the cfr′ − into the ifr − . in ( ) the calculated cfr is determined rather too high and does not represent an ifr, because of just the same distortion factor f as in ( ) . since testers selectively address the pool of diseased persons, into the pool of persons tested nt, they will therefore also increase the risk of death that would be representative for the population of all infected persons, which should be reflected by our ifr. likewise, we will need to correct cfrn- by the same factor f as in ( ) . in this case, ifr − can be calculated as: at this point, i suggest the reader to jump to table in the results section, to understand better, why cfr is divided by f and cfr' multiplied with f. typical potential distortion factors, like the gross domestic product of a country, but first and foremost the age composition of the population, are inversely correlated with cfr or cfr'. to adjust these two equations for all the factors that act in divergent ways on the figures calculated by ( ) or ( ), we can now solve the equation below for the common distortion factor f to adjust the cfrs calculated in ( ) and ( ) and estimate an ifr − . solving for f: accordingly, cfr adjusted with the factor = √ np nt , resembles an ifr, which can be calculated by equations or , with both equations delivering the same outcome. to compare estimates of the ifr i analysed the data on cases, deaths and recoveries published in real time and once daily after correction by jh . data between the th and nd of march were combined and validated with the respective data from the ecdc . for the numbers of death i used the data for the training data set in its final corrected version from jh for the end of the th of march, for the enlarged final dataset the st of march, and for the data set for the validation with the epidemiolocal project in iceland the version th of april . for the calculation of the epidemiologically derived ifrdecode we did not need to apply the correction factor f, because the prevalence p in the population for revealing a positive test has been determined experimentally as pdecode which was used to determine the ifr for the general population of iceland with the formula: this formula is not relying anymore on cases reported in the official databases of jh or ecdc and it served as a cross-validation figure for the ifr and the cfrs, which are solely based on these data and the population data of iceland in the validation part of the results section. for the final analysis in the training and the final dataset only countries were included which reported at least one death by the nd of march in the ecdc and the jh database. to avoid undue variance by too few case numbers only data points from countries were included with more than a sum of cases, and at least two official reports of the numbers of tests performed in the period th of march till nd of march. data on test frequencies were originally obtained from the open-source "our world in data" . population data of nations were imported from world bank and data on gross domestic products and age cohort compositions from the un in their version for to be comparable with the population data from or as age cohort estimates for . for countries, sufficient data were controlled, corrected and updated with information from the official data source pages of the national health agencies as listed in supplemental table . two more datasets were obtained for a validation study. first, data from a testing cohort of the normal population of iceland led by the genetic company decode . the project is planned as a clinical project with ethical permit and so far, there is to the best of my knowledge no other data available on representative study cohorts from populations without being pre-selected for pathological symptoms. second, data mining for a final enlarged dataset was done on the pages of the official national health agencies, wikipedia, and within the data mining community on github using archived webpages if necessary in order to enable a large-scale cross-country assessment and comparison of ifr-values. in subset-analysis on multiple entries for the log-normalized cumulative testing data, there was a pearson's correlation coefficient between data entries across the three different data sources > . (data not shown). most of the data nested into groups showed signs of unequal variance by barlett and levene testing. therefore, they were log-normalized and in case data included an offset of . was added before log-normalization, taking care that this did not distort the distribution of data, as analysed by shapiro-wilk-testing. normal distributed data with equal variance across groups were then compared using one-way anova f-testing. global significance was followed up by all-pairs tukey-kramer testing as post-hoc test. for reporting, data were de-normalized adding the offset where necessary and reported as means with -% cis, if not specified otherwise. if signs of non-normal distribution or unequal variance prevailed, a wilcoxon-test on rank sums for group comparisons, or a spearman's rank correlation coefficient was reported. for the testing data set descriptive median values and their interquartile -ranges were reported. the combined datasets from ecdc and jh contained information on deaths, cases, and in the jh dataset information on recoveries. for ecdc , data entries from countries starting on the st of december . and for jh , entries from the same countries starting on st of january were combined. cumulative cases and deaths were significantly correlated between both data sets (pearson's r > . , p < . , for both, data not shown). no data on test frequencies were reported in the official international data repositories. from the platform "our world in data" different data entries for countries were retrieved. after combination with the official data on test figures, fourteen countries fulfilled inclusion criteria. testing data for these countries were controlled by visiting the official test report pages of the fourteen countries (supplemental table ), which enabled adding another data points. the calculation of the classic cfr by dividing deaths by cases revealed a large range for the respective medians of the countries and cfr calculated for the period from the nd to the th of march surpassed % for countries (fig. ) . excess mortality was present throughout most points in time for italy, uk, france, the philippines, and canada, except for one data point, which was related to a period from case to death of only one week. excess mortality or mortality too close to the point, or even with the point of death is a bias, which will not be corrected by the factor f and will inevitably lead to an overestimation of cfr with both formulas ( ) and ( ). cfr values above % are theoretically impossible, while cfr values over % are at least highly unrealistic. therefore, i excluded these data points with a cfr > from further analysis. noteworthy, this led to the removal of all data points from italy (classic cfr = . ,) france ( . ), uk ( . ) and philippines ( . ). noteworthy, selection did not exclude japan with a classic cfr of . ( table ). . values > % were excluded from further analysis as explained in the results section. upper right shows the classic cfr calculated as total deaths at one day divided by active cases at same day for the remaining countries. the classic cfr is the higher, the more recent data were assessed (legend for all upper parts). ifr was calculated for the remaining countries and is shown at lower left to vary also depending on period. while the ifr increases with period, this increase declines significantly with increasing period as shown in the lower right. in comparison with the cfr (blues curve) with % ci on splined data with moderate , the classic cfr red curve, the ifr green curve shows higher values in countries that have the who status ( . . ) "in local transmission". values reached those of the countries either the status "outbreak" after the occurrence of the first reported death. though it is plausible that conducting more tests per day, can contribute to artificially increasing both types of cfrs, countries may also respond with increasing their test numbers, once they notice increases in the test positive ratio as a sign for focusing testing too much on the more morbid part of the population (table , last line). in table in the following we will use f as an adjustment factor for determining the ifr − of sars-cov- and comparing the obtained values with the three different cfrs (tab. ). the median ifr values of the nine remaining countries lie in a close margin between . for south korea and . for denmark, while classic cfr and cfr values still show high variance for the remaining countries. especially for south korea and japan as two comparable countries that are in a phase of stagnation the cfr values calculated are roughly - times higher and still rising and they are without correction by f not in the margin of the current expert consensus for a cfr for covid- (table ) . table shows the medians and quartiles of the three different cfr values and the ifr in contrast to the classic cfr, cfr and cfr' the ifr values, including japan, were in the lower range of expert consensus. as the classic cfrs the ifr can depend on the length of the period from cases included to the deaths they are related to figure (lower part). a correlation of the percent increases from day to day with increasing periods in days for all ifrs computed for the countries shown in table , showed a significant negative trend (pearson's r = . ; n = ; p = . ). therefore, dependence on the period between cases and deaths seemed to become more moderate over time and was rather related to the state of the pandemic categorized as "in local transmission" or "outbreak" (fig. , lower right) . as can be observed from the curve of the classic cfr (ccfr), data, which rely on cases assessed before the first death related to the outbreak was registered, tended to be either far too low (ccfr), or to high cfr and ifr. therefore, for the following validation of data from iceland and the evaluation of the final validation dataset, care was taken to include data after the first death reported which ideally also reflected at least a period of days for calculation of the ifrs. for validation of my procedure, i analyzed data from two different testing cohorts in iceland . up to the last data entry for the th of april, . % of the general population in the representative cohort had been tested for sars-cov- by the genetics company decode (decode). additionally, a second, rather typical test cohort of persons with increased risk of infection, representing . % of the total population, had been tested by the directorate of health in iceland via the laboratory of the national university hospital iceland (nuhi). only data were included following the first death on the th of march and allowing for at least a day period of the ifr. prevalence of sars-cov- positive tested persons was . fold-lower (ci: . - . ) in the decode collective, which was highly significantly different from the correction factor f at . (ci: . - . , f = , df = , p = . ; figure ). the upper panel shows four different estimates for mortality of the population of iceland fitted by a spline function with moderate and %-cis shaded. the ifrdecode is the figure derived from testing the general population of iceland and served to cross validate the mortality figures cfr and classic cfr that have been calculated from the data repositories of jh and the ifr that used this repository in conjunction with the test data published by iceland's department of public health. the lower left shows different mortality figures calculated compared to the data ifrdecode that is epidemiologically derived and calculated by formula ( ) the data still seemed to rise with an increase in the period of the ifr as indicated by the change in colour with increasing period from red to blue. the lower left shows the comparison of the distortion factor f compared with the p-quotient, which is the quotient for the prevalence of a positive test results in the test pool of the health officials of iceland compared with the general population. for the decode collective an epidemiologically derived prevalence of being tested positive can be calculated according to formula ( ) for general population, which served to calculate an ifrdecode according to formula ( ) representative for the general population and independent from the cases reported by nuhi or in the ju database. the group comparison for ifrdecode with other calculated fatality rates showed a significant global group difference (n = , df = , f = . , p = . ), which was followed by all-pairs tukey kramer post hoc testing (fig. ) . the ifrdecode with . (ci . - . ) did neither differ from the ifr calculated from ju data (ifrju) . (ci: . - . ), nor the one calculated from nuhi data ifrnuhi = . (ci: . - . ). while the classic cfr (ccfr) . ( . - . ; p = . ), the cfrju . (ci . - . , p = . ), and the cfrnuhi . (ci: . - . , p = . ) tended to overestimate fatality of infection roughly . -fold up to -fold. this margin of overestimation is relatively low, when compared to other observed fold-differences between the cfr and the ifr in the training dataset described in table . in order to validate the concept of the ifr on a larger and more heterogenous collection of countries, from more continents than europe and asia, and in order to compare the estimates with expert consensus and conventional cfrs, a more comprehensive data set was composed. this was achieved by connecting the data from jh and ecdc up to the st of march with the data on test numbers conducted as retrieved from the following internet sources (suppl. table ): data found on wikipedia, in the covid- tracking project on github, the cross validated data on "our world in data" (owid), and non-validated data from owid relying on press releases for instance, but reporting its sources rigorously . double entries in different data sources were cross validated. while cross-validation indicated a high data reliability (r > . , p = . ), highly significantly more data that did not pass quality control of staying below a cut-off for the cfr of % had been retrieved from unofficial data sources (data not shown). this cannot be taken as a sign of higher inaccuracy of unofficial data per se, since the following difficulties were encountered by controlling the data on testing frequencies. though referenced correctly, the data in unofficial sources for one country were sometimes referring to different starting points of cumulative assessment. after finding, visiting and translating the original reports page from japanese authorities (supplemental table ) data point entries could be increased from two to , but it became also evident that there were data reporting cumulative test figures starting th of march and data starting from the very beginning of case and death reporting. there is the question whether to exclude or include the cases from an international cruise ship that was under quarantine in a japanese harbor, since most of the ship passengers were not japanese. this is relevant for the early reporting in japan, while, fortunately, the testing figure done on those individuals is rather negatable for later points with higher cumulative total test figures, which analyzed in the following. some other countries were not reporting cumulative, but solely daily, or weekly reports of their testing figures and only in their national language, which could sometimes unintendedly be misinterpreted as cumulative total, when figures were high and rapidly increasing, as for instance in germany. semiofficial resources started to report first estimates on testing figures more than a week before official sources in germany. these figures were too low and had been corrected by official resources, but the up to now official data are still incomplete, which can only be revealed, it one translates, reads and understands the complete report in german language (suppemental table ). moreover, reporting by unofficial sources can be sometimes more precise than official data, but points towards a new field of uncertainty. for the us, the unofficial data-tracking project on github published the data differentiating in reported positive, negative and tests pending. the "test pending" category could be very relevant. a set of countries emerged ( table ) . by data mining, i was able to retrieve the following data on cumulative test numbers. data points came from the national official reporting organ of the countries, data points were originally retrieved via owid, but controlled, and then updated with official national data, data points came from owid, for the usa from the tracking project, where i relied on the confirmed test numbers, excluding the pending ones, to avoid partial doubling of data. finally, data points were retrieved from wikipedia on the covid- pandemic information pages, where also pdfs and links to the national sources of data are published. these data points belonged originally to countries, but only fulfilled inclusion criteria. these countries are listed with their ifrs in table , which also provides means and %-cis for the means of japan and korea are both at the end of a consolidation phase and did neither show values for the classic cfr nor for cfr in line with expert estimates, but their ifr estimates are again in line with that of all other countries (fig. ) . the ifr is with . - . a bit lower than the current expert consensus, but the margin reflected ( fig. and table ) is the narrowest for all countries. the median for all countries for the ifrs was . ( %-ci: . - . ) and significantly different from cfr with . ( %-ci: . - . , p = . ) and the classic cfr of . ( %-ci: . - . , p = . . ; figure upper left). an epidemiologically derived ifr of . % ( %-ci: . %- . %) was determined for iceland and was very close to the calculated ifr of . % ( %-ci: . - . ), but highly significantly . - fold lower than cfrs. ifrs, but not cfrs, were positively associated with the medians of the countries ifrs were significantly positively associated in men with the proportion of elderly people in the respective countries age cohorts > years (r = . , p = . ), > years (r = . , p = . ), > years (r = . , p = . ), while only significantly associated with the age cohort > years (r = . , p = . ) in females (figure lower part). table : presented are the ifrs for the countries with their means and %-cis in the validation data set. in the lines in bold, the ifrs are nested into three groups according to progressive increase in time period, ranging from below days over - days to weeks and more. in the two lower lines the estimates for the cfr and the classic cfr are shown. during the outbreak of a pandemic, it is difficult to estimate and then communicate a cfr precisely enough from epidemiological data , while situations, where many countries may run out of health resources nevertheless require guidance and recommendations by experts , . in the current situation of the sars-cov- outbreak, the general public is confronted with large differences in the estimation of conventional cfrs between countries like italy (> %), south korea (> %) or germany ( . %). experts' estimates for cfr vary between . up to , which reflects a broad range of pandemic scenarios and a broad range of possible mitigation and suppression strategies which could be derived by experts. in this situation, modelling of scenarios is applied, by relying on key parameters of epidemic spread , , . the most crucial imputation for such models is the basic reproductive number r , which can be assessed from data on the early outbreak of a pandemic, but is compromised by a significant level of uncertainty on top of any epidemiologically derived level of statistical confidence due to essential uncertainties how such data can be transferred from one scenario to another. to put this in simple terms, a cruise ships' field conditions during a quarantine are not comparable to an unanticipated new pandemic outbreak in china , are not comparable to europe . applying modelling now requires even more imputation on the latent period of the infectious period and the interval of half-maximum infectiousness. values could again all be derived from environmental observations, and all prone to a substantially unknown extend of error, especially if assessed for a new pathogen for the very first time. in principal, imputation values can be subject to modelling itself . also modeled values used for imputation into the next model as an assumption will not avoid further inflating the level of uncertainty . what will come out at the end, is the most precise we can get with modeling, we will end up in a range of scenarios from the spanish influenza down to the seasonable flu and with a range of mitigation or suppression strategies that will all be supportable, principally. to improve modelling substantially would now require narrowing the range of fatality down to a margin at which modelling makes sense. a very recent publication describes a way, how we could achieve this goal . in this publication, again a high number of imputations had to be fed again into a model, again field conditions, which are not comparable between each other, had to be chosen and basic assumptions had to be made to model an estimate for the ifr. while this estimate may indeed be more precise than former estimates, the basic problem of requiring a lot of proper field work and requiring a lot of as precise basic assumptions as possible to avoid excessive error propagation of unknown extend, had not been solved or dealt with. the ifr was adjusted for census and for a problem, which at first glance my ifr is not capable to cope with, ascertainment aspects . however, a more morbid person, which ends up preferentially in a test pool during the outbreak of a pandemic, may not be a more elderly person or a person having better access to testing, only. we just are not aware, of the many factors that all may contribute to preferentially testing certain people in the heat of a pandemic. we are trusting in ex posteriori -derived assumptions and confirm them with a model. here i propose a way to adjust for this one particular problem in infection epidemiology -preferential selection of persons that will show up in a test pool -, if there was equal access to the test pool and enough testing capacity. i deduced that my approach only required sequentially monitored confirmed cases, recovered cases, and death events in conjunction with total numbers of diagnostic tests performed in a given population. these data, except the total number of tests conducted, are already subject to official reporting and data collection by national and international centres of disease control. i showed that this approach successfully stabilized against selection bias, with a validation against field data in iceland and by comparing cfrs and calculated ifrs for plausibility and for their ability to reflect an association with census not only within countries, but also across countries. the latter is important, if we assume that biological aspects of a virus are valid across the boundaries of nations. this approach required deducing a correction for selection bias, here termed f, and validating the effect of applying this correction factor to empirical data, to arrive at preliminary estimates for countries of the world with % confidence intervals for ifrs. it is a preliminary estimate with all its shortcomings, but it is a single, potentially relevant variable for crude mortality, logically deduced, requiring only data imputations that potentially could be delivered with high certainty during future pandemics, with low effort, and at low cost. more crucially, it does not require exponential modelling or substantial expert knowledge to arrive at a readout for crude mortality that appears to be robust between countries and appears to reflect viral biology. the correction factor f is simply the square root of the quotient of the total population divided by total tests conducted. even if countries would either go through periods of rapid test rate growths or experience limitations with their testing capacities, the distortion provoked will not lead to huge uncertainty ranges by a substantial unknown error propagation. correcting cfrs with f is capable of harmonizing differences in cfrs between countries that would otherwise be difficult to explain. amongst these candidate countries are japan, south korea, iceland, and norway, which have done meticulous work in dealing with their testing data, protocolling everything transparently and timely, to the public, and moreover, which have strong economies and strong health care systems to cope with the current pandemic. amongst those, that report their testing data almost in real time and comprehensively, is pakistan. pakistan is a country, which seems to fall out of the range of ifrs, with an ifr of . that is roughly -fold lower than the one reported for the so-called developed countries. since testing was reported transparent and timely, it is important to understand, whether this extremely low ifr figure reported in table could be possibly realistic, or not. population statistics of this country compared to any of the developed countries is very informative with this regard. as of , . % of the population in pakistan were over years old and % were younger than . in germany % were younger than then years old, % were older than years. the cfr for people aged - compared to people aged below has been published to be roughly -fold higher . even though this figure is most likely too high because of the cfr being prone to be inflated by selecting a more morbid population into the testing pool, there is agreement amongst scientists, that sars-cov- at least shows a strong difference within countries or within regions to be associated with higher values for older people . by using the correction factor f it is now possible to show significant association of a sars-cov- related mortality figure with age composition not only within, but also between countries. a crucial problem for testing data prevails, on the present level of accuracy for official reporting. even for countries like the us, italy and the uk, with very timely and detailed test reporting, a calculation of an ifr could not have delivered any meaningful outcome than a cfr calculation, or essentially, guessing. in the heat of an ongoing pandemic, it is often still possible to report a death almost in realtime. at the same time, cases will be prone to unreported delay factors, once testing reaches maximum testing capacity. during an exponential growth, this will cause a severe distortion, if a cfr or ifr is calculated. this might happen just because we think that our cases are reported with the day of testing, but in fact, the case may appear as a reported case many days after the death of the person, leading to a cfr of sometimes more than , % (for an example; uk, figure ). on the opposite, germany had been able to expand its testing volume presumably (pending data revision as of april th , suppl. table for reference) by a factor of . from week to week of the current year, which inflated case number. such bias will not only limit the validity of the cfr, but also limit the validity of ifr calculation. however, in contrast to the problems of unknown error propagation in modeling approaches, such limitation could principally be dealt with. if a similar pandemic outbreak, with concomitantly high enough testing capacity, in a well enough informed and health educated cohort of enough people would occur again. the numbers generated here for the ifrs need to be critically taken into consideration by abductive reasoning in an interdisciplinary committee of experts. they are no standalone figures for mortality, since they mainly reduce one particular sort of bias amongst the manifold in empirical work in the field of infection epidemiology. with the decisions to follow certain mitigation or suppression strategies by almost all developed nations, there will be problems to be solved around the globe . there is one last question, which i will not discuss here. provided that there was a defined place with more than . people, and provided at that place everybody knew what covid- is, will not miss a single death, will take care of people coughing, and provided that there was enough testing capacity at hand: could it be that my formula ( ) to correct for selecting more morbid persons into a test pool, will also correct for all other sorts of selection bias, therefore delivering an ultra-precise early estimate for mortality? reassessing the global mortality burden of the influenza pandemic global mortality associated with seasonal influenza epidemics: new burden estimates and predictors from the glamor project covid- -navigating the uncharted communicating the risk of death from novel coronavirus disease (covid- ) case fatality ratio of pandemic influenza. the lancet. infectious diseases potential biases in estimating absolute and relative case-fatality risks during outbreaks estimating the asymptomatic proportion of coronavirus disease (covid- ) cases on board the diamond princess cruise ship early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia case-fatality rate and characteristics of patients dying in relation to covid- in italy quantifying sars-cov- transmission suggests epidemic control with digital contact tracing approaches to uncertainty in exposure assessment in environmental epidemiology. annual review of public health current anti-doping crisis: the limits of medical evidence employing inductive statistical inference. sports medicine (auckland estimates of the severity of coronavirus disease : a model-based analysis. the lancet. infectious diseases data on the two different populations in iceland and description an interactive web-based dashboard to track covid- in real time data provided for the covid- infection by johns hopkins university via the united nations probabilistic population projections: an introduction to demographic forecasting with uncertainty incubation period and other epidemiological characteristics of novel coronavirus infections with right truncation: a statistical analysis of publicly available case data none to be declared. all data used for this study are freely available and accessible via the given references. composed datasets are available upon request. no funding supported the work presented. i would like to thank the health authorities of iceland and decode genetics for making their data freely available for the public. many thanks to the open data source community in its broadest sense. there is massive amounts of work going into curation of pages like the one we can now find on wikipedia about testing. i thank my colleagues, dear friends, and family members, who critically read the manuscript. key: cord- -huyl vz authors: shagam, lev title: untangling factors associated with country-specific covid- incidence, mortality and case fatality rates during the first quarter of date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: huyl vz at early stages of the covid- pandemic which we are experiencing, the publicly reported incidence, mortality and case fatality rates (cfr) vary significantly between countries. here we aim to untangle factors that are associated with the differences during the first quarter of the year . number of performed covid- tests has a strong correlation with country-specific incidence (p < × - ) and mortality rate (p = . × - ). using multivariate linear regression we show that incidence and mortality rates correlate significantly with gdp per capita (p = . × - and . × - , respectively), country-specific duration of the outbreak ( . × - and . ), fraction of citizens over years old (p = . and . × - ) and level of press freedom (p = . and . ) which cumulatively explain % of variability of incidence and more than % of variability of mortality of the disease during the period analyzed. country hemisphere demonstrated significant correlation only with mortality (p = . and . ) whereas population density (p = . and p = . ) and latitude (p = . and . ) did not reach significance in our model. case fatality rate is shown to rise as the outbreak progresses (p = . ). we rank countries by covid- mortality corrected for incidence and the factors that were shown to affect it, and by cfr corrected for outbreak duration, yielding very similar results. among the countries where the outbreak started after the th of february and with at least registered patients during the period analyzed, the lowest corrected cfr are seen in israel, south africa and chile. the ranking results should be considered with caution as they do not consider all confounding factors or data reporting biases. the coronavirus disease (covid- ) that emerged in china in late and is caused by the sars-cov- virus from the coronaviridae family (andersen et al. ) has led to almost thousands registered deaths during the first quarter of only and is having a huge impact on the world economy. examination of factors influencing covid- epidemiological parameters and comparison of the latter across countries has crucial importance during and following the pandemic. here we focus on covid- incidence, mortality and case fatality rate (cfr) and examine their correlations with selected variables characterizing demographic, healthcare, financial, geographical and political aspects of the countries. specifically, mortality and case fatality rate for covid- have been previously shown to be associated with age, comorbidities and sex (onder, rezza, and brusaferro ) . some studies suggest that country-specific cfr and incidence can depend on latitude (braiman ) and temperature (triplett ) , respectively. apart from testing the abovementioned correlations in our study, here we also examine the following assumptions: (i) number of laboratory tests for the infection could correlate with incidence, mortality or cfr, (ii) covid- incidence at the beginning of pandemic could be associated with the countries' economic situation (for instance, being impacted by travel activity of the citizens) as well as with population density (potentially facilitating spread of the virus), (iii) finally, incidence, mortality and thus cfr could be misreported in the countries with limited press freedom. all the three abovementioned epidemiological parameters vary substantially across countries during the period of interest. comparison of the countries' healthcare systems in regard to cfr could shed light on best practices of pandemy management (khafaie and rahim ) . it is being extensively discussed though that reported cfr could be biased due to a number of factors. countries might use different death reporting policies. hospitalization and laboratory testing capacities can also vary both across and within countries (onder, rezza, and brusaferro ) which would create inaccuracies within the data that are very difficult to account for. also, the cfr which can only be computed using censored data during the course of outbreak significantly depends on the way it is calculated. assessment based on total number of cases underestimates whereas calculation based on closed cases significantly overestimates the number (spychalski, błażyńska-spychalska, and kobiela ) . according to who, the time between symptom onset and death ranges from about to weeks (who, ), however dividing the number of deaths by total number of cases weeks before (baud et al. ) turned out to be biased too, especially during the first days of the pandemic (spychalski, błażyńska-spychalska, and kobiela ) . here we show that country-specific cfr depends on the number of days that passed from the outbreak start. for comparison of the cfr across countries we correct the fatality rate for duration of the epidemy. we do not consider potentially different malignancy that can be achieved by mutations of the virus as well as health profile and inherited susceptibility differences across the populations which also could affect all the epidemiological parameters being compared. data on number of confirmed cases of coronavirus as well as number of deaths per country was obtained as of april , (dong, du, and gardner ) . data on press freedom ranking across the globe for the year was obtained from (reporters without borders, ) . population, area, gdp per capita and percentage of population over the age of data was taken from (world bank open data, ). information on number of performed covid- tests per country was obtained from ( our world in data, ) (when the number was not known for april st, it was extrapolated using the information for adjacent dates). we collected and used the following country-specific covariates for analysis: daily records of number of covid- cases as well as deaths, country centroid latitude, population size, area, press freedom ranking measure, number of tests performed and fraction of people over years old. calculations were performed in r version . . using packages stats, tibble and data.table, visualization was done using package ggplot . covid- incidence was calculated as a ratio of total number of detected cases by a certain date to total population of the country. mortality was calculated as a ratio of total number of covid- -related deaths by a certain date to total population of the country. for some calculations and plots (as indicated below), incidence and mortality rate were subjected to base-ten logarithmic transformation. case fatality rates were calculated as country-specific ratios of total deaths to total number of diagnosed cases for all days of monitoring till the date of interest (typically april, ) and then subjected to base-ten logarithmic transformation. p-value significance cut-off across all tests was set at . . performing univariate regression was preferred over multivariate whenever the number of countries with data available was less than - . we analyzed data for a total of countries or territories (overseas territories of france, united kingdom and netherlands were considered as separate in our analysis) with at least one patient reported. in order to increase accuracy of the case fatality analysis, we considered only countries/territories with at least registered covid- patients which was a total of for this specific part of the analysis. countries known to perform extensive healthy population testing for covid- during the period of interest (iceland) were excluded from the analysis. for each country, we calculated time from the moment when the number of registered cases of covid- was at least till the st of april, (also referred to as duration of the outbreak/epidemic below). as it is shown on fig. , the distribution of the epidemic duration times divides countries into two groups: with duration of ≤ and of ≥ days. for reason of uniformity, the group of countries with outbreak start date before th of february (see fig. a ) was excluded from further statistical analysis. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . a b figure . number of days from first registered cases of covid- till april st, across countries; a , all countries with at least reported cases, b , all countries with at least cases as of april st, . based on the observed bimodal distributions of epidemic durations, countries can be divided into groups with sizes of and (a); and countries each (b). correlates with gdp per capita . number of performed tests is an important factor that could both determine the reported number of cases or mortality and be influenced by the two indicators via actions of medical authorities and testing facilities. however, testing activity report is yet provided by limited number of countries. in a multivariate linear regression model, we checked its correlation (log-transformed) with log-transformed gdp per capita, latitude, hemisphere (northern or southern), press freedom ranking measure, population density, log-transformed fraction of people over the age of and outbreak duration (n = ). the number of tests turned out to strongly correlate with gdp per capita (p = . × - ) whereas other variables were not influencing it (total r adj = . ). we examined the dependence of covid- incidence and mortality for the period of january-march, on some country-specific variables that could potentially influence them. considering the reasons of avoiding multicollinearity as well as significant sample size reduction mentioned above, this was done using two linear regressions: (i) the model of log-transformed dependent variable explained by country log-transformed gdp per capita, latitude, hemisphere (northern or southern), press freedom ranking measure, population density, log-transformed fraction of people all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . over the age of and outbreak duration (r adj = . for incidence and r adj = . for mortality, n = and ; countries with no registered covid- cases or deaths were excluded from the corresponding analyses). results are summarized in table (columns and ) and fig. and include among others a correlation of all the three epidemiological parameters examined with duration of the outbreak; (ii) the model of log-transformed dependent variable corrected for outbreak duration explained by log-transformed number of tests (r adj = . for incidence, n = , and r adj = . for mortality, n = , results are summarized in table , columns and ). gpd per capita, fraction of senior citizens, outbreak duration, press freedom rank are correlated with both incidence and mortality. hemisphere of the country centroid is associated with mortality, however its association with incidence is not significant. the association of population density as well as of country centroid latitude with neither incidence nor mortality has reached significance in our analysis when other explanatory variables were considered. number of tests is strongly correlated with both incidence and mortality. given the correlation of gdp per capita with number of performed tests as well as a positive feedback loop of increasing testing capacity as number of patients increases, the true causality between these three variables is difficult to dissect. however, we further aimed to shed light on it by understanding if the influence of gdp per capita on the reported incidence and mortality is mediated solely by the number of tests performed or vice versa the observed influence of number of tests is a byproduct of correlation with gdp. we explore this by comparing the additional variance explained by each of the components. the model of incidence explained by gdp per capita and country-specific epidemy duration explains . % of variation (r adj ). addition of number of tests explains . % of variation (p = . × - ; . ; . × - for gdp, epidemy duration and number of tests correspondingly), whereas the model of incidence by number of tests and epidemy duration alone explains . % of variation (n = ). at the same time, when mortality is modeled by the three variables, influence of number of tests becomes not significant (p = . ) compared to gdp per capita (p = . ) and time since first ten patients were reported (p = . ). together these three variables explain . % (r adj ) of variation which even increases to . % when number of tests is not considered compared to decrease by . % when gdp is eliminated (r adj = . %). from this we conclude that although being strongly correlated, both gdp per capita and number of performed tests contribute to the reported incidence, however the reported covid- mortality across countries is not significantly influenced by the number of tests performed per se . all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . table . covid- epidemiological parameters' correlation with country-specific explanatory variables for the countries with outbreak start after the th of february, : regression coefficients (for significant correlations only) and p-values. incidence ( table . covid- epidemiological parameters' correlation with number of performed tests for the countries with outbreak start after the th of february, : regression coefficients and p-values. incidence, mortality and cfr were corrected for outbreak duration. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . fig. . country-specific covid- incidence (rows and ) and mortality rate (rows and ) depending on explanatory variables: outbreak duration ( a,d ), number of tests performed ( b,e ), gdp per capita ( c,f ), press freedom ranking measure (countries with limited press freedom have higher rank; g,j ), fraction of senior citizens ( h,k ) and position of the country's centroid ( i,l ). each dot corresponds to a country (only countries with outbreak start after th of february plotted). linear approximation in the corresponding univariate model (red) as well as % confidence interval (grey) are shown. all in all, the determined country-specific covariates which are correlated with covid- mortality (gdp, press freedom, age structure of the population, time since the outbreak start, hemisphere) as well as covid- incidence explain as much as about % of variance of mortality (r adj = . ) for the countries with sufficient number of cases and epidemy start after the th of february. on fig. we rank the countries by mortality, original (a) as well as corrected for the abovementioned factors and incidence (b). the correction significantly alters order of the countries. fig. . covid- mortality and case fatality rate across countries as of april st, (outbreak start after th of february; at least registered cases by the st of april), n= ; a , deaths per million of people; color of the bars corresponds to relative position of the country in (b); b , log-transformed mortality rates corrected for covid- incidence, gdp, press freedom, fraction of senior citizens, hemisphere and country-specific duration of the outbreak; c , case fatality rate (measured as a ratio of total deaths to total number of cases, as of april st, ), color of the bars all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . corresponds to relative position of the country in (d); d , log-transformed cfrs corrected for country-specific duration of the outbreak. correlation of case fatality rate with country-specific covariates . for the reason of reduction of noise that is most significant at initial stages of the outbreak, cfr has been evaluated only for the countries with at least registered covid- cases during first months of the year . similarly to the analysis of incidence and mortality, we worked with countries characterized by outbreak duration of no more than days (n = for the countries fulfilling the two requirements, see fig. b ). for out of them, number of performed tests was available. we checked correlation of all of the explanatory variables (see table , column ) with case fatality rate using univariate linear regressions. only outbreak duration demonstrated significant correlation with cfr (p = . ). for further analysis, the log-transformed case fatality rate was corrected for the outbreak duration using linear model. the case fatality rates across countries are summarized on fig. c (original) and d (corrected). the total case fatality varies hugely from . - . % in top countries with lowest cfr to . - . % in top countries with the highest. we were expecting that ranking of the countries by the corrected cfr as well as ranking by mortality corrected for incidence and all other country-specific variables correlated with it should give similar results characterizing the corrected case fatality rate across countries. indeed, scores of countries in these two rankings correlate well with each other (pearson r = . , p = . × - ). fig. illustrates change of case fatality rate during the course of pandemic for countries with highest and lowes corrected cfrs. the plot shows that the reported log-transformed cfrs gradually grow as the outbreak progresses in each country. fig. . covid- daily cumulative case fatality rate depending on country for top countries with lowest and highest corrected values (outbreak start after th of february; at least registered cases by the st of april). the abscissa coordinate shows country-specific outbreak duration. parameters of the covid- pandemic analyzed herein are important for managing and modelling the ongoing outbreak. exploration of them during the course of the outbreak faces certain limitations based on data availability as well as standardization problems, however is crucial to timely understand spread and effect of the virus. being a likely issue for current analysis, the anticipated data bias has also been addressed here as a phenomenon that we are aiming to untangle. here we show a number of correlations for the epidemiological parameters with variables characterizing economical, demographic, healthcare and political organization of the countries (or overseas territories) analyzed. incidence, mortality and case fatality rates are expectedly correlated with duration of the covid- outbreak. other correlations demonstrated here are for incidence and mortality of the infection (cfr was herein analyzed only for countries that fulfil the requirements on number of people affected and outbreak start date which substantially decreases sensitivity compared to incidence and mortality where sample sizes are much bigger). correlation with fraction of citizens over years old is concordant with the previous findings of case fatality rate dependence on patient age (onder, rezza, and brusaferro ) . the spotted correlation of gdp per capita with incidence and mortality requires further investigation on its origins. one of the factors contributing could be higher mobility of population in 'rich' countries which led to quicker spread of the virus. another one influencing the reported covid- incidence (but not significantly per se influencing mortality) is the testing activity. the other interesting finding herein is the correlation of incidence as well as mortality with freedom of the press. this supports our hypothesis that there is likely to be a practice of manipulating numbers within medical authorities of particular countries characterized by restricted press freedom and thus the numbers should be handled with caution. we also examined possible association of incidence and mortality with country centroid latitude as well as with hemisphere it is located in. although the correlations are evident when the linear model is done with only one explanatory variable, multiple linear regression does not support the idea of influence of latitude (p = . ) or hemisphere (p = . ) on incidence of the disease. in contrast, our analysis suggests that hemisphere might influence mortality (the latter seems to be higher in the northern hemisphere, p = . ) and thus does not not contradict the idea that outdoor temperature could impact chances for patient recovery. finally, no significant association of any of the parameters analyzed with population density has been spotted. due to drastic variability in the reported covid- case fatality rates across countries (from . % in italy to . % in south africa as of january-march, ) the practice of ranking states by cfr is tempting, but should be interpreted with extreme caution due to different testing capacities, cause-of-death assignment practices and undiagnosed covid- deaths rates. here we demonstrate that it also depends on and suggest that it should be corrected for the country-specific outbreak duration. another approach to comparing efficiencies of healthcare systems in the times of the epidemy that we implement here is comparison of the country-specific mortalities corrected for variables that are not directly linked to healthcare service quality but have been shown to correlate with mortality: population age, wealth, press freedom, hemisphere, outbreak duration as well as for covid- incidence. the two approaches give very similar results. abovementioned findings shed light on the course of covid- pandemic and might contribute to facilitating its management across the globe. real estimates of mortality following covid- infection latitude dependence of the covid- mortality rate-a possible relationship to vitamin d deficiency an interactive web-based dashboard to track covid- in real time cross-country comparison of case fatality rates of covid- /sars-cov- case-fatality rate and characteristics of patients dying in relation to covid- in italy estimating case fatality rates of covid- to understand the global pandemic, we need global testing -the our world in data covid- testing dataset evidence that higher temperatures are associated with lower incidence of covid- in pandemic state, cumulative cases reported up to world bank open data i would like to thank prof. michel georges and prof. yurii aulchenko for useful discussions, ideas and extensive help with manuscript revision. reporters without borders . " world press freedom index | rsf." n.d. rsf.accessed april , . https://rsf.org/en/ranking_table . andersen, kristian g., andrew rambaut, w. ian lipkin, edward c. holmes, and robert f. garry. . "the proximal origin of sars-cov- ." nature medicine ( ): - . key: cord- -pjycxlse authors: shah, m. r. t.; ahammed, t.; anjum, a.; chowdhury, a. a.; suchana, a. j. title: finding the real covid- case-fatality rates for saarc countries date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: pjycxlse crude case fatality rate (cfr) is the most accurate when the pandemic is over. adjustments to the crude cfr measure can better explain the pandemic situation by improving the cfr estimation. however, no study has thoroughly investigated covid- adjusted cfr of the south asian association for regional cooperation (saarc) countries. in this study, we estimated both survival interval and underreporting adjusted cfr of covid- for the saarc countries and observed the cfr changes due to the imposition of fees on covid- tests in bangladesh. using the daily records up to th october, we implemented a statistical method to remove both the bias in crude cfr, i.e., the delay between disease onset and outcome bias and due to asymptomatic or mild symptomatic cases, reporting rates lower than % ( % ci: %- %) bias. according to our findings, afghanistan had the highest cfr, followed by pakistan, india, bangladesh, nepal, maldives, and sri lanka. our estimated crude cfr varied from . % to . %, survival interval adjusted cfr varied from . % to . % and further underreporting adjusted cfr varied from . % to . %. we have also found that crude cfr increased from . % to . % after imposing the covid- test fees in bangladesh. therefore, the authorities of countries with higher cfr should be looking for strategic counsel from the countries with lower cfr to equip themselves with the necessary knowledge to combat the pandemic. moreover, caution is needed to report the cfr. covid- is a highly contagious disease, and the outbreak went global within three months of being first discovered. the disease kept spreading so uncontrollably that even the most adequate healthcare systems around the world were overwhelmed by it. developing countries are struggling even more [ ] . the nature of the disease forced the world to ask questions about the case fatality rate (cfr) of this disease [ ] . cfr is an important readout to understand the pandemic severity, and, in the media, cfr is often used to describe the situation regarding covid- , as well as any other pandemic. however, during a pandemic, cfr can be misleading [ ] . the cfr of a disease is the total number of deaths divided by the total number of cases, i.e., the ratio of fatal cases of a specified condition within a specified time [ ] . in cfr calculation during a pandemic, cases might be defined as the total number of confirmed cases, which does not account for the delay between onset of the disease symptoms and outcome, i.e., recovery or death. therefore, the cfr calculation becomes an underestimate of the actual cfr. by contrast, if we only consider the closed cases where patients have either recovered or died, the real-time cfr estimate remains consistently high throughout [ ] . while the crude cfr can give us an approximate idea about the risk of death during the pandemic, it is the most accurate after the pandemic is over [ ] . an adjustment to the crude cfr measure can significantly improve the cfr estimates and give us a better idea about the pandemic situation [ ] . [ , ] . a study from april found that the cfr of covid- in italy was . %, while in germany, it was just . % [ ] . the variation in preventive measures and government policies can be responsible for this difference in cfr [ , ] . for example, about three months into the covid- outbreak, the bangladesh government inflicted fees on covid- tests on all government labs and hospitals from june . prior to that, all government-run facilities offered covid- tests for free, and % of all the whole country's tests were being conducted on government-controlled sites [ ] . the imposition of fees on covid- testing made bangladesh the only country to do so among all south asian countries. the bangladesh government's official stance was that fees were inflicted to ensure better management and discourage unnecessary tests. health experts in bangladesh believed the imposition of any fee on covid- tests might increase the outbreak size [ ] [ ] [ ] . the cfr difference for different countries can provide much-needed information to combatting the pandemic, such as what factors are responsible for speeding up or slowing the outbreak's progression. moreover, it will give us a better idea about the fatality rate of covid- of the countries of interest. therefore, it is of the utmost importance to calculate the cfr of a country with a high degree of representativeness, highlighting the importance of calculating adjusted cfr. however, no study has thoroughly investigated covid- adjusted cfr of saarc countries, a regional union of eight nations-afghanistan, bangladesh, bhutan, india, maldives, nepal, pakistan, and sri lanka. therefore, this study's objective was to calculate and compare the covid- cfr for saarc countries adjusted by the disease's survival interval and reporting rates. moreover, we explore the covid- cfr of bangladesh before and after the test fees imposition. we collected the daily record of confirmed cases and deaths attributed to covid- of all member countries of saarc up to october , . bhutan was not considered in this study since, curiously, no death has been recorded there at all due to covid- . then crude cfr, based on confirmed cases at the time point , was calculated as then we adjusted the crude cfr value by considering the survival interval of covid- . as during any point of an ongoing epidemic, the denominator of the crude cfr contains the total number of patients, some of whom may yet fall to their demise due to the disease. the deaths yet to happen cannot possibly be considered in the calculation of cfr. therefore, we applied a statistical technique to reduce the bias in the crude cfr calculation. the technique considered the uncertainty related to the variability of the interval between disease emergence and death by enabling the probabilistic distribution of the survival interval to vary within a wide range. a gamma distribution with a mean of . days and a standard deviation of . days (shape parameter: a= . , rate parameter: b= . ) was used to estimate the adjusted cfr [ ] . to allow the mean survival interval to vary between to weeks [ , ] , the mean of the gamma distribution was sampled from a normal distribution with a mean of . days and a standard deviation of days. likewise, the standard deviation was sampled from a normal distribution with a mean and a standard deviation of . days and day, respectively. a maximum likelihood method was then used to estimate the growth rate (r) of the covid- outbreak in the selected saarc countries. in monte-carlo simulations (with independent replications), gamma distribution's moment generating function was used to calculate the correction factor for the adjusted cfr on each calendar day [ ] . finally, we calculated the adjusted cfr using the following formula: furthermore, assuming % lower reporting rates ( % ci %- %) of covid- [ , ] due to the asymptomatic cases or exhibition of mild symptoms, we again adjusted the calculated adjusted cfr. the probability of underreporting ‫)ݑ(‬ was sampled from a beta distribution with the shape parameter ܽ = and scale parameter ܾ = as, for each monte-carlo replication. the parameters of the distribution were selected as such so that the daily reporting rates may vary from % to %. the % confidence interval for ‫ݑ‬ was . - . , while ‫ݑ‬ was drawn in the range - . the true incidence ‫)ݐ(‬ was then estimated by: . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october , . ; using the sampled incidence modified for a given probability of low reporting rates, we then again estimated the adjusted cfr. table shows the total confirmed cases and deaths and the confirmation date of the first case of the covid- in the selected saarc countries. as of october , a total of , , confirmed cases of covid- , and , deaths were recorded. india had the highest number of confirmed cases ( , , ) and deaths ( , ) whereas, sri lanka had the lowest numbers ( , confirmed cases and deaths) among these countries. table overview of the covid- (up to october ) situation in the selected saarc countries. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october , . ; figure . case fatality rates ( % confidence interval) of the selected saarc countries (up to october ) . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october , . ; figure shows the crude cfr, adjusted cfr (accounted for the survival interval), and further adjusted cfr (also considering reporting rates of less than %) of the selected south asian countries on october . in all three scenarios, afghanistan had the highest cfr, and sri lanka had the lowest. the crude cfr varied from . % (afghanistan) to . % (sri lanka), while adjusted cfr varied from . % ( % ci: . %- . %) to . % ( % ci: . %- . %). when we further adjusted the cfr considering the underreported cases, the cfr varied from . % ( % ci: . %- . %) to . % ( % ci: . %- . %). . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint figure indicate that the cfr's of maldives, nepal, and sri lanka were relatively low throughout. for maldives and nepal, the cfrs were consistent all through the pandemic period, whereas, for sri lanka, cfr considerably decreased over time. in bangladesh, there was a sharp spike in cfr, which became stable over time. moreover, we found that crude cfr increased from . % to . % after the imposition of the covid- test fees in bangladesh (figure ) . . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint understanding the case fatality rate of the covid- allows policymakers to mitigate the outbreak impact by implementing efficient and effective interventions for disease control. therefore, in this study, we estimated the adjusted cfr of covid- outbreak for the selected saarc countries, i.e., afghanistan, bangladesh, india, maldives, nepal, pakistan, and sri lanka. to our knowledge, this is by far the most comprehensive estimation for the covid- cfr for these selected countries. there was a difference in estimated cfrs of saarc countries. ranking the countries, we observed the highest cfr in afghanistan, followed by pakistan, india, bangladesh, nepal, maldives, and sri lanka. the variation can be attributed to the public health system, preparedness, and effective interventions of each country. for example, sri lanka has a free public health system and has been ranked as th on global response to infectious disease while bangladesh is th [ ] . according to our findings, the survival interval adjusted cfr values were slightly greater than that of the respective countries' crude cfr. the reason is that during an epidemic crude cfr estimation becomes an underestimate of the actual cfr [ ] . however, because of both limited numbers of test and presence of asymptotic or mild cases, there exists unreported cases [ ] [ ] [ ] [ ] . therefore, after further adjustment for reporting rates lower than %, estimated cfrs became less than one-third compared to crude cfr, and survival interval adjusted cfr. in agreement with previous studies [ , ] , our estimated cfrs for selected countries were lower than most of the european countries' cfr. several factors, such as temperature, the proportion of young people, genetic factors, can be responsible for this variation in cfr [ , , [ ] [ ] [ ] . moreover, we found that cfr of the covid- pandemic is less than sars, mers, bird flu, and ebola [ , ] . however, as it is highly infectious, and there are many mild or asymptomatic cases, public health concerns must be addressed. in our estimation, the crude cfr of bangladesh increased after the imposition of the covid- test fees to discourage unnecessary tests, therefore, ensure better management. as a result, in the first days since the imposition of test fees, there had been a decrease of , tests in total from the previous -day period. more tests can detect more asymptomatic or mild cases, which reduced the mortality rate [ , ] . however, as the decision had affected the poor citizen's ability or willingness of testing [ ] , the bangladesh government decided to cut the test fees by almost half on august [ ] . since the government-imposed fee for covid- is increasing the country's cfr, immediate steps should be taken to remove the fees so that the tests are affordable to everybody. the first limitation of this study is that the calculated case-fatality rates refer to the countries' entire population. patients with critical health condition, populations with a higher proportion of older adults, inadequate resources and unorganized health care systems can have a higher cfr. [ , [ ] [ ] [ ] . second, we could not use country wise mean survival time of covid- patients for the adjusted cfr estimation. third, we assumed that there were no age-specific and country-. cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted october , . ; specific differences in under-reporting. the children and youths with mild symptoms are tested less often. moreover, factors such as testing capacity, awareness about the importance of reporting symptoms etc. directly affects the reporting rates of the disease [ ] . it is recommended that future researches on similar issues should consider improving on these limitations. survival intervals of the patients and a large number of underreported cases affect the cfr estimation, therefore affecting the countries' policies. in this regard, the bias-adjusted cfr measure can provide better information to health professionals and policymakers. therefore, survival interval and underreported cases should be considered while calculating covid- cfr. this will equip us with much better knowledge of the covid- scenario worldwide. charting the challenges behind the testing of covid- in developing countries: nepal as a case study the many estimates of the covid- case fatality rate using a delay-adjusted case fatality ratio to estimate underreporting a dictionary of epidemiology estimating case fatality rates of covid- a demographic adjustment to improve measurement of covid- severity at the developing stage of the pandemic temporal estimates of case-fatality rate for covid- outbreaks in canada and the united states epidemiology working group for ncip epidemic response chinese center for disease control and prevention, the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) in china covid- and older people in asia: asian working group for sarcopenia calls to actions estimation of the basic reproduction number, average incubation time, asymptomatic infection rate, and case fatality rate for covid : meta analysis and sensitivity analysis apparent difference in fatalities between central europe and east asia due to sars-cov- and covid- : four hypotheses for possible explanation estimation of novel coronavirus (covid- ) reproduction number and case fatality rate: a systematic review and meta-analysis the covid- pandemic in the us: a clinical update covid- mortality is negatively associated with test number and government effectiveness have testing charges affected the number of covid- tests in bangladesh? govt sets fee for covid- test the daily star, levying fees on covid- testing unacceptable the novel coronavirus, -ncov, is highly contagious and more infectious than initially estimated updated understanding of the outbreak of novel coronavirus likelihood of survival of coronavirus disease early epidemiological assessment of the virulence of emerging infectious diseases: a case study of an influenza pandemic a bayesian estimate of the underreporting rate for covid- based on the experience of the diamond princess cruise ship tracking the global leadership response in the covid- crisis covid- antibody seroprevalence in confirmed and unreported covid- -like illness death counts: an assessment of reporting discrepancy report : estimating the potential total number of novel coronavirus cases in wuhan city underreporting covid- : the curious case of the indian subcontinent national incidence and case-fatality rates of novel coronavirus (covid- ) across countries and territories: a systematic assessment of cases reported from covid- mortality increases with northerly latitude after adjustment for age suggesting a link with ultraviolet and vitamin d effects of temperature variation and humidity on the death of covid- in wuhan sars-cov- genomic variations associated with mortality rate of covid- mers--an uncertain future cross-country comparison of case fatality rates of covid- /sars-cov- govt reduces covid- testing fees epidemiology of covid- in a long-term care facility in case-fatality rate and characteristics of patients dying in relation to covid- in italy covid patients' clinical characteristics, discharge rate, and fatality rate of meta analysis estimating the infection and case fatality ratio for coronavirus disease (covid- ) using age-adjusted data from the outbreak on the diamond princess cruise ship we acknowledge musaddiqur rahman ovi for his support in visualization. there are no conflicts of interest to declare. key: cord- -b s es authors: kelso, joel k; halder, nilimesh; postma, maarten j; milne, george j title: economic analysis of pandemic influenza mitigation strategies for five pandemic severity categories date: - - journal: bmc public health doi: . / - - - sha: doc_id: cord_uid: b s es background: the threat of emergence of a human-to-human transmissible strain of highly pathogenic influenza a(h n ) is very real, and is reinforced by recent results showing that genetically modified a(h n ) may be readily transmitted between ferrets. public health authorities are hesitant in introducing social distancing interventions due to societal disruption and productivity losses. this study estimates the effectiveness and total cost (from a societal perspective, with a lifespan time horizon) of a comprehensive range of social distancing and antiviral drug strategies, under a range of pandemic severity categories. methods: an economic analysis was conducted using a simulation model of a community of ~ , in australia. data from the pandemic was used to derive relationships between the case fatality rate (cfr) and hospitalization rates for each of five pandemic severity categories, with cfr ranging from . % to . %. results: for a pandemic with basic reproduction number r( ) = . , adopting no interventions resulted in total costs ranging from $ per person for a pandemic at category (cfr . %) to $ , per person at category (cfr . %). for severe pandemics of category (cfr . %) and greater, a strategy combining antiviral treatment and prophylaxis, extended school closure and community contact reduction resulted in the lowest total cost of any strategy, costing $ , per person at category . this strategy was highly effective, reducing the attack rate to %. with low severity pandemics costs are dominated by productivity losses due to illness and social distancing interventions, whereas higher severity pandemic costs are dominated by healthcare costs and costs arising from productivity losses due to death. conclusions: for pandemics in high severity categories the strategies with the lowest total cost to society involve rigorous, sustained social distancing, which are considered unacceptable for low severity pandemics due to societal disruption and cost. results: for a pandemic with basic reproduction number r = . , adopting no interventions resulted in total costs ranging from $ per person for a pandemic at category (cfr . %) to $ , per person at category (cfr . %). for severe pandemics of category (cfr . %) and greater, a strategy combining antiviral treatment and prophylaxis, extended school closure and community contact reduction resulted in the lowest total cost of any strategy, costing $ , per person at category . this strategy was highly effective, reducing the attack rate to %. with low severity pandemics costs are dominated by productivity losses due to illness and social distancing interventions, whereas higher severity pandemic costs are dominated by healthcare costs and costs arising from productivity losses due to death. conclusions: for pandemics in high severity categories the strategies with the lowest total cost to society involve rigorous, sustained social distancing, which are considered unacceptable for low severity pandemics due to societal disruption and cost. keywords: pandemic influenza, economic analysis, antiviral medication, social distancing, pandemic severity, case fatality ratio background while the h n virus spread world-wide and was classed as a pandemic, the severity of resulting symptoms, as quantified by morbidity and mortality rates, was lower than that which had previously occurred in many seasonal epidemics [ ] [ ] [ ] . the pandemic thus highlighted a further factor which must be considered when determining which public health intervention strategies to recommend, namely the severity of symptoms arising from a given emergent influenza strain. the mild symptoms of h n resulted in a reluctance of public health authorities to use rigorous social distancing interventions due to their disruptive effects, even though modelling has previously suggested that they could be highly effective in reducing the illness attack rate [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . had the h n influenza strain been highly pathogenic, more timely and rigorous responses would have been necessary to mitigate the resultant adverse health outcomes. furthermore, there is continuing concern that a highly pathogenic avian influenza a(h n ) strain may become transmissible between humans. this scenario is highlighted by the large reservoir of influenza a(h n ) in poultry in south-east asia [ ] , and recent experimental results which have shown that the a (h n ) virus may be genetically modified to become readily transmissible between ferrets, a commonly used animal model for human influenza transmission studies [ ] [ ] [ ] . the severity of a particular influenza strain directly impacts on the cost of any pandemic; increased severity increases health care costs and escalates productivity losses due to a) absenteeism arising from increased illness and b) increased mortality rates. in this study, the role which pandemic severity has on the total cost of a pandemic for a range of potential intervention strategies is analysed, and for highly pathogenic influenza strains inducing significant morbidity and mortality, as occurred during the pandemic [ , ] , the results suggest which intervention strategies are warranted in terms of reduction of illness and total pandemic cost. this study adopts a societal perspective on the cost of a pandemic, with the time horizon being the lifetime of individuals experiencing the pandemic. we used a detailed, individual-based simulation model of a real community in the south-west of western australia, the town of albany with a population of approximately , , to simulate the dynamics of an influenza pandemic. comparing simulations with and without interventions in place allowed us to analyse the effect which a range of interventions have on reducing the attack rate and on the health of each individual in the modelled community. epidemic outcome data produced by the simulation model were used to determine health outcomes involving hospitalisation, icu treatment, and death. in turn, these healthcare outcomes, together with productivity losses due to removal from the workforce, were used to estimate the overall cost of interventions. figure provides an overview of this analysis methodology, showing each of the processes that make up the methodology, their input parameters and the resulting data generated by the process. the simulation model captures the contact dynamics of the population of albany, western australia using census and state and local government data [ ] . these data allowed us to replicate the individual age and household structure of all households in this town of approximately , individuals, and also allowed for the construction of an explicit contact network linking households, schools, workplaces and other meeting places by allocating individuals to workplaces and schools. the modelled community was chosen so as to be representative of a developed world population, and selfcontained in the sense that all major locales for interpersonal mixing were represented within the community. the model includes both urban and rural components, a central commercial core, a complete set of schools (covering all age groups), and a mix of large and small employers. the community is also of a size where public health interventions could be uniformly implemented based on local information. the model captures explicit person-to-person contact with the contact network describing population mobility occurring between households, schools, workplaces and the wider community as shown in figure . the virus spreads through the community due to this mobility, as transmission occurs between individuals when they are co-located, possibly following a move from one location to another. for figure overview of pandemic cost analysis methodology. input parameters are shown on the left in boxes with blue text, with arrows indicating to which part of the cost analysis methodology they apply. boxes with white text represent different processes of the methodologyeach process is described in the methods section under a subsection of the same name. boxes with green text appearing at the bottom and on the right represent results generated by the analysis. example, an infectious child moves from household to school on a given day, and infects two further children; they return to households and and, following virus incubation, become infectious and may infect other household members in their households. note that these households may be geographically separate, but are connected via contact of children at school. each household contains uniquely identified individuals. children and adults were assigned by an allocation algorithm to school classes and workplaces, respectively. the assignment of children to classes was based on age, school class size data, and proximity between schools and households; the assignment of adults to workplaces was based on workplace size and commuter survey data. in addition to contact occurring in households and mixing hubs, community contact was introduced to capture mixing which occurs outwith these locales and in the wider community. the number of contacts made by each individual each day in school, work and community settings were adjusted to reproduce the proportion of cases occurring in different settings as reported by empirical studies, specifically % of infections occurred in households, % in schools and workplaces, and % in the wider community [ ] [ ] [ ] . contacts within schools and workplaces occurred in fixed-size mixing groups of maximum size . within mixing groups contact was assumed to be homogeneous. community contacts occurred between randomly selected individuals, weighted toward pairs of individuals located in neighbouring households. a simulation algorithm, realised in the c++ programming language, manipulates the underlying demographic model and captures both population mobility and the time-changing infectivity profile of each individual. each individual has their infectivity status denoted by one of the four (susceptible, exposed, infectious, recovered) states at any time point during the duration of the simulated period. the simulation algorithm captures the state of the whole population twice per day, a daytime pointin-time snapshot and an evening snapshot, with individuals (possibly) moving locations between successive day or night periods, such as household to school or workplace for the day phase, returning to home for the night period. individuals come into contact with other individuals on a one-to-one basis in each location, with possible influenza transmission then occurring. individuals in each household and contact hub make contacts within a close-contact mixing group, taken to be the entire household or a subset of larger hubs, and also make additional non hub-based random community contacts. the attributes of the various locations in which individuals come into potentially infectious contact are summarized in table . using the contact, mobility and transmission features described above, stochastic simulations of influenza spread were conducted. all simulations were repeated times with random numbers controlling the outcome of stochastic events (the locality of seeded infected individuals and the probability of transmission) and the results were averaged. analysis of this simulation model has shown that the -run mean attack rate is highly unlikely ( % confidence) to differ by more than . % from the mean attack rate of a much larger set of experiment repeats. one new infection per day was introduced into the population during the whole period of the simulations, and randomly allocated to a household. this seeding assumption of case per day was chosen to reliably begin a local epidemic in every stochastic simulation. for the transmission characteristics described above, analysis shows that seeding at this rate for days results in a sustained epidemic in > % of the simulation runs and % with two weeks of seeding, with higher percentages for the higher transmissibility scenarios. seeding at this rate is continued throughout the simulation in order to capture the case where an epidemic may be initially suppressed by a rigorous intervention strategy, but may subsequently break out if intervention measures are relaxed. after the beginning of a sustained local epidemic, any subsequent variation in the amount of seeding has very little effect on the progress of the local epidemic, as the number of imported cases is much smaller than those generated by the local epidemic. preliminary analyses using the present model have shown that even if the seeding rate is increased to infections per day, after days the number of infections generated from the selfsustained local epidemic is twice the number of imported infections, and by days local infections outnumber imported infections by a factor of . the simulation period was divided into hour day/ night periods and during each period a nominal location for each individual was determined. this took into consideration the cycle type (day/night, weekday/weekend), infection state of each individual and whether child supervision was needed to look after a child at home. individuals occupying the same location during the same time period were assumed to come into potential infective contact. details of the simulation procedure are presented in [ ] . in the simulation model, we assumed that infectious transmission could occur when an infectious and susceptible individual came into contact during a simulation cycle. following each contact a new infection state for the susceptible individual (either to remain susceptible or to become infected) was randomly chosen via a bernoulli trail [ ] . once infected, an individual progressed through a series of infection states according to a fixed timeline. the probability that a susceptible individual would be infected by an infectious individual was calculated according to the following transmission function, which takes into account the disease infectivity of the infectious individual i i and the susceptibility of susceptible individual i s at the time of contact. maximum group size is . tertiary and vocational education institutions, number and size determined from state education department data. weekdays during day cycle. young adult and adult individuals who are allocated into the hub if they are active*. maximum group size is . workplace number and size of determined for local government business survey data. weekdays during day cycle. adult individuals who are allocated into the hub if they are active * . maximum group size is . community represents all contact between individuals in the community that is not repeated on a daily basis. everyday during day cycle. all individuals make contacts if they are active*, contact is random but weighted towards pairs with nearby household locations. * all individuals are active during day cycles unless: he/she is symptomatically infected and chooses to withdraw to household ( % chance for adults, % for children); or if his/her school or workplace is affected by social distancing interventions; or if he/she is a parent of a child who is inactive (only one parent per family is affected this way). the baseline transmission coefficient β was initially chosen to give an epidemic with a final attack rate of . %, which is consistent with seasonal influenza as estimated in [ ] (in table three of that paper). to achieve simulations under a range of basic reproduction numbers (r ), β was increased from this baseline value to achieve epidemics of various r magnitudes; details of the procedure for estimating β and r are given in [ ] . a reproduction number of . was used as a baseline assumption, and the sensitivity of results to this assumption was gauged by repeating all simulations and analyses for alternative reproduction numbers of . and . . a pandemic with a reproduction number of . corresponds to some estimations of the basic reproduction number of the pandemic [ ] [ ] [ ] [ ] , while a reproduction number of . corresponds to an upper bound on estimates of what may have occurred in the pandemic, with most estimates being in the range . - . [ , ] . the disease infectivity parameter inf(i i ) was set to for symptomatic individuals at the peak period of infection and then to . for the rest of the infectivity period. the infectiousness of asymptomatic individuals was also assumed to be . and this applies to all infected individuals after the latent period but before onset of symptoms. the infection profile of a symptomatic individual was assumed to last for days as follows: a . day latent period (with inf(i i ) set to ) was followed by day asymptomatic and infectious, where inf(i i ) is set to . ; then days at peak infectiousness (with inf(i i ) set to . ); followed by . days reduced infectiousness (with inf(i i )set to . ). for an infected but asymptomatic individual the whole infectious period (of . days) was at the reduced level of infectiousness with inf(i i ) set to . . this infectivity profile is a simplification of the infectivity distribution found in a study of viral shedding [ ] . as reported below in the results section for the unmitigated no intervention scenario, these assumptions regarding the duration of latent and infectious periods lead to a mean generation time (serial interval) of . days which is consistent with that estimated for h n influenza [ , , ] . following infection an individual was assumed to be immune to re-infection for the duration of the simulation. we further assume that influenza symptoms developed one day into the infectious period [ ] , with % of infections being asymptomatic among children and % being asymptomatic among adults. these percentages were derived by summing the age-specific antibody titres determined in [ ] . symptomatic individuals withdrew into the home with the following probabilities; adults % and children %, which is in keeping with the work of [ , ] . the susceptibility parameter susc(i s ) is a function directly dependent on the age of the susceptible individual. it captures age-varying susceptibility to transmission due to either partial prior immunity or age-related differences in contact behaviour. to achieve a realistic age specific infection rate, the age-specific susceptibility parameters were calibrated against the serologic infection rates for seasonal h n in - in tecumseh, michigan [ ] . the resulting age-specific attack rates are consistent with typical seasonal influenz, with a higher attack rate in children and young adults (details of the calibration procedure may be found in [ ] ). the antiviral efficacy factor avf(i i ,i s ) = ( -ave i )*( -ave s ) represents the potential reduction in infectiousness of an infected individual (denoted by ave i ) induced by antiviral treatment, and the reduction in susceptibility of a susceptible individual (denoted by ave s ) induced by antiviral prophylaxis. when no antiviral intervention was administrated the values of both ave i and ave s were assumed to be , indicating no reduction in infectiousness or susceptibility. however, when antiviral treatment was being applied to the infectious individual the value of ave i was set at . , capturing a reduction in infectiousness by a factor of % [ ] . similarly, when the susceptible individual was undergoing antiviral prophylaxis the value of ave s was set to . indicating a reduction in susceptibility by a factor of % [ ] . this estimate is higher than most previous modelling studies [ , , ] , which assume an ave s of %. this common assumption appears to stem from an estimate made in [ ] based on - trial data. our higher value is based on a more comprehensive estimation process reported in [ ] , which also incorporated data from an additional study performed in - [ ] . it is also in line with estimates of %- % reported in [ ] . we examined a comprehensive range of intervention strategies including school closure, antiviral drugs for treatment and prophylaxis, workplace non-attendance (workforce reduction) and community contact reduction. these interventions were considered individually and in combination and social distancing interventions were considered for either continuous periods (that is, until the local epidemic effectively ceased) or periods of fixed duration ( weeks or weeks). antiviral drug interventions and social distancing interventions were initiated when specific threshold numbers of symptomatic individuals were diagnosed in the community, and this triggered health authorities to mandate the intervention response. this threshold was taken to be . % of the population. this threshold was chosen based on a previous study with this simulation model, which found that it represents a robust compromise between early, effective intervention and "premature" intervention, which can result in sub-optimal outcomes when limited duration interventions are used [ ] . it was assumed that % of all symptomatic individuals were diagnosed, and that this diagnosis occurred at the time symptoms appeared. for continuous school closure, all schools were closed simultaneously once the intervention trigger threshold was reached. for fixed duration (e.g. weeks or weeks) school closure, schools were closed individually as follows: for a primary school the whole school was closed if or more cases were detected in the school; in a high school only the class members of the affected class were isolated (sent home and isolated at home) if no more than cases were diagnosed in a single class; however if there were more than cases diagnosed in the entire high school the school was closed. note that these school closure policies were only activated after the community-wide diagnosed case threshold was reached; cases occurring in schools before this time did not result in school closure. this policy of triggering school closure based on epidemic progression avoids premature school closure which can reduce the effectiveness of limited duration school closure [ , , ] ; see [ ] for a detailed description of school closure initiation triggering strategies. two primary antiviral drug strategies have been examined; antiviral drugs used solely for treatment of symptomatic cases (strategy t), and treatment plus prophylaxis of all household members of a symptomatic case (strategy av). a further strategy was also examined, in which prophylaxis was also extended to the contact group (school or workplace contacts) of a symptomatic case (strategy t + h + e). due to the logistical resources required, it is unlikely that this extended strategy could be implemented throughout a pandemic, and we do not report the results of this strategy in the main paper; full results are however given in (additional file ). antiviral treatment (and prophylaxis for household or work / school group contacts) was assumed to begin hours after the individual became symptomatic. it was assumed that an individual would receive at most one prophylactic course of antiviral drugs. further details of antiviral interventions are given in [ , ] . workforce reduction (wr) was modelled by assuming that for each day the intervention was in effect, each worker had a % probability of staying at home and thus did not make contact with co-workers. community contact reduction (ccr) was modelled by assuming that on days when the intervention was in effect, all individuals made % fewer random community contacts. the most rigorous social distancing interventions considered in this study, which we denote as strict social distancing, involve the combined activation of school closure with workforce reduction and/or community contact reduction, and for this to occur for significant time periods; continuous and weeks duration were considered. in the present study we simulated a total of intervention scenarios (for each of three reproduction numbers . , . and . ). to simplify the results, we only present those interventions that reduce the unmitigated illness attack rate by at least %. we defined five severity categories based on those proposed by the cdc [ ] . the cdc pandemic index was designed to better forecast the health impact of a pandemic, based on categories having cfrs ranging from < . % to > = . %, and allow intervention recommendations to match pandemic severity. the discrete cfrs used are listed in table . we extend the cdc categories to further include rates of hospitalisation and icu treatment, as described below using data collected during the pandemic in western australia, by the state department of health. these data permit case hospitalisation (icu and non-icu) and case fatality ratios (cfr) to be related, as described below. the least severe pandemic considered (category ) has cfr of . % which is at the upper end of estimates for the pandemic. initially, the pandemic cfr was estimated to be in the range . % - . % [ ] ; however recent reanalysis of global data from suggest a cfr (for the - age group) in the range . % - . % [ ] . cost analysis results for a pandemic with h n characteristics using a similar simulation model to the one described here can be found in [ ] . calculation of costs arising from lost productivity due to death and from hospitalisation of ill individuals requires that individual health outcomes (symptomatic illness, hospitalisation, icu admission, and death) be estimated for each severity level. the pandemic data from western australia was used to provide this relationship between the mortality rate and numbers requiring hospitalisation and icu care. these data indicated a non-icu hospitalisation to fatality ratio of : and an icu admission to fatality ratio of : . these values align with those in a previous study by presanis et al. in [ ] , which estimated the ratios in the ranges - to and . - . to , respectively. the economic analysis model translates the age-specific infection profile of each individual in the modelled symptomatic infectiousness timeline . day latent (non infectious), day asymptomatic; days peak symptomatic; . days post-peak symptomatic [ ] asymptomatic infectiousness timeline . day latent; . days asymptomatic [ ] asymptomatic infectiousness . [ ] peak symptomatic infectiousness . post-peak symptomatic infectiousness . [ ] probability of asymptomatic infection . [ ] probability average school closure cost (per student per day) $ . [ ] average gp visit cost $ . [ ] average hospitalization cost (per day) $ [ ] average icu cost (per day) $ [ , ] population, as derived by the albany simulation model, into the overall pandemic cost burden. total costs involve both direct healthcare costs (e.g. the cost of medical attention due to a gp visit, or for hospitalisation) and costs due to productivity loss [ , ] . pharmaceutical costs (i.e. costs related to antiviral drugs) are also estimated. all costs are reported in us dollars using consumer price index adjustments [ ] . us dollar values are used to make the results readily convertible to a wide range of countries. age-specific hospitalisation costs are achieved by multiplying the average cost per day by average length of stay for each age group [ , ] . hospitalisation costs, including icu costs, those involving medical practitioner visits, and antiviral drug (and their administration) costs are taken from the literature and are presented in table [ , , ] . the antiviral costs include the costs of maintaining an antiviral stockpile. this was calculated by multiplying the antiviral cost per course (but not the dispensing cost per course, which was included separately) by the expected number of times each antiviral course would expire and be replaced between pandemics, assuming a mean inter-pandemic period of . years (based on the occurrence of pandemics in , , and ) and an antiviral shelf life of years [ ] . treatment costs, lengths of stay in hospital (both icu and non-icu), and other cost data used in establishing the overall cost of mitigated and unmitigated epidemics in the modelled community are given in table . productivity losses due to illness and interventions (e.g. necessary child-care due to school closure and workforce reduction) were calculated according to the human capital approach, using average wages and average work-days lost; the latter being determined from day-to-day outbreak data generated by the simulation model. assumed average wages are given in table . school closure is assumed to give rise to two costs. the first, following the work of perlroth et al. [ ] , is a $ per student school day lost. this is intended to approximate the cost of additional education expense incurred in the futurewhich might occur for example in the form of additional holiday classes. the second component is lost productivity of parents staying at home to supervise children. the simulation model calculates whether this occurs for every day for every household, based on what interventions are in force (school closure and/or workforce reductions), whether children or adults are ill, the number of adults in the household, whether it is a school day, etc., and accumulates the cost accordingly. indirect production losses due to death were also derived using a human capital approach, based on the net present value of future earnings for an average age person in each age group. this was calculated by multiplying the age-specific number of deaths due to illness by the average expectancy in years of future earnings of an individual by an average annual income [ ] . we assumed a maximum earning period up to age . productivity losses due to death were discounted at % annually, which is a standard discounting rate used to express future income in present value [ ] . to provide an alternative analysis, total costs were also calculated without this long-term productivity loss due to death component. overview figure presents the final attack rate (ar) and the total cost of the epidemic for each intervention strategy applied, for a pandemic with a basic reproduction number of r = . . although costs are calculated from the whole-of-society perspective, total costs are presented as a cost per person in the community, calculated by dividing the simulated cost of the pandemic by the population of~ , , in order to make the results more easily transferable to communities of various sizes. strategies are ordered from left to right by increasing effectiveness (i.e. their ability to decrease the attack rate), and only intervention strategies that reduce the attack rate by at least % are included. figure shows three distinctive features. firstly, for an epidemic with basic reproduction number r = . , no single intervention is effective in reducing the attack rate by more than %, and thus do not appear in figure . this finding is consistent with previous modelling studies which found that layering of multiple interventions is necessary to achieve substantial attack rate reductions [ ] [ ] [ ] [ ] [ ] [ ] , , ] . secondly, higher severity pandemics have higher total costs. total costs of unmitigated pandemics range from $ to $ per person for pandemics from category to category (see table ). thirdly, for high severity pandemics total costs are lower for the more effective intervention strategies. figure presents the constituent components that make up the total cost of each intervention and severity category, measured in terms of cost per person in the modelled community. three distinctive features can be seen in figure . firstly, for high severity pandemics costs are dominated by productivity losses due to death and health care costs. secondly, for low severity pandemics costs are dominated by social distancing and illness costs. thirdly, for all severity categories antiviral costs are comparatively low when compared with all other cost components of antiviral based intervention strategies. antiviral costs never constitute more than % of the total cost, and for all severity categories greater than (cfr > . %) antiviral costs are always the smallest cost component. below we report on effectiveness, total costs and cost components of interventions for pandemics with high and low severity. these cost data are presented in table . figure summarises the characteristics of key intervention strategies. for high severity pandemics (categories and , with case fatality rates above . %) the least costly strategy combines continuous school closure, community contact reduction, antiviral treatment and antiviral prophylaxis. at category this strategy has a total cost of $ , per person, a net benefit of $ per person compared to no intervention. this strategy is also the most effective intervention strategy, reducing the attack rate from % to . %. the results indicate that strategies with the lowest total costs are also the most effective. for a category pandemic the most effective strategies, all of which reduce the attack rate to less than %, have total costs ranging from $ , to $ , per person, which is less than one-third the cost of the unmitigated pandemic ($ , ), showing the substantial net benefit of effective interventions for high severity pandemics. these strategies all feature continuous school closure, with either continuous community contact reduction or antiviral treatment and prophylaxis. the ability of highly effective interventions to reduce the total cost of a high severity pandemic is due to the largest component of the overall cost being productivity losses arising from deaths. this is illustrated in figure which shows the cost components for each intervention. it can be seen that the majority of the cost for an unmitigated pandemic of severity category and is due to death-related productivity losses (shown in purple). although highly effective interventions incur large intervention-related productivity losses (shown in green), for high severity pandemics these intervention costs are more than outweighed by the reduction in medical costs and death-related productivity losses. the most costly intervention considered (i.e. which still reduced the attack rate by at least %) is continuous school closure combined with continuous workforce reduction, which costs $ , per person. for low severity pandemics (in category , having cfr < = . %) the intervention strategy with the lowest total cost considered is weeks school closure combined with antiviral treatment and prophylaxis, costing $ per person which represents a net saving of $ per person compared to no intervention. however, this strategy is not as effective as other intervention strategies, reducing the attack rate to only %. the most effective intervention (combined continuous school closure, community contact reduction, and antiviral treatment and household prophylaxis), which reduces the attack rate to . %, costs $ per person, a net benefit of $ per person compared to no intervention. figure shows that for category and pandemics, although highly effective intervention measures reduce medical costs and death-related productivity losses, they incur larger costs due to intervention-related lost productivity. the most costly intervention considered is continuous school closure combined with continuous workforce reduction, which costs $ , per person, a net cost of $ per person compared to no intervention. this is due to the large cost associated with % workforce absenteeism. an important subset of intervention strategies are those consisting of purely social distancing interventions. in the case that antiviral drugs are unavailable or ineffective, only these non-pharmaceutical interventions strategies will be available. the most effective non-pharmaceutical strategy is the continuous application of the three social distancing interventions, school closure, workforce reductions, and community contact reduction, which reduces the attack rate to %. this intervention has a total cost ranging from $ , to $ , per person for severity categories ranging from to respectively. the least costly non-pharmaceutical strategy omits workforce reduction, resulting in a slightly higher attack rate of %. this intervention has a total cost ranging from $ to $ , per person for severity categories ranging from to respectively. the costing model used for this analysis includes future productivity losses from deaths caused by the pandemic. this long-term cost is often not included in cost-utility analyses. the inclusion of death-related productivity losses greatly increases the total costs of severe pandemics. however, even if these costs are not included, medical costs (due to hospitalisation and icu usage) play a similar, although less extreme, role. if long-term productivity losses due to death are not included in the costing model, the total cost of the pandemic is not surprisingly lower. however the effectiveness and relative total costs of intervention strategiesthat is, the ranking of intervention strategies by total cost -remains the same whether or not death-related productivity losses are included (spearman's rank correlation coefficient r = . , p = . for a null hypothesis that rankings are uncorrelated). full cost results of an alternate analysis that omits death-related productivity losses is contained in an additional file accompanying this paper (additional file ), and is summarised below. for category , when death-related productivity losses are not included the total cost of intervention strategies ranges from $ to $ , . this range is much smaller than if death-related productivity losses are included, in which case total cost ranges from $ , to $ , . for lower severity pandemics with lower case fatality ratios, the contribution of death-related productivity losses is naturally smaller. for category , when death-related productivity losses are not included total cost ranges from $ to $ , ; with death-related productivity losses the range is $ to $ , . if death-related productivity losses are not included, social distancing and illness costs dominate the total cost of each intervention strategy for low severity pandemics, while health care costs dominate the cost profile for high severity pandemics. sensitivity analyses were conducted to examine the extent to which these results depend upon uncertain model parameters that may impact on the cost or effectiveness of interventions. the methodology adopted was to identify assumptions and model parameters known to have an effect on intervention outcomes, taken from previous studies with this simulation model [ , , , , , ] , and to perform univariate analyses on each, examining parameter values both significantly higher and lower than figure breakdown of pandemic cost components. breakdown of pandemic costs shown as horizontal bar, for each intervention strategy and each severity category. coloured segments of each bar represent cost components as follows: (blue) health care; (red) antiviral drugs, including dispensing costs; (green) productivity losses due to illness and social distancing interventions; (purple) productivity losses due to deaths. note that horizontal scale is different for each severity category. values are for a pandemic with unmitigated transmissibility of r = . . interventions abbreviated as: scschool closure; ccr - % community contact reduction; wr - % workforce reduction; , intervention duration in weeks; contcontinuous duration; avantiviral treatment of diagnosed symptomatic cases and antiviral prophylaxis of household members of diagnosed symptomatic cases. the baseline values. alternative parameter settings were analysed for transmissibility (as characterised by the basic reproduction number r ), voluntary household isolation of symptomatic individuals, antiviral efficacy, compliance to home isolation during school closure, degree of workforce reduction, and degree of community contact reduction. a common finding across all sensitivity analyses was that alternative parameter settings that rendered interventions less effective resulted in strategies that not only had higher attack rates, but also had higher total pandemic costs, with this effect being most pronounced for pandemics of high severity. further details and results of the sensitivity analysis can be found in an additional file accompanying this paper (additional file ). the need for an unambiguous, extended definition of severity has been noted in the world health organization report on the handling of the pandemic [ ] , which highlights the impact pandemic severity has on health care provision and associated costs. in the absence of such definitions, an extended severity metric is presented. this extends the case fatality ratio (cfr) severity scale devised by the cdc [ ] , with hospitalisation and intensive care unit (icu) data collected in australia during the pandemic. these data have been used to generate a more extensive notion of pandemic severity, relating actual age-specific attack rates with agespecific hospitalisation and mortality rates, thereby contributing to the realism of both the simulation model and the economic analysis. this pandemic severity scale together with a pandemic spread simulation model allows the calculation of the total cost of a pandemic, and to estimate the relative magnitude of all the factors that contribute to the pandemic cost, including not only pharmaceutical and medical costs, but also productivity losses due to absenteeism and death. the severity of a future pandemic is shown to have a major impact on the overall cost to a nation. unsurprisingly, high severity pandemics are shown to be significantly more costly than those of low severity, using a costing methodology which includes costs arising from losses to the economy due to death, in addition to intervention and healthcare costs. a key finding of this study is that at high severity categories, total pandemic costs are dominated by hospitalization costs and productivity losses due to death, while at low severities costs are dominated by productivity losses due to social distancing interventions resulting from closed schools and workplaces. consequently, findings indicate that at high severity, the interventions that are the most effective also have the lowest total cost. highly effective interventions greatly reduce the attack rate and consequently the number of deaths, which in turn reduces productivity losses due to death. although highly effective interventions incur significant intervention-related productivity losses, for severe pandemics having high cfr, these intervention costs are more than compensated for by the reduction in death-related productivity losses, resulting in lower overall costs. conversely, for low severity pandemics, although highly effective intervention measures do reduce medical costs and death-related productivity losses, these savings can be smaller than costs incurred due to intervention-related lost productivity, resulting in total costs that are higher than the unmitigated baseline. antiviral strategies alone are shown to be ineffective in reducing the attack rate by at least %. however, the addition of antiviral case treatment and household prophylaxis to any social distancing strategy always resulted in lower attack rates and lower total costs when compared to purely social distancing interventions. the cost of all antiviral interventions constitutes a small fraction of total pandemic costs, and these costs are outweighed by both the healthcare costs prevented, and productivity gained, by their use in preventing illness and death. it should be noted that the lowest severity category considered, pandemic category , has a cfr of . % which is at the upper end of cfr estimates for the pandemic, which has been estimated to have a cfr of between . % and . % [ ] . thus, the cost results are not directly applicable to the pandemic. vaccination has been deliberately omitted from this study. the effectiveness and cost effectiveness of vaccination will depend crucially on the timing of the availability of the vaccine relative to the arrival of the pandemic in the communityvaccination cannot be plausibly modelled without considering this delay, and how it interacts with the timing of introduction and relaxation of other, rapidly activated interventions. the examination these timing issues for realistic pandemic scenarios that include both vaccination and social distancing / antiviral interventions is an important avenue for future work. as they stand, the results of this study, specifically the "continuous" duration social distancing strategies, can be considered to be models of interim interventions to be used prior to a vaccination campaign. the results are based on the community structure, demographics and healthcare system of a combined rural and urban australian community, and as such may not be applicable to developing world communities with different population or healthcare characteristics. although the cost and effectiveness results are directly applicable to pandemic interventions in a small community of , individuals, we expect that the per-capita costs and final attack rate percentages derived in this study can be extended to larger populations with similar demographics, provided a number of conditions are met. for the results to be generalisable, it needs to be assumed that communities making up the larger population implement the same intervention strategies, and instigate interventions upon the arrival of the pandemic in the local community (according to the criteria described in the methods section). the assumption is also made that there are no travel restrictions between communities. it should be noted that the single-community epidemic results do not predict the overall timing of the pandemic in the larger population. the simulation model used in this study has been used in previous studies to examine various aspects of social distancing and pharmaceutical (antiviral and vaccine) pandemic influenza interventions [ , , , , , ]. this simulation model shares characteristics with other individual-based pandemic influenza simulation models that have been employed at a variety of scales, including small communities [ , , , , , ] , cities [ , ] , countries [ , , , ] and whole continents [ ] . several related studies which also used individualbased simulation models of influenza spread coupled with costing models are those of those of sander et al., perlroth et al., brown et al., and andradottir et al. [ , , , ] . the current study extends upon the scope of these studies in several ways: five gradations of pandemic severity are considered, more combinations of interventions are considered, social distancing interventions of varying durations are considered, and probabilities of severe health outcomes for each severity category are based on fatality, hospitalization and icu usage data as observed from the pandemic. also in contrast with those models, we have chosen to include a cost component arising from productivity loss due to death, though a similar costing without death-related productivity losses has been included in (additional file ). for a pandemic with very low severity, with a cfr consistent with mild seasonal influenza, and that of the pandemic, previous results with the simulation and costing model used for this paper coincide with the studies mentioned above [ ] . specifically, they showed that antiviral treatment and prophylaxis were effective in reducing the attack rate and had a low or negative incremental cost, and that adding continual school closure further decreased attack rates, but significantly increased total cost. for high severity pandemics the inclusion of productivity loss following death, as presented in this study, leads to a markedly different assessment of total costs when compared to the two studies quoted above that considered severe pandemics [ , ] . for example, perlroth et al. found that the incremental cost of adding continuous school closure to an antiviral strategy was always positive, even for pandemics with high transmissibility (r = . ) and a cfr of up to %, meaning that adding school closure always increased total costs. similarly sander et al. found that the addition of continuous school closure to an extended antiviral strategy also increased total costs, including pandemics with a % cfr. in contrast, we found that adding continuous school closure to an extended prophylaxis strategy reduced total costs where the cfr was . % or greater (i.e. category and above), for a pandemic with r = . . the study of smith et al. estimated the economic impact of pandemic influenza on gross domestic product for a range of transmissibility and severity values [ ] . consistent with our study was the finding that at low severity the largest economic impacts of a pandemic would be due to school closure (effective but costly) and workplace absenteeism (largely ineffective and costly). like the other two studies mentioned above, the study of smith et al. did not include future productivity losses due to death. as a result, in contrast to our findings, they did not find that, for severe pandemics, the high short-term costs of rigorous social distancing interventions were outweighed by future productivity of people whose lives were saved by the intervention. in this study we considered the case of a pandemic that infects a significant proportion of the population, and thus incurs significant direct costs stemming from medical costs and productivity losses. however, in the case of a pandemic perceived by the public to be severe, there are likely to be additional indirect macroeconomic impacts caused by disruption of trade and tourism, consumer demand and supply, and investor confidence [ , ] . in the case of a pandemic of high severity (i.e. high case fatality ratio) but low transmissibility, these indirect effects and their resulting societal costs may constitute the main economic impact of the pandemic, an effect seen with the sars outbreak in [ ] . the results of this study are relevant to public health authorities, both in the revision of pandemic preparedness plans, and for decision-making during an emerging influenza pandemic. recent modelling research has shown that combinations of social distancing and pharmaceutical interventions may be highly effective in reducing the attack rate of a future pandemic [ , , , , , , , , ] . public health authorities are aware that rigorous social distancing measures, which were used successfully in some cities during the pandemic [ , ] , when pharmaceutical measures were unavailable, would be highly unpopular due to resulting societal disruption, and costly due to associated productivity losses [ ] . the results of this study give guidance as to the pandemic characteristics which warrant the use of such interventions. the results highlight the importance of understanding the severity of an emergent pandemic as soon as possible, as this gives guidance as to which intervention strategy to adopt. in the likely situation where the severity of an emerging pandemic is initially unknown (but is suspected to be greater than that of seasonal influenza), the results indicate that the most appropriate intervention strategy is to instigate school closure and community contact reduction, combined with antiviral drug treatment and household prophylaxis, as soon as transmission has been confirmed in the community. if severity is determined to be low, public health authorities may consider relaxing social distancing measures. in the case of a category pandemic (cfr approximately . %), little is lost by the early imposition and subsequent relaxation of social distancing interventions: results indicate that even if schools are closed for weeks while severity is being determined, the total cost of the pandemic is lower than if no interventions had been enacted. if severity is determined to be high, extending the duration of social distancing interventions results in both net savings to society and reduction in mortality. anzic influenza investigators: critical care services and h n influenza in australia and new zealand europe's initial experience with pandemic (h n ) -mitigation and delaying policies and practices mortality from pandemic a/h n influenza in england: public health surveillance study analysis of the effectiveness of interventions used during the h n influenza pandemic developing guidelines for school closure interventions to be used during a future influenza pandemic mitigation strategies for pandemic influenza in the united states targeted social distancing design for pandemic influenza reducing the impact of the next influenza pandemic using household-based public health interventions effective, robust design of community mitigation for pandemic influenza: a systematic examination of proposed us guidance a small community model for the transmission of infectious diseases: comparison of school closure as an intervention in individual-based models of an influenza pandemic simulation suggests that rapid activation of social distancing can arrest epidemic development due to a novel strain of influenza school closure is currently the main strategy to mitigate influenza a(h n )v: a modelling study modelling mitigation strategies for pandemic (h n ) nature outlook: influenza airborne transmission of influenza a/h n virus between ferrets experimental adaptation of an influenza h ha confers respiratory droplet transmission to a reassortant h ha/h n virus in ferrets the potential for respiratory droplet-transmissible a/h n influenza virus to evolve in a mammalian host statistics of influenza morbidity with special reference to certain factors in case incidence and case fatality emerging infections: pandemic influenza australian census a bayesian mcmc approach to study transmission of influenza: application to household longitudinal data estimating the impact of school closure on influenza transmission from sentinel data strategies for mitigating an influenza pandemic papoulis a: probability, random variables and stochastic processes tecumseh study of illness xiii. influenza infection and disease, - pandemic potential of a strain of influenza a (h n ): early findings pandemic (h n ) influenza community transmission was established in one australian state when the virus was first identified in north america estimating the reproduction number of the novel influenza a virus (h n ) in a southern hemisphere setting: preliminary estimate in new zealand epidemiological and transmissibility analysis of influenza a (h n ) v in a southern hemisphere setting: peru time lines of infection and disease in human influenza: a review of volunteer challenge studies early transmission characteristics of influenza a (h n ) v in australia: victorian state serial intervals and the temporal distribution of secondary infections within households of pandemic influenza a (h n ): implications for influenza control recommendations influenzavirus infections in seattle families, - containing pandemic influenza at the source design and evaluation of prophylactic interventions using infectious disease incidence data from close contact groups temporal factors in school closure policy for mitigating the spread of influenza the impact of case diagnosis coverage and diagnosis delays on the effectiveness of antiviral strategies in mitigating pandemic influenza a/h n containing pandemic influenza with antiviral agents management of influenza in households: a prospective, randomized comparison of oseltamivir treatment with or without postexposure prophylaxis neuraminidase inhibitors for influenza simulating school closure strategies to mitigate an influenza epidemic interim pre-pandemic planning guidance: community strategy for pandemic influenza mitigation in the united states cost-effectiveness of pharmaceutical-based pandemic influenza mitigation strategies health outcomes and costs of community mitigation strategies for an influenza pandemic in the united states economic evaluation of influenza pandemic mitigation strategies in the united states using a stochastic microsimulation transmission model direct medical cost of influenza-related hospitalizations in children responding to pandemic (h n ) influenza: the role of oseltamivir world health organization: making choices in health: who guide to cost-effectiveness analysis. geneva: world health organization the severity of pandemic h n influenza in the united states estimated global mortality associated with the first months of pandemic influenza a h n virus circulation: a modelling study cost-effective strategies for mitigating a future influenza pandemic with h n characteristics vaccination against pandemic influenza a/h n v in england: a real-time economic evaluation usa historical consumer price index economics of neuraminidase inhibitor stockpiling for pandemic influenza economic analysis of pandemic influenza vaccination strategies in singapore reactive strategies for containing developing outbreaks of pandemic influenza measures against transmission of pandemic h n influenza in japan in : simulation model who: report of the review committee on the functioning of the international health regulations ( ) in relation to pancemic (h n ) . geneva: world health organization an influenza simulation model for immunization studies individual--based computational modeling of smallpox epidemic control strategies modeling targeted layered containment of an influenza pandemic in the united states mitigation measures for pandemic influenza in italy: an individual based model considering different scenarios the role of population heterogeneity and human mobility in the spread of pandemic influenza would school closure for the h n influenza epidemic have been worth the cost?: a computational simulation of pennsylvania the economy-wide impact of pandemic influenza on the uk: a computable general equilibrium modelling experiment globalization and disease: the case of sars. asian economic papers global macroeconomic consequences of pandemic influenza. sydney australia: lowy institute for international policy the effect of public health measures on the influenza pandemic in u.s. cities quantifying social distancing arising from pandemic influenza economic analysis of pandemic influenza mitigation strategies for five pandemic severity categories additional file : additional results and sensitivity analyses. "milne pandemiccostadditionalfile .doc". competing interests gjm has received a travel grant from glaxosmithkline to attend an expert meeting in boston, usa; mjp has received travel grants from glaxosmithkline and wyeth to attend expert meetings in reykjavik, iceland, boston, usa and istanbul, turkey. jkk and nh have no potential competing interests. key: cord- -jdizyzbl authors: bertschinger, nils title: visual explanation of country specific differences in covid- dynamics date: - - journal: nan doi: nan sha: doc_id: cord_uid: jdizyzbl this report provides a visual examination of covid- case and death data. in particular, it shows that country specific differences can too a large extend be explained by two easily interpreted parameters. namely, the delay between reported cases and deaths and the fraction of cases observed. furthermore, this allows to lower bound the actual total number of people already infected. the unfolding covid- pandemic requires timely and finessed actions. policy makers around the globe are hard pressed to balance mitigation measures such as social distancing and economic interests. while initial studies [ ] predicted millions of potential deaths never findings hint at a much more modest outcome [ , ] . especially the case fatality rate (cfr) and the number of unobserved infections are crucial to judge the state of the pandemic as well as the effectiveness of its mitigation. yet, there estimates are plagued with high uncertainties as exemplified in the quick revisions even from the same institution [ , ] most studies are based on elaborate epidemic modeling either using stochastic or deterministic transmission dynamics. especially, the susceptible-infectedrecovered (sir) model [ ] forms a basic building block and has been extended in several directions in order to understand the dynamics of the ongoing covid- pandemic [ , , , ] . in this context, it has not only been compared with more phenomenological growth models [ ] , e.g. logistic growth, but also been used to quantify the effectiveness of quarantine and social distancing [ , ] . e.g. social distancing, can be easily included by replacing the infection rate parameter with a function allowing it to change over time. [ ] assumes one or several (soft) step functions where the infection rate drops in response to different measures after these had been implemented. such detailed modeling is required in order to capture and forecast temporal dynamics of the epidemic spreading. yet, substantial care is needed as to which parameters can be learned from the data and which cannot. indeed, i show here that sir type models -and others exhibiting similarly flexible growth dynamics -are non-identified with respect to the cfr and the fraction of observed infections. instead, a direct visual exploration of the data leads to valuable insights in this regard. in particular, much of the variability relating reported case and death counts can be explained by two easily interpreted parameters. furthermore, based on three simple assumptions a lower bound on the number of actual infections, including observed and unobserved cases, can be obtained. in turn, confirming recent estimates without the need of complex and maybe questionable modeling choices. covid- data are published by several sources, most notably the john hopkins university and the european center for decease prevention and control (ecdc). here, data from ecdc as available from https://opendata.ecdc. europa.eu/covid /casedistribution/csv are used. figure shows the total cumulative case and death counts of selected countries. these countries are among the eight most effected countries in terms of absolute and relative deaths . in the following, i will focus on relative counts as these are arguably more meaningful when comparing different countries -which could differ widely in terms of population size. assumption . death counts are more reliable than case counts. by assumption analysis will start from relative cumulative death counts d t in the following . furthermore, in order to facilitate country comparisons, dates are shifted relative to the first day that relative death counts exceed a threshold θ of , , or deaths per million inhabitants respectively, i.e. t = is defined such that d t ≥ θ for t ≥ and d t < θ for t < . figure shows the resulting time course of relative case and death counts. aligning dates in this fashion shows that several countries exhibit similar time courses, e.g. belgium and spain or china and south korea. as shown in the supplementary figure s the remaining country specific differences can be explained by differences in growth rates. re-scaling time according to the estimated doubling time indeed leads to a data collapse as complete as often observed in physical systems exhibiting scaling laws [ ] . here, these differences in the precise temporal dynamics of epidemic growth are not required. instead, the relation between relative death and case counts is considered. while relative death counts exhibit similar time courses the corresponding relative case counts c t are more variable when aligned in the same fashion, i.e. relative to the first day that d t exceeds a given threshold. as i will argue now, most of this variability can be explained with two readily interpretable parameters. there is a well defined country specific delay between reported cases and deaths. relative days since one death per mill. estimated cfrτ for varying delays τ ita figure : estimated cfr cfr τ for germany (left) and italy (right) using different delays of τ = , . . . , days. note that in each case, there exists a characteristic delay such that estimates are almost constant over time. further note that estimates for all delays will eventually converge to the same final value when enough data are available. figure suggests that relative case counts are not aligned as some countries, e.g. germany, systematically lead the counts reported in other countries, e.g. italy. such a difference could mean that individuals survive longer, e.g. due to differences in medical care, until they eventually. it could also just reflect reporting delays due to bureaucratic reasons. in any case, it is clearly the case that individuals die not immediately, but some days after they had been tested positive previously. this delay also needs to be taken into account when estimating the case fatality rate (cfr). commonly the cfr is defined as cfr = dt ct . not surprisingly this estimate is highly variable and changes systematically over time, especially at the beginning of an epidemic. the observation captured in assumption also explains the surprisingly low cfrs initially announced in austria and germany where reported death counts are simply some days older compared to other countries! thus, taking into account that individuals that had been tested positive will usually not die on the same day but after some delay τ (if at all), i define i.e. comparing current death with previous case counts. figure shows the cfrs estimated for germany and italy in this fashion, i.e. for different delays τ . the estimate using τ = rises over time simply reflecting that due to the reporting delay death counts have not yet caught up with the exponentially growing case counts. interestingly, for each country there exists a characteristic delay at which the estimated cfrs are essentially constant. thus, reflecting the hypothesized delay between reported cases and deaths. this delay can either be estimated by visual inspection or by fitting a linear model on each delay and picking the one with minimal absolute slope . figure shows the delays τ and corresponding cfrs cfr τ , i.e. the median cfr value at this delay, estimated for each country in this fashion. in order to fully relate the observed case with death counts an additional, and stronger, assumption is needed. assumption . the true case fatality rate is the same for all countries. while assumption ignores medical, demographic and other differences between countries, i believe it unlikely that the cfr is very different across different countries. in the end, its the same type of virus spreading in all countries. this suggests that differences in estimated cfrs simply reflect differences in the ability of countries to actually observe all infected individuals, i.e. due to more or less effective tracking and testing procedures. to illustrate this effect, a true cfr of % is assumed in the following. this is consistent with current knowledge and had also been used in other studies [ ] . just from the estimated values any cfr below the minimum of all estimates (about % found for austria and south korea) and above . % (which would imply an observed fraction above one for belgium) is compatible with the data. figure shows the country specific estimates of reporting delay, cfr and fraction of observed cases (assuming a true cfr of %) obtained in this fashion. in turn, figure shows the implied relative case counts when shifted by the estimated delays and scaled to reflect the unobserved fraction of cases for each country. notably, these implied counts all align nearly as good as the death counts in figure (right panel) even though the initial threshold was based on the deaths counts alone. the supplementary figure s shows that this holds also when re-scaling time according to the growth rate of deaths. overall, the collapse of implied case dynamics convincingly illustrates that the relation between case and death counts is fully and reliably captured by two parameters -compatible with three reasonable assumptions. in reality, an additional delay between an infection and its corresponding positive test result can be assumed. therefore, the fraction of observed cases will be even lower than obtained by the analysis above. unfortunately, assuming a sufficiently flexible model for the growth of the actual cases already the cfr and the fraction of observed cases, let alone an additional delay, are not jointly identifiable. the basic sir model [ ] , assumes that an infection unfolds when susceptible (s) individuals become infected (i) -which in turn infect further susceptible individuals. finally, infected individuals recover (r) (or die) and are no longer susceptible. in continuous time, the dynamics can be described by the following system of ordinary differential equations (odes): where n ≡ s t + i t + r t is constant over time. model parameters are • the infection rate β • and the recovery rate γ. in this model, the average time of infection is γ − giving rise to a basic reproduction number of r = βγ − . sir models and extensions are widely used in epidemic modeling. the have also been applied to the understand the dynamics of the ongoing covid- pandemic [ , , , ] . in particular, models including the possibility of unobserved cases or including a reporting delay have been developed. within the sir framework, both effects can be included in several ways, most easily by assuming that observed cumulative infections are simply a fraction α ∈ [ , ] of previous total infections i t + r t , i.e. α(i t−τ + r t−τ ). a more elaborate attempt instead considers more detailed dynamics of the form where a fraction α of infected individuals i t is observed (o t ) after an initial delay γ i . in any case, whether observed or not, individuals recover (or die) after an additional delay. in general, the infection rates β i , β o , β u could be different for initial infections and observed vs unobserved cases . in addition, mitigation measures, e.g. social distancing, can be easily included by assuming that β's are functions of time. e.g. αγ i i t . now assume a second model with α = > α which nevertheless exhibits the same dynamics with an additional time shift τ . by using a time varying β (t) such that we obtain exactly the same number of observed cases, i.e. o t−τ = o t . note that as α > α, we have that s t < s t−τ and s t is a sigmoidal function of time due to the sir dynamics. furthermore, when the population is large, i.e. n and s ≈ n the resulting β (t) is mostly driven by the drop in s t+τ as compared to the much smaller change in s t . indeed, figure shows the dynamics of the above model with β = . , γ i = γ r = , α = . starting from (n = , , , , ). in turn, assuming α = and τ = , the time varying infectivity β (t) is approximated by the best-fitting logistic sigmoid of the form β + (β − β )σ( t−τ t ). note that the number of observed cases is identical, just shifted by τ , whereas the final fraction of susceptible individuals is vastly different. indeed, in the first case the epidemic is stopped by group immunity whereas in the second case effective mitigation measures are imposed. correspondingly, police implications would be vastly different in the two situations even though they are observationally indistinguishable. instead of detailed modeling of epidemic dynamics, which is further complicated due policy actions requiring flexible models with delicately chosen parameters, the present analysis is based on visual inspection of the reported data. overall, relative case and deaths counts (observed for country c) seem to be related as follows: where a c r denotes the actual infections a fraction α c ∈ [ , ] is observed. a suitable reporting delay τ c can be estimated by visual inspection of the data, but again the fraction of observed cases α c and cfr cfr are not jointly identifiable if there exist sets of parameters such that a t−τ = αa t , as is the case for dynamic sir type models. in the end, any epidemic modeling implicitly or explicitly chooses a parametric form for the latent growth process a t and will not be identified if sufficiently flexible. yet, assumption three of a constant cfr across all countries allows to derive . a range of values consistent among all countries, . as well as recover the corresponding fraction of observed cases in each country. thereby, assuming a reasonable true cfr value, i.e. from the model implied range . % to % which is also consistent with current knowledge, and using the estimated delay, the actual case numbers can be reconstructed. figure shows the resulting actual relative infection counts across several countries. note that despite the simplicity of this analysis, the estimated numbers compare favorable [ ] . indeed, i would rather trust these even more as they do not rely on complex modeling assumptions but follow from visual inspection of the data. overall, i have illustrated that much of the variability between observed case and deaths counts between different countries can be explained by two parameters. namely, the reporting delay τ and the fraction of observed cases. especially the reporting delay exhibits crucial differences between countries and needs to be taken into account when comparing data and planning actions. in particular, containment is challenging when long incubation times are involved [ ] but a combination of case tracing and isolation policies could be effective [ , ] . thus, detailed epidemic modeling is certainly needed in order to judge the effectiveness of current mitigation measures across different countries [ , ] . on the other hand, important parameters need to fixed based on additional knowledge as they cannot be identified within sufficiently flexible models. in the end, data analysis and detailed modeling alone only gets us only that far and more extensive testing is urgently needed to obtain reliable knowledge about the current progression of the covid- pandemic. figure s : aligned data as in figure , but time is additionally re-scaled to match local growth rate of the epidemics. a data collapse by re-scaling time aligning the data as in figure still shows country-specific differences in the temporal course of epidemic spreading. much of this difference can be attributed to the speed at which the epidemic spreads in different countries. estimating the local growth rate of deaths d log dt dt by the three day running average of observed changes log d t+ − log d t , relative time, i.e. relative to the threshold of total deaths reached, is re-scaled to match local growth rates. figure s shows the resulting data collapse for d t and the corresponding c t dynamics. further, taking the estimated relation between cases and deaths via cfr and country specific delays into account an almost complete data collapse for the cases is obtained. not that as in the main text, data are aligned according to relative death counts only. furthermore, the temporal re-scaling is based on the estimated growth rate from the death counts as well. yet, shifting and scaling case data according to the estimated country specific delay and fraction of observed cases leads to an almost complete data collapse as well. as individual countries can be hard to identify in figures and , the ny times featured panel views where each country is highlighted above a background of all countries. here, i provide similar figures for relative death and case counts using a threshold of two deaths per million inhabitants. note that an sir model already includes a natural delay between infections and recovery (or death). indeed, the total number of cases is given by c t = i t + r t while the cumulative death toll is obtained as cfrr t , i.e. modeling that a fraction of individuals does not recover but dies instead. assuming that only a fraction α of cases is observed, the model is estimated with the following relative days since two death per mill. relative count figure s : details of aligned and adjusted case counts for threshold of two deaths per million. sampling distribution thus, observed daily changes are related to the model implement changes via an over-dispersed poisson aka negative binomial distribution. figure s shows the resulting estimates assuming β t = β + (β − β )σ( t−τ t ) and cfr = % . the sir model assuming a single change point in the infectivity, via the logistic sigmoid sigma(·) in β t reflecting the implementation of social distancing is clearly able to capture the epidemic dynamics. yet, parameter uncertainties, especially about the reporting delay can be large . bayesian estimates have been carried out using stan (full code available from my https://github.com/bertschi/covid repository) and using weakly informative broad normal or student-t prior distributions on all parameters. due to the non-identifiability derived in the main text either α or cfr needs to be fixed. the high uncertainty could also reflect that an sir dynamics is misspecified in that it corresponds to an exponential delay distribution. such additional model assumptions need to be carefully chosen in order to obtain meaningful parameter estimates. figure s : model predictions and estimated parameters from sir model fitted to data from italy (top) and germany (bottom). mitigation and herd immunity strategy for covid- is likely to fail. medrxiv inferring covid- spreading rates and potential change points for case number forecasts impact of nonpharmaceutical interventions (npis) to reduce covid- mortality and healthcare demand estimating the number of infections and the impact of non factors that make an infectious disease outbreak controllable a retrospective bayesian model for measuring covariate effects on observed covid- test and case counts substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov ) fundamental principles of epidemic spread highlight the immediate need for large-scale serological surveys to assess the stage of the sars-cov- epidemic. medrxiv effective containment explains subexponential growth in confirmed cases of recent covid- outbreak in mainland china scaling, universality, and renormalization: three pillars of modern critical phenomena rational evaluation of various epidemic models based on the covid- data of china. medrxiv modeling the epidemic dynamics and control of covid- outbreak in china. medrxiv key: cord- - tjhdczt authors: green, manfred s.; peer, victoria; schwartz, naama; nitzan, dorit title: the confounded crude case-fatality rates (cfr) for covid- hide more than they reveal—a comparison of age-specific and age-adjusted cfrs between seven countries date: - - journal: plos one doi: . /journal.pone. sha: doc_id: cord_uid: tjhdczt background: crude case-fatality rates (cfrs) for covid- vary widely between countries. there are serious limitations in the cfrs when making comparisons. we examined how the age distribution of the cases is responsible for the covid- cfr differences between countries. methods: covid- cases and deaths, by ten-year age-groups, were available from the reports of seven countries. the overall and age-specific cfrs were computed for each country. the age-adjusted cfrs were computed by the direct method, using the combined number of cases in all seven countries in each age group as the standard population. a meta-analytic approach was used to obtain pooled age-specific cfrs. findings: the crude overall cfrs varied between . % and . % in the seven countries and the variation in the age-specific cfrs were much smaller. there was wide variation in the age distribution of the cases between countries. the ratio of the crude cfr for the country with the highest cfr to that with the lowest ( . ) was much lower for the age-adjusted cfrs rates ( . ). conclusions: the age structure of the cases explains much of differences in the crude cfrs between countries and adjusting for age substantially reduces this variation. other factors such as the definition of cases, coding of deaths and the standard of healthcare are likely to account for much of the residual variation. it is misleading to compare the crude covid- cfrs between countries and should be avoided. at the very least, age-specific and age-adjusted cfrs should be used for comparisons. , who declared covid- as a pandemic. by october, , almost all countries had been affected, and globally there were reports of more than million cases and more than a million deaths. early estimates indicated that the average proportion of deaths among the diagnosed cases, defined as the case-fatality rate (cfr), was around . % [ ] . however, subsequently, the reported crude covid- cfrs varied widely between countries [ , ] . the limitations of comparing crude cfrs in general, has been evaluated previously [ ] . the strong positive association between the covid- cfr and age has been demonstrated both in observational studies [ ] and in a model-based analysis [ ] , particularly over the age of . the interpretation of the cfr depends on the context in which it is used. in a single cohort of patients, it can indicate the severity of the disease at a single point in time. it can also be used to assess trends in the impact of changes in health care over time. in the current context of the covid- pandemic, cfrs are commonly compared between countries and that may lead to speculation on differences in healthcare. for example, it may be concluded that the cfrs somehow reflect the successes and failures of the different countries in the treatment of serious cases. in general a number of factors could impact on the both the numerator and denominator of the cfr. this is especially important for the cfr for covid- . there could be misclassification of causes of death and there are likely to be variations in the definition of cases in the denominator. substantial confounding by age could impact on cfr comparisons between different groups. a related concept is the infection fatality rate (ifr) which includes asymptomatic cases in the denominator, which need to be identified by screening tests. since the ifr is rarely available for covid- , in this paper we consider only the cfr. in this paper, we examined the contribution of the age distribution of the cases when comparing the covid- cfrs between seven countries, with widely varying cfrs. we studied published crude and age-specific cfrs in cohorts of cases of covid- in seven countries, with varying periods of follow-up. the countries chosen were based on the accessibility of the data. the first case covid- in israel was confirmed on february . the first case in south korea was announced on january .the covid- spread from hubei province, china, after december .sars-cov- was confirmed to have reached spain, sweden, italy, and canada on end of january . the data on cases and deaths by age group were available from china on february , . for south korea cases and deaths were updated to the end of april. for spain, the cases and deaths were updated to mid-may . the information for israel, italy, canada and sweden were updated during august . the data for each country were not necessarily updated to the time of the study, and the cfr's may have changed over time. age-specific data on the cases and deaths by ten-year age groups ( - , - , . . ... +), were available for seven countries: italy [ ] , spain [ ], sweden [ ] , china [ ] , s korea [ ] , israel (ministry of health, personal communication) and canada [ ] . the outcome variable was defined as the crude cfr defined as the number of deaths divided by the number of reported cases. the exposure variable was the individual country. age-group was considered as a confounding variable. age-adjustment was carried by the direct method, using the distribution of the combined cases of all six countries by age group as the standard population. % confidence intervals were computed for each age-adjusted rate using winpepi [version . , aug, ]. open access aggregative and anonymous data were used and there was no need for ethics committee approval. the age-specific number of cases, number of deaths and the crude cfrs by country are given in table . the distributions of the cases vary markedly between the countries. for example, israel and south korea are heavily weighted in the - age group, china has a more balanced distribution and italy, sweden and canada are heavily weighted in the over age groups. the distributions of the cases for each country are shown in fig . it is clear that distributions vary widely and are not necessarily related to the age distribution of the population of the country. for example, for south korea, there is a relatively large number of cases in the age group - , due to an outbreak affecting that age group in particular. the age-specific cfrs are shown in fig . while there are differences in the age-specific cfrs between countries, the trend of steeply increasing cfrs in the oldest age groups is evident. age groups - and - were excluded from the figure, since there were almost no deaths. the crude (%) and age-adjusted cfrs (%) are compared in table . the crude cfrs varied from . % for israel to . % for italy and the ratios of the crude cfrs compared with lowest cfr, varied between . and . . the age-adjusted cfrs varied between . % for china to . % for italy and the ratios of the age-adjusted cfrs for each country compared with the lowest cfr adjusted varied between . and . . fig shows graphically the marked reduction in the differences between the crude (%) and age-adjusted cfrs (%) for the seven countries. we used meta-analytic methods to obtain weighted pooled estimates of the cfr's by age group. the results of the meta-analysis are presented in the forest plot in fig , in our study, we examined crude, age-specific and age-adjusted cfrs for covid- in seven countries, with widely varying crude cfrs. the trends in the age-specific cfrs were remarkably similar in the seven countries, with the cfr's increasing steeply in those over . after adjusting for age, the marked differences in the crude cfrs were substantially reduced. these findings demonstrate the importance of accounting for age when comparing rates in general and cfrs in particular. the results of this study are strengthened by the use of national data or large datasets from a number of countries, with considerable differences in the extent of the pandemic in each country. it should be stressed that the age distribution of the cases was used to compute the age-adjusted cfrs and not the age distribution of the total population in each country. in addition to the age distribution of the cases, the use of the cfr for comparisons between countries has other important limitations. selection bias is clearly present when calculating the denominator on the basis of reported cases. as mentioned, the cfr must be distinguished from the ifr, which includes asymptomatic cases identified by deliberate or incidental screening with diagnostic tests. in addition, if only those with more severe symptoms are tested this will affect the denominator of the cfr and will depend on the testing strategy of each country. if more mild cases are identified, this is likely to reduce the cfr. there is a lag time between the reporting of the case and the death which can occur up to weeks later. in the country reports, cases and deaths are usually reported at the same time, so the cases in the denominator are usually an overestimate of the true denominator which should be the number of cases reported sometime earlier [ , ] . this will have a more dramatic effect when the number of cases are rising rapidly. selection bias may also affect the numerator if only deaths occurring in hospital are reported. information bias can be present in both the numerator and denominator of the cfr. the definition of the cases may be biased due to the variability of the sensitivity and specificity of the diagnostic tests for covid- . information bias in the numerator can occur when the cause of death is coded. this could be particularly problematic in elderly people with multiple co-morbidities. the purpose of this paper is to demonstrate the dramatic effect of confounding by the age distribution of the cases when using crude overall cfrs for country comparisons. this was shown in an earlier paper when comparing six countries, and we have extended it to a comparison of seven countries with widely different cfrs. the age structures of the population of the seven countries used in this study vary markedly. the percentage of the population age and over is % in israel, % in italy %, . % in s korea, . % in spain, % in sweden, % in china and . % in canada [ ] . however, the main impact of confounding by age was due to the differences in the age distribution of the cases. this was largely due to the specific circumstances of exposure. for example, most of the cases in italy occurred in an area of a particularly old population [ ] . in some countries, many of the cases were medical personnel, a large number of whom were relatively young women [ ] . in south korea, a large percentage of the cases were young women associated with a specific religious group [ ] . in germany, many of the cases were relatively young people returning from skiing holidays in austria and italy [ ] . in israel, the largest outbreaks occurred in the ultra-orthodox jewish community, where the number of children per family is much higher than in the general population. other factors affecting the age distribution of the cases, depended on the frequency of outbreaks in homes for the elderly [ ] . the results of this study once again demonstrate the pitfalls of comparing unadjusted rates. the assumption that differences between countries in testing policies or standard of treatment accounted for the wide discrepancies in cfrs, is not well-founded. this does not mean that there are no differences. for example, it is possible that where the health services were overloaded, younger patients were more likely to be admitted to intensive care units with better chances of survival. clearly, the data are incomplete and other factors affecting cfrs such as case definitions, use of different denominators, underlying health conditions and the standard of health services are likely to play important roles. in order to assess the impact of these factors, age-specific and age-adjusted cfrs must be used. in addition to the selection and information biases inherent in computing cfrs, the age structure of the cases dramatically impacts on the differences in the crude cfrs between countries. failure to account for this source of confounding markedly distorts the country comparisons. the substantial reduction in the differences in the age-adjusted cfrs suggest that differences in the standard of healthcare between these countries may not play as important a role in affecting the death rates, as some have hypothesized. crude covid- cfrs have no real use for between country comparisons and should be avoided. in general, for comparisons between groups and countries, age-adjusted cfrs can be used, but age-specific covid- cfrs are generally far more meaningful. similarity in case fatality rates (cfr) of covid- /sars-cov- in italy and china case-fatality rate and characteristics of patients dying in relation to covid- in italy early estimation of the case fatality rate of covid- in mainland china: a data-driven analysis potential biases in estimating absolute and relative case-fatality risks during outbreaks covid- : death rate is . % and increases with age, study estimates estimates of the severity of coronavirus disease : a model-based analysis aggiornamento nazionale the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- )-china, . the novel coronavirus pneumonia emergency response epidemiology team a &bid = &list_no = &act = view coronavirus disease (covid- ): epidemiology update". public health agency of canada real estimates of mortality following covid- infection estimating case fatality rates of covid- impact on mental health and perceptions of psychological care among medical and nursing staff in wuhan during the novel coronavirus disease outbreak: a cross-sectional study korean society of infectious diseases; korean society of pediatric infectious diseases report on the epidemiological features of coronavirus disease (covid- ) outbreak in the republic of korea from a german exception? why the country's coronavirus death rate is low we express our appreciation to the official institutions of all countries for the providing their data on covid- . conceptualization: manfred s. green, dorit nitzan. key: cord- -o aosx q authors: lipsitch, marc; donnelly, christl a.; fraser, christophe; blake, isobel m.; cori, anne; dorigatti, ilaria; ferguson, neil m.; garske, tini; mills, harriet l.; riley, steven; van kerkhove, maria d.; hernán, miguel a. title: potential biases in estimating absolute and relative case-fatality risks during outbreaks date: - - journal: plos negl trop dis doi: . /journal.pntd. sha: doc_id: cord_uid: o aosx q estimating the case-fatality risk (cfr)—the probability that a person dies from an infection given that they are a case—is a high priority in epidemiologic investigation of newly emerging infectious diseases and sometimes in new outbreaks of known infectious diseases. the data available to estimate the overall cfr are often gathered for other purposes (e.g., surveillance) in challenging circumstances. we describe two forms of bias that may affect the estimation of the overall cfr—preferential ascertainment of severe cases and bias from reporting delays—and review solutions that have been proposed and implemented in past epidemics. also of interest is the estimation of the causal impact of specific interventions (e.g., hospitalization, or hospitalization at a particular hospital) on survival, which can be estimated as a relative cfr for two or more groups. when observational data are used for this purpose, three more sources of bias may arise: confounding, survivorship bias, and selection due to preferential inclusion in surveillance datasets of those who are hospitalized and/or die. we illustrate these biases and caution against causal interpretation of differential cfr among those receiving different interventions in observational datasets. again, we discuss ways to reduce these biases, particularly by estimating outcomes in smaller but more systematically defined cohorts ascertained before the onset of symptoms, such as those identified by forward contact tracing. finally, we discuss the circumstances in which these biases may affect non-causal interpretation of risk factors for death among cases. the case-fatality risk (cfr) is a key quantity in characterizing new infectious agents and new outbreaks of known agents. the cfr can be defined as the probability that a case dies from the infection. several variations of the definition of "case" are used for different infections, as discussed in box . under all these definitions, the cfr characterizes the severity of an infection and is useful for planning and determining the intensity of a response to an outbreak [ , ] . moreover, the cfr may be compared between cases who do and do not receive particular treatments as a way of trying to estimate the causal impact of these treatments on survival. such causal inference might ideally be done in a randomized trial in which individuals are randomly assigned to treatments, but this is often not possible during an outbreak for logistical, ethical, and other reasons [ ] . therefore, observational estimates of cfr under different treatment conditions may be the only available means to assess the impact of various treatments. however, observational studies conducted in the early phases of an outbreak, when public health authorities are appropriately concentrating on crisis response and not on rigorous study design, are challenging. a common problem is that disease severity of the cases recorded in a surveillance database will differ, perhaps substantially, from that of all cases in the population. this issue has arisen in the present epidemic of ebola virus disease in west africa and in many previous outbreaks and epidemics [ ] [ ] [ ] [ ] [ ] [ ] and will continue to arise in future ones. here we outline two biases that may occur when estimating the cfr in a population from a surveillance database, and three more biases that may occur when comparing the cfr between subgroups to estimate the causal effect of medical interventions. we also briefly consider the applicability of these biases to a different application: comparing the cfr across different groups of people, for example, by geography, sex, age, comorbidities, and other "unchangeable" risk factors. such factors are "unchangeable" in the sense that they are not candidates for intervention in the setting of the outbreak, though some could, of course, change over longer timescales. the goal of estimating the cfr in groups defined by such unchangeable factors is not to understand the causal role of these factors in mortality, but to develop a predictive model for mortality that might be used to improve prognostic accuracy or identify disparities. such box . definition of the cfr. the cfr itself is an ambiguous term, as its definition and value depend on what qualifies an individual to be a "case." several different precise definitions of cfr have been used in practice, as have several imprecise ones. the infection-fatality risk (sometimes written ifr) defines a case as a person who has shown evidence of infection, either by clinical detection of the pathogen or by seroconversion or other immune response. such individuals may or may not be symptomatic, though asymptomatic ones may go undetected. the symptomatic case-fatality risk (scfr) defines a case as someone who is infected and shows certain symptoms. infection in many outbreaks is given several gradations, including confirmed (definitive laboratory confirmation), probable (high degree of suspicion, by various clinical and epidemiologic criteria, without laboratory confirmation), and possible or suspected (lower degree of suspicion). this paper describes issues in estimating any of these risks or comparing them across groups, but does not go into the details of each possible definition. furthermore, unlike risks commonly used in epidemiologic research (e.g., the -year mortality risk), the length of the period during which deaths are counted for the cfr is rarely explicit, probably because it is considered to be short enough to avoid ambiguity in the definition of cfr. however, a precise definition of the cfr would need to include the risk period, e.g., the -month cfr of ebola. clearly, the definition of cfr for a particular investigation should be specified as precisely as possible. predictions may be affected by survivorship bias and selection bias, but not by confounding, as we discuss. two biases that may affect the estimation of an overall cfr are presented in table : for diseases that have a spectrum of clinical presentation, those cases that come to the attention of public health authorities and are entered into surveillance databases will typically be people with the most severe symptoms, who seek medical care, are admitted to hospital, or die. therefore, the cfr will typically be higher among detected cases than among the entire population of cases, given that the latter may include individuals with mild, subclinical, and (under some definitions of "case") asymptomatic presentations. laboratory confirmation as an inclusion criterion may reduce this bias if it is able to detect a wider spectrum of presentations, or may exacerbate it if the probability of receiving a laboratory test is higher for more severe cases and/or if test sensitivity is higher for more severe cases. the magnitude of this bias may be uncertain for a long period because the spectrum of clinical presentations is itself uncertain at the start of an outbreak of a new disease [ , ] . all proposed approaches to estimate and correct for this bias (table ) require auxiliary data sources to estimate how the reported subset of cases compares with the overall population of cases. the availability of such auxiliary data sources will depend on the context of the outbreak. during an ongoing epidemic, there is a delay between the time someone dies and the time their death is reported. therefore, at any moment in time, the list of cases includes people who will die and whose death has not yet occurred, or has occurred but not yet been reported. thus dividing the cumulative number of reported deaths by the cumulative number of reported cases at any moment will underestimate the true cfr. the key determinants of the magnitude of the bias are the epidemic growth rate and the distribution of delays from case-reporting to death-reporting; the longer the delays and the faster the growth rate, the greater the bias. heuristically, the underestimate will be proportionate to the expansion of the epidemic during the delay between the time a case enters the database to the time the death of that case enters the database (if it occurs). fig illustrates an example where the delay is weeks, the epidemic doubling time is weeks, and the underestimate is by a factor of / . . this bias may be corrected for in various ways, and to varying degrees, using information on the growth rate of the epidemic, the distribution of times from case-report to death-report, and the distribution of times from case-report to recovery-report (i.e., report that the case is no longer at risk of dying of the infection). a simple approach is to limit analysis to those cases with sufficiently long follow-up for a death to have been recorded had a death occurred, but this approach may result in an exceedingly small sample size if applied early in the epidemic. several such strategies are described in table . here, and in table , we discuss the sources of three biases that threaten the validity of a causal interpretation of a difference in cfr between groups who have received different interventions. such a difference might be measured as a risk ratio (rr), the ratio of cfr in group a to that in group b, or as an odds ratio (or), the ratio of the odds of dying in group a and group b, or as table . potential biases that can affect the estimation of cfr (and thereby also the comparison of cfr across groups). direction outbreaks in which analysts have noted this bias may be operating preferential ascertainment of severe cases: in an infection with a range of manifestations from relatively mild to highly severe, milder cases are less likely to appear in surveillance databases than more severe ones; therefore, the cfr among ascertained cases will be higher than that among all cases. spuriously increases estimate of cfr influenza h n pdm [ ] [ ] [ ] , influenza h n [ ] , influenza h n [ ] (though this hypothesis has been refuted [ ] ), middle east respiratory syndrome [ ] , ebola (this article) [ ] note: these solutions are listed in approximately the temporal order in which they may be practical, from early in the outbreak to later on; details will depend on the epidemiology of the outbreak. use sentinel surveillance sites to estimate multipliers between various levels of severity and extrapolate to a larger population [ ] . survey-or health-facility-based surveillance for symptomatic infection [ ] in a defined population, combined with enhanced surveillance for severe outcomes (particularly death) in the same population. use travelers from high-burden areas with low ascertainment to low-burden areas with higher ascertainment to estimate incidence of infection in source population [ , ] , thereby providing a more accurate denominator for comparison to deaths in source population. surveillance pyramid approaches: reconstruct conditional probabilities of appearing at one severity level conditional on reaching a lower severity level; combine data sources that have relatively complete ascertainment of higher severity levels (e.g., hospitalization, icu, death) with those having relatively complete ascertainment of lower levels (e.g., seeking medical attention, hospitalization) [ , ] . cfr can then be estimated as a product of conditional probabilities with associated uncertainties [ ] . serologic ascertainment of infection [ , ] to provide a population denominator for infections regardless of symptoms, combined with active surveillance for more severe outcomes. individuals ascertained by a different mechanism, e.g., named healthy contacts of cases who subsequently test positive, could be a more representative group in whom to assess severity [ ] . bias due to delayed reporting. during an ongoing epidemic, at any week w the persons who have died up to time w will not be the only ones to die of the infection among those who became cases by w. the denominator of the cfr (cases) includes persons who have not yet died of the infection, but will do so in the future. thus the cfr by w will be less than the true cfr. this bias will be particularly severe for infections that are increasing rapidly in incidence and for which the infection-death time interval is long. spuriously decreases estimate of cfr sars [ ] , influenza h n pdm [ ] , ebola [ , ] limit analysis to those cases with sufficiently long follow-up for a death to have been recorded had a death occurred. while this may lead to extremely small sample sizes near the beginning of an epidemic, this strategy is more feasible after a local epidemic wave, including reporting delays, has passed or nearly passed [ , ] . limit analysis to those cases known either to have died or recovered, but exclude those with unknown outcome (biased if severity affects outcome ascertainment) [ ] [ ] [ ] . apply a competing-risk kaplan-meier-like method or a parametric mixture model to the full dataset (biased if the times to death and time to recovery have different distributions) [ , ] . fit the distribution of times to death and to recovery to estimate the true cfr [ , ] , or inverse-probability weight deaths using the conditional probability of having survived by w, given that one dies [ ] (biased if the probability distribution is incorrect). a risk difference (rd), the difference between the cfr in group a and group b. we use the term relative cfr to refer to any of these measures, and call a relative cfr non-null when it differs from (ratio) or (difference). when these biases are present, a relative cfr, different from the null value in group b compared with a does not imply a causal effect of group. for example, if group a is non-hospitalized patients and group b is hospitalized patients, an odds ratio of death less than may not imply a beneficial effect of hospitalization on the odds of death. similarly, a relative cfr greater than may not imply that hospitalization is harmful. we use the estimation of the causal impact of hospitalization on mortality as our example throughout this section. note that exactly the same reasoning applies to assessment of another intervention or to a comparison of two interventions, for example, a comparison of treatment at center a versus treatment at center b. the first bias arises in a naïve comparison of mortality between those who have and those who have not been hospitalized. if some individuals die before they can be admitted to a hospital, they will by definition not become hospitalized. therefore, even in the absence of any effect of hospitalization on the risk of death, there will be fewer deaths among those hospitalized than among those not hospitalized. we will refer to this bias as "survivorship bias." in an ongoing epidemic, there will typically be a delay between the reporting of a case and the reporting of the death of that case, if the infected person dies. thus, at any moment, there will be some cases reported who will die of the infection but who have not yet died, or whose deaths have not yet been reported. simple division of the number of deaths reported by week w (green), by the number of cases reported by week w (blue) will underestimate the cfr because the numerator does not include all those cases in the denominator who will eventually die. with a reporting delay of weeks for deaths compared to cases, the reported deaths curve will be shifted weeks to the right, relative to the curve of the total number of cases reported by week w who will die (red). if the epidemic doubling time is weeks, as shown here, the underestimate of cfr will be by a factor of about / . , with the exponent being the number of epidemic doubling times that pass between case reporting and death reporting. in reality, there will be a distribution of reporting delays rather than a fixed delay, making this a heuristic rather than exact approach. the problem is ameliorated in an epidemic that grows more slowly or less than exponentially. for more details, see references in table . this bias can be eliminated using data on the time d since the person became a case. the analysis would then compare the risk of death between those individuals who became table . potential biases that can affect the comparison of cfr across groups (relative cfr), using the example of comparing the cfr among hospitalized and non-hospitalized persons to assess the relative cfr for hospitalization. direction outbreaks in which analysts have noted this bias may be operating possible solutions/means of detecting the bias survivorship bias: those who die before being hospitalized cannot, by definition, be hospitalized; a crude comparison of deaths among hospitalized and non-hospitalized cases will therefore reflect the "protective" effect of death against hospitalization. this is an example of reverse causality because for these individuals, death prevented hospitalization, rather than hospitalization preventing death. spurious protective effect of hospitalization on risk of death ebola (this article) conditioning analysis on survival up to day d of symptoms, and analyzing hospitalization on day d as the intervention, will avoid this bias, as individuals who die before hospitalization will not be included in the analysis. this analysis can be repeated for different values of d and potentially combined in a parametric model. individuals identified before becoming cases (e.g., as healthy contacts of infected persons) and actively followed regardless of clinical severity could be analyzed separately as a prospective cohort for whom the course of disease could be observed and this restriction readily made. confounding: if individuals are hospitalized in response to predictors of poor prognosis, hospitalization will be noncausally associated with poor outcome. this problem is common in the pharmacoepidemiology literature [ ] . alternatively, in situations of triage, when beds or other resources are limited, individuals with better prognosis may receive hospitalization (or another intervention), creating a spurious beneficial effect of hospitalization. may be in either direction, depending on whether those receiving the intervention have better or worse prognosis. ebola (this article), h n pdm (effect of antiviral treatment on death) [ ] , influenza h n (effect of antiviral treatment on death) [ , , ] in principle, analysis can adjust for prognostic factors that also predict hospitalization via matching, stratification, or multivariable analysis. in practice, such information may be unavailable [ ] . such adjustments will be more readily made if data are obtained prospectively from a cohort of cases identified before becoming cases. selection bias occurring because mortality and hospitalization both affect the probability a case will appear in the database [ ] : when inclusion in a database can occur as a result of either of two (or more) factors, the association between these two factors within the database will be biased relative to that in the source population. for example, if death and hospital admission are both means by which cases are ascertained and enter a database, as may be the case for ebola datasets, hospitalization will be spuriously associated with death in the dataset even if hospitalization has no causal effect in preventing death. direction of bias depends on the probabilities of inclusion in the dataset depending on exposure and outcome. ebola (this article) without knowledge of how cases came to enter a dataset, the magnitude of this bias cannot be evaluated. under assumptions about the proportion of cases entering the dataset for various reasons, a sensitivity analysis could be performed to assess the plausibility of assigning any observed protective effect to this bias [ ] . this bias too may be avoided by prospectively following a cohort of individuals who are identified before becoming cases. the second source of bias is confounding. severity of disease will likely affect the probability of hospitalization and the probability of death. as a common cause of the exposure of interest (hospitalization) and the outcome (death), disease severity is a confounder of the causal effect of hospitalization on death. if hospitalization is offered to especially severe cases or-in the setting of extreme triage-to especially mild cases, then hospitalization would spuriously appear harmful (if hospitalization went to especially severe cases) or beneficial (if it went to especially mild cases). there may be other confounders of this effect besides disease severity. individuals living in rural areas may be at greater risk of mortality (e.g., due to malnutrition) and also less likely to be hospitalized (due to longer travel time to hospital). place of residence (or travel time to hospital) in this setting would be a confounder of the effect of hospitalization on death. the standard approach to reducing confounding is to stratify, restrict, or adjust for prognostic factors that affect the propensity to receive the treatment (in this case to be hospitalized) [ ] . however, such information may frequently be limited or unavailable in databases compiled during outbreaks, especially in resource-limited settings. the third source of bias is selection occurring because mortality and hospitalization both affect the probability a case will appear in the database. during an outbreak, many cases may not appear in the database because they are not ascertained or because information about them is not obtainable. in particular, cases who are not hospitalized, and cases who do not die, may be less likely than other cases to appear in the database because they are less likely to come to medical or public health attention. if appearance in a database is the common effect of hospitalization and death, then the association between hospitalization and death among cases in the database may be non-null even if hospitalization and death were independent in the population of all cases. the direction and magnitude of the association between hospitalization and death among cases in the database will then be the result of combining the association due to this selection bias, the association due to a potential effect of hospitalization on mortality, and the association due to confounding. hypothetical examples are shown in tables - . in these tables, the association in the population between hospitalization on day (an arbitrarily chosen day) and death is negative; individuals hospitalized on day (an arbitrarily chosen day) of symptoms have a lower probability of death than those who are not hospitalized on day of symptoms. if we assume that this analysis has avoided survivorship bias by limiting analysis to cases still alive on day , then the population-level association would reflect a combination of the causal effect of hospitalization on day on risk of death, and confounding by severity or other factors. this population-level association is the same in tables , , and , but different probabilities are assumed for inclusion in the database, depending on whether an individual is hospitalized on day , dies, or both. relative cfrs on the rr, or, and rd scales for hospitalization on day are calculated for each hypothetical example. the hypothetical data in these tables show that selection bias in such a circumstance may be either positive or negative on each of the three scales, depending on the specific probabilities of selection in each of the four states. table shows an example of negative bias on the rr, or, and rd scales (overestimating the protective effect of hospitalization on day expressed as a lower value of each relative risk measure). table shows an example of a positive bias on the rr and rd scales and a negative bias on the or scale. table shows an example of positive bias (underestimating the protective effect of hospitalization on day expressed as a higher value of each measure) on all three scales. from experience, it seems that when databases are assembled in this way, it is rarely possible to tell why an individual case has come into the database. in the absence of such information, it is difficult to imagine how adjustments could be performed. however, sensitivity analyses could be performed to assess how strong such biases are likely to be [ ] . we have stated already that survivorship bias can be avoided by limiting analyses of the intervention to those who remain alive on a certain day after becoming a case. one strategy that would help to resolve the other two sources of bias is to limit analysis to a cohort of cases who were identified before they became cases; for example those who were identified as healthy contacts of known cases, and were followed prospectively. confounding occurs because individual factors like severity of infection or place of residence (which could affect both the probability of exposure-receiving the intervention-and the probability of the outcome-mortality) are not accounted for in the analysis through stratification, restriction, or adjustment. selection bias in this setting occurs because the exposure and the outcome both affect the probability of inclusion in the database. follow-up of a cohort of contacts ascertained before becoming cases could eliminate hospitalization and mortality as predictors of inclusion in the database, thus eliminating the form of selection bias we have discussed. it would provide an opportunity for subscript p represents the population values, while subscript d represents the values measured for those cases included in the data base; selection bias produces the discrepancy. the extent of selection bias may be measured as or s ¼ s s s s , where s ij is the probability a case with exposure (hospitalization at day ) i and outcome (mortality) j appears in the database. in this example, selection bias spuriously enhances the negative association between hospitalization on day and death, on all scales: rr, or, and rd. gathering data on severity and other predictors of exposure and outcome, which would facilitate control of confounding, though not guarantee to eliminate it. such a cohort would also provide a natural setting for analyses that avoid survivorship bias. the cost of such improvements in inference would be the need to ascertain such contacts and maintain surveillance of those individuals, following them to obtain data on relevant covariates. such a strategy-which has been followed in cases of exposed health-care workers in settings with high resources and few cases-would likely have benefits for the individuals followed (e.g., increasing the probability they receive care if infected) and for reducing transmission (if such individuals were promptly isolated upon evidence of infection). however, it has not been possible so far in the large ebola outbreaks in west africa to do this routinely. it is often of interest to predict the probability of mortality for an individual case of an infectious disease based on that individual's demographic and clinical data, without placing any causal interpretation on the factors used to predict outcome. for example, in , there was much interest in whether morbid obesity (or obesity in general) was predictive of worse outcome in infection with the novel pandemic strain of influenza a/h n [ ] .the primary goal was to improve estimates of clinical prognosis, although observations about prognosis could later be used to generate causal hypotheses for further testing. similarly, observations of disparate rates of severe outcomes by geography within new york city did not initially involve causal judgments about why certain areas had worse outcomes, although they could be used to guide enhancement of services in areas with worse outcomes [ ] . even for a well-understood disease like polio, it may be necessary to identify unusual demographic patterns of mortality in order to understand and respond effectively to an outbreak [ ] . prognostic exercises such as these cannot suffer from confounding bias because no causal interpretation is attached to the conclusions. they can, however, suffer from selection bias. returning to the ebola context, one might wish to know whether pregnant women infected with ebola are at greater risk of death from ebola infection than other cases [ ] , for example, in order to give them greater supportive care. if the probability of entering the database depends on whether an ebola patient is pregnant and on whether she ultimately dies of the infection, then the probability of death given pregnancy will likely differ in the database from the value in the population of direct interest for a clinical or public health decision maker. if the goal of analysis is to inform public health decision makers on the value of efforts to prevent infection in pregnant women, then the population-wide cfr among pregnant women is the value of direct interest. if, on the other hand, the goal of analysis is to inform health care providers at a treatment center to make a better clinical decision based on an accurate prognosis of the patient presenting to them, the quantity of direct interest is the probability of death among pregnant women in the population they encounter-those admitted to the treatment center. table . effect of selection bias on estimates of relative cfr on the risk ratio (rr) odds ratio (or) and risk difference (rd) scales. joint frequencies of hospitalization and death in the whole population among those alive at day of symptoms this value, again, will differ from that in the database, which may (in our running example) have been enriched for individuals entered in the database because they died of the infection. it will also differ from that in the overall population. the general point is that selection bias can be operative if the population on which analysis is performed is not a representative sample of the population for which the value of the cfr is sought, and selection bias of this form can lead to spurious conclusions in prognostic estimates as well as in causal ones. as in the case of causal inference, prognostic estimates will avoid selection bias to the extent they can be performed on a randomly chosen cohort of cases, identified via tracing of healthy contacts, for example. to determine the appropriate scope and magnitude of public health response to an infectious disease outbreak, it is important to estimate the cfr and the determinants of its variation [ , ] . for example, in the influenza pandemic, early point estimates of the cfr ranged over orders of magnitude, from a value below that of seasonal influenza, which would have justified a modest response, to values around %, approximately half that of the pandemic, which would have indicated the need for massive interventions to protect public health [ , [ ] [ ] [ ] ] . to a large degree, this variation reflected judgments that one or the other of the biases in table was more important, judgments that were difficult to make accurately and confidently on the rapid timescale required for decision making [ ] . in other situations, accurate assessment of the cfr is not as crucial for decisions about the scale of response required; for example, in the ongoing ebola epidemic in west africa, the uncertainty about the cfr is limited to a range between high values and very high values, and it is not clear that any greater response would be indicated by a % cfr than a % cfr [ ] . either way, a rapid and massive response is warranted. even when the overall cfr is not a key input to decision making, there is obvious value to inferences about which conditions lead to a lower cfr, whether these be specific treatment, particular types of supportive care, or hospitalization in general. moreover, treatment facilities might be evaluated by the proportion of their patients who survive; here the relative cfr calculated would be for treatment in one facility versus treatment in another. there will be a temptation to conclude that treatment facilities with higher cfr are doing a worse job-that is, to apply a causal interpretation to observed differences in the cfr. even in settings with more resources to measure covariates, methods of risk-adjustment of comparative outcomes to account for the mix of patients seen are complex and controversial [ ] . in an emergency setting, with few covariates available to characterize the "case mix" of a health care provider, causal interpretation of differences in cfr would be particularly prone to error, potentially producing conclusions that mislead and thereby damage control efforts. for instance, if through confounding, larger referral treatment centers primarily receive patients who have survived infection for some time and are therefore less likely to die, independently of treatment, this may be erroneously interpreted as more effective treatment in these centers. similarly, if certain treatment centers preferentially admit the most symptomatic patients, they may falsely appear to be less effective or even harmful to patient outcome. with at least five separate sources of bias in cfr or relative cfr estimates, and only imperfect solutions typically available for most due to lack of data, separating causal from non-causal factors in relative cfr estimation seems extremely risky. this is not to deny that data should be gathered or analyzed; on the contrary, the biases here suggest that more thorough data gathering is necessary before analyses of such quantities as relative cfr are relied upon for any decision. there has been much debate, particularly in the area of ebola treatments, about whether randomized studies comparing a treatment to a placebo are ethical [ , ] . whatever one's view on this debate, it seems likely that some observational (non-randomized) studies of the effectiveness of particular therapies, or the comparative effectiveness of two or more therapeutic approaches will occur, whether for ethical reasons, logistical reasons, or both. such studies-in which a key endpoint will be mortality-will be vulnerable to the sorts of biases described in this article, particularly in cases in which the true effect size of the treatment is limited. the biases described here should be kept in mind when evaluating the conclusions of such studies, and wherever possible, studies should be designed to minimize them. small studies conducted using systematic approaches to enrollment and follow-up of patients may be more precise and less biased than studies with larger sample sizes that use databases collected for other reasons. similarly, there may be situations in which efforts are made to administer scarce therapeutic agents to those most likely to benefit from them. such efforts rely on estimates, formal or informal, of the prognosis of patients with and without the treatment, depending on variables such as the time since they became symptomatic. these estimates, too, may be affected by the biases discussed. in the current ebola outbreak in west africa, such data gathering has not routinely occurred, for a number of reasons, including lack of health system infrastructure [ ] and prioritization of crisis response and other directly lifesaving activities. in future outbreaks of other diseases, as in the past with pandemic influenza, setting up systematic approaches to gather data useful for such assessments should be a priority [ , ] . meanwhile, emphasis on recording for each patient in a database the time, place, and circumstances (e.g., hospital, clinic, funeral, contact tracing) under which the information is being gathered can substantially improve our ability to account for biases induced by a database with unplanned entry criteria. to reduce the impact of the biases identified on causal and (where applicable) prognostic inference, it appears desirable when possible to limit analysis to a subset of cases who have been followed prospectively since they became cases. these individuals might most likely be identified by forward contact tracing, in which cases are asked to name healthy individuals with whom they have had contact, and those individuals are followed to identify further infections. it has previously been noted that cases identified by contact tracing are more representative of cases in the general infected population than those identified because of symptoms, medical need, or death [ , ] . use of such a sample does not guarantee to eliminate biases, as there may be residual confounding not adequately controlled in the analysis or subtler forms of selection bias (e.g., differential loss to follow up within the sample) [ ] , but should significantly reduce them. we have emphasized the relevance of several biases to interpretation of datasets gathered in an emergency, such as the early phases of an emerging infection. while the downward bias in estimation of the cfr due to delayed reporting of deaths is most acute in rapidly growing epidemics, the other biases described may apply regardless of the overall trajectory of an epidemic, and thus may apply to endemic diseases as well as emerging ones. nonetheless, due to the sense of urgency to gather data and scale-up a response simultaneously, datasets assembled during infectious disease outbreak or emergency settings are especially prone to include unplanned mixes of cases who enter the dataset for various reasons. biases of the sorts described here should be systematically considered whenever one attempts to extract causal inferences from such observational data, and alternative, more systematic data collection should be considered when possible. • datasets available at the onset of new epidemics of infectious diseases are often collected for reasons other than epidemiologic analysis of absolute and comparative casefatality risks (cfr), and estimates of such quantities based on these data may be subject to biases, the relative magnitudes of which are difficult to ascertain and vary by situation. • major sources of bias affecting the estimation of absolute cfr are differences in severity between all cases and the subset of cases who enter the dataset, typically leading to inflated estimates of cfr, and more rapid reporting (less delay) in reporting cases than in reporting the deaths of those cases, typically leading to underestimates of cfr. • biases affecting the causal interpretation of relative cfr (causal attribution of different cfr in different groups to a particular intervention in one group, e.g., hospitalization) may arise from survivorship bias, in which individuals who survive longer may be more likely to receive the intervention; from confounding, in which a common factor (e.g., disease severity) affects the probability of both the intervention and mortality; and from selection bias, in which individuals are more or less likely to enter the dataset as a function of whether they receive the intervention and whether they have the outcome. • these biases may be severe enough to lead to qualitatively mistaken inferences about the severity of the infection or about the impact of interventions (such as hospitalization) on mortality, and may be particularly misleading when comparing, for example, the effect of hospitalization at different centers, given that cases hospitalized at different centers may enter the dataset for different reasons. • methods exist to identify and reduce these biases. in particular, the use of small but carefully defined cohorts of individuals who are followed from the time of infection or symptom onset (perhaps those identified via contact tracing) may ameliorate many of these biases. epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in hong kong basic methods for sensitivity analysis of biases assessing the severity of the novel influenza a/h n pandemic a structural approach to selection bias h n surveillance group. improving the evidence base for decision making during a pandemic: the example of influenza a/h n studies needed to address public health challenges of the h n influenza pandemic: insights from modeling interim pre-pandemic planning guidance: community strategy for pandemic influenza mitigation in the united states-early targeted layered use of nonpharmaceutical interventions. department of health and human services randomised controlled trials for ebola: practical and ethical issues middle east respiratory syndrome coronavirus: quantification of the extent of the epidemic, surveillance biases, and transmissibility improving the evidence base for decision making during a pandemic: the example of influenza a/h n human infection with avian influenza a h n virus: an assessment of clinical severity seroevidence for h n influenza infections in humans: metaanalysis comment on "seroevidence for h n influenza infections in humans: meta-analysis epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in hong kong the severity of pandemic h n influenza in the united states changes in severity of pandemic a/h n influenza in england: a bayesian evidence synthesis pandemic influenza a(h n )v in new zealand: the experience from under-reporting and case fatality estimates for emerging epidemics notes from the field: outbreak of pandemic influenza a (h n ) virus at a large public university in delaware pandemic potential of a strain of influenza a (h n ): early findings use of cumulative incidence of novel influenza a/h n in foreign travelers to estimate lower bounds on cumulative incidence in mexico optimizing the precision of case fatality ratio estimates under the surveillance pyramid approach estimating infection attack rates and severity in real time during an influenza pandemic: analysis of serial cross-sectional serologic surveillance data the infection attack rate and severity of pandemic h n influenza in hong kong transmission scenarios for middle east respiratory syndrome coronavirus (mers-cov) and how to tell them apart assessing the severity of the novel influenza a/h n pandemic ebola virus disease in west africa-the first months of the epidemic and forward projections case fatality rate for ebola virus disease in west africa methods for estimating the case fatality ratio for a novel, emerging infectious disease non-parametric estimation of the case fatality ratio with competing risks data: an application to severe acute respiratory syndrome (sars) managing and reducing uncertainty in an emerging influenza pandemic assessment and control for confounding by indication in observational studies effectiveness of neuraminidase inhibitors in reducing mortality in patients admitted to hospital with influenza a h n pdm virus infection: a meta-analysis of individual participant data effectiveness of antiviral treatment in human influenza a(h n ) infections: analysis of a global patient registry strengthening observational evidence for antiviral effectiveness in influenza a (h n ) determinants of antiviral effectiveness in influenza virus a subtype h n a structural approach to selection bias basic methods for sensitivity analysis of biases risk factors for severe outcomes following influenza a (h n ) infection: a global pooled analysis pandemic (h n ) surveillance for severe illness and response investigation of elevated case-fatality rate in poliomyelitis outbreak in pointe noire ebola hemorrhagic fever and pregnancy use of risk adjustment in setting budgets and measuring performance in primary care ii: advantages, disadvantages, and practicalities evaluating novel therapies during the ebola epidemic ebola: the teaching and learning moment household transmission of pandemic influenza a (h n ) virus in the united states we thank lina nerlander for helpful suggestions on an earlier draft. key: cord- -n i authors: ioannidis, john p. a. title: coronavirus disease : the harms of exaggerated information and non‐evidence‐based measures date: - - journal: eur j clin invest doi: . /eci. sha: doc_id: cord_uid: n i the evolving coronavirus disease (covid- ) epidemic is certainly cause for concern. proper communication and optimal decision-making is an ongoing challenge, as data evolve. the challenge is compounded, however, by exaggerated information. this can lead to inappropriate actions. it is important to differentiate promptly the true epidemic from an epidemic of false claims and potentially harmful actions. the evolving coronavirus disease (covid- ) pandemic is certainly cause for concern. proper communication and optimal decision-making are an ongoing challenge, as data evolve. the challenge is compounded, however, by exaggerated information. this can lead to inappropriate actions. it is important to differentiate promptly the true epidemic from an epidemic of false claims and potentially harmful actions. based on altmetric scores, the most discussed and most visible scientific paper across all + million papers published in the last years across all science is a preprint claiming that the new coronavirus' spike protein bears "uncanny similarity" with hiv- proteins. the altmetric score of this work has reached an astronomical level of points as of march . the paper was rapidly criticized as highly flawed, and the authors withdrew it within days. regardless, major harm was already done. the preprint fuelled conspiracy theories of scientists manufacturing dangerous viruses and offered ammunition to vaccine deniers. refutation will probably not stop dispersion of weird inferences. the first report documenting transmission by an asymptomatic individual was published in the new england journal of medicine on january . however, the specific patient did have symptoms, but researchers had not asked. understanding the chances of transmission during the asymptomatic phase has major implications for what protective measures might work. lancet published on february an account from two chinese nurses of their front-line experience fighting coronavirus. the authors soon retracted the paper admitting it was not a first-hand account. these examples show how sensationalism affects even top scientific venues. moreover, peer review may malfunction when there is little evidence and strong opinions. opinionbased peer review may even solidify a literature of spurious statements. as outlined below, for the main features of the epidemic and the response to it, circulating estimates are often exaggerated, even when they come from otherwise excellent scientists. an early speculation that %- % of the global population will be infected went viral. early estimates of the basic reproduction number (how many people get infected by each infected person) have varied widely, from . to . . these estimates translate into manyfold difference in the proportion of the population eventually infected and dramatically different expectations on what containment measures (or even any future vaccine) can achieve. the fact that containment measures do seem to work, means that the basic reproduction number is probably in the lower bound of the . - . range, and can decrease below with proper measures. the originator of the " %- % of the population" estimate tweeted on march a revised estimate of " %- % of adults," but this is probably still substantially exaggerated. even after the %- % quote was revised downward, it still remained quoted in viral interviews. early reported cfr figures also seem exaggerated. the most widely quoted cfr has been . %, reported by who dividing the number of deaths by documented cases in early march. this ignores undetected infections and the strong age dependence of cfr. the most complete data come from diamond princess passengers, with cfr = % observed in an elderly cohort; thus, cfr may be much lower than % in the general population, probably higher than seasonal flu (cfr = . %), but not much so. observed crude cfr in south korea and in germany, countries is about . %. some deaths of infected, seriously ill people will occur later, and these deaths have not been counted yet. however, even in these countries many infections probably remain undiagnosed. therefore, cfr (or, more properly called, infection fatality rate, counting as cases all infected individuals) may be even lower rather than higher than these crude estimates. at face value, the epidemic curve of new cases outside china since late february is compatible with exponential community spread. however, reading this curve is very difficult. part of the growth of documented cases could reflect rapid increases in numbers of coronavirus tests performed. the number of tests done depends on how many test-kits are available and how many patients seek testing. even if bottlenecks in test availability are eventually removed, the epidemic curve may still reflect primarily population sensitization and willingness for testing rather than true epidemic growth. china data are more compatible with close contact rather than wide community spread being the main mode of transmission. under alarming circumstances, extreme measures of unknown effectiveness are adopted. china initially responded sluggishly, but subsequently locked down entire cities. school closures, cancellation of social events, air travel curtailment and restrictions, entry control measures and border closure are applied by various countries. italy adopted country-level lockdown on march , and many countries have been following suite. evidence is lacking for the most aggressive measures. a systematic review on measures to prevent the spread of respiratory viruses found insufficient evidence for entry port screening and social distancing in reducing epidemic spreading. plain hygienic measures have the strongest evidence. , frequent hand washing and staying at home and avoiding contacts when sick are probably very useful. their routine endorsement may save many lives. most lives saved may actually be due to reduced transmission of influenza rather than coronavirus. most evidence on protective measures comes from nonrandomized studies prone to bias. a systematic review of personal protective measures in reducing pandemic influenza risk found only two randomized trials, one on hand sanitizer and another on facemasks and hand hygiene in household members of people infected with influenza. given the uncertainties, one may opt for abundant caution and implement the most severe containment measures. by this perspective, no opportunity should be missed to gain any benefit, even in the absence of evidence or even with mostly negative evidence. this reasoning ignores possible harms. impulsive actions can indeed cause major harm. one clear example is the panic shopping which depleted supplies of face masks, escalation of prices and a shortage for medical personnel. masks, gloves and gowns are clearly needed for medical personnel, and their lack poses healthcare workers' lives at risk. conversely, they are meaningless for the uninfected general population. however, a prominent virologist's comment that people should stock surgical masks and wear them around the clock to avoid touching their nose went viral. policymakers feel pressure from opponents who lambast inaction. also, adoption of measures in one institution, jurisdiction or country creates pressure for taking similar measures elsewhere under fear of being accused of negligence. moreover, many countries pass legislation that allocates major resources and funding to the coronavirus response. this is justified, but the exact allocation priorities can become irrational. for example, undoubtedly research on coronavirus vaccines and potential treatments must be accelerated. however, if only part of resources mobilized to implement extreme measures for covid- had been invested towards enhancing influenza vaccination uptake, tens of thousands of influenza deaths might have been averted. only %- % of the population in china is vaccinated against influenza. even in the united states, despite improvements over time, most adults remain unvaccinated every year. as another example, enhanced detection of infections and lower hospitalization thresholds may increase demands for hospital beds. for patients without severe symptoms, hospitalizations offer no benefit and may only infect health workers causing shortage of much-needed personnel. even for severe cases, effectiveness of intensive supportive care is unknown. excess admissions may strain health care systems and increase mortality from other serious diseases where hospital care is clearly effective. an argument in favour of lockdowns is that postponing the epidemic wave ("flattening the curve") gains time to develop vaccines and reduces strain on the health system. however, vaccines take many months (or years) to develop and test properly. maintaining lockdowns for many months may have even worse consequences than an epidemic wave that runs an acute course. focusing on protecting susceptible individuals may be preferable to maintaining countrywide lockdowns longterm. the potential consequences on the global economy are already tangible. february - was the worst week for global markets since , and the worse may lie ahead. moreover, some political decisions may be confounded with alternative motives. lockdowns weaponized by suppressive regimes can create a precedent for easy adoption in the future. closure of borders may serve policies focused on limiting immigration. regardless, even in the strongest economies, disruption of social life, travel, work and school education may have major adverse consequences. the eventual cost of such disruption is notoriously difficult to project. a quote of $ . trillion is totally speculative. much depends on the duration of the anomaly. the global economy and society is already getting a major blow from an epidemic that otherwise (as of march ) accounts for . % of all million annual global deaths from all causes and that kills almost exclusively people with relatively low life expectancy. leading figures insist that the current situation is a once-ina-century pandemic. a corollary might be that any reaction to it, no matter how extreme, is justified. this year's coronavirus outbreak is clearly unprecedented in amount of attention received. media have capitalized on curiosity, uncertainty and horror. a google search with "coronavirus" yielded results on march and results on march . conversely, "influenza" attracted -to -fold less attention although this season it has caused so far more deaths globally than coronavirus. different coronaviruses actually infect millions of people every year, and they are common especially in the elderly and in hospitalized patients with respiratory illness in the winter. a serological analysis of cov e and oc in adult populations under surveillance for acute respiratory illness during the winters of - (healthy young adults, healthy elderly adults, high-risk adults with underlying cardiopulmonary disease and a hospitalized group) showed annual infection rates ranging from . % to % in prospective cohorts, and prevalence of . %- . % in the hospitalized cohort. case fatality of % has been described in outbreaks among nursing home elderly. leaving the well-known and highly lethal sars and mers coronaviruses aside, other coronaviruses probably have infected millions of people and have killed thousands. however, it is only this year that every single case and every single death gets red alert broadcasting in the news. some fear an analogy to the influenza pandemic that killed - million people. retrospective data from that • a highly flawed nonpeer-reviewed preprint claiming similarity with hiv- drew tremendous attention, and it was withdrawn, but conspiracy theories about the new virus became entrenched • even major peer-reviewed journals have already published wrong, sensationalist items • early estimates of the projected proportion of global population that will be infected seem markedly exaggerated • early estimates of case (infection) fatality rate may be markedly exaggerated • the proportion of undetected infections is unknown but probably varies across countries and may be very large overall • reported epidemic curves are largely affected by the change in availability of test kits and the willingness to test for the virus over time • of the multiple measures adopted, a few have strong evidence, and many may have obvious harms • panic shopping of masks and protective gear and excess hospital admissions may be highly detrimental to health systems without offering any concomitant benefit • extreme measures such as lockdowns may have major impact on social life and the economy (and those also lives lost), and estimates of this impact are entirely speculative • comparisons with and extrapolations from the influenza pandemic are precarious, if not outright misleading and harmful pandemic suggest that early adoption of social distancing measures was associated with lower peak death rates. however, these data are sparse, retrospective and pathogenspecific. moreover, total deaths were eventually little affected by early social distancing: people just died several weeks later. importantly, this year we are dealing with thousands, not tens of millions deaths. the box summarizes the problems with inaccurate and exaggerated information in the case of covid- . even if covid- is not a -recap in infection-related deaths, some coronavirus may match the pandemic in future seasons. thus, we should learn and be better prepared. questions about transmission, duration of immunity, effectiveness of different containment and mitigation methods, the role of children in viral spread, and assessment of the effectiveness of vaccines and drugs are essential to settle timely. this research agenda requires carefully collected, unbiased data to avoid unfounded inferences. larger-scale diagnostic testing should help get more unbiased estimates of cases, basic reproduction number and infection fatality rate. the research agenda also deserves proper experimental studies. besides candidate vaccines and drugs, randomized trials should evaluate also the real-world effectiveness of simple measures (eg face masks in different settings), least disruptive social distancing measures and healthcare management policies for documented cases. if covid- is indeed the pandemic of the century, we need the most accurate evidence to handle it. open data sharing of scientific information is a minimum requirement. this should include data on the number and demographics of tested individuals per day in each country and the demographics and background diseases of patients requiring hospital care and intensive care and those who die. proper prevalence studies and trials are also indispensable. if covid- is not as grave as it is depicted, high evidence standards are equally relevant. exaggeration and overreaction may seriously damage the reputation of science, public health, media and policymakers. it may foster disbelief that will jeopardize the prospects of an appropriately strong response if and when a more major pandemic strikes in the future. characteristics of and important lessons from the coronavirus disease (covid- ) outbreak in china: summary of a report of cases from the chinese center for disease control and prevention uncanny similarity of unique inserts in the -ncov spike protein to hiv- gp and gag study claiming new coronavirus can be transmitted by people without symptoms was flawed how many people might one person with coronavirus infect? an updated estimation of the risk of transmission of the novel coronavirus ( -ncov) coronavirus may infect up to % of world's population, expert warns experts: rapid testing helps explain few german virus deaths early containment strategies and core measures for prevention and control of novel coronavirus pneumonia in china physical interventions to interrupt or reduce the spread of respiratory viruses effectiveness of personal protective measures in reducing pandemic influenza transmission: a systematic review and meta-analysis coronavirus could cost the global economy $ . trillion. here's how responding to covid- -a once-in-a-century pandemic? global mortality associated with seasonal influenza epidemics: new burden estimates and predictors from the glamor project clinical impact of human coronaviruses e and oc infection in diverse adult populations an outbreak of human coronavirus oc infection and serological cross-reactivity with sars coronavirus transmissibility of pandemic influenza public health interventions and epidemic intensity during the influenza pandemic key: cord- -muetf l authors: okpokoro, e.; igbinomwanhia, v.; jedy-agba, e.; kayode, g.; onyemata, e.; abimiku, a. title: ecologic correlation between underlying population level morbidities and covid- case fatality rate among countries infected with sars-cov- date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: muetf l background: the ongoing coronavirus disease (covid- ) pandemic is unprecedented in scope. high income countries (hic) seemingly account for the majority of the mortalities considering that these countries have screened more persons. low middle income countries (lmic) countries may experience far worse mortalities considering the existence of a weaker health care system and the several underlying population level morbidities. as a result, it becomes imperative to understand the ecological correlation between critical underlying population level morbidities and covid- case fatality rates (cfr). method: this is an ecological study using data on covid- cases, prevalence of copd, prevalence of tobacco use, adult hiv prevalence, quality of air and life expectancy. we plotted a histogram, performed the shapiro-wilk normality test and used spearman correlation to assess the degree of correlation between covid- case fatality rate (cfr) and other covariates mentioned above. result: as at the st of march , there were a total of , cases of covid- from countries and a global case fatality rate of % (range % to %). angola and sudan both had the highest cfr of %, while italy had the highest number of deaths (i.e. , ) as at st of march . adult hiv prevalence has a significant but weak negative correlation with cfr (correlation coefficient = - . , p value = . ) while all the other variables have positive correlation with cfr due to covid- though not statistically significant. of the countries analyzed, only countries (i.e. %) had complete datasets across all population level morbidities (i.e. prevalence of copd, prevalence of tobacco use, life expectancy, quality of air, and adult hiv prevalence variables). correlations of cfr from these countries were similar to that from the countries except for the correlation with quality of air and prevalence of tobacco use. conclusion: while we interpret our data with caution given the fact that this is an ecological study, our findings suggest that population level factors such as prevalence of copd, prevalence of tobacco use, life expectancy and quality of air are positively correlated with cfr from covid- but, adult hiv prevalence has a weak and negative correlation with covid- cfr and would require extensive research. adult hiv prevalence has a weak and negative correlation with covid- cfr and would require extensive research. . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint | p a g e the ongoing corona virus disease (covid- ) pandemic is unprecedented in scope. as at the st march , , persons from counties have been infected. of these numbers, , were seriously or critical ill and , had died. high-income (hic) countries seemingly account for the majority of the mortalities worldwide. incidentally these countries have screened a larger proportion of their populations' for covid- . despite the presence of efficient health care systems in these countries, they have been unable curb the persistent rise in mortalities. this scenario suggests that should there be an exponential rise in covid- cases among low and middle income countries (lmics), these countries may experience far worse mortalities considering the existence of a weaker health care system and the several underlying population level morbidities. as a result, it becomes imperative to understand the ecological relationship between critical underlying population level morbidities globally and mortalities associated with sar-cov- infection. according to anecdotal evidence, most mortalities following covid- occurs among the aging populations but to our knowledge, there has been no documented evidence to confirm this. in addition, considering the high burden of hiv in lmics particularly in sub-saharan africa (ssa), there are serious concerns of the impact of hiv on mortalities associated with covid- . covid- is caused by an rna virus (i.e. sar-cov- ) which affects mainly the respiratory system, therefore, countries with a high burden of underlying respiratory diseases such as chronic obstructive pulmonary disease (copd) may experience higher mortalities compared to those with lower prevalence of respiratory diseases. factors such as cigarette smoking and quality of air may impact on the respiratory system leading to higher mortalities due to covid- in such countries. generating quick preliminary evidence through an ecologic study could provide vital insight at the population level, to understanding factors correlated with covid- case fatality rates especially as the incidence rapidly increases in low and middle income countries. this is an ecological study using data from several open sources on the internet and from systematic reviews. we obtained data on covid- cases and mortality from the worldometers as at st march and life expectancy data of from online sources . we also obtained data on prevalence of copd, adult hiv, tobacco use and quality of air from other online sources and journals [ ] [ ] [ ] [ ] . we plotted a histogram to assess the distribution of the data as well as performed the shapiro-wilk normality test. thereafter, we provided appropriate summary statistics. we also performed two-way scatterplots involving key variable . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint and thereafter used a spearman correlation test to assess the degree of correlation between covid- case fatalities rates and other covariates (table ) . in our study, we define case fatality rate (cfr) as the number of deaths due to covid divided by the number of confirmed cases officially reported. as shown in table , a total of countries were identified with cases of covid- as at st march . the mean number of deaths per day was , while the mean number of new cases was , . the global case fatality rate was % while it ranges from % to %. angola and sudan had the highest cfr while italy had the highest number of deaths (i.e. , ). of the countries infected with covid- , ( %) had mortality rates greater than % (i.e. above the global average). . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . - . . weak and negative correlation *. -. "very weak", . -. "weak", . -. "moderate", . -. "strong", . - . "very strong" . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . of the countries, only countries (i.e. %) had complete datasets covering all population level morbidities (i.e. copd prevalence, tobacco use prevalence, life expectancy, quality of air and adult hiv prevalence variables). our findings from analyzing dataset of these countries (table ) were similar with the countries (table ) except for the correlation between quality of air, tobacco use and cfr due to covid- . . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . scatterplot between cfr and prevalence of copd among countries . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . this ecological study has demonstrated that a substantial amount of the variation observed in case fatality rate associated with covid- across countries as at st march could be explained by five population level morbidities such as; prevalence of copd, prevalence of tobacco use, adult hiv prevalence, quality of air and life expectancy. our findings suggest that cfr was highest in angola and sudan as at st march . we found varying levels of positive correlations between copd prevalence, life expectancy and quality of air with cfr in covid- disease. our findings suggest that adult hiv prevalence is seemingly protective. however, we must interpret this finding with caution given the weak negative correlation with cfr in covid (correlation coefficient= - . , p= . ). this negative relationship was maintained even when we had a closer analysis involving a subset of only out of countries which had complete dataset across the variable (see fig ) . thus countries with higher prevalence of hiv may experience lower mortalities compared with those with lower hiv prevalence. this was a rather surprising finding as we expected that populations with higher adult hiv prevalence would have more immunocompromised persons and therefore have higher covid- associated cfr. in accordance with our findings, some experts have suggested the possibility of using antiretroviral therapies such as lopinavir/ritonavir against covid and clinical trials involving antiviral medications are currently ongoing . however, more extensive research studies are needed in this area in order to confirm or refute this finding. in one clinical trial involving severely ill covid- patients, there was no significant effect of lopinavir-ritonavir treatment in terms of reducing mortality, increasing clinical improvement and decreasing throat viral rna detectability among these patients. thus, considering the combination of lopinavir-ritonavir with other antiviral agents to boost its effects might be an option in future studies we found life expectancy to have a very weak but positive correlation with cfr (correlation coefficient . ; p= . ). however, this correlation was strongly positive when we reviewed a subset of countries (see fig ) . thus countries with higher life expectancy were likely to experience a higher cfr when compared with countries with lower life expectancy. similar demographics or findings have been informally reported by researchers in other countries. in south africa, out ( %) mortality occurred among those years and over while the average age was years . similarly, in italy, about . % of mortality were among those over years and while in spain, a rate of % was reported among in those over years of age , . . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint considering the fact that covid affects the respiratory system cases, we found a weak but positive correlation between the prevalence of copd and cfr due to covid- (correlation coefficient = . , p value = . ). this correlation was stronger when we reviewed a subset of only countries with complete data out (see fig ) . this relationship may be due to the underlying relationship between copd and older populations . thus, countries with older populations or higher life expectancy would likely have a higher prevalence of copd which in turn could lead to higher cfr due to covid- . our findings on the relationship between cfr due to covid- and prevalence of tobacco use and/or quality of air, showed that both factors were very weakly and positively correlated to cfr (correlation coefficient= . and . respectively). however, this finding was not statistically significant (pvalue = . and . respectively). thus increased tobacco consumption and higher ranking on quality of air (higher ranking meaning higher air pollution) is correlated with higher cfr due to covid- . previous studies have reported the link between respiratory viral infection and air pollutions. according to these studies, respiratory viral infection increased with air pollution . also a recent study by researchers at harvard university reports a higher cfr due to covid- among counties within the us with higher air pollution . surprisingly, when we reviewed data of a subset of only out of countries with complete datasets, there was a reversal from positive to negative correlation between cfr and the prevalence of tobacco use and/or quality of air. we noticed a negative and strong correlation between cfr due to covid- (correlation coefficient = - . and - . respectively), though not statistically significant. this may be due to the small sample size but it is worthy of future research. countries in lmics are increasing the number of persons being tested for sars-cov- infection through innovative new technologies for sars-cov testing. despite the rise in tests being conducted, it is likely that these countries may not experience the high mortality due to covid as is currently being reported from hic countries due to many reasons. firstly, owing to the few tests currently being done, the number likely to die from a confirmed diagnosis of covid- would be underestimated. secondly, the earlier awareness and lead time in preparations by lmics prior to the advent of covid- might reduce the spread and impact. more so, findings from our study suggest that a higher prevalence of adult hiv as experienced in lmic might be protective against higher cfr due to covid- . lastly, and hopefully, ongoing clinical trials might be successful before lmic begin to experience such a comparable high mortality. an important limitation of this study is the ecological nature of the study, hence, findings must be interpreted with caution and should not be directly extrapolated to individual levels. secondly, covid- . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint is a rapidly evolving disease and total numbers of confirmed cases and mortality figures are constantly changing. thirdly, the datasets used for our analysis were publicly available on the internet and may not be accurate as officially published estimates by the respective countries included in our study. however, our study presents data from countries worldwide on , confirmed cases of covid- . additionally, this study also provides useful information on population level factors that may contribute to the noticeable differences in mortality rates worldwide, which is an issue of significant interest to epidemiologists, clinicians, and public health policy makers worldwide. our findings suggest that population level factors such as copd prevalence, prevalence of tobacco use, life expectancy and quality of air are positively correlated with cfr from covid- . while adult hiv prevalence has a weak and negative correlation with covid- cfr. extensive research is required to investigate these population level factors at the individual level in order to provide information that can be generalizable. . cc-by-nc . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . . . doi: medrxiv preprint covid- coronavirus pandemic list of countries by hiv/aids adult prevalence rate. text-indent:- . pt world most polluted countries global burden of copd: systematic review and meta-analysis eacs & bhiva statement on risk of covid- for people living with hiv (plwh) eacs a trial of lopinavir-ritonavir in adults hospitalized with severe covid- average age of sa's coronavirus victims is -here's what we know so far statista. mortality rate of coronavirus defining chronic obstructive pulmonary disease in an aging population severity and mortality associated with copd and smoking in patients with covid- : a rapid systematic review and meta-analysis. medrxiv preprint the short-term effects of air pollutants on influenza-like illness in jinan, china air pollution linked with higher covid- death rates key: cord- -kx wvw c authors: goh, h. p.; mahari, w. i.; ahad, n. i.; chaw, l.; kifli, n.; goh, b. h.; yeoh, s. f.; ming, l. c. title: risk factors affecting covid- case fatality rate: a quantitative analysis of top affected countries date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: kx wvw c background: latest clinical data on treatment on coronavirus disease (covid- ) indicated that older patients and those with underlying history of smoking, hypertension or diabetes mellitus might have poorer prognosis of recovery from covid- . we aimed to examine the relationship of various prevailing population-based risk factors in comparison with mortality rate and case fatality rate (cfr) of covid- . methods: demography and epidemiology data which have been identified as verified or postulated risk factors for mortality of adult inpatients with covid- were used. the number of confirmed cases and the number of deaths until april , for all affected countries were extracted from johns hopkins university covid- websites. datasets for indicators that are fitting with the factors of covid- mortality were extracted from the world bank database. out of about affected countries, only top countries were selected to be analyzed in this study. the following seven variables were included in the analysis, based on data availability and completeness: ) proportion of people aged above, ) proportion of male in the population, ) diabetes prevalence, ) smoking prevalence, ) current health expenditure, ) number of hospital beds and ) number of nurses and midwives. quantitative analysis was carried out to determine the correlation between cfr and the aforementioned risk factors. results: united states shows about . % of confirmed cases in its country and it has about . % of cfr. luxembourg shows the highest percentage of confirmed cases of . % but a low . % of cfr, showing that a high percentage of confirmed cases does not necessarily lead to high cfr. there is a significant correlation between cfr, people aged and above (p = . ) and diabetes prevalence (p = . ). however, in our study, there is no significant correlation between cfr of covid- , male gender (p = . ) and smoking prevalence (p = . ). conclusion: older people above years old and diabetic patients are significant risk factors for covid- . nevertheless, gender differences and smoking prevalence failed to prove a significant relationship with covid- mortality rate and cfr. keywords: coronavirus, covid- , risk, epidemiology, fatality, age, diabetes deceased patients ( ). another risk factor for covid- mortality is in patients with existing comorbidities. a study by guan et al. shows that covid- are more commonly seen in patients with hypertension, diabetes, cardiovascular disease and a history of smoking ( ). not only were these patients susceptible to the disease, they also had a higher chance of obtaining poor health outcomes after immediate care unit (icu) admission and may lead to death ( ). moreover, a study on the correlation between covid- mortality and bcg vaccination suggested that early bcg vaccination could help to decrease the mortality rate ( ). other than that, malaria prevalence is also another risk factor of covid- mortality. according to the research conducted by spencer, there is a higher number of covid- cases reported in countries with low malaria prevalence than countries that had higher malaria prevalence ( ) . apart from addressing risk factors, there are also parameters that may affect the covid- mortality rate such as shortage of staff, lack of medical supply or equipment, insufficient hospital beds and the country's health expenditure. as of end of april , sars-cov- virus has resulted in more than . million infections and over , deaths globally ( ). as covid- has become a global pandemic issue, implementation of suitable interventions will be needed for the public, healthcare professionals and patients and to ensure all sectors to work together cohesively and efficiently. even though covid- origins from coronavirus, the sars-cov- has very different severity and contagion characteristics and much still needs to be learned about it. thus, it is imperative to evaluate the relationship of postulated or verified risk factors with covid- mortality. it is absolute crucial to evaluate the risk factors of mortality among patients infected with covid- at population level. by validating the relationship, patients with covid- can be treated more aggressively than those without the risk factor. the findings of the current study provide a clinical picture of . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / the following seven variables were included in the analysis, based on data availability and completeness: ) proportion of people aged above, ) proportion of male in the population, ) diabetes prevalence, ) smoking prevalence, ) current health expenditure, ) number of hospital beds and ) number of nurses and midwives. data analysis for each country, the percentage of confirmed covid- case per country was calculated by dividing the number of confirmed covid- cases by the total population for each country. also, cfr was calculated by dividing the number of deaths related to covid- by the confirmed covid- cases. bar graphs are plotted to illustrate both measures. regression analysis was conducted to determine the risk factors of cfr for covid- . for this analysis, few variables (cfr and number of hospital beds) were standardized due to differences in scale and very large range. standardization was done by subtracting each value by the mean and then dividing it with the standard deviation. also, some variables (diabetes prevalence, current health expenditure, and number of nurses and midwives) were divided into four equal categories (i.e. in quartiles). all analyses were conducted using microsoft excel and r (ver. . . ). a p-value < . was considered as statistically significant. cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . the proportion of people aged and above has a significant association with cfr (p = . , table ). the β coefficient of . tells us that for every -unit increase in the proportion of people cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / there are still a lot of unknown regarding the disease covid- . there is a steep learning curve about the virus, and this could take a couple of years to work out. however, we are not completely in the dark when it comes to risk factors. studies have shown that age is a clear risk factor for severe covid- disease and thus, resulting in death. this has been confirmed by our study where the proportion of people aged and above has shown a significant correlation with cfr. this indicates that countries with a higher proportion of people aged and above may result in higher covid- mortality rate. bhatraju et al. ( ) has shown that in seattle, the us reported more than % of covid- deaths in patients aged years and above than those who are younger than years old ( ). verity et al. ( ) has shown that the case fatality rate for those under age was . % while the fatality rate increases drastically to . % for those people aged over years old ( ). this shows that the older the population, the higher the fatality rate. for those years old and over, covid- appears to have a . % fatality rate ( ). furthermore, deceased patients were found to be at an average age of years old while recovered patients to be at an average age of years old ( ). these studies show that covid- disproportionately impacts certain groups, and older people is one of the vulnerable groups. there is no one reason to this; it is believed that immune system declines with age. an increase of deficiency in t-cell and b-cell function and overproduction of type cytokines as age increases ( ). this may increase the viral replication and extend the duration of pro- inflammatory responses leading to poor health results ( ). older people tend to have more underlying conditions that may also be risk factors for severe ) . even though there is no significance shown between covid- cfr and the male gender, it is important to note that differences in gender may play a role in severity of covid- . there are studies that have shown covid- affecting more males than females ( - ). this could be due to males having more underlying health risk factors than the female population or the fact that males tend to engage in more risky health-relatable behaviours, such as greater rates of smoking and drinking alcohol ( ). genetics and differences in immune response can be explanations to this phenomenon too. studies have shown that many of the severe covid- patients also have underlying medical conditions, such as diabetes and cardiovascular diseases ( , ) . our study has confirmed that there is indeed a certain association between diabetes prevalence and cfr. however, it is important to note that according to our study, diabetes prevalence may be an "unreliable" variable as it was shown that countries with high diabetes prevalence have lower covid- cfr than countries with low diabetes prevalence. further investigation is needed to define the actual association between diabetes prevalence and covid- . although smoking prevalence has shown no significant association with covid- , it cannot be assumed that there is no correlation between other co-morbidities and covid- cfr since not all factors were considered in this study, such as hypertension and cardiovascular diseases ( ). patients with existing comorbidities , including hypertension, diabetes, cardiovascular disease and history of smoking, seems to be associated with covid- more severely ( ). with reference to a retrospective study of deceased patients from covid- , % of the patients had chronic hypertension and % of them had cardiovascular diseases ( ). in addition to that, covid- patients who have hypertension were closely associated with poor health outcomes after hospital admission. this may be due to factors such as vascular aging, reduced renal function and medication interactions ( ). . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / it is important to note that everyone is responsible in controlling this covid- pandemic, there are a number of limitations in this study that need to be acknowledged. firstly, some factors had to be excluded due to incomplete data such as malaria prevalence and bcg vaccination. secondly, the years from which the data was collected were not consistent for all indicators. thirdly, the data collected were not from the same year for one indicator such as the number of hospital beds. lastly, some required data were unavailable to sufficiently make an overall conclusion for some of the factors, including comorbidities. there were other proposed comorbidities to be analyzed but only two indicators' datasets were available in world bank data, which are diabetes and smoking prevalence. therefore, more research should be conducted to . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / further understand the relationship between comorbidities and cfr. this would help to identify and to better understand other possible factors that may also affect cfr. as covid- is such a new disease, much still needs to be learned about it. age is a clear risk factor for severe covid- and death. covid- is an illness that disproportionately impacts older people. however, aforementioned risk factors should not be neglected as they may play essential roles in flattening the curve and reducing healthcare burden. prediction alone is not efficient, but well-planned and suitable interventions should also be carried out. in addition to that, potential risk factors need a lot more research in order to understand the risks for the worst forms . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . https://doi.org/ . / . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . . https://doi.org/ . / world health organization. novel coronavirus ( -ncov) situation report - some covid- vs. malaria numbers: countries with malaria have virtually no novel coronavirus (covid- ) cases data -humanitarian data exchange key: cord- -i ozygkp authors: babacic, h.; lehtiö, j.; pernemalm, m. title: global between-countries variance in sars-cov- mortality is driven by reported prevalence, age distribution, and case detection rate date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: i ozygkp objective: to explain the global between-countries variance in number of deaths per million citizens (ndpm) and case fatality rate (cfr) due to severe acute respiratory syndrome coronavirus (sars-cov- ) infection. design: systematic analysis. data sources: worldometer, european centre for disease prevention and control, united nations main outcome measures: the explanators of ndpm and cfr were mathematically hypothesised and tested on publicly-available data from countries with linear regression models on may st . the derived explanators - age-adjusted infection fatality rate (ifradj) and case detection rate (cdr) - were estimated for each country based on a sars-cov- model of china. the accuracy and agreement of the models with observed data was assessed with r and bland-altman plots, respectively. sensitivity analyses involved removal of outliers and testing the models at five retrospective and two prospective time points. results: globally, ifradj estimates varied between countries, ranging from below . % in the youngest nations, to above . % in portugal, greece, italy, and japan. the median estimated global cdr of sars-cov- infections on april th was . %, suggesting that most of the countries have a much higher number of cases than reported. at least % and up to % of the variance in ndpm was explained by reported prevalence expressed as cases per million citizens (ncpm), ifradj, and cdr. ifradj and cdr accounted for up to % of the variance in cfr, but this model was less reliable than the ndpm model, being sensitive to outliers (r as low as . %). conclusions: the current differences in sars-cov- mortality between countries are driven mainly by reported prevalence of infections, age distribution, and cdr. the ndpm might be a more stable estimate than cfr in comparing mortality burden between countries. the severe acute respiratory syndrome coronavirus (sars-cov- ) has substantially affected the lives of billions of people. ( , ) an ongoing question in the public is how high is the direct mortality caused by sars-cov- . observations in the case fatality rate (cfr), i.e. the proportion of individuals with a confirmed sars-cov- infection who die, has raised concerns due to the high variability between countries, ranging from below · % in qatar and singapore to above % in belgium and france, on may th . ( , ) the global average case detection rate (cdr) on march th was estimated at %, suggesting that the true prevalence of infections is likely underestimated in most of the countries. ( ) studies suggest that the reported number of cases per million citizens (ncpm) is probably lower than the true number of infected individuals, and that this contributes to the varying cfr between countries.( , ) cfr appears higher than the true infection fatality rate (ifr), i.e. the true proportion of individuals with a sars-cov- infection who will die in the population regardless of whether they are confirmed or not. ( ) this was observed in china where the crude cfr estimate was . %, whereas the age-adjusted overall ifr (ifradj) was estimated at . %. ( ) the number of confirmed deaths per million citizens (ndpm) is a population-normalised measure of mortality used to compare countries. however, the varying ndpm in countries with similar ncpm, population size and similar mitigation strategies has also raised fears of potential varying virulence of the virus and different treatment capacity between countries. a recent multivariable model could explain only . % of sars-cov- mortality variance between countries.( ) explaining the remaining variance of the reported mortality as ndpm and cfr is extremely relevant for both the medical community and the public, to address public concerns. furthermore, it is important to assess whether the adjusted mortality differs substantially between countries, in order to track the success of different strategies. the aim of this study was to test two mathematical hypotheses that explain the global between-countries variance in sars-cov- mortality expressed as ndpm and cfr on real data. global data on cumulative number of cases (nc), cumulative number of deaths (nd), cumulative number of tests (nt), number of tests per million citizens (ntpm), number of cases per million citizens (ncpm), and ndpm per country were downloaded from worldometer. ( ) global data on number of new cases and deaths per day were downloaded from the european centre for disease prevention and control (ecdc).( ) global data on age distribution and gdp per country were obtained from united nations (un) statistics.( , ) the overall ifradj per country was estimated and weighed per nine age groups, following the equation: where !"# is the ifradj in percentages (%), is the total population size, $ is the number of susceptible individuals within an age group, $ is the ifr for that age group in % as estimated by verity and colleagues. ( ) for the purposes of this study, the !"# serves as an ageadjustment factor. the cdr per country was estimated as the percentage of the estimated cases that have been confirmed cases, following the approach of vollmer & bommer ( ): where $ is cdr in %, $ is ifradj in %, -$ is cumulative number of confirmed cases at time , and -# is cumulative number of confirmed deaths at time . following the verity model ( ) , is days before in this approach, based on the estimate that on average · days pass from the onset of symptoms to death, holding a conservative assumption that on average . days pass from symptom onset to case detection. from these equations, -# is implied to have an inverse relation with the $ and will depend on the cumulative number of cases days before the -# have occurred, and the age-adjusted $ : assuming that the number of cases at the time of -# ( -# ) will have a constant dependence on the -$ , as observed repeatedly in epidemics, including sars-cov- , -# can replace it in the equation. in order to explain the population-. cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint normalised number of deaths -ndpm, one has to normalise the -# per population size with ncpm, deriving that: where ndpm at time will be higher in countries with higher ncpm at time and higher ifradj, whereas it will be lower in countries with the same ncpm and ifradj that have higher cdr. following that the cfr is the proportion of -# from the -# , the subsequent relationship between the cfr, the ifradj and cdr is implied as: $ is the cfr of a country, $ is the ifradj of a country, -# is the cumulative number of cases on the day of -# , and -$ is the cumulative number of cases two weeks before . again, assuming that -# will have a constant dependence on the -$ , they can be omitted from the equation, deriving that: where the cfr will be higher in countries with higher ifradj and will have an inverse relation with the cdr. hypothesis implies that older countries will have higher cfr and countries with higher cdr will have lower cfr, and predicts that the -# will not drive cfr. to test hypothesis , we built linear regression model (ndpm model), to explain ndpm with ncpm, ifradj, and cdr. to test hypothesis , we built linear regression model (cfr model), to explain cfr with ifradj and cdr. only countries with more than , cases were included in the analyses. all variables were normalised by log transformation. we have additionally tested whether gdp, ntpm, and duration of epidemic (as days from first case) could explain the mortality after being added to the models.( ) the accuracy was assessed with r , and the agreement was analysed graphically with the bland-altman mean difference plot.( ) to address uncertainty, we removed outliers outside of the % confidence intervals ( % ci) of the bland-altman plots, and reiterated the analyses retrospectively on april th , th , th , st , th , and prospectively on may th , th , th , and th . the study is conducted according to the guidelines for accurate and transparent health estimates reporting. ( ) the code, data, and results are publicly available at https://github.com/harbab/covid_ _morta lity. all analyses were performed in r v. . . . as of may st , a total of countries in the world have reported sars-cov- infections. of these, countries have reported more than , sars-cov- infections. the estimated ifradj varied from below . % in the youngest nations of ivory coast, guinea, nigeria, uae, cameroon, and afghanistan, up to above . % in the world's oldest nations of germany, portugal, greece, italy, and highest in japan with . %. the global average cdr on april th was . % (median: . %, sd: . ), suggesting that most of the cases were undetected. only two countries detected more than % of expected cases -iceland ( . %) and singapore ( . %). estimates for each country are shown in table s , supplementary information. univariate analyses showed that ncpm, ifradj, and ntpm could explain . %, . %, and . % of the variance in ndpm, respectively (p < . - ). the cdr was not univariately associated with ndpm (p = . ). however, combined together ncpm, ifradj and cdr could explain . % of the variance in ndpm (p < . - ). introducing ntpm to the model only slightly improved the r to . (p < . - ). all four variables were included in the final model ( table ). the relationship . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint between the variables was as hypothesised mathematically. the model showed almost perfect accuracy ( figure a ) and high agreement ( figure b ) in explaining ndpm. some countries were outliers ( figure c ). univariately, ndpm had a positive association with gdp (p = . - ) and the duration of the epidemic ( . ). however, neither variable had an association with ndpm when added to model (p = . and . , respectively). gdp had a positive correlation with all four explanators of model (p < . - ), most evidently with ntpm (r = . ), and was thus redundant in the model. gdp and duration of epidemic are possibly unstable explanators that can be useful in stratified analyses per continents and regions. univariately, ifradj and cdr explained . % and . % of the variance in cfr, respectively. when combined together, ifradj and cdr accounted for . % of the variance in cfr ( table ) . the predicted cfr also had high accuracy and high agreement with observed cfr (figure ). both ncpm and -# were not associated with the cfr when added to model independently, confirming the assumption on which hypothesis relies. none of the additional variables (ntpm, gdp, or duration of epidemic) was associated with cfr univariately or when added to model . ntpm and gdp were also not associated with cfr in a previous report ( ). reiterating the analysis at five retrospective and four prospective timepoints showed that model could robustly explain at least % of the variance in ndpm (at least % after removing outliers), but model had lower accuracy at earlier stages of the pandemic (figure ) . less countries had > , cases at earlier timepoints (range: on april th - on may th ). the ntpm was an unreliable explanator of ndpm that accounted for a very small proportion of variance that can be omitted; the effect of ntpm is possibly underestimated due to its association with ncpm. the cfr model was more sensitive to outliers compared to the ndpm model, with a higher average decrease in r of . % (range: . - . %, median: . %) compared to an average decrease of . % for the ndpm model (range: . - . %, median: . %) when including outliers (p = . ). the assumption of no effect of -# on cfr was violated at some time-points. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint bland-altman plot. the mean of the predicted log-normalised ndpm and observed log-normalised ndpm is plotted on the x axis, whereas the difference on a log scale between the observed ndpm and predicted ndpm is plotted on the y axis. the mean difference between the observed ndpm and predicted ndpm was (blue, full line), with the % confidence intervals (red, dashed lines) containing most of the values. five countries were outliers in this model, having less ndpm than predicted: russia, belarus, singapore, bangladesh, and kazakhstan; c. countries outliers. actual difference between observed ndpm and predicted ndpm in numbers. the labelled countries in the upper part of boxplot (> th quantile) had much more observed ndpm than predicted, whereas the labelled countries in the lower part had much less ndpm than predicted by the model. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint figure . agreement between observed and predicted case fatality rate (cfr). the countries are annotated with their country code. a. predicted log-normalised cfr (x axis) vs log-normalised observed cfr (y axis). the model could predict almost perfectly the cfr in a linear fashion. the blue line is model fit and the shades are % ci; b. bland-altman plot. the mean of the predicted log-normalised cfr and observed log-normalised cfr is plotted on the x axis, whereas the difference on a log scale between the observed cfr and predicted cfr is plotted on the y axis. the mean difference between the observed cfr and predicted cfr was (blue, full line), with the % confidence intervals (red, dashed lines) containing most of the values. four countries were outliers in this model, having lower cfr than predicted: russia, belarus, singapore, and bangladesh. . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint on average, several countries had > more ndpm than expected -italy, belgium, switzerland, spain, the netherlands, and iran, whereas others had on average > less ndpm than expected -uk, peru, brazil, belarus, russia, canada, chile and kuwait. likewise, italy, algeria, iran, the netherlands, china, belgium, iraq, indonesia, spain, switzerland, and the philippines had on average > . % higher cfr than expected according to the model, whereas bangladesh, ukraine, brazil, bolivia, mexico, belarus, russia, peru, and honduras had on average > % lower cfr than expected. at prospective time-points in may, three countries had consistently higher than % cdr, reporting more cases than expected: singapore (range: - %), iceland (range: - %), and qatar (range: - %). detailed results from the sensitivity analyses are available in supplementary information. most of the global variance in ndpm between countries was explained by reported prevalence of sars-cov- infections (ncpm) and age distribution as expressed with the ifradj. this has to be further adjusted for the cdr, which has an inverse relation with the ndpm, but only in the context of using ncpm and ifradj to explain ndpm. as expected, the richer countries were better at testing and detecting cases, but were also older and had a higher infection mortality burden. the cfr is also dependent on the ifradj and the cdr, but does not depend on the prevalence or the total number of sars-cov- confirmed cases. some countries remain outliers, having consistently higher or lower mortality than expected according to the models. this is possibly due to consistent misreporting ( ), differences in reporting deaths, diagnostic bias, sex distribution and average age of individuals who diedcountries with on average higher mortality than expected possibly had more older people and more men infected and dying. ( ) the observation that several countries have detected a higher number of cases than expected and had lower observed cfr than ifradj (see supplementary . cc-by-nc-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted june , . . https://doi.org/ . / . . . doi: medrxiv preprint information), supports the notion that the current ifr is overestimated and that the actual mortality is lower than estimated. to confirm these observations, serological surveys of populations will be essential in correctly estimating the true prevalence and mortality of sars-cov- . these models use very few explanators while maintaining high accuracy in explaining mortality. the sensitivity analyses demonstrated the robustness of the mathematical models when tested on real data. there is a remaining small proportion of variance than cannot be explained by the models, and this can be due to data mishandling or estimation errors, which limit the study. independent of these limitations, the ndpm model remained robust. the cfr model was more sensitive to outliers than the ndpm model, and might be a less stable mortality outcome to follow sars-cov- mortality burden over time and across countries. the models were somewhat less accurate at earlier stages, which can be due to the amount of data (number of countries) used to build the models. overall, this study demonstrates that most countries are on a similar sars-cov- mortality trajectory as the number of cases increases, after adjusting for age distribution and cdr. these models should be used for less biased comparisons of mortality between countries. the ndpm model appears as a more stable indicator of sars-cov- infection mortality burden and should be favoured in following and comparing mortality within and between countries. evidence before this study -verity and colleagues (lancet inf dis ) have estimated the sars-cov- infection fatality rates (ifr) per age groups, and vollmer & bommer ( ) have estimated that the average case detection rate (cdr) of sars-cov- infections in countries was below % end of march. -no studies have been published explaining the global sars-cov- variance in mortality. a medrxiv preprint by shagam ( ) reports that approximately % of sars-cov- mortality variance can be explained by gross domestic product per capita in united states dollars (gdp), latitude, hemisphere, press freedom, population density, fraction of citizens over years old, and outbreak duration. added value of this study -the models in this study demonstrate that most of the between-countries variance in sars-cov- mortality can be explained with two to three explanators, maintaining high accuracy. this can help to alleviate public concerns of potential varying virulence of the virus, and provide a less biased, standardised comparison of mortality burden between countries. -in the setting of lacking an effective and safe treatment and/or vaccine against sars-cov- , most of the countries will be on a similar sars-cov- mortality trajectory as the number of cases increases, after adjusting for the age distribution of the population and the case detection rate. disease control, civil liberties, and mass testing -calibrating restrictions during the covid- pandemic if the world fails to protect the economy, covid- will damage health not just now but also in the future european centre for disease prevention and control ecdc. geographic distribution of covid- cases worldwide substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov- ) real estimates of mortality following covid- infection. the lancet. infectious diseases. united states estimates of the severity of coronavirus disease : a model-based analysis average detection rate of sars-cov- infections is estimated around nine percent untangling factors associated with country-specific covid- incidence, mortality and case fatality rates during the first quarter of . medrxiv statistical methods for assessing agreement between two methods of clinical measurement guidelines for accurate and transparent health estimates reporting: the gather statement case-fatality rate and characteristics of patients dying in relation to covid- in italy we express gratitude to dr. petter brodin, dr. ioannis siavelis, and dr. emilie friberg for reading the draft and providing fruitful feedback. the study is conducted with publicly available data, and does not include individual patient or public involvement. the study was performed according to the ethical standards expressed in the declaration of helsinki. this study does not require ethical approval.contribution: hb designed the study, derived the hypotheses, collected, analysed and interpreted the data, and wrote and edited the manuscript. jl and mp assisted in design and interpretation of the study, supervised the work, reviewed and edited the manuscript. the corresponding author (hb) affirms that this manuscript is an honest, accurate, and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any discrepancies from the study as planned have been explained.dissemination: not applicable. the authors declare no conflict of interest.funding: the authors have not received funding for this work. key: cord- -fgjhli p authors: bignami, simona; ghio, daniela; van assche, ari title: estimates of covid- case-fatality risk from individual-level data date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: fgjhli p when calculated from aggregate data on confirmed cases and deaths, the case-fatality risk (cfr) is a simple ratio between the former and the latter, which is prone to numerous biases. with individual-level data, the cfr can be estimated as a true measure of risk as the proportion of incidence for the disease. we present the first estimates of the cfr for covid- by age and sex based on event history modelling of the risk of dying among confirmed positive individuals in the canadian province of ontario, which maintains one of the few individual-level datasets on covid- in the world. the severity profile of a novel pathogen is one of the most critical issues as it begins to spread, when assessing disease course and outcome is crucial for planning health interventions ( ) . for this reason, an important epidemiological indicator to monitor during the current outbreak of covid- is the case-fatality risk (cfr), the proportion of confirmed cases who result in fatalities. when calculated from aggregate data on confirmed cases and deaths, the cfr is a simple ratio between the former and the latter, which is prone to numerous biases ( ) ( ) ( ) ( ) ( ) . with individual-level data, the cfr can be estimated as a true measure of risk as the proportion of incidence for the disease. there are only two accessible individual-level datasets on covid- confirmed positive cases that include their clinical outcome (death or recovery), which are maintained by the canadian provinces of ontario and alberta . ontario is currently the second province in canada for number of covid- infections and fatalities. we present the first estimates of the cfr for covid- by age and sex based on event history modelling of the risk of dying among confirmed positive individuals in ontario. between january and april , , ontario recorded , confirmed positive cases of covid- , of which , had recovered and had died. the individual-level dataset we analyse contains basic information on age, gender, clinical outcome (death or recovery) and name of the reporting health facility for these cases. individuals' age is recorded at the time of the dataset for ontario is available at: https://data.ontario.ca/dataset?keywords_en=covid- . the dataset for alberta at: https://covid stats.alberta.ca. the number of covid- -related deaths recorded in alberta ( as of april ) is too small for this type of analysis. all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . testing in ten-years groups, with information missing for cases (of which are recoveries and have an unresolved outcome). information on gender is missing for cases (of which are fatalities, are recoveries, and have an unresolved outcome). these observations were excluded, so that the analysis is based on , cases, of whom . % ( , ) are females. following the guidelines of the world health organization, in canada only individuals symptomatic with fever, cough and/or difficulty breathing are tested for covid- . in ontario, case definition refers to these symptomatic individuals who tested positive for the virus and also to their close contacts even if they were not tested for the disease. in our dataset, this latter category included , ( %) of cases. we estimate the cfr through event history modelling in order to take into account censoring that arises because, at the time of observation, the outcome is unknown for a nonnegligible portion of infected individuals. in this framework, the cfr coincides with the cumulative incidence (ci) of mortality, with recovery as a competing risk. the ci is estimated after fitting the competing risk regression model, controlling for gender, with stcrreg in stata/se (version . , statacorp, llc). the cfr (cumulative incidence of mortality vs recovery) for covid- confirmed positive cases in ontario is presented in table . it can be seen that it increases exponentially with age, all rights reserved. no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted april , . . https://doi.org/ . / . . . doi: medrxiv preprint reaching . % for those aged +. by comparing the estimated cumulative incidence of death for covid- with the crude case-fatality risk calculated from aggregate data on the total number of cases and deaths, we can appreciate to what extent the latter over-estimates the risk of covid- -related mortality in each age group. this is especially the case in the - and - age groups, when the difference between the two estimates is . % and . %, respectively. at any age, female positive cases have a risk of mortality that is percent lower than male cases (shr=. ; ci=. , . ; p=. ). the comparison of the cumulative incidence of covid- -related mortality for males and females is presented in figure . methods for estimating the case fatality ratio for a novel, emerging infectious disease potential biases in estimating absolute and relative case-fatality risks during outbreaks the many estimates of the covid- case fatality rate a demographic adjustment to improve measurement of covid- severity at the developing stage of the pandemic monitoring trends and differences in covid- case-fatality rates using decomposition methods: contributions of age structure and age-specific fatality building an international consortium for tracking coronavirus health status beware of covid- projections based on flawed global comparisons no reuse allowed without permission. (which was not certified by peer review) is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted key: cord- -honeavwj authors: mair-jenkins, john; saavedra-campos, maria; baillie, j. kenneth; cleary, paul; khaw, fu-meng; lim, wei shen; makki, sophia; rooney, kevin d.; beck, charles r.; nguyen-van-tam, jonathan s. title: the effectiveness of convalescent plasma and hyperimmune immunoglobulin for the treatment of severe acute respiratory infections of viral etiology: a systematic review and exploratory meta-analysis date: - - journal: j infect dis doi: . /infdis/jiu sha: doc_id: cord_uid: honeavwj background. administration of convalescent plasma, serum, or hyperimmune immunoglobulin may be of clinical benefit for treatment of severe acute respiratory infections (saris) of viral etiology. we conducted a systematic review and exploratory meta-analysis to assess the overall evidence. methods. healthcare databases and sources of grey literature were searched in july . all records were screened against the protocol eligibility criteria, using a -stage process. data extraction and risk of bias assessments were undertaken. results. we identified studies of sars coronavirus infection and severe influenza. narrative analyses revealed consistent evidence for a reduction in mortality, especially when convalescent plasma is administered early after symptom onset. exploratory post hoc meta-analysis showed a statistically significant reduction in the pooled odds of mortality following treatment, compared with placebo or no therapy (odds ratio, . ; % confidence interval, . –. ; i( ) = %). studies were commonly of low or very low quality, lacked control groups, and at moderate or high risk of bias. sources of clinical and methodological heterogeneity were identified. conclusions. convalescent plasma may reduce mortality and appears safe. this therapy should be studied within the context of a well-designed clinical trial or other formal evaluation, including for treatment of middle east respiratory syndrome coronavirus cov infection. as of may , the world health organization (who) had been informed of persons with laboratory-confirmed middle east respiratory syndrome coronavirus (mers-cov) infection, of whom ( %) have died [ ] . the current approach to clinical management of mers-cov infection centers on general supportive care, with provision of critical care and organ support when necessary [ ] . it has recently been suggested that administration of convalescent plasma or hyperimmune immunoglobulin will yield a clinical effect for treatment of mers-cov infection [ ] . however, numerous uncertainties remain because the clinical course, viral replication kinetics, and host interactions are yet to be fully established [ ] . furthermore, the underlying evidence is based on studies of varying size and quality that describe clinical experience in treating other viral infections, including those due to sars coronavirus (sars-cov), spanish influenza a(h n ), avian influenza a(h n ), and pandemic influenza a (h n ) (hereafter, "influenza a[h n ]pdm ") [ ] [ ] [ ] [ ] [ ] . we conducted a systematic review and exploratory meta-analysis to evaluate the clinical effectiveness of convalescent plasma, serum, or hyperimmune immunoglobulin for the treatment of severe acute respiratory infections (saris) of viral etiology, to help inform clinical management of mers-cov infection. this systematic review was conducted in accordance with the preferred reporting items for systematic reviews and meta-analyses (prisma) guidelines [ ] . the study protocol was registered with the national institute for health research international prospective register of systematic reviews [ ] . the study eligibility criteria are available elsewhere [ ] . briefly, the study population of interest was human subjects of any age or sex who were hospitalized with saris with a laboratory-confirmed or suspected viral etiology. the intervention of interest was convalescent plasma, serum, or hyperimmune immunoglobulin derived from convalescent plasma. comparator treatments included placebo, sham therapy, or no intervention; studies with no comparator group were also included. outcome measures were derived from the protocol research questions to ascertain the clinical effectiveness of therapy [ ] . two reviewers (j. m.-j. and m. s.-c.) executed the search strategy in july . the sources of information searched and search construct are available elsewhere [ ] . adaptations were made for search interfaces that did not allow use of complex constructs. all search records were imported to endnote x software (thomson reuters, san francisco, ca) or screened manually, using paper records. following the removal of duplicate entries, a -stage screening process was followed to identify eligible records through the sequential examination of each title, abstract, and full text. two reviewers (j. m.-j. and m. s.-c.) screened each record, with provision for arbitration from a third reviewer (c. r. b.). data were collected independently by paired reviewers, using a piloted form. consensus agreement for each extracted data item was reached by discussion, with provision for arbitration from a third reviewer (j. m.-j., m. s.-c., and c. r. b.). the data extraction form is available as an appendix to the study protocol [ ] . risk of bias assessments were performed at the outcome measure level during data collection. the cochrane collaboration tool was used for experimental and prospective cohort studies [ ] , the newcastle-ottawa scale was used for observational studies (excluding prospective cohort studies) [ ] , and a tool published by the us agency for healthcare research and quality was used for systematic reviews [ ] . records limited to abstracts were not assessed, because of the paucity of information contained therein. odds ratios (ors), case-fatality rates (cfrs), absolute differences in cfrs, and difference in means were calculated as summary statistics with % confidence intervals (cis). study characteristics and outcome measures were tabulated. a recognized framework for narrative synthesis was adopted [ ] . because of potential concerns with clinical heterogeneity, analyses were stratified by viral etiology for each research question in accordance with the protocol [ ] . an exploratory, post hoc, random-effects-model metaanalysis was conducted to describe the pooled or of mortality, irrespective of sari etiology, following treatment with convalescent plasma or serum, using the odds after receipt of placebo or no therapy as a reference. results were adjusted by adding . to each cell of the contingency table when no deaths occurred in the exposed group in individual studies [ ] . meta-analysis of crude cfrs, using a random-effects model, was undertaken. statistical heterogeneity was ascertained using the i statistic, and meta-analyses were abandoned when this reached % [ ] . sensitivity analyses were undertaken to investigate the impact of excluding studies with ≤ patients in the exposed group. publication bias was assessed through construction of funnel plots and by use of the egger test. all statistical analyses were conducted using stata software, version . (statacorp, college station, tx), except for meta-analysis of pooled proportions, for which we used statsdirect software, version . . (statsdirect, altrincham, united kingdom). statistical significance was assumed at the % level. the search process yielded records ( figure ). after sifting unique records against the protocol eligibility criteria, we identified studies from reports (supplementary table ). three studies could not be obtained [ ] [ ] [ ] , although results from a study by bass et al [ ] were reported elsewhere [ ] , which enabled their inclusion. french (n = ), german (n = ), and korean (n = ) records were screened by single reviewers because of a lack of multilingual collaborators. the study characteristics are summarized in supplementary table . three systematic reviews met our protocol eligibility criteria [ , , ] . data on patients from case studies [ ] [ ] [ ] [ ] [ ] [ ] , case series [ , , , [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] , case-comparison studies [ , ] , and prospective cohort [ ] were included. we identified observational studies published between and , which studied patients who received a clinical diagnosis of influenza-associated pneumonia or spanish influenza a (h n ) infection [ , , - , - , ] . it is unclear whether some of these studies recruited patients with secondary bacterial pneumonia. sixteen observational studies that met our protocol eligibility criteria were published between and . four studies reported outcomes for patients infected with avian influenza a(h n ) [ , , , ] , studies reported outcomes for patients infected with influenza a (h n )pdm [ , , , ] , and studies reported outcomes for patients with sars [ , , , , , , , ] . the clinical status of patients at the time of treatment administration varied, as did concomitant treatments and comorbidities. convalescent plasma was used in all observational studies of sars-cov, influenza a(h n )pdm , and avian influenza a(h n ) infections (supplementary table ). for spanish influenza a(h n ) infection, observational studies used convalescent plasma, and used convalescent serum (supplementary table ). no studies that used hyperimmune immunoglobulin met our protocol eligibility criteria. the use of sham treatments or placebos was not reported. two systematic reviews were at low risk of bias [ , ] , whereas one was at moderate to low risk of bias across most domains ( table ) [ ] . data extraction was judged to be a moderate source of bias in all systematic reviews. search strategies were also a moderate source of bias in systematic reviews, as grey literature and non-peer-reviewed sources were not considered [ , ] . the risks of bias of outcomes in a single prospective cohort study were considered to be moderate ( table ) [ ] . the lack of randomized treatment allocation may have introduced systematic error, and the viral load outcome was at high risk of bias because of incomplete follow-up of patients. figure summarizes the risk of bias assessments for outcomes from observational studies. studies reported outcomes that were either at moderate risk ( outcomes) or moderate to high risk of selection bias ( outcomes). the majority of studies lacked a comparator group, and studies were at high or very high risk of reporting bias. this suggests that the observational study data included are at moderate to high risk of bias. three studies were not assessed for risk of bias, because they presented insufficient data [ , , ] . table summarizes our narrative synthesis, and supplementary table shows results of the individual studies that included an all-cause mortality outcome. meta-analyses, sensitivity analyses, and assessments of publication bias, by viral etiology, proved unfeasible due to a paucity of suitable data. there were no data available to address study questions relating to organ failure and sepsis or to hospital readmission and recurrence of severe disease. table and supplementary summarize observational studies at moderate to high risk of bias that reported improved mortality after patients received various doses of convalescent plasma [ , , , , , , , ] . a retrospective case-comparison study showed a cfr reduction after plasma treatment that reached statistical significance (absolute reduction in cfr, %; % ci, %- %; p = . ) [ ] . a second study with a comparator group described a cluster of cases of sars-cov infection in which patient received convalescent plasma and survived (absolute reduction in cfr, %; % ci, − % to %; p = . ) [ , ] . three small studies reported treatment of patients with no deaths, and a case series by cheng et al reported a cfr of . % ( of patients) following treatment (supplementary table ) [ , , , , , , ] . within this series, a subgroup analysis of patients found that those treated when pcr-positive but seronegative for sars-cov were more likely to be discharged within days of admission than those who were seropositive at the time of plasma infusion ( % vs %; p = . ). a further subgroup analysis of patients found that receipt of convalescent plasma treatment < days after onset of symptoms improved the likelihood of discharge within days of admission ( % vs %; p < . ); this remained significant after adjustment for age, viral status, time of administration, and lactate dehydrogenase level, suggesting that early treatment with convalescent plasma may be beneficial. however, allocation of treatment was mostly based on the physician's decision and the availability of plasma, and this study was at high risk of bias. four observational studies [ , , , ] and systematic review [ ] reported data on severe cases of influenza a(h n )pdm infection treated with convalescent plasma (table and supplementary table ). hung et al [ ] performed a prospective cohort study in which patients received a single -ml dose of convalescent plasma with a neutralizing antibody titer of > : . univariate analysis showed a significant absolute reduction in cfr of % ( % ci, %- %; p = . ) after treatment. multivariable analysis also showed a significant reduction in the relative risk of mortality (or, . ; % ci, . -. ; p = . ), although the factors adjusted for were not clearly stated. both groups received other treatments, such as neuraminidase inhibitors and steroids (supplementary table ). this nonrandomized study was at moderate risk of bias. a small study by chan et al [ ] at moderate risk of bias reported exclusively on patients who received extracorporeal membrane oxygenation (ecmo) and showed a nonsignificant absolute reduction of % ( % ci, − % to %) in the cfr after convalescent plasma treatment. in a case series at high risk of bias, in which of patients receiving convalescent plasma, a nonsignificant absolute reduction of % ( % ci, %- %; p = . ) in the cfr was observed (supplementary table ) [ ] . three case reports reported recovery among patients who were treated with convalescent plasma [ , , ] . the dose of convalescent plasma varied across each study, and the neutralizing antibody titer was reported for only case ( : ) [ ] . all studies were at high to moderate risk of bias and had patients who were given other therapies concomitantly (including steroids and antivirals), which could have influenced the reported clinical effect. a systematic review and meta-analysis by luke et al [ ] showed that treatment with convalescent plasma, serum, or blood was associated with a significant absolute reduction of % ( % ci, %- %) in the pooled cfr. statistical heterogeneity was low (i = . %), although interventions were clinically heterogeneous. of the studies included in the meta-analysis, reported use of convalescent whole blood; however, these studies only contributed patients ( %) in the treatment group. when timing of treatment was recorded, patients who received early treatment (< days from pneumonia onset) had a cfr of % ( of ), compared with % ( of ) for those treated later [ ] . only studies of convalescent serum reported a comparator group [ , ] . both reported absolute reductions in cfr after treatment, with a reduction of % ( % ci, %- %) in one and % ( % ci, %- %) in the other; the reduction in the latter reached statistical significance (p = . ). the remaining studies observed a cfr ranging from % ( of ) to % ( of ) after treatment (supplementary table ) . a significant absolute reduction in the cfr was observed in a case series of cases, of whom received convalescent plasma (absolute reduction in the cfr, %; % ci, % to %; p = . ) [ ] . a further study of patients treated with convalescent plasma reported a cfr of % ( of ) [ ] . the majority of studies on spanish influenza a(h n ) infection were found to have high risk of bias due to the use of now archaic research methods and a risk of wartime censorship and publication bias [ ] . the post hoc meta-analysis evaluated pooled data from comparative studies: studies of sars-cov infection [ , ] , of influenza a(h n )pdm infection [ , ] , of avian influenza a(h n ) infection [ ] , and of spanish influenza a(h n ) infection [ , , ] . there was a statistically significance lower risk of mortality in the group treated with convalescent plasma or serum ( pooled or, . ; % ci, . to . ; p < . ; i = %; figure ). examination of the funnel plot and findings of the egger test showed no evidence of publication bias. sensitivity analyses that excluded studies with ≤ cases demonstrated little variation in the pooled or or change in statistical heterogeneity (figure ) . meta-analysis of the crude cfr in treated patients was rejected due to excessive statistical heterogeneity (i = %). sensitivity analysis that excluded studies with ≤ cases did not account for this and was similarly abandoned (i = %). convalescent plasma treatment was associated with a significant increase in the proportion of sars-cov-infected patients discharged within days of admission in center (absolute difference, %; % ci, %- %; p = . ) after excluding patients with comorbidities from the analysis (table ) [ ] . a further sars-cov infection case series [ ] reported that % of patients ( of ) were discharged by day , and initiation of therapy was significantly earlier among patients discharged by that time (mean number of days from symptom onset, . vs . ; p < . ). both studies were at moderate to high risk of selection bias and confounding by indication. a case-comparison study at moderate risk of bias [ ] reported no significant difference in length of hospital stay between treatment and control patients with severe pandemic influenza a (h n ) infection who required ecmo ( table ) . a retrospective observational study [ ] reported that convalescent plasma treatment made nonsignificant reductions to the length of time spent in the intensive care unit, days of no data were reported in identified studies. significantly lower viral load after treatment was observed at days , , and after icu admission in subgroup analysis of prospective study, which was at moderate to high risk of bias. one noncomparative study found a reduction in viral load after treatment. no adverse events or complications were reported after treatment. nonsignificant benefits following intervention were reported in study with comparator data. three case reports reported no deaths. no comparative data were reported. the length of hospital stay was d in a case report at high risk of bias. no comparative data were reported. one case report, which had a high risk of bias, cited that treatment allowed discontinuation of mechanical ventilation. specific antibodies were detected between day and day after treatment in a case report at high risk of bias. no comparative data were reported. three studies reported reductions in viral load after treatment. no adverse events or complications were reported after treatment. a pooled absolute reduction of % ( % ci, %- %)in the cfr was reported by a meta-analysis at low risk of bias. this pooled studies, including studies using convalescent blood. subgroup analyses suggested that early treatment was beneficial. the absolute reduction in the risk of mortality ranged from . % ( % ci, . %- . %) to . % ( % ci, . %- . %) in studies at high risk of bias. ten noncomparative studies found that the cfr varied from % ( / ) to % ( / ) . no data were reported in identified studies. no data were reported in identified studies. no data were reported in identified studies. no data were reported in identified studies. three studies reported chills, increased temperature, and sweats after infusion. abbreviations: cfr, case-fatality rate; ci, confidence interval; ecmo, extracorporeal membrane oxygenation; icu, intensive care unit; influenza a(h n )pdm , pandemic influenza a(h n ); sars-cov, severe acute respiratory syndrome coronavirus. a all studies reported use of convalescent plasma, except studies, in which convalescent serum was used to treat spanish influenza a(h n ) infection, and meta-analysis of studies, of which reported use of convalescent blood to treat spanish influenza a(h n ) infection. additional data pertaining to individual studies (including comparator data, where presented) are available in the supplementary materials. mechanical ventilation, or number of days of ecmo for patients with severe pandemic influenza a (h n )pdm infection (table ) . two other case reports of pandemic influenza a (h n )pdm infection [ ] and avian influenza a(h n ) infection [ ] also suggested that convalescent plasma may have aided clinical improvement and reduced the duration of mechanical ventilation. we identified limited evidence relating to levels of viral antibodies after convalescent plasma treatment; studies did not use a comparator and were at high risk of bias. peaks in sars-cov antibody levels occurred - days following receipt of a single dose of convalescent plasma in healthcare workers (table ) [ ] . however, it is likely that other treatments, such as intravenous immunoglobulin, ribavirin and steroids, may have influenced the relationship between plasma and antibody levels. a case report of a patient with avian influenza a(h n ) infection also found that virus-specific antibodies appeared - days following administration of convalescent plasma [ ] . the sars-cov load in the respiratory tract decreased at a higher rate in patients who received convalescent plasma in a subgroup analysis of patients with influenza a(h n )pdm infection in a prospective cohort study (table ) ; [ ] viral loads were significantly lower , , and days after intensive care unit admission. however, there was a high risk of selection bias for this outcome, and concomitant treatments, including oseltamivir, zanamivir, and corticosteroids, may have confounded the results. further studies reported that viral load became rapidly undetectable in the blood of patients with sars-cov infection [ ] and in respiratory tract specimens from a patient infected with influenza a(h n )pdm [ ] after treatment. similar decreases in viral loads in serum and respiratory tract specimens were observed in cases of avian influenza a(h n ) infection, with virus becoming undetectable - days after initiation of convalescent plasma treatment for cases and - days after treatment initiation for the third case (supplementary table ) [ , , ] . no studies reported a serious adverse event, and few studies reported information about treatment complications, although minor complications may be underreported in the literature. two observational studies [ , ] concerned with sars-cov infection reported that treatments did not cause harm when administered to patients. one study involving influenza a(h n ) pdm infection reported that no adverse events were observed in the treatment group [ ] . three studies from - (involving - patients with influenza) reported minor infusion complications, including chills, increased temperature [ , ] , and sweats [ ] . a study of patients did not report chills or any serious complications. the methods and reporting of these studies reflect the period during which they were conducted, and the studies are therefore at high risk of bias. our analyses suggest that convalescent plasma may have a clinically relevant impact in reducing the rate of mortality and viral load in patients with sari of viral etiology. post hoc pooled meta-analysis across all viral etiologies showed a statistically significant % reduction in the odds of mortality among those who were treated with convalescent plasma or serum. we found no evidence of serious adverse events or complications due to therapy and limited evidence of a reduction in the use of critical care resources and the length of hospital stay. of interest is the evidence for a survival benefit after early administration. a recent multicenter, prospective, double-blind, randomized control trial compared the use of hyperimmune immunoglobulin (derived from influenza a(h n )pdm convalescent plasma) to intravenous immunoglobulin manufactured before the pandemic [ ] . for patients from this study who received treatment within days of symptom onset and were excluded per protocol, a multivariate subgroup analysis demonstrated that hyperimmune immunoglobulin had a protective effect (or, . ; % ci, . -. ) [ ] . evidence from studies of sars-cov infection [ ] and spanish influenza a(h n ) infection [ ] showed a survival benefit following convalescent plasma treatment within days and days of symptom onset, respectively. these findings suggest that early initiation of treatment may be of critical importance to reducing mortality in patients with sari of viral etiology. a lack of high-quality studies and a paucity in the volume of relevant literature limited our analyses. observational studies were predominately case reports or series, had no control groups, and had a moderate to high risk of bias. findings were commonly at high risk of confounding by indication. although selection or reporting bias may favor the intervention, recruiting patients who are clinically deteriorating or moribund would bias the result in the opposite direction. adequate methodological or statistical measures were infrequently used to control bias and confounding, and we identified numerous sources of clinical and methodological heterogeneity. we cannot be assured that all spanish influenza a(h n ) infection studies were included since our protocol did not include hand searching of literature from - . although our post hoc metaanalyses were undertaken to help inform clinical decision making, the theoretical rationale for pooling mortality data from different viral etiologies remains to be fully established. the results obtained must be considered experimental and interpreted with an appropriate level of caution. we did not identify any reports of convalescent plasma use for patients with mers-cov infection. the evidence for a reduction in mortality associated with convalescent plasma is strongest for sars and influenza a(h n )pdm infection. although it is clinically rational to consider novel therapies for critically ill patients, there is evidence that maximum benefit from convalescent plasma might be realized through early initiation of therapy. however, many treatment protocols currently mention convalescent plasma as a treatment of last resort. if this treatment is considered for mers-cov-infected patients with sari, it should ideally only be administered in acute centers able to manage potential treatment-related complications, such as transfusionrelated acute lung injury. we consider this a precautionary approach because of the limited clinical experience of administering convalescent plasma to this patient group. improved knowledge regarding the mode of action of convalescent plasma and the virologic and immunologic kinetics of novel respiratory infections that cause sari (such as mers-cov) are needed. this would help clarify the potential benefits and harms of treatment, identify optimal dosage, and ascertain whether repeated treatments are relevant factors for clinical practice. randomized controlled trials or observational studies that adopt a standardized minimum data set are needed to better evaluate convalescent plasma as a therapeutic option for mers-cov infection before it can be fully recommended or before refinements can be made over its current use, other than our current recommendation for early use. the who and the international severe acute respiratory and emerging infection consortium are currently developing a clinical trial protocol to investigate the effectiveness of passive immunotherapy for patients with sari. available evidence suggests that convalescent plasma is likely to reduce mortality during saris of viral etiology, with larger treatment effects if it commenced early after symptom onset. however, this is based on predominately low-quality, uncontrolled studies. our review supports the use of convalescent plasma in critically ill mers-cov-infected patients as part of a well-designed clinical trial or other formal evaluation. we thank the following reviewers from the convalescent plasma study group, who evaluated non-english-language articles on the basis of protocol eligibility criteria and undertook risk of bias assessments and data extraction: dr ana l. p. mateus supplementary materials are available at the journal of infectious diseases online (http://jid.oxfordjournals.org). supplementary materials consist of data provided by the author that are published to benefit the reader. the posted materials are not copyedited. the contents of all supplementary data are the sole responsibility of the authors. questions or messages regarding errors should be addressed to the author. middle east respiratory syndrome coronavirus (mers-cov)-update clinical management of severe acute respiratory infections when novel coronavirus is suspected: what to do and what not to do state of knowledge and data gaps of middle east respiratory syndrome coronavirus (mers-cov) in humans hyperimmune iv immunoglobulin treatment: a multicenter double-blind randomized controlled trial for patients with severe influenza a(h n ) infection treatment of severe acute respiratory syndrome sars: systematic review of treatment effects experience of using convalescent plasma for severe acute respiratory syndrome among healthcare workers in a taiwan hospital convalescent plasma and hyperimmune ig for saris the management of coronavirus infections with particular reference to sars preferred reporting items for systematic reviews and meta-analyses: the prisma statement middle east respiratory syndrome coronavirus (mers-cov) and the effectiveness of convalescent plasma for the treatment of severe acute respiratory infections of viral aetiology: a systematic review cochrane handbook for systematic reviews of interventions. version . . . updated the newcastle-ottawa scale (nos) for assessing the quality of non-randomised studies in meta-analyses systems to rate the strength of scientific evidence: summary centre for reviews and dissemination. systematic reviews: crd's guidance for undertaking reviews in health care influenza vaccination for immunocompromised patients: systematic review and meta-analysis from a public health policy perspective use of serum in the treatment of influenza-pneumonia treatment of influenza with injections of blood from convalescents treatment of influenza pneumonia with serum from convalescents treatment of influenza pneumonia by use of convalescent human serum meta-analysis: convalescent blood products for spanish influenza pneumonia: a future h n treatment? clinical research during a public health emergency: a systematic review of severe pandemic influenza management successful treatment of avian influenza with convalescent plasma convalescent plasma for prophylaxis and treatment of severe pandemic influenza a (h n ) infection: case reports treatment of severe acute respiratory syndrome with convalescent plasma treatment with convalescent plasma for influenza a (h n ) infection clinical characteristics and therapeutic experience of case of severe highly pathogenic a/h n avian influenza with bronchopleural fistula letter to the editor treatment of severe acute respiratory syndrome with convalescent plasma, author reply kong's experience on the use of extracorporeal membrane oxygenation for the treatment of influenza a (h n ) use of convalescent plasma therapy in sars patients in hong kong epidemiologic features, clinical diagnosis and therapy of first cluster of patients with severe acute respiratory syndrome in beijing area treatment of influenzal pneumonia with plasma of convalescent patients treatment of influenzal pneumonia by the use of convalescent human serum human serum in influenza clinical characteristics of human cases of highly pathogenic avian ]influenza a (h n ) virus infection in china retrospective study on collecting convalescent donors's plasma in treatment of patients with pandemic influenza a (h n ) virus infection human serum in the treatment of influenza bronchopneumonia convalescent serum in treatment of influenza. nor mag laegevidenskaben convalescent serum in the prevention and treatment of influenza saint-girons f. plasma therapy in influenza report of influenza treated with serum from recovered cases convalescent serum in the treatment of influenza pneumonia the use of the serum of convalescents in the treatment of influenza pneumonia: a summary of the results in a series of one hundred and one cases advances in clinical diagnosis and treatment of severe acute respiratory syndrome retrospective comparison of convalescent plasma with continuing high-dose methylprednisolone treatment in sars patients serum treatment of postinfluenzal bronchopneumonia convalescent plasma treatment reduced mortality in patients with severe pandemic influenza a (h n ) virus infection treatment of severe pandemic influenza a/h n infection convalescent transfusion for pandemic influenza: preparing blood banks for a new plasma product? disclaimer. the authors alone are responsible for the views expressed in this article, which do not necessarily represent the views, decisions, or policies of the institutions with which the authors are affiliated. the funder had no role in study design, data collection, analysis, or interpretation of the results; preparation of the manuscript; or decision to publish.financial support. this work was supported by the who pandemic and epidemic diseases department.potential conflicts of interest. w. s. l. has received funding from the national institute for health research to set up a pandemic influenza clinical trial and has received unrestricted funding from pfizer for a study in adult pneumonia. the university of nottingham health protection research group (with which j. s. n.-v.-t. and c. r. b. are affiliated) is an official who collaborating center for pandemic influenza and research and receives limited funding from the who in support for specific activities. all other authors report no potential conflicts.all authors have submitted the icmje form for disclosure of potential conflicts of interest. conflicts that the editors consider relevant to the content of the manuscript have been disclosed. key: cord- - b d authors: thomas, b. s.; marks, n. a. title: estimating the case fatality ratio for covid- using a time-shifted distribution analysis date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: b d estimating the case fatality ratio (cfr) for covid- is an important aspect of public health. however, calculating cfr accurately is problematic early in a novel disease outbreak, due to uncertainties regarding the time course of disease and difficulties in diagnosis and reporting of cases. in this work, we present a simple method for calculating the case fatality ratio using only public case and death data over time by exploiting the correspondence between the time distributions of cases and deaths. the time-shifted distribution (tsd) analysis generates two parameters of interest: the delay time between reporting of cases and deaths and the case fatality ratio. these parameters converge reliably over time once the exponential growth phase has finished. analysis is performed for early covid outbreaks in many countries, and we discuss corrections to cfr values using excess-death and seroprevalence data to estimate the infection fatality ratio (ifr). while cfr values range from . - % in different countries, estimates for ifr are mostly around . - . % for countries that experienced moderate outbreaks and - % for severe outbreaks. the simplicity and transparency of tsd analysis enhance its usefulness in characterizing a new disease as well as the state of the health and reporting systems. the novel coronavirus sars-cov- , and its attendant disease, covid- , first appeared in late in wuhan, china. since then, studies and estimates of the transmissibility and virulence of covid- have abounded, with widely varying results [ ] [ ] [ ] [ ] [ ] [ ] . virulence is often measured using the case fatality ratio (also called case fatality rate or case fatality risk, cfr), which is the number of deaths due to a disease as a proportion of the number of people diagnosed with the disease. the cfr is dependent on the particular pathogen (and its mechanism of action) and the immune response of the host, which can depend on age, sex, genetic factors and pre-existing medical conditions. environmental factors such as climate and health system may also affect cfr. it is important to accurately quantify the cfr of a new disease to inform policy, communication and public health measures. calculating the case fatality ratio requires data on cases and deaths over time, either for individuals or populations. in general, the cfr is based on diagnosed cases of disease rather than the number of actual infections (which is difficult to measure); there may be many more infections than reported cases, depending on the expression of symptoms and the degree of testing. the simplest estimate of cfr is to divide the cumulative number of deaths by the cumulative number of cases at a given time, known as the crude (or naïve) cfr. however, the crude cfr tends to underestimate the cfr during an outbreak because at any given time, some of the existing known cases will prove fatal and need to be included in the death count. this bias is known as right-censoring and obscures the cfr of a new disease early in the course of the outbreak, particularly before the time course of the disease is characterised. further, even once the distribution of times from onset of disease to death is known, it can be difficult to use this information to accurately correct the crude cfr. an alternative method is to use data for closed cases only, once patients have recovered or died (eg [ , ] ), yet this information is also difficult to obtain during an outbreak and may be biased towards a particular demographic or . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint skewed by delays in reporting of recoveries. other biases in calculating cfr include underascertainment of mild or asymptomatic cases, time lags in testing and reporting, and the effects of intervention approach, demographics and reporting schemes [ ] . there are many published calculations of case fatality ratios for covid- using various datasets from different countries and using a range of methods. in some places, initial outbreaks have now concluded and the final crude cfr accurately reflects the overall ratio of reported deaths to cases. in many other places, outbreaks are continuing. questions remain regarding quality of data, methods of calculation and even the possibility of changes in the cfr over time. these continuing uncertainties make it necessary to improve estimates of the cfr by refining the methods used to calculate it. in essence, this means finding the best way to correct the crude cfr for biases due to time lags and other factors. most previously published studies make use of a parametrised distribution of times from onset (or hospitalisation) to death, determined from individual case data from early in the outbreak (largely from china) [ , [ ] [ ] [ ] , which is then used in combination with statistical methods to estimate the cfr using population-level data on cases and deaths [ , , , ] . various assumptions are made in these analyses, including the form (and transferability) of the time course of cases, time lags in reporting or testing or hospitalisation, and estimates of the proportion of cases being detected. early values of cfr obtained using these methods range from - %, with the highest values obtained for china: - % early in the outbreak [ , ] , % in wuhan and as low as % outside hubei province [ ] . values reported outside china include - % for early cases in travellers [ , ] , and - % in korea [ ] . case fatality ratios have also been shown to vary greatly with the age of the patient [ ], which limits the transferability of parameters based on case studies. the specific data requirements and the range of approximations and assumptions required by statistical methods can make it difficult to interpret or rely on the results of such analyses, since biases can be obscured. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint the time-shifted distribution (tsd) analysis method began with an observation that the shape of the evolving time distribution of covid- cases in a given country often closely matches the shape of the corresponding distribution of covid- deaths -simply shifted by a number of days and linearly scaled in magnitude. this is illustrated in figure for covid- cases and deaths in italy (data from [ ] , -day averaged data shown); the time-shifted relationship between case and death distributions can be seen in both cumulative and daily tallies. we can understand this shift from the perspective of the time delay between diagnosis and death or recovery. however, the closeness of the match reflects a much simpler apparent relationship than that suggested or assumed by conventional analyses, which relate deaths and cases using statistical parametric models that incorporate a broad distribution of expected times between diagnosis (or onset) and death, usually generated from case study data (e.g. [ ] ). this observation suggests that there are two parameters of interest: the number of days separating the case and death distributions (called the delay time or ! ), and the scaling factor between the time-shifted case data and the death data, . for the optimal value of ! , there is a simple linear relationship between cumulative number of deaths at time , ( ), and cumulative number of cases at time − ! , ( − ! ), with gradient : to find the optimal value for ! , we test integer values from zero to days. for each value of ! , we plot ( ) as a function of ( − ! ) (for all ) and perform a linear regression using matlab. the value of ! is chosen on the basis of the lowest root-mean-squared error in the linear regression analysis and the value of is the gradient of the corresponding line. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint respectively), using a delay time of four days and a linear scaling factor of . . what do these parameters represent? the delay time is presumably a measure of the delay between reporting of confirmed cases and reporting of covid- -related deaths. while four days seems very short compared to current estimates of the mean delay between onset of covid- symptoms and death (or even between hospitalisation and death), which is around - days with a large variance [ , , , , ] , it is possible to rationalise the shorter apparent delay on the basis of delays in testing, diagnosis and reporting of the disease, particularly in countries where the outbreak is severe. for example, in italy from late february, testing was prioritized for "patients with more severe clinical symptoms who were suspected of having covid- and required hospitalization" [ ] ; a subsequent delay in test results could account for the rather short delay is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . between reported diagnosis and death. this shows the inherent danger in analysing such datasets using time-delay distributions from specific case data (which presumes a much longer delay time). moreover, the delay time may provide some useful information about relative conditions in various countries. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint using a time delay of four days in italy, the scaling factor of . represents the ratio of deaths to cases, or in other words, an estimate for the case fatality ratio, converging towards the crude cfr with time. the calculated cfr of . % is almost identical to the crude cfr of . % at the end of june, which is a good estimate for the "true" cfr at the end of the outbreak. an interesting question is, at what point in the outbreak does the cfr calculated using the tsd analysis give a good approximation to the final value? this is important because early estimates of cfr are vital for informing public health decisions. figure shows the cfr calculated at various stages of the outbreak using data available to that point. errors represent uncertainty in the linear regression as well as in ! . the earliest data predict a longer delay time, which results in a higher predicted cfr; once the value of ! has stabilised (from march), the predicted value of cfr is also very stable, and also remarkably accurate ( . %), compared to the crude value of . % at that time. even a week earlier, the calculated cfr of . % is a better estimate than the crude estimate of . %. it appears that this simple analysis generates two parameters of significant interest: the apparent delay between reporting of related cases and deaths, and the cfr. the estimates of these parameters (which can be determined unequivocally once an outbreak is concluded) can be calculated during the course of an outbreak and give a better approximation than the crude cfr. it should be noted that such an analysis cannot be applied during purely exponential growth, because time-shifting (horizontally) and scaling (vertically) an exponential function are equivalent operations, as: "($%$ ! ) = [ %"$ ! ] "$ = "$ , which means that any value of ! will give an equivalent relationship between ( − ! ) and ( ) with gradient depending on ! . therefore, the tsd analysis is only valid once exponential growth ends and the daily case rate is approaching (or past) its peak. alternatively, an estimate for ! could be used, but this reduces the simplicity and transparency of the model. we note that others have calculated an "adjusted" cfr early in the covid- outbreak using an equivalent method with an assumed . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint value of the time delay between onset and death (because the true value was not known): yuan and colleagues [ ] chose sample values of one, three and five days to give estimates from - % for italy in early march, while wilson and colleagues [ ] used days to give . - . % for china in early march. to test the tsd analysis method in determining cfr in the middle of an outbreak, and compare to alternative methods, we analyse data from the sars outbreak in hong kong is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint figure . to compare, on april the crude cfr is . %, which is a significant underestimate of the true value. the delay time of days is consistent with observations that the delay between onset and death for sars is approximately three weeks [ ] . we also applied tsd analysis to sars data for other countries, giving a calculated cfr of % for singapore and canada (although data are noisy), and % for taiwan. we can compare these estimates of the cfr with the more complex mathematical models of nishiura and coworkers [ ] and ghani and coworkers [ ] for the same sars outbreak. the simple tsd analysis gives better predictions than both the parametric mixture model and modified kaplan-meier method described by ghani et al. [ ] , which use individual case data . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint (dates of hospitalisation and death or discharge from hospital) to estimate cfr using statistical methods. such methods can provide earlier estimates (from april, giving around - % cfr) but are less accurate at this early stage than a simple estimate of cfr from data on closed cases (recoveries and deaths) at the same dates [ ] , and are later outperformed by our simple tsd . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint similarly, the model of nishiura et al. [ ] can provide much earlier estimates of cfr than our analysis but the accuracy of these estimates is uncertain and depends on the assumptions made. their analysis requires data on the dates of onset of confirmed cases and the distribution of times from onset to death; the latter, in particular, is poorly known at the start of an outbreak of a new disease. nishiura et al. [ ] analyse the hong kong sars data by assuming a simple exponential distribution for the time between onset and death, with a mean of days (from donnelly [ ] for sars cases in hong kong up to april, although donnelly used a gamma is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint distribution), and using statistical sampling to predict the cfr. the fact that this model provides a reasonable prediction of cfr at a specific time (around the end of march) is likely fortuitous, given that it involves scaling the crude cfr by a constant factor and will therefore overestimate the cfr at later times (as well as very early times). further, this method requires the use of parametrised data (the time distribution for onset to death) that are not available at the time that the predictions are purported to be made. in fact, when nishiura et al. [ ] apply the method to early h n (swine flu) data in they are forced to use a time distribution calculated from historical data for h n (spanish) influenza from - , which is problematic; a sensitivity analysis shows that the predicted cfr is sensitive to the choice of distribution parameters, making this method somewhat difficult to apply in the circumstances for which it is proposed. in comparison, the time-shifted distribution analysis is both transparent and straightforward to implement, using only publicly available data and no assumptions, and can provide a reasonably early estimate (once exponential growth has sufficiently slowed) of cfr that converges to the "true" value. if the value of the time delay is approximately known early in the outbreak, this could be used to constrain the fitting procedure, but as observed already, it is difficult either to know the time delay between onset and death or to apply it to the time delay between reporting of cases and deaths. time-shifted distribution analysis was performed on covid- data from an extensive range of countries, using datasets from johns hopkins center for systems science and engineering [ ] , cross-checked and supplemented with data from worldometers.com and -day averaged. for most countries (as for italy), the analysis results in a robust linear fit and provides a stable estimate for cfr and delay time. these data are shown in table , organised by region is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . table , most analyses use data up until the end of may, which is generally representative of the initial outbreak; for some countries with later outbreaks, later end dates are used. in many countries, more recent outbreaks have had dramatically different cfr values to initial outbreaks (due largely to improved testing rates); these can be analysed independently by selecting the time frame studied, but values presented here are for the initial outbreak in each country. the most notable result is the huge range in both delay times and calculated cfr estimates over different countries: from zero to days' delay and from less than one to % case fatality ratio. the highest ratios are calculated in western europe (up to %), followed by north america (up to %), south america (up to %), africa (up to %), and lowest in the middle east, asia and oceania (up to %). it is problematic to draw conclusions about relative covid- virulence by comparing cfr values between countries, because of vast differences in testing and reporting regimes -in particular, the under-reporting of cases (including mild or asymptomatic cases) due to inadequate testing, but also differences in classification or recognition of covid- -related deaths. however, it is instructive to calculate in this way, for any given country, the proportion of detected cases that are currently proving fatal, for the purposes of public health management and planning. for comparison, mazumder and colleagues [ ] calculated case fatality ratios for a range of countries using recovery and death is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint data from closed cases. they analysed eleven countries with high outcome rates and sufficient progression in the outbreak for analysis (at the end of april), but many of their calculated cfr values are much higher than our estimates -for example, estimated cfr above % for italy, france and usa at the end of april -probably due to delays in recovery reporting, whereas estimated cfr values for germany, china and south korea match ours. the tsd analysis provides more reliable estimates for a broader range of countries, due to the greater availability of death and case data over recovery data. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint the differences in delay times are also startling, ranging from zero to days with no clear pattern. this delay between reported cases and deaths may be informative regarding the state of reporting or testing in a country but it is difficult to interpret. the mean delay between onset of symptoms and death has been estimated at - days using case data [ , , , , , ] , but there are also delays between onset of symptoms and testing, between testing and reporting of results, and in reporting of deaths. for example, in sweden a mean delay of five days between onset of symptoms and the "statistical date" of a reported case (including one day from test to statistic) was reported [ ] . in some countries, tests are only administered to the sickest patients (many days after onset), and in others, test results can take up to a few weeks. we note that for australia and new zealand, where case numbers have been low and testing extensive and rapid, the calculated time delay is more than ten days, whereas many of the harder-hit countries in western europe and north america have much shorter calculated time delays. spain is an interesting case. until august, tsd analysis using spanish data from the worldometer website [ ] gave a stable cfr of % with a delay time of one day. on that day, data were "adjusted retrospectively by national authorities: case counts adjusted from february to august and death counts adjusted from april to august" according to the world health organisation (who) [ ] . using the revised data, the tsd analysis provided an even more robust fit; the cfr was almost unchanged at % but the delay time was increased to days. this means that early data from spain, which were erratic, reflected a much shorter delay between reported cases and deaths. in fact, the death data were largely unaffected by the august revision, but the dates of reported cases had shifted nearly two weeks earlier, presumably to better capture the onset time. this shows that a short delay time can reflect late reporting of cases, due either to testing late in the progress of the disease (well after onset) or delays in providing test results (or both). this may explain the short delay times for the united kingdom, italy, the netherlands and the usa, as well as many other countries (for example, is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint an important conclusion from this analysis concerns the perils in calculating the cfr using established time distributions for onset to death obtained from case studies, as is common. the beauty of this simple method is its transparency -nothing is assumed and the data are enabled to speak for themselves, and can therefore give us information that we might not expect, rather than merely reflecting our assumptions. we are interested not only in the versatility and simplicity of this method, but also in what conclusions may be drawn from the parameters calculated -namely, the cfr and the delay time. reported cfr values for covid- vary widely, but the best current estimates of the true infection fatality ratio or ifr (taking into account all infections including undiagnosed and asymptomatic) are around . - . % [ , ] based on cruise ship and population serology data. the very high cfr values calculated for many european countries in particular are probably vastly inflated due to the inadequate testing and overwhelmed health systems in these countries, which result in underestimation of case numbers. however, it is an oversimplification to assume that this is the only relevant factor that differs between countries, since we know that demographics and health systems (among other things) can also affect survival probability. such an assumption has been used in various studies, in order to compare the effectiveness of different countries' reporting systems and to correct case numbers [ , ] . however, by assuming that the ifr is identical everywhere at all times, valuable information is lost and conclusions may be misleading. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint in this study, we estimate the ifr from the cfr for a subset of countries using seroprevalence data (to correct case numbers) and excess death data (to correct death numbers). along similar lines, ioannidis [ ] previously estimated the ifr for various countries using seroprevalence data and cumulative reported deaths at a corresponding date, although this does not account for either excess deaths or the relationship between cases and deaths over time; in fact, using seroprevalence and death data alone reintroduces the issue of the unknown time delay between cases and deaths, which must be approximated. we note that studies from very early in the pandemic provided initial estimates for the true prevalence of covid- in specific places; a spatiotemporal transmission model applied to wuhan [ ] gave a prevalence factor of seven in january (published mid-march), and a statistical analysis study of testing data in the usa [ ] estimated a prevalence factor of nine in april (published in may). these early prevalence studies can be useful in roughly correcting the cfr to estimate the ifr before rigorous seroprevalence data are available. in australia, case numbers have been generally low (especially before june) and testing rates high. it is unlikely that there have been appreciable unreported covid-related deaths [ ] . however, even with robust testing many cases will be undiagnosed, especially asymptomatic cases, which could constitute half of all infections [ ] . a recent seroprevalence study of elective surgery patients in four states [ ] estimated that the number of true infections was around - times the number of reported cases, although the authors state that the study cohort may not reflect the general population (older individuals overrepresented). this prevalence ratio gives an approximate ifr for australia of . - . %. note that before june, most of australia's covid- cases were returned travellers, which may affect the age distribution and baseline health of cases compared to the general population. new zealand, taiwan and thailand are similarly circumstanced and have very similar cfr values, which are expected to reflect similar ifr values to australia. singapore, with its extremely low fatalities and extensive testing, did . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint not return a robust result from tsd analysis; nonetheless, the crude cfr of . % at the end of may is likely a lower bound for the ifr. the usa is an interesting case study. the tsd analysis is problematic because the relationship between cases and deaths changes over time, causing mismatch between case and death distributions and a downward drift in both cfr and delay time. this may be due to incomplete data, or changes in testing or reporting over time, which can affect both delay time and case numbers. alternatively, the case fatality ratio may be truly changing over time, due to changes in treatment approach or in the demographics (or location) of covid- cases [ ] . in the usa, there is also heterogeneity between states. to demonstrate, we present the time-shifted distribution analysis for the usa in figure , and for the state of new jersey (which has the highest mortality rate in the usa) in and illinois and new mexico give %. one potential reason for the mismatch of case and death data in the usa as a whole (and many of its states) is the under-reporting of cases due to the low level of testing, which varies over time. one measure of the adequacy of testing is the share of daily covid- tests that return a positive result, known as the positive test rate (ptr). the who has suggested a ptr around - % (or less) as a benchmark of adequate testing [ ] . in the usa, the positive test rate reached maximum levels in april, with values between and % from - april [ ] , is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . which is the region of greatest discrepancy between case and death profiles in the initial outbreak, as seen in figure . we would expect that such a high ptr indicates that case numbers during this time are greatly underestimated, which may explain the poor fit from tsd analysis and the high cfr. similar effects are seen in data from sweden and brazil, which also had low and variable testing rates and high ptr. recent seroprevalence studies in many states . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint of the usa from march to may [ ] suggest that there were at least eleven times as many infections as reported cases before the end of may. excess death data indicate that covidrelated deaths may be higher than reported by a factor of . for the same period [ ] . using these correction factors for the cfr, the estimated ifr for the usa is . % or below. for is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint comparison, a worldometers calculation estimated an ifr of . % in new york city in may [ ] , using a prevalence ratio of ten from an early antibody study [ , ] . in europe, many of the most affected countries have very high case fatality ratios, often combined with relatively short delay times. for some of these countries, seroprevalence studies provide estimates of the degree of undercounting of cases during the initial outbreak [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] , which can be utilised along with excess death data [ ] to estimate the infection fatality ratio. these ifr values are shown in table along with the correction factors used. some of the seroprevalence data is preliminary, including studies of germany, sweden and italy, and others are for specific regions of the country and may not be representative. nonetheless, the estimated ifr values are reasonable: switzerland and germany are around . %, above australia and below sweden and usa at around . %; belgium, uk and spain are between one and two percent; and italy higher at around %. ioannidis [ ] also calculated the ifr for many of these countries using seroprevalence studies, but using only single-time seroprevalence and death data with an assumed delay time (generally a week after the midpoint of the seroprevalence survey); these are also shown in table and are broadly consistent with our values except where excess deaths are significant (e.g. spain). our value for germany is somewhat higher but we expect that it is more reliable, using the scaling factor for cases [ ] with our calculated cfr rather than the absolute number of deaths at a certain date in the german town of gangelt [ ] , which is very low and reflects a date early in the german outbreak. although the calculated ifr values are only approximate and subject to revision, it is conceivable that higher ifr values may reflect higher fatality ratios in particular places at particular times, due to overwhelmed health systems in hard-hit areas or specific demographics or baseline health of affected populations. for example, it is reasonable to conclude that in lombardy, italy, the older population and overwhelmed health system caused a higher fatality . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint ratio compared to other places. in fact, the difference in age distribution of cases between italy and australia up to the end of may (using data from [ ] and [ ] ) can alone account for a factor of three in the ifr. therefore, while differences in testing and reporting between different countries undoubtedly account for much of the variation in ifr between countries, we neither expect nor find that ifr is the same for all covid- outbreaks. country-specific factors that influence ifr and differ between countries include testing and reporting, age demographics [ ] , health-care systems and treatments [ ] , mask-wearing and other behaviours, climate and culture, transport infrastructure and community mobility [ ] , genetic factors or prevalence of particular antibodies that affect immune response [ ] . there is some evidence that the ifr might be decreasing over time in some countries, especially those experiencing a "second wave". this is observed, for example, in the data for the usa in figure , demonstrated in the increasing mismatch in case and death distributions . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint later in the outbreak. we can use the tsd analysis to analyse the latter part of the outbreak (from july to september), giving a cfr of . - . % and a delay time of - weeks. similar analyses for individual states of the usa give stable cfr values from . - . % with delay times between four and days, with a mean of . % cfr and days' delay over states with robust fits. this later cfr is far lower than the value of % calculated early in the outbreak. we observe similar effects in various other countries post-july including japan (reduced to . % and days' delay) and spain, france and portugal (all reduced to . - . %, - days' delay). these values are all similar and may reflect a reasonable estimate for cfr when testing is adequate; we would still expect the ifr to be lower by a factor of at least two due to undiagnosed and asymptomatic cases. a decrease in cfr over time may also indicate a change in the demographics of the case load or improvements in treatment or even an increasing time delay between reported cases and deaths, perhaps due to earlier diagnosis. we also find that, in countries where the time delay is significant, the tsd analysis can serve in a predictive capacity for numbers of deaths, using the linear relationship between deaths and time-shifted cases. figure shows this prediction for the second phase of the covid- outbreak in france from august. using parameters calculated from tsd analysis for august to mid-october, reported case data can be time-shifted and linearly scaled to predict daily deaths for france for the next three weeks. this is useful for public health planning, as well as decision-making regarding implementation of restrictions. the time-shifted distribution analysis is a straightforward way to predict cfr over time, using only publicly available data on cases and deaths and requiring no assumptions or parametrisations regarding the progress of the illness. the beauty of this method is in its transparency and simplicity; the lack of assumptions allows more to be gained from the data, is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint including trends that may be unexpected or changing over time. this analysis method has particular utility early in an outbreak, once sufficient data are available for a robust fit (beyond the exponential growth phase). without the benefit of hindsight, the tsd-calculated values for cfr and time delay between cases and deaths can shed light on the virulence of a disease and on the conditions that a particular country may be facing. excess death data (where available) may be used to correct death data, while positive test rates and other indicators or models of testing adequacy can often give an early rough idea of the true prevalence relative to reported case numbers. these data can be used to interpret the cfr calculated using tsd analysis early in an outbreak, and to approximate the ifr. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint our estimates of ifr range from . - %, with higher values observed for countries that experienced more severe outbreaks, perhaps reflecting the negative influence of overwhelmed health systems and the spread of disease to more vulnerable populations. the calculated time delay is also potentially informative; for example, the one-day delay calculated from early data in spain reflects the breakdown of testing and reporting systems at that time, whereas the revised delay time of days shows the recovery of the system and the likely delay between case diagnosis and death. in this way, tsd analysis of data from a particular place at a particular time can give useful local information on the progression of an outbreak to inform public health planning and policy. are from the economist's covid- excess deaths tracker repository at https://github.com/theeconomist/covid- -excess-deaths-tracker, and positive covid- test rates are from our world in data at https://ourworldindata.org/coronavirus-testing. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint . cc-by-nc-nd . international license it is made available under a perpetuity. is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprint the copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia incubation period and other epidemiological characteristics of novel coronavirus infections with right truncation: a statistical analysis of publicly available case data the many estimates of the covid- case fatality rate. the lancet on identifying and mitigating bias in the estimation of the covid- case fatality rate. harvard data science review report : severity of -novel coronavirus (ncov) estimating risk for death from coronavirus disease, china updated understanding of the outbreak of novel coronavirus ( -ncov) in wuhan estimating the risk of covid- death during the course of the outbreak in korea an interactive web-based dashboard to track covid- in real time methods for estimating the case fatality ratio for a novel, emerging infectious disease clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a single-centered, retrospective, observational study case-fatality rate and characteristics of patients dying in relation to covid- in italy monitoring transmissibility and mortality of covid- in europe case-fatality risk estimates for covid- calculated by using a lag time for fatality world health organization. cumulative number of reported probable cases of severe acute respiratory syndrome (sars) epidemiological determinants of spread of causal agent of severe acute respiratory syndrome in hong kong early epidemiological assessment of the virulence of emerging infectious diseases: a case study of an influenza pandemic covid- ) mortality rate coronavirus disease (covid- ): log of major changes and errata in who daily aggregate case and death count data a novel comprehensive metric to assess covid- testing outcomes: effects of geography, government, and policy response infection fatality rate of covid- inferred from seroprevalence data. bulletin of the world health organization : article id blt substantial undocumented infection facilitates the rapid dissemination of novel coronavirus (sars-cov- ) substantial underestimation of sars-cov- infection in the united states how accurate are australia's coronavirus numbers? the answer lies in our death data suppression of a sars-cov- outbreak in the italian municipality of vo' a dual antigen elisa allows the assessment of sars-cov- antibody seroprevalence in a low transmission setting simpson's paradox in covid- case fatality rates: a mediation analysis of age-related causal effects our world in data. coronavirus (covid- ) testing seroprevalence of antibodies to sars-cov- in sites in the united states the economist. tracking covid- excess deaths across countries cumulative incidence and diagnosis of sars-cov- infection in new york amid ongoing covid- pandemic, governor cuomo announces results of completed antibody testing study of , people showing . . cc-by-nc-nd . international license percent of population has covid- antibodies infection fatality rate of sars-cov- infection in a german community with a super-spreading event seroprevalence of anti-sars-cov- igg antibodies in geneva, switzerland (serocov-pop): a population-based study public health agency of sweden). första resultaten om antikroppar efter genomgången covid- hos blodgivare antibody prevalence for sars-cov- in england following first peak of the pandemic: react study in , adults seroprevalence of igg antibodies against sars coronavirus in belgium: a prospective cross-sectional study of residual samples prevalence of sars-cov- in spain (ene-covid): a nationwide, population-based seroepidemiological study sars-cov- infection fatality risk in a nationwide seroepidemiological study primi risultati dell'indagine di sieroprevalenza sars-cov- seroprevalence of sars-cov- significantly varies with age: preliminary results from a mass population screening australian government: department of health. coronavirus (covid- ) current situation and case numbers assessing the age specificity of infection fatality rates for covid- : meta-analysis & public policy implications determinants of the number of deaths from covid- : differences between low-income and high-income countries in the initial stages of the pandemic targets of t cell responses to sars-cov- coronavirus in humans with covid- disease and unexposed individuals we thank dr nick golding (curtin university) for many helpful conversations and comments on the manuscript. the data used this study are publicly available. covid- data are from the covid- data it is made available under a perpetuity.is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprintthe copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint it is made available under a perpetuity.is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprintthe copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint it is made available under a perpetuity.is the author/funder, who has granted medrxiv a license to display the preprint in (which was not certified by peer review) preprintthe copyright holder for this this version posted october , . ; https://doi.org/ . / . . . doi: medrxiv preprint key: cord- -tn muv authors: jen, tung-hui; chien, tsair-wei; yeh, yu-tsen; lin, jui-chung john; kuo, shu-chun; chou, willy title: geographic risk assessment of covid- transmission using recent data: an observational study date: - - journal: medicine (baltimore) doi: . /md. sha: doc_id: cord_uid: tn muv background: the us centers for disease control and prevention (cdc) regularly issues “travel health notices” that address disease outbreaks of novel coronavirus disease (covid)- in destinations worldwide. the notices are classified into levels based on the risk posed by the outbreak and what precautions should be in place to prevent spreading. what objectively observed criteria of these covid- situations are required for classification and visualization? this study aimed to visualize the epidemic outbreak and the provisional case fatality rate (cfr) using the rasch model and bayes's theorem and developed an algorithm that classifies countries/regions into categories that are then shown on google maps. methods: we downloaded daily covid- outbreak numbers for countries/regions from the github website, which contains information on confirmed cases in more than chinese locations and other countries/regions. the rasch model was used to estimate the epidemic outbreak for each country/region using data from recent days. all responses were transformed by using the logarithm function. the bayes's base cfrs were computed for each region. the geographic risk of transmission of the covid- epidemic was thus determined using both magnitudes (i.e., rasch scores and cfrs) for each country. results: the top countries were iran, south korea, italy, germany, spain, china (hubei), and france, with values of { . , . , . , . , . . , . } and { . %, . %, . %, . %, . %, . %, and . %} for the outbreak magnitudes and cfrs, respectively. the results were consistent with the us cdc travel advisories of warning level in china, iran, and most european countries and of level in south korea on march , . conclusion: we created an online algorithm that used the cfrs to display the geographic risks to understand covid- transmission. the app was developed to display which countries had higher travel risks and aid with the understanding of the outbreak situation. since the outbreak of the novel coronavirus disease in wuhan city, china, on january , , [ , ] a total of , confirmed cases and deaths had been reported by march , , [ ] involving provinces/cities in china as well as countries/regions outside of china. [ ] the total number of deaths (= ) has substantially surpassed those from (final toll of deaths in ) and the middle east respiratory syndrome (final toll of deaths in ). [ ] [ ] [ ] . . travel information required for knowledge of covid- risk in an influenza pandemic, the strength of the increase in confirmed cases is a proxy for epidemic size and disease transmissibility. [ ] the us centers for disease control and prevention (cdc) has established geographic risk-stratification criteria for the purpose of issuing travel health notices for countries with covid- risk and guiding management decisions for people with potential travel-related exposure to covid- . [ ] four strata have been established: ( ) limited community transmission, ( ) sustained (ongoing) community transmission, ( ) widespread, sustained (ongoing) transmission, and ( ) widespread, sustained (ongoing) transmission and restrictions on entry to the united states. for instance, on march , , the entry of foreign nationals from china and iran was suspended. the cdc recommended that ( ) travelers avoid all nonessential travel to the following destinations (china, iran, and most european countries), and ( ) older adults or those with chronic medical conditions consider postponing traveling to south korea. these represent the levels of notice based on the risk presented by the outbreak and the precautions that are needed to prevent infection, including watch level , alert level , and warning level . although a number of factors were involved in publishing the geographic risk stratification, including size (e.g., the number of confirmed cases), geographic distribution, and epidemiology of the outbreak, [ ] none of these objectively observed criteria were provided to us for our assessment of the covid- situation for each country/region. as of february , , more than articles related to covid- were searchable with the keyword "covid- or -ncov" on pubmed central (pmc). [ ] the johns hopkins center for systems science and engineering (jhc) has built an online dashboard and regularly updates the data to track the worldwide spread of the -ncov outbreak [ ] with the hope of providing the public with a better understanding of the covid- outbreak. however, the jhc [ ] and other dashboards [ , , ] only provided visual dashboards of the world map and included little information on the outbreak and bubbles for counties/ regions. no solid geographic risk assessment for covid- transmission has been seen yet on the internet, including on those websites [ , , [ ] [ ] [ ] [ ] [ ] [ ] providing simple and widely available information (e.g., the number of confirmed, deaths, and recovered cases based on countries/regions along with death rate, transmission rate, incubation period, as well as discussions on age and demographics) to the public. none were found to be equipped with travel information that would fulfill the public's needs. rasch models, [ ] which were named after georg rasch, are a family of psychometric models for creating measurements from categorical data, such as answers to questions on a reading assessment or questionnaire responses with a function of the trade-off between a. respondent ability and b. task difficulty. [ ] in addition to psychometrics and educational research, the rasch model and its extensions have been used in other areas, including the health profession [ ] and market research, [ ] because of their general applicability. [ ] our goal was to determine whether rasch analysis could be used for inspecting epidemic magnitudes by observing the pattern of daily confirmed cases. the reasons for the use of the rash model include that . all responses were ordinal within a specific range (e.g., from - on a likert-type scaling survey), . all regions and days (like persons and items on a test) were on an equal interval continuum with a unit of logit (=log odds) in comparison, [ , ] . sequential assessments that estimate the epidemic magnitudes and examine the covid- situation for each country/region instead of using the cumulative confirmed cases with the traditional method ignoring the recent cases, which have greater weight (i.e., of importance) in determining the outbreak magnitudes. the (cfr is related to the following questions: ( ) how deadly is this? and ( ) how many people will die in this outbreak? the severe acute respiratory syndrome , the middle east respiratory syndrome, ebola, and h ni yielded real cfrs of . %, . %, %, and . %, respectively, [ ] [ ] [ ] and the cfr for covid- has been discussed in numerous articles. [ ] [ ] [ ] the world health organization, in a press conference on january , , announced that the death rate of covid- was % based on the cfr calculation (= deaths/cases). [ , [ ] [ ] [ ] [ ] this figure was substantially underestimated because it assumed . no lag days from symptom onset to death (i.e., death tolls registered and confirmed many days ago) [ ] and . all currently infected cases had totally (i.e., %) recovered. bayes's theorem (alternatively bayes's law or bayes's rule) describes the probability of an event based on prior knowledge of conditions that might be related to the event. [ ] it is necessary to use the post-cfr to adjust the prior-cfr for each country/region on covid- to examine the geographic risks. this is because the post-cfr might be increased if the conditional probability of death is greater than the counterpart of recoveries according to the equation, pða jbÞ ¼ pðbja Þpða Þ , where the probability of (p (a ), cfr) is based on the shared portions of ( ) conditional deaths and recoveries: p(bja ) and p(bja ), and ( ) the total possibility (e.g., pðbÞ ¼ pðbja Þ Â pða Þ þ pðbja Þ Â pða Þ for a particular region, p(a ) = À cfr). the shared portions can be used to more accurately assess the probability of (p(a ), cfr), which can be done without the knowledge of the shares using bayes's theorem for estimation. in the current study, we were motivated to apply bayes's theorem to estimate the adjusted cfr for countries/regions on covid- . the aims of the current study were to . visualize (i) the outbreak magnitude and (ii) the adjusted cfrs for countries/regions in recent days . develop an algorithm that classifies countries/regions into categories of outbreak epidemics and shows then on google maps, and . design an app for better interpreting the geographic risk of covid- transmission. we downloaded covid- outbreak numbers on march , , from github, [ ] a site that provides information on newly confirmed cases in more than chinese locations and other countries/regions. all downloaded data (in supplemental digital content file , http://links.lww.com/md/e ) were publicly displayed on the website. ethical approval was not necessary for this study because all the data were obtained via the internet. [ ] . . rasch model for obtaining the outbreak magnitudes the rasch analysis [ ] was performed online using authordeveloped codes. [ ] all responses were derived from ordinal scores using the logarithm functions (i.e., using the excel function round (ln(confirmed cases), ) from to ) for each region in china and other countries. the geographic risks for covid- transmission were determined by both the outbreak magnitudes with a unit of logit (log odds) and the adjusted cfrs based on bayes's theorem. we defined the adjusted post-cfr, as shown in eqs. ( ) and ( ) as follows: pðbÞ pðbja Þ ¼ deaths in the regiion total deaths in all regions ; ð Þ pðbja Þ ¼ recoveries in the region total recoveries in all regions ; ð Þ where p(a jb) denotes the post-cfr, p(b) stands for the burnouts (or loading dealing with those currently infected cases in the respective region) on covid- , and p(bjai) represents the conditional probabilities observed from the structure (or pattern) in deaths (=a ) and recoveries (=a ). p(a ) and p(a ) are the prior-cfr (=deaths/confirmed cases) and the probability of recoveries (= -cfr), respectively; in ( ) and ( ), the adjusted post-cfr is higher if p(bja ) is greater than p(bja ). otherwise, the post-cfr is less than the prior = cfr. as such, the transmission risk can be denoted by the adjusted post-cfr because these two metrics in eqs. ( ) and ( ) are unequal. imagine that at the end of the outbreak course, both p(bja ) and p(bja ) converge to have identical values and lead both post/ prior-cfrs to be equal. world maps have been used to show disparities in health outcomes across areas in many disciplines, [ , ] such as dengue outbreaks, [ , ] disease hotspots, [ ] and the global health observatory (gho) maps on major health topics. [ ] a kano diagram [ , ] was used to highlight the geographic risks of countries/regions. the kano diagram was used to divide areas into three groups; bubbles were colored by latitude (i.e., higher in green and below . in red) and sized by doubling days for the confirmed cases of covid- (i.e., days it takes to double the number of confirmed cases starting from at least cases). the formula of /d * was applied to transform the doubling days into a scale, with higher means spending fewer days to increase the number of confirmed cases. rasch logit scores are on the axis x and adjusted cfrs on the axis y. the number of confirmed cases in the recent and days were transformed into ordinal scores from to , respectively, for comparison. on the other hand, we plotted countries/regions on the kano diagram, dividing them among four features represented by different colors: . ready to increase (yellow), . increasing (green), . starting to decrease (light green), and . decreasing (red). a specific algorithm was applied to the categorization of the features mentioned above. three types of line charts were provided to verify that the features were fully supported. a dashboard app was designed for a daily updating geological display of the epidemic situation for travelers. we examine whether the rasch model could be applied to evaluate the riskalert level for covid- by examining the advisories of the us cdc. the study flowchart is shown in figure and supplemental digital content file , http://links.lww.com/md/e . fig. ). if the last days were applied to measure the geographic risks for regions, the top seven were germany, iran, south korea, italy, spain, sweden, and norway, with { . , . , . , . , , , . , and . } and { . %, . %, . %, . %, . %, . %, and . %} for the rasch scores and cfrs, respectively (see fig. ). readers are invited to scan the qr codes in figures and to see details about the information on google maps, such as the doubling days for the confirmed cases on covid- : and days for hubei (china) and south korea. it is worth noting that hubei (china) has fallen behind on the outbreak magnitudes because the outbreak situation has been gradually improved if the data from the last seven days are used for reporting. the results were consistent with the us cdc travel advisories of warning level in china, iran, most european countries, and level in south korea on march , . the top countries/regions (italy, spain, and iran) with the highest covid- transmission risks were particularly highlighted with symbols from to using the confirmed cases in the recent seven days dated march , (fig. ) . the bubbles were sized according to the number of confirmed cases and colored by feature (i.e., ready to increase, increasing, starting to decrease, and decreasing). we can see that counties in europe have green bubbles. in contrast, many regions (or provinces in china) have black bubbles, indicating that there has been no confirmed case in the last days. medicine we suggest that readers scan the qr-code in figure and click the link about the -line charts for the region of interest. the features of the outbreak for each country/region are shown in figure . we can see that the bubbles were sized by the number of confirmed cases and colored by feature (e.g., increasing in green and decreasing in red). the line charts regarding the details appear when the bubble of interest has been clicked. we confirmed that the information in figure by using rasch analysis and the adjusted cfrs could highlight the travel risk on covid- . the results were consistent with the us cdc travel advisories of warning level in china, iran, and most european countries, and level in south korea on march , . in an influenza pandemic, the strength of the increase in confirmed cases is a proxy for epidemic size and disease transmissibility. [ ] the us cdc has established geographic risk-stratification criteria for the purpose of issuing travel health notices for countries with the risk of covid- transmission and guiding public health management for people with potential travel-related exposures to covid- . [ ] however, there is no objective measurement system that can help us visualize the transmission risk of covid- for travelers. in this study, we provided visual representations based on the risk posed by the outbreak using rasch analysis and the cfrs based on bayes' theorem, which was a rare strategy in the literature. many dashboards and websites [ , , [ ] [ ] [ ] [ ] [ ] [ ] provide daily covid- -related information. none of them display such sophisticated messages on the ongoing epidemic situations as those from the rasch modeling technique and the bayes' theorem (figs. - ) . although choropleth maps have been popularly applied in the healthcare setting, [ , ] the major features of outbreak magnitudes and cfrs are included in this study to display the high travel risk for covid- transmission, which differentiates this study from others [ , , [ ] [ ] [ ] [ ] [ ] [ ] ] that only provide the number of confirmed cases or other simple information, particularly with bubbles sized by the number of confirmed cases and merely colored without other meaningful features. we provide main algorithms that display the outbreak magnitudes and cfrs to highlight the regions with the highest transmission risk, which are rarely seen in the literature but are of importance to revealing the epidemic transmission risk. however, with complex computations, these algorithms can be routinely run on the internet, which allows us to easily examine the daily progress of the outbreak, as we have shown in the previous figures. qr codes have been provided to readers to examine the detailed information on any regions of interest on the dashboards via google maps. the post-cfrs were used to examine how the particular risks appeared in regions. in this case, the countries/regions were within our expectations and were listed on the us cdc website on march , , [ ] indicating that the results were reliable. two main strengths of the current study include . the epidemic trend displayed under the rasch measurement (x-axes in figs. and ); . cfrs based on bayes' theorem, which was enriched in this study (y axes in figs. and ); . the geographic risks shown on google maps (fig. ) ; . using features to display all countries/regions in four respective quadrants (fig. ) ; and . the creation of an app to demonstrate the covid- situations on dashboards that use google maps for display. our study has some limitations. first, we were more concerned with the transmission risk in certain regions. as such, the numbers of confirmed cases were transformed into ordinal scores (e.g., from to ) to fit the rasch model's requirement. whether the preliminary assumptions on the rasch model were met (e.g., local independence on items and unidimensional scale) was not examined in this study, though rasch analysis can be performed on such repeated measures. [ ] [ ] [ ] second, although we applied cfrs to distinguish the geographic risks, the difference between the prior-and post-cfrs might emphasize the regions with higher risks based on death tolls. in contrast, the rasch logit scores were focused on the outbreak magnitudes. a greater number of confirmed cases yield higher magnitudes due to momentum. third, readers might be doubtful about the different weights, which were created by transforming original counts into ordinal scores using the logarithm function, used in the rasch analysis. areas with more confirmed cases have lower weights, similar to the law of diminishing marginal utility in economics. [ ] otherwise, the transformation function can be substituted with other functions, such as equal interval compression (e.g., compress cases/ into several categories), to meet the requirement of rasch measurement. fourth, the doubling days for the confirmed cases on covid- have not been discussed much in this study. the use of doubling days in estimating the number of confirmed cases in a region is worth studying in the future. for instance, when the doubling days and the average length of hospitalization for deaths (alhd) are known, the confirmed cases can be estimated by the formula of ^(alhd/dd) * death tolls in a region. furthermore, the online rasch rating scale model [ , ] was programmed by the authors. although many visualization models have been developed, other useful diagrams and algorithms, such as diagnosis maps and kidmap, [ , ] can be further elaborated and developed in the future. finally, we suggest using both outbreak magnitudes and cfrs to observe the transmission risk in regions. the former concerns the number of confirmed cases, and the latter relates to the death tolls. from these perspectives, we can understand the transmission risks with more confidence, making them worthy of further investigation in the future. we created an online rasch modeling algorithm to display a visual representation of the geographic risks of the covid- transmission. we are hopeful that the app will help us better understand travel risks and keep us updated on the situation of the current outbreak. twc developed the study concept and design. sc, jcj, and yt analyzed and interpreted the data. sc monitored the process of this study and helped in responding to the reviewers' advice and comments. th drafted the manuscript, and all authors provided critical revisions for important intellectual content. the study was supervised by wc. all authors read and approved the final manuscript. the rate of underascertainment of novel coronavirus ( -ncov) infection: estimation using japanese passengers data on evacuation flights preliminary estimation of the basic reproduction number of novel coronavirus ( -ncov) in china, from to : a data-driven analysis in the early phase of the outbreak jhc. coronavirus disease (covid- ) outbreak. available at coronavirus disease (covid- ) outbreak risks to healthcare workers with emerging diseases: lessons from mers-cov, ebola, sars and avian flu are coronavirus diseases equally deadly? comparing the latest coronavirus to mers and sars estimation of merscoronavirus reproductive number and case fatality rate for the spring saudi arabia outbreak: insights from publicly available data approximate bayesian algorithm to estimate the basic reproduction number in an influenza pandemic using arrival times of imported cases search covid- risk assessment by country articles related to covid- wuhan coronavirus outbreak has now surpassed mers (final toll of deaths in novel coronavirus (ncov) data repository novel coronavirus ( -ncov) outbreak cdc tests for -ncov european centre for disease prevention and control (ecdc) novel coronavirus ( -ncov) national health commission of the people's republic of china (nhc) probabilistic models for some intelligence and attainment tests (reprint, with foreword and afterword by the rasch model rasch measurement in health sciences generalizing the rasch model for consumer rating scales solving measurement problems with the rasch model applying the rasch model: fundamental measurement in the human sciences evaluation of mobile apps targeted to parents of infants in the neonatal intensive care unit: systematic app review the comparative effectiveness of mobile phone interventions in improving health outcomes: meta-analytic review mobile phone apps for quality of life and well-being assessment in breast and prostate cancer patients: systematic review epidemiological characteristics and low case fatality rate of pandemic (h n ) in japan -novel coronavirus ( -ncov): estimating the case fatality rate -a word of caution novel coronavirus ( -ncov) fatality rate: who and media vs logic and mathematics novel coronavirus ( -ncov) fatality rate is % the stanford encyclopedia of philosophy (spring a rating formulation for ordered response categories student's performance shown on google maps using online rasch analysis choropleth map legend design for visualizing the most influential areas in article citation disparities: a bibliometric study using google maps to display the pattern of coauthor collaborations on the topic of schizophrenia: a systematic review between dengue outbreaks and the geographic distribution of dengue vectors in taiwan: a -year epidemiological analysis recognizing spatial and temporal clustering patterns of dengue outbreaks in taiwan dot map cartograms for detection of infectious disease outbreaks: an application to q fever, the netherlands and pertussis who. global health observatory map gallery attractive quality and must-be quality using the kano model to display the most cited authors and affiliated countries in schizophrenia research available at https://ncov . live/data repeated measure designs (time series) and rasch rasch analysis of repeated measures rack and stack: time vs time or pre-test vs post-test value, cost, and marginal utility some notes on the term: "wright map kidmap: person-by-item interaction mapping (research memorandum # ) medicine ( ) : www.md-journal we thank aje (american journal experts at https://www.aje. com/) for the english language review of this manuscript. key: cord- -v tewoi authors: giorgi rossi, paolo; broccoli, serena; angelini, paola title: case fatality rate in patients with covid- infection and its relationship with length of follow up() date: - - journal: j clin virol doi: . /j.jcv. . sha: doc_id: cord_uid: v tewoi nan in their systematic review on the clinical characteristics of covid- , wu and colleagues report a . % case fatality rate (cfr), ranging from % to % with strong heterogeneity between studies (i = %). one study from the initial phase of the epidemic in wuhan showed higher cfr and was responsible of the heterogeneity of results. the authors suggest that higher complication and fatality rate in wuhan could be due to the limited clinical experience in the initial phase of the epidemic. when comparing data from china to those from italy, cfr is the most impressive difference, with data from italy, and now also from other european countries, reporting rates three to ten times higher than in china. , other studies tried to justify this difference as due to the extremely old italian population and provided similar age-specific cfr in the two countries. but this was in an initial phase of the epidemic, when official statistics reported . % cfr in italy. now overall cfr reported by routine statistics in italy, spain, uk, the netherlands and france is over % and it is difficult to justify the difference only with the older age of patients. here we propose a simple explanation: the length of follow up. we report data from the covid- information system set up in italy by the national institute of health and described elsewhere, , diagnosed from february to march and followed up to april in emilia-romagna region (approximately . million inhabitants). briefly the dataset collects individual information on date of symptom onset, rt-pcr test, hospitalization, intensive care admission, death or recovery for all sar- -cov rt-pcr positive patients in italy. the cfr increases with the length of follow up of cases, from % for cases diagnosed between march and march , to about % for those diagnosed from february to march (table ) . including only cases with symptom onset (or laboratory diagnosis, when symptom onset was not reported) before march , ie with at least days of follow up, we constructed a frequency distribution of the distance from symptom onset to death (figure a). the median in this subpopulation is days. the definition of clinically recovered patients includes patients with two consecutive negative swabs and those who had no symptoms in the preceding three days at least. the median time to recovery is days. given that the minimum follow up in this cohort is days, we are by far underestimating the median time to recovery. our data show that, according to the italian definition of covid-related death, , the cfr can reach about % if we follow up patients for a long enough time to observe the vast majority of deaths. these findings are identical to those in other italian regions. it is possible that italian surveillance is now testing only severe cases, thus overestimating cfr, but the increase with increasing observation time is probably generalizable to other case definitions. unfortunately, previous studies did not focus on this point. prevalence and severity of corona virus disease (covid- ): a systematic review and meta-analysis clinical features of patients infected with novel coronavirus in wuhan an interactive web-based dashboard to track covid- in real time clinical characteristics of coronavirus disease (covid- ) in china: a systematic review and meta-analysis case-fatality rate and characteristics of patients dying in relation to covid- in italy epidemiological characteristics of covid- cases in italy and estimates of the reproductive numbers one month into the epidemic the following are members of the emilia-romagna covid- working group: andrea mattivi, giulio matteo, key: cord- -o skj r authors: plouffe, joseph f.; martin, daniel r. title: re-evaluation of the therapy of severe pneumonia caused by streptococcus pneumoniae date: - - journal: infectious disease clinics of north america doi: . /j.idc. . . sha: doc_id: cord_uid: o skj r pneumonia caused by streptococcus pneumoniae is the most deadly form of community-acquired pneumonia. the death rate of bacteremic pneumococcal pneumonia has remained constant over the past years. several retrospective reviews of bacteremic pneumococcal pneumonia suggest that dual therapy with a beta-lactam and a macrolide antimicrobial agent is associated with a lower case fatality rate than therapy with a beta-lactam alone. these studies are reviewed, potential mechanisms are suggested, and future studies are discussed. with the advent of modern microbiology, streptococcus pneumoniae (pneumococcus) was identified as the cause of community-acquired pneumonia (cap) in the most patients [ ] . the case fatality rate (cfr) of untreated bacteremic pneumococcal pneumonia was %. early studies defined the importance of opsonizing antibodies to the infecting serotype. serum therapy was instituted in the s and resulted in the decrease of the cfr to %. with the advent of antimicrobial therapy in the s, the cfr of bacteremic pneumococcal disease was decreased further to %. the changing pattern of pneumococcal pneumonia was recognized [ ] . over the next years, even though the pneumococcus remained susceptible to penicillin, the cfr remained constant. modern icus failed to improve on the % cfr [ ] . to further complicate matters, in the s, some s pneumoniae strains developed resistance to penicillin and other antimicrobial agents used to treat pneumonia [ ] . several retrospective studies have suggested that combination therapy with a b-lactam and a macrolide antimicrobial agent results in a lower cfr than does therapy with a b-lactam alone [ ] [ ] [ ] . this article addresses the available data on the treatment of bacteremic pneumococcal pneumonia and discusses the biologically feasible explanations behind new therapy. guidelines for the treatment of the clinical syndrome of cap have been published by several pulmonary and infectious disease societies [ , [ ] [ ] [ ] [ ] . these guidelines are addressed in detail in other articles in this issue. to evaluate bacteremic pneumococcal pneumonia, series of cap cases should be examined, as physicians infrequently know whether a patient has pneumococcal pneumonia on initial presentation. over the years, many series of cap cases have been published and reflect the changing nature of cap [ , [ ] [ ] [ ] [ ] [ ] [ ] . in series of cap through the s s pneumoniae was the predominant pathogen, accounting for more than % of cases. in each subsequent decade, another pathogen or group of pathogens has been identified as causes of cap. mycoplasma pneumoniae was identified as the initial cause of atypical pneumonia in the s. the importance of anaerobic organisms in aspiration pneumonia was identified in the s. legionella pneumophila was discovered to be the cause of the epidemic of legionnaires disease in . chlamydia pneumoniae was identified as another cause of atypical pneumonia [ ] . the importance of atypical and other viral causes of cap in adults (ie, respiratory syncytial virus, parainfluenza, hantavirus, metapneumonia virus, coronavirus [severe acute respiratory syndrome]) have been identified by various investigators, including those at the centers for disease control and prevention and world health organization [ ] [ ] [ ] [ ] [ ] . more recent series have been able to identify s pneumoniae in only % to % of patients with cap, and no specific cause was found in % to % of patients [ , ] . approximately one third of patients had taken at least one dose of antibiotics before presenting to the physician. the services of many microbiology laboratories have been scaled back because of hospital budgetary constraints. the consolidation of many hospitals has led to the use of centralized or reference laboratories, which prolongs the time from specimen collection to processing. these factors have decreased the ability to culture pyogenic organisms, such as s pneumoniae. centers that use methods in addition to culture for s pneumoniae (antigen detection, serological means) have reported finding more cases of pneumococcal pneumonia than cases of pneumonia caused by unidentified pathogens, suggesting that many patients without a definable cause have pneumococcal pneumonia [ , ] . patients with increased susceptibility to pneumococcus may be susceptible to other pulmonary pathogens, leading to dual infections. some pathogens, such as influenza virus, render the host more susceptible to the pneumococcus [ ] . predisposition to pneumococcal infection may hold true for patients with antecedent m pneumoniae and c pneumoniae infections [ ] [ ] [ ] [ ] . lessons learned from the series of patients with cap include the fact that it may be difficult to identify cases of pneumococcal pneumonia, patients with pneumococcal pneumonia may have additional infections [ , ] , and patients with pneumonia reflect the demographics of the changing u.s. population. studies of pneumococcal bacteremia suggest that the incidence of disease is increasing in the u.s. population [ , [ ] [ ] [ ] . pathophysiology s pneumoniae is acquired through inhalation of large droplets from a carrier. the pneumococcus must colonize the oropharyngeal epithelial cells and then be able to multiply. microaspiration of these organisms to the lungs causes the pneumonia. the efficiency of this process is low in most instances, as patients with pneumococcal pneumonia are not placed in respiratory isolation. in certain closed populations, such as jails, long-term care facilities, and day care centers, the process' efficiency is higher, and outbreaks can occur. the defense system of the host is helpful in controlling s pneumoniae attachment (conjugate vaccine), growth, and spread to lungs. factors that inhibit ciliary function, such as smoking or viral infections, increase the likelihood of acquiring pneumococcal pneumonia. once in the pulmonary parenchyma, the pneumococcus elicits an intense inflammatory reaction. phagocytosis in enhanced if type-specific opsonizing antibodies are present. bacteremia is more likely to occur in the absence of these antibodies (hypogammaglobulinemia), diminished function of phagocytic cells (alcoholism), decreased inflammatory response (complement deficiencies), and the absence of the clearing function of the spleen (sickle cell disease, splenectomy). before the early s, most s pneumoniae isolates were susceptible to most of the antimicrobial agents that were used to treat respiratory infections. since then, higher concentrations of penicillin have been required to inhibit growth of the pneumococcus [ ] . the changing susceptibilities of antimicrobial agents are discussed in detail in another article in this issue. in general, b-lactam antibiotics effectively treat nonmeningeal (ie, pneumonia, bacteremia) pneumococcal disease in most cases. although several respiratory pathogens may have a higher cfr (rate for pseudomonas aeruginosa, %) than s pneumoniae ( %- %), the total number of cap-related deaths caused by pneumococci exceeds the number of deaths caused by all other pathogens [ ] . it seems logical that the changes in the treatment of cap that result in more favorable outcomes also would be beneficial in patients with pneumococcal pneumonia. changes that have been associated with improvements in cfr in some series of patients with cap include more rapid antibiotic delivery [ ] , combination therapy with a cephalosporin with good pneumococcal activity and macrolide (versus the cephalosporin alone), and therapy with a fluoroquinolone (ciprofloxacin; versus a cephalosporin alone) [ ] . culture of s pneumoniae from a normally sterile body fluid (blood pleural fluid) in a patient with an acute pneumonia usually is accepted as definite sign of pneumococcal pneumonia [ , ] . there is some debate as to the value of culturing s pneumoniae from expectorated sputum even with a compatible gram stain, although clinicians with experience in pneumococcal pneumonia value the information provided by high-quality pulmonary secretions [ ] . a rapid s pneumoniae urinary antigen has been evaluated [ , ] and shown to have good specificity in the adult population ([ % in most studies) and reasonable sensitivity ( %- % in most studies). the test was too sensitive in heavily colonized children and could not discriminate among infected or colonized children in underdeveloped countries [ ] . a study from spain studied patients who were hospitalized with acute cap in whom a s pneumoniae urinary antigen (spua) test was performed [ ] . pneumococci were found in cultures from only patients ( %; half from blood, half from sputum). the spua test was positive in of patients with positive cultures ( %); however, an additional patients had positive spua tests with cultures that were negative for pneumococcus. because the specificity has been reported to be greater than % in adults, most of these patients also had pneumococcal pneumonia. in this study, of patients ( %) would have pneumococcal pneumonia, a proportion of pneumococcal pneumonia cases that is similar to the proportion in other large series of hospitalized cases of cap. the sensitivity of the spua test would be of patients ( %) or at least would be four times greater than the combination of cultures of sputum and blood ( of patients [ %]). cultures still would be important in determining antimicrobial susceptibility. several retrospective studies suggest that monotherapy with an effective cephalosporin is not adequate treatment for pneumococcal pneumonia. mufson and stanek [ ] reported on patients with pneumococcal bacteremia in huntington, west virginia over years. the data were analyzed in -year periods. overall, the incidence of pneumococcall bacteremia increased, and the cfr decreased. in each -year period, a regimen including a macrolide and b-lactam resulted in lower cfr than did regimens involving a b-lactam alone or two antibiotics (excluding macrolides). no specifics were provided on the timing of the initial dose and changes in therapy. fluoroquinolones were used infrequently. although this study was retrospective and did not control for severity of illness, it may have offered the first clue that monotherapy of pneumococcal bacteremia with a cephalosporin is less efficacious than combination therapy with a cephalosporin and a macrolide. waterer et al [ ] reported data on antimicrobial therapy in patients with bacteremic pneumococcal pneumonia from hospitals in tennessee between january and july . immune-compromised patients were excluded. seven patients with s pneumoniae isolates resistant to empiric therapy also were excluded. patients received one antibiotic active against the patient's isolate (single effective therapy [set]), two effective antibiotics (dual effective therapy [det]), or more than two effective antibiotics (met). logistic regression analysis was used to calculate the odds ratio (or) for death adjusted for predicted mortality. compared with det, the or for set was . ( % confidence interval [ci], . - . ). all deaths occurred in cases with pneumonia severity index (psi) classes iv and v. even after excluding deaths that occurred in first hours of hospitalization, set was an independent predictor of death (or, . ; %ci . - . ). analysis was done to evaluate coverage for atypical pathogens. the cfr was . % ( of patients) in patients receiving atypical coverage and was . % ( of ) in patients not receiving atypical coverage; however, the predicted mortality rate was higher in the latter group of patients. multivariate analysis did not show that lack of atypical coverage was a predictor of death (p = . ). the investigators suggest that prospective studies should address set versus det in patients with pneumococcal bacteremia in psi classes iv and v. they state that the s pneumoniae urinary antigen should help in rapidly identifying the subset of patients with pneumococcal pneumonia. martinez et al [ ] performed a retrospective analysis of a -year ( - ) database of patients with bacteremic pneumococcal pneumonia. of patients analyzed, ( %) received empiric therapy with a b-lactam plus a macrolide, whereas ( %) received empiric therapy with a b-lactam alone. potential risk factors for in-hospital death were identified in stepwise logistic regression analysis. multivariate analysis revealed that absence of a macrolide in the initial empiric regimen independently was associated with death (p = . ). other independent predictors of death included shock, age greater than years, and a blood culture isolate of an s pneumoniae strain resistant to penicillin and erythromycin. a total of patients ( %) died. even when the data were reanalyzed to exclude early deaths (occurred \ hours after presentation), the absence of a macrolide in initial therapy was associated with death (or, . ; %ci, . - . ). in this study, a macrolide could be combined favorably with a cephalosporin or a b-lactamase inhibitor. a previous study of patients with cap, but not nonbacteremic pneumococcal pneumonia, found that treated with blactamase inhibitors and a macrolide were less effective than treatment with a cephalosporin alone [ ] . as with most retrospective studies, there were differences among the popuations. the group receiving cephalosporin alone had higher incidences of comorbid conditions, hiv infection, hematologic malignancies, neutropenia, nosocomial bacteremia, and penicillin-resistant isolates. in the group receiving b-lactamase inhibitors and a macrolide, more patients experienced shock and resultant admission to icu. the investigators caution that a prospective, randomized trial is necessary to definitively determine the effect of macrolides. bacteremic pneumococcal pneumonia remains a serious life-threatening infection. the incidence of pneumococcal bacteremia seems to be increasing. the cfr with bacteremic pneumococcal pneumonia has not changed much in the past years. there always have been unanswered questions with regard to severe pneumococcal disease. why does the cfr differ among different centers and countries [ , ] ? why do some countries have many cases of nosocomial s pneumoniae infections [ ] and others (eg, the united states) have a minimal number of such cases [ ] ? reports have suggested that combination antimicrobial therapy containing a macrolide is more effective than therapy with a cephalosporin or b-lactam alone [ ] [ ] [ ] . this article addresses the published literature. why would a b-lactam (cephalosporin) in combination with a macrolide be more efficacious than a b-lactam (cephalosporin) alone in the treatment of patients with bacteremic pneumococcal pneumonia? are there interactions between the two antibiotics against streptococcus pneumoniae? although some antibiotic combinations have been shown to be synergistic in vitro and in vivo (ie, ampicillin and gentamicin against enterococci), no data suggest that such a synergistic activity exists between a cephalosporin or penicillins and a macrolide against pneumococci [ ] . there is evidence that the combination of penicillin and tetracycline have antagonistic effects in patients with pneumococcal meningitis [ ] . one possible explanation for the decreased mortality rate with combination therapy could be that the macrolide is somewhat antagonistic against the rapid killing of the pneumococci by the cephalosporin. this effect could slow the rapid lysis of pneumococci and abate the resultant intense inflammatory response. would the use of two empiric antibiotics make it more likely that at least one would be active against the pneumococcus? in their study, waterer et al [ ] excluded organisms resistant to empiric therapy and still demonstrated a benefit of macrolide use. lujan et al [ ] demonstrated that discordant therapy was associated with a higher cfr. this finding was seen only among physicians who did not use third-generation cephalosporins. pneumococcal resistance to ceftriaxone or cefotaxime was minimal ( % of patients). what is the possibility that the macrolide is treating a secondary infection in a patient with pneumococcal bacteremia? influenza infection predisposes to pneumococcal pneumonia and bacteremia through several mechanisms. co-infections with atypical pathogens that would be resistant to a cephalosporin but susceptible to a macrolide, including m pneumoniae and c pneumoniae, have been described [ ] [ ] [ ] [ ] [ ] [ ] [ ] . it is not clear whether patients with dual infections fare worse if only the pneumococcal bacteremia is treated. co-infection with s pneumoniae and l pneumophila has been described [ ] . in most epidemiologic studies of cap, an etiologic agent is not identified in a large proportion of patients ( %- %) [ , ] . it is possible that other pulmonary pathogens that are susceptible to macrolides have not been identified. mcnally et al [ ] screened acute and convalescent serum samples from patients with pneumonia of unknown cause. legionella bozemanii was identified as the potential cause in % of cases using the criterion of fourfold rise in antibody titers between acute and convalescent samples. it is possible that other legionella [ ] or legionella-like organisms requiring different growth medium will be identified [ , ] . what is the possibility that the immune-modulating activity of macrolides is important in reducing the mortality rate? the intense host inflammatory response with sepsis sometimes is deleterious. multiple studies have used different agents to try to diminish this exaggerated immune response [ ] [ ] [ ] [ ] . steroids were studied in multiple doses in many studies, and success was difficult to demonstrate in patients with sepsis. if given before antibiotics, however, steroids seemed to help reduce the morbidity rate in patients with bacterial meningitis [ , ] . studies of patients with difficult sepsis in various stages of illness who were treated with antibodies to endotoxin and tumor necrosis factor (tnf) have had differing results. one murine study showed that antibodies to tnf had a deleterious effect in mice with pneumococcal pneumonia that also were treated with ceftriaxone [ ] . review of human trials with antibody to tnf did not show any effect on the mortality rate in patients with severe sepsis and bacterial pneumonia [ ] . other components of the complex inflammatory response, such as granulocyte colony-stimulating factor (g-csf), have been investigated in mice and humans. local production of g-csf seems to occur at the site of infection in patients with unilateral pneumonia [ ] . macrolides have been shown to inhibit various factors in the inflammatory response, mostly in mice. no human studies have shown that the immune-modulating activity of macrolides has a beneficial effect. further investigation into the complex immune response and its saluatory or deleterious effect on the mortality rate is important. there have been a large number of articles addressing the issue of in vivo susceptibility data with erythromycin and other macrolides [ ] [ ] [ ] and how it correlates with clinical outcome [ ] . a retrospective study ( ) ( ) ( ) ( ) from spain examined patients who were admitted with cap and treated with combination therapy [ ] . all of the patients received ceftriaxone. the type of macrolide therapy was chosen by the attending physician. the choices were mg of oral azithromycin daily for days (n = ) or mg of intravenous clarithromycin twice daily with a switch to oral treatment (total duration of treatment, days; n = ). the patients had similar ages, comorbidities, and psis. the length of stay (los) for the azithromycin group was days shorter (p \ . ). the cfr was . % in the azithromycin group and . % in the clarithromycin group (p \ . ). there was no obvious reason for the differences in the two treatment arms. the investigators suggested that compliance might have been an issue, because patients in the azithromycin arm received their -day course in the hospital, whereas many patients in the clarithromycin arm had to complete their course at home. other possibilities include differences in the anti-inflammatory attributes of the two drugs and the presence of an unknown pathogen that is susceptible to azithromycin and resistant to clarithromycin. because this analysis was retrospective analysis, there may have been undiscovered biases. what is needed in the future? the retrospective studies discussed earlier [ ] [ ] [ ] suggest that combination therapy is better than cephalosporin monotherapy for elderly patients with cap and older, sicker patients with bacteremia pneumococcal pneumonia. caution about overinterpreting retrospective studies has been published [ ] [ ] [ ] . investigations into the inflammatory response in patients with severe pneumococcal pneumonia should incorporate recent advances in murine studies [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] . prospective studies aimed at testing the available hypotheses need to be developed. the s pneumoniae urinary antigen will be helpful in defining the subset of patients who should be studied intensively. tests are needed that assess the inflammatory response, resolution of illness (ie, rapidity in reduction of the magnitude of bacteremia), and importance of alternative pathogens (m pneumoniae, c pneumoniae, l pneumophila, other legionella spp and viruses). other variables that should be taken into account include the specific macrolide used and the effect of other classes of antimicrobial agents, such as fluoroquinolones. ideally, the study would include provisions for autopsies or postmortem pulmonary biopsies in fatal cases. as these studies are designed and performed, vigilance is needed in the immunization of appropriate patients with influenza and pneumococcal vaccines to prevent bacteremic pneumococcal pneumonia. pneumococcal bacteremia with special reference to bacteremic pneumococcal pneumonia bacteraemic pneumococcal pneumonia: a continuously evolving disease severe community-acquired pneumococcal pneumonia-what might be done better management of community-acquired pneumonia in the era pneumococcal resistance: a report from the drug-resistant streptococcus pneumoniae therapeutic working group bacteremic pneumococcal pneumonia in one american city: a -year longitudinal study, - monotherapy may be suboptimal for severe bacteremic pneumococcal pneumonia addition of a macrolide to a beta-lactam-based empirical antibiotic regimen is associated with lower in-hospital mortality for patients with bacteremic pneumococcal pneumonia canadian guidelines for the initial management of community-acquired pneumonia-an evidence-based update by the canadian infectious disease society and the canadian thoracic society: the canadian cap working group guidelines for the management of adults with community acquired pneumonia: diagnosis, assessment of severity, antimicrobial therapy, and prevention swedish infectious diseases society pneumonia study group: management of patients with community-acquired pneumonia treated in hospital in sweden update of practice guidelines for the management of community-acquired pneumonia in immunocompetent adults community-acquired pneumonia requiring hospitalization: -year prospective study new and emerging etiologies for community-acquired pneumonia with implications for therapy: a prospective multi-center study of cases multiple pathogens in adult patients admitted with community-acquired pneumonia: a one year prospective study of consecutive patients incidence of community-acquired pneumonia requiring hospitalization microbial etiology of community-acquired pneumonia in the adult population of municipalities in eastern finland community-acquired pneumonia in europe: causative pathogens and resistance patterns a new chlamydia psittaci strain, twar, isolated in acute respiratory tract infections respiratory syncytial virus (rsv) may be an important cause of community-acquired lower respiratory infection among hospitalized adults the role of atypical pathogens: mycoplasma pneumoniae, chlamydia pneumoniae and legionella pneumonia in respiratory infection clinical characteristics of c. pneumoniae infection as the sole cause of community-acquired pneumonia parainfluenza virus infection among adults hospitalized for lower respiratory tract infection importance of atypical pathogens of community-acquired pneumonia role of neuraminidase in lethal synergism between influenza virus and streptococcus pneumoniae moraxella catarrhalis bacteraemia associated with mycoplasma pneumoniae infection and pneumonia legionnaires disease with bacteremic coinfection cbpis study group: bacteremia with streptococcus pneumoniae: implications for therapy and prevention adult bacteremic pneumococcal pneumonia in a community teaching hospital, - : a detailed analysis of cases cost-effectiveness of vaccination against pneumococcal bacteremia among elderly people prognosis and outcomes of patients with communityacquired pneumonia quality of care, process and outcomes in elderly patients with pneumonia association between antimicrobial therapy and medical outcomes for hospitalized elderly patients with pneumonia the impact of blood cultures on antibiotic therapy in pneumococcal pneumonia blood cultures for community-acquired pneumonia: no place to skimp! bacteremic and nonbacteremic pneumococcal pneumonia: a prospective study evaluation of a rapid immunochromatographic test for detection of streptococcus pneumoniae antigen in urine samples from adults with community-acquired pneumonia rapid diagnosis of bacteremic pneumococcal infections in adults by using the binax now streptococcus pneumoniae urinary antigen test: a prospective, controlled clinical evaluation usefulness of urinary antigen detection by an immunochromatographic test for diagnosis of pneumococcal pneumonia in children evaluation of the immunochromatographic binax now assay for detection of streptococcus pneumoniae urinary antigen in a prospective study of community-acquired pneumonia in spain deaths in bacteremic pneumococcal pneumonia: a comparison of two populations bacteremic pneumococcal pneumonia mortality rate: is it really different in sweden? lack of synergy of erythromycin combined with penicillin or cefotaxime against streptococcus pneumoniae in vitro treatment of pneumococcal meningitis with penicillin compared with penicillin and aureomycin; studies including observations on an apparent antagonism between penicillin and aureomycin prospective observational study of bacteremic pneumococcal pneumonia: effect of discordant therapy on mortality potential importance of legionella species in community acquired pneumonia cap infection due to legionella species other than l. pneuophila legionella drozanskii sp nov, legionella rowbothamii sp nov and legionella fallonii sp nov: three new unusual legionella species legionella-like and other amoebal pathogens as agents of community-acquired pneumonia and the methylprednisolone severe sepsis study group: a controlled clinical trial of high-dose methylprednisolone in the treatment of severe sepsis and septic shock a controlled clinical trial of e murine monoclonal igm antibody to endotoxin in the treatment of gram-negative sepsis efficacy and safety of monoclonal antibody to human tumor necrosis factor in patients with sepsis syndrome: a randomized, controlled, double-blind, multicenter clinical trial a second large controlled clinical study of e , a monoclonal antibody to endotoxin: results of a prospective, multicenter, randomized, controlled trial the beneficial effects of early dexamethasone administration in infants and children with bacterial meningitis european dexamethasone in adulthood bacterial meningitis study investigators: dexamethasone in adults with bacterial meningitis anti-tumor necrosis factor antibody impairs the therapeutic effect of ceftriaxone in murine pneumococcal pneumonia is anti-tumor necrosis factor therapy associated with increased mortality in patients with severe sepsis caused by pneumonia? activation of neutrophils and inhibition of the proinflammatory cytokine response by endogenous granulocyte colony stimulating factor in murine pneumococcal pneumonia drug-resistant streptococcus pneumoniae trust surveillance program. factors associated with relative rates of antimicrobial resistance among streptococcus pneumoniae in the united states: results from the trust surveillance program influence of patient age on the susceptibility patterns of streptococcus pneumoniae isolates in north america international pneumococcal study group: an international prospective study of pneumococcal bacteremia. correlation with in vitro resistance, antibiotics administered, and clinical outcome is azithromycin the first-choice macrolide for treatment of community-acquired pneumonia? the best treatment of pneumonia: new clues, but no definitive answers what is optimal therapy for bacteremic pneumococcal pneumonia? monotherapy versus dual therapy for community-acquired pneumonia in hospitalized patients synergy between amoxicillin and gentamicin in combination against a highly penicillin-resistant and -tolerant strain of streptococcus pneumoniae in a mouse neumonia model effective combination therapy for invasive pneumococcal pneumonia with ampicillin and intravenous immunoglobulins in a mouse model tnf-alpha compensates for the impaired host defense of il- type i receptor-deficient mice during pneumococcal pneumonia interleukin- gene-deficient mice show impaired defense against pneumococcal pneumonia il- improves the early antimicrobial host response to pneumococcal pneumonia alveolar macrophages have a protective antiinflammatory role during murine pneumococcal pneumonia improved host defense against pneumococcal pneumonia in platelet-activating factor receptor-deficient mice zmpb, a novel virulence factor of streptococcus pneumoniae that induces tumor necrosis factor alpha production in the respiratory tract key: cord- - ykvl u authors: binns, colin; low, wah yun; kyung, lee mi title: the covid- pandemic: public health and epidemiology date: - - journal: asia pac j public health doi: . / sha: doc_id: cord_uid: ykvl u nan in this issue of the journal, we publish a review of covid- infection by eminent virologists, mackenzie and smith (see in this issue). it is too early in the history of the covid- outbreak to write the full history, but their article provides a good outline of the emerging pandemic. the disease is causing widespread social disruption in many countries, and it has just been announced that the asia pacific academic consortium for public health (apacph) conference has been postponed indefinitely. the media are full of daily totals of new cases, hospital and intensive care admissions, and deaths. the actual numbers are dependent on the testing regimes in use at different locations. in the first months of , there were hundreds of papers and commentaries published on corona viruses, including a major clinical review in the lancet that has already had almost citations. corona viruses are a large family of viruses that can cause human diseases, but usually mild in nature, such as a common cold. there have been previous severe outbreaks of novel corona viruses: severe acute respiratory syndrome (sars) in and middle east respiratory syndrome coronavirus (mers-cov) in , which together caused more than cases. the case fatality rates (cfrs) were % for sars-cov and % for mers-cov. in this commentary, we will discuss additional public health issues that will assist in teaching at our public health institutions. since the first cases of covid- , the reported numbers have increased rapidly with more than . million cases and deaths to the end of february , and cases are now being reported from all of the more populous countries. the symptoms of covid- infection are nonspecific and include elevated temperature and cough. this, then, progresses to shortness of breath. a clinical report of the first patients from wuhan (n = ) with covid- infection gave the following details of clinical symptoms: fever %, cough %, myalgia or fatigue %, sputum %, and dyspnea %. the average time from the first onset of symptoms to the development of dyspnea was days. in the first report from outside of wuhan (from zhejiang province, n = ), the symptoms were fever %, cough %, sputum %, headache %, and myalgia or fatigue %. the disease is more likely to occur in people who have a chronic illness, are immunocompromised, or are older. those who contracted the virus and were older than years of age, and particularly older than years, were more likely to require admission to intensive care unit, and have an increased cfr (see table in mackenzie and smith, this issue). in wuhan, there were deaths from the confirmed cases, an overall cfr of . %. the initial clinical symptoms of covid- are the symptoms of a common cold and influenza. every adult gets to colds per year, and more frequently in children. with a world population of around . billion, this suggests that there are billion cases every year. screening all of these cases for possible covid- would obviously be impossible, and so testing for the virus is confined to those with other risk factors, including contact with confirmed cases and travel from outbreak epicenters. covid- is the latest in a continuing series of infectious disease epidemics in the history of the human race. as population numbers and population density increases, the likelihood of epidemics increases. probably, the greatest epidemic of all time was the influenza epidemic of , which caused an estimated million deaths worldwide and had a cfr of more than . % and which could have been as high as %. this was in the pre-antibiotic days, and before an influenza vaccine and antiviral therapy had been developed. infectious diseases remain a major challenge for public health. to assist in defining, containing, preventing, and ultimately in successfully treating those who become ill in an epidemic, it is important to understand the basic epidemiology of the outbreak. experts from many institutions have collaborated together in an effort to categorize the covid- outbreak. most infectious diseases have a spectrum of severity from clinically undetected disease through to death. figure is modified from the paper prepared by imperial college. it is likely that only the cases with severe symptoms or other risk factors (eg, contact with a known case) are being identified at present, but it varies between countries. for some diseases, the base of the triangle in figure would be much wider, and this may be the situation with covid- . the base of subclinical cases with minimal symptoms may be much greater than those with disease severe enough for hospital admission. the history of public health contains a number of examples of infectious diseases that were initially thought to have had a very high cfr only for it to be revised downward later. lassa fever was initially described as a very severe disease, with a cfr close to %. however, as more widespread epidemiological studies were undertaken in west africa, it was found that % or more of the population of sierra leone and guinea were seropositive to the disease without showing any symptoms. under these circumstances, the cfr for lassa virus is now thought to be low (< %), but the cfr for severe disease remains very high. when assessing the likely impact of an infectious disease, there are parameters that are considered: the likelihood of transmission of the disease (its capacity to spread), and the severity of the disease and its capacity to kill (or disable) those infected. these are assessed using the reproduction rate and the cfr. reproduction rate (r ): the base reproduction number (r ) for transmission indicates the number of secondary infections due to an initial case (r = virgin state, a population with no previous exposure to this infection ). in populations that have previously been exposed to the infection, and have some immunity, the reproduction rate may be lower. case fatality rate is defined as the proportion of reported cases of a specified disease that are fatal within a specified time. the cfr depends on the definition of disease, the accuracy of diagnosis (case detection), and the availability of treatment. transmissibility and severity are the most critical factors that determine the public health impact of an epidemic. a disease that has a high transmission rate and is very severe is the greatest public health risk. covid- has a high transmission rate, and the cfr appears to be greater than for influenza epidemics, and it is, therefore, potentially a major public health threat. infectivity is the tendency to spread the infection from host to host. the period of infectivity for some diseases commences before symptoms appear, making it far more difficult to control spread. the spectrum of clinical disease (see figure ) makes it difficult to define a case and even more difficult to calculate a cfr, as the denominator is not readily defined. in february, the cfr for covid- infection was estimated by the world health organization (who) to be %, much lower than for mers and sars, but estimates of the cfr have changed over time as the criteria for counting the number of cases in the denominator has changed to include very mild or even asymptomatic infections. changes in the denominator decrease the cfr and increase the reproducibility rate. in some situations, the reproducibility rate changes and the quarantine of passengers on a cruise ship in yokohama had a r times greater (as high as ) compared with the initial r in wuhan. the use of diagnostic tests has traditionally been reported using measures of sensitivity (proportion of true positives that are correctly identified by the test) and specificity (proportion of true negatives that are correctly identified by the test). at the present time, covid- disease is diagnosed by detecting the virus in throat and nasal swabs in patients who have symptoms of an upper respiratory tract infection who have been in a region with disease transmission. no other screening tests are yet available, and in the absence of "gold standard" for diagnosis, these parameters are not yet being calculated. incubation period is the time between exposure to an infectious agent and the appearance of clinical symptoms (or physiological evidence of disease). it is not known if transmission of the virus occurs during this period and before the presence of clinical symptoms. the who estimates of the incubation period for covid- range from to days, most commonly around days. modeling of the role of contact tracing and case isolation suggest that these are effective in the control of epidemics such as covid- . in australia, the home isolation (quarantine) of contacts or suspected cases is recommended for days. if transmission is occurring before symptoms appear, it makes it more difficult to control an infectious disease. the virus is contained in droplets from coughing and nasal secretions. it can survive up to hours on surfaces under favorable conditions. most transmission will occur through spread by hands and not from the direct inhalation of droplets. the who recommends the following to prevent transmission of the virus: • • regularly and thoroughly clean your hands with an alcohol-based hand rub or wash them with soap and water. • • maintain at least m distance between yourself and others, and in particular from anyone who is coughing or sneezing. • • avoid touching eyes, nose, and mouth. • • make sure you, and the people around you, follow good respiratory hygiene. this means covering your mouth and nose with your bent elbow or tissue when you cough or sneeze. then dispose of the used tissue immediately. • • stay at home if you feel unwell. if you have a fever, cough, and difficulty breathing, seek medical attention and call in advance. follow the directions of your local health authority. despite much research and success in some animal models, including primates, there is still no vaccine for sars. the first clinical cases of sars were noted in november , but it was not until months later that a causative agent was isolated. transmission and dispersion around the world occurred by respiratory aerosols and contact via hands, and in several cases, this was from touching door handles. there were reported cases of sars and deaths from many different countries. while there is no doubt that the virus has the potential to reemerge at present, it is not a clinical issue at present. sars has a lower transmissibility than covid- . the development, testing, and mass production of vaccines is always time-consuming before they can be deployed on a population-wide scale. the development, testing, and distribution of a vaccine will take years. despite the interest in sars, there is still no vaccine, and in other cases, important diseases have defied vaccine development, notably malaria and dengue. will covid- decline in the northern summer following the pattern of influenza? this may be because people are more likely to have closer contact with others because it is colder. it could also relate to relative humidity levels that are lower in winter. in western countries, when it is cooler in winter, respiratory infections increase (eg, influenza). however, this seasonality does not apply to influenza in india. a study of a new strain of influenza in vietnam shows that in the early years of this variant, there was significant transmission throughout the first year and the usual seasonal transmission pattern evolved a few years later. in australia, it is summer and hot, and the covid- is spreading here. this suggests that covid- transmission may not be related to climatic conditions and may not be seasonal in its early year(s) as the population is being exposed for the first time. the history of smallpox elimination provides an example of the role of epidemiology in defeating an infectious disease. the practice of variolation to prevent smallpox had been in existence for several centuries, particularly in the middle east, before edward jenner popularized the process and published his paper on the topic. vaccination against smallpox developed for almost centuries before the disease was certified as being eliminated in . what was needed was a thorough understanding of the epidemiology of the disease. by studying seasonal variation and transmission within an infection cluster, it was shown that with vaccination of perhaps % of the population, the immediate case contacts was just as effective as vaccinating % of the population. understanding the epidemiology of outbreaks proved to be the effective way of eliminating the disease. recording all of the details of the covid- outbreak is basic to understanding its epidemiology and the natural history of the disease and could provide the key to defeating the disease outbreak. there are several ethical issues that have already been raised in this outbreak. the importance of preserving the physical and mental health and availability of health workers in an epidemic situation is very important. the quality of health care depends almost entirely on them having a professional service ethic that motivates them to provide the highest quality care to all with the resources available to them. this often leads them to put their own health and safety at risk, especially in infectious disease outbreaks. after the sars outbreak of that caused illness and deaths among health workers, the ethics of exposing staff to the disease was still being debated in . the history of public health is full of the writings of the heroism of health workers providing care to those in need, despite endangering themselves. however, experience has shown that risks associated with outbreaks of life-threatening infections only receive attention after health workers have suffered serious adverse consequences. institutions need to prepare for outbreaks and provide the best available protective equipment to their workers and volunteers. a further ethical issue is the development of vaccines. if this proves possible, it will take several years to develop, test, and submit for approval. then issues of distribution and cost will need to be finalized. professor jeffrey sachs, health economist, public health advocate, and former advisor to the secretary general of the united nations for the development of the sustainable development goals, has written an editorial on the development of a vaccine for covid- . sachs states that in earlier public health emergencies, governments, nonprofit foundations, and international organizations took the lead in the development of preventive measures and made the vaccine or knowledge available freely. sachs discusses the example of jonas salk who did not patent the polio vaccine, making it affordable to programs all over the world. however, in the case of covid- , the present us administration is stating that commercial companies will develop the vaccine and its availability will depend on the market. will the profits of multinational companies come before the survival of the poor in lower income countries ? reducing the peak of new cases by slowing down the spread of new cases is very useful as it . reduces the load on diagnostic and treatment services . reduces the number of health workers who contact the disease and who can continue working . if the epidemic curve can be smoothed, it will result in a lower overall disease burden there are examples from the historical control of epidemics using public health measures that may be applied to covid- . an analysis of records from cities in the united states during the influenza pandemic shows that cities that implemented public health measures (isolation, banning meetings, etc) were successful in reducing epidemic peaks and overall mortality. this was in a naive population not previously exposed to this virus. hatchett et al documented the substantial differences between st. louis (which banned mass gatherings) and the higher death rates in philadelphia, which allowed the massed celebratory marches following world war i to proceed. useful public health interventions to reduce the peak of new cases include the following: . personal hygiene: no handshaking, no direct physical contact with others, keeping m distance from others, no coughing in public, do not touch your face, wash hands frequently with soap and water, and eat cooked food that is still hot. these measures require clean water supplies (also important for nutrition and food safety) . reducing person to person contact by banning public gatherings, closing schools, home quarantining, controlling public transport, and so on, has been effective, but disruptive, in some communities . isolation of cases and quarantine (usually self-quarantine) of contacts. there is more information on the who and centers for disease control and prevention websites. , in their modeling, hellewell et al suggest that isolation of cases and tracing of contacts may be successful in controlling an outbreak within months. in the current global situation, it appears that some countries have been able to slow the epidemic while others have been overwhelmed by peaks of cases. universities are presented with major challenges to manage the covid- epidemic. they are gathering places for thousands of young adults and academics who are usually older and are more vulnerable to complications. obviously, large gatherings should be avoided and wider use be made of online teaching and tutorials. on campus, the usual precautions of avoiding contact, keeping distance, and frequent handwashing should apply. within the memory of one of us (cb), these personal precautions were applied with success to polio, influenza, and hepatitis. we hope that these precautions will lead to a containment of the epidemic allowing time to continue development and clinical trials of a vaccine. the emergence of covid- is a serious global public health problem. the future direction of the epidemic is unknown. the size of the outbreak will depend on reducing transmission, which at the present time means using traditional public health measures. these include contact tracing and quarantine of cases, or sometimes the quarantine of localities. modeling suggests that these measures may be effective. health promotion programs should emphasize avoiding crowds, handwashing and hygiene, and extensive testing of at-risk persons. vaccine development is a slow process, and it will be a year(s) before it can become a component of public health interventions. schools of public health and research institute members of apacph are actively involved in basic research, epidemiology of outbreaks, and health promotion. this journal welcomes submissions that document the outbreak and contribute to the control of the disease. for the next few issues, we will publish a selection of letters and short papers that we are receiving on the covid- epidemic. in this day and age, there are more rapid means of communication, but the printed word remains unsurpassed as long-term record of what has happened. for our journal, the teaching of public health undergraduate and postgraduate students is important, and this will be a useful resource. the editors clinical features of patients infected with novel coronavirus in wuhan world health organization. who director-general's opening remarks at the media briefing on covid coronavirus disease (covid- ). symptoms of coronavirus the novel coronavirus pneumonia emergency response epidemiology team. vital surveillances: the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- )-china center for disease control and prevention. disease of the week. common cold. accessed march remembering gustav klimt and million others: the year anniversary of the greatest human epidemic lassa fever: epidemiology, clinical features, and social consequences covid- outbreak on the diamond princess cruise ship: estimating the epidemic potential and effectiveness of public health countermeasures world health organization. cholera case fatality rate preparation for possible sustained transmission of novel coronavirus: lessons from previous epidemics coronavirus: covid- has killed more people than sars and mers combined, despite lower case fatality rate diagnostic tests. : sensitivity and specificity feasibility of controlling covid- outbreaks by isolation of cases and contacts australian government department of health. coronavirus (covid- ) health alert world health organization. coronavirus disease (covid- ) advice for the public the severe acute respiratory syndrome severe acute respiratory syndrome: historical, epidemiologic, and clinical features differences in influenza seasonality by latitude, northern india chronological, geographical, and seasonal trends of human cases of avian influenza a (h n ) in vietnam, - : a spatial analysis jenner and the history of smallpox and vaccination infections and public health: who will win? epidemiology of smallpox in west-pakistan. ii. determinants of intravillage spread other than acquired immunity smallpox: ten years gone from cooperation to competition in national health systems-and back? impact on professional ethics and quality of care can healthcare workers reasonably question the duty to care whilst healthcare institutions take a reactive (rather than proactive) approach to infectious disease risks? public health ethics damien de veuster ( - ): a life devoted to lepers the trump administration's ludicrous approach to coronavirus vaccine public health interventions and epidemic intensity during the influenza pandemic colin binns, mbbs, phd , wah yun low, phd , , and lee mi kyung, phd key: cord- -khdyxiwe authors: chakraborty, tanujit; ghosh, indrajit title: real-time forecasts and risk assessment of novel coronavirus (covid- ) cases: a data-driven analysis date: - - journal: chaos solitons fractals doi: . /j.chaos. . sha: doc_id: cord_uid: khdyxiwe the coronavirus disease (covid- ) has become a public health emergency of international concern affecting countries and territories around the globe. as of april , , it has caused a pandemic outbreak with more than , , confirmed infections and more than , reported deaths worldwide. the main focus of this paper is two-fold: (a) generating short term (real-time) forecasts of the future covid- cases for multiple countries; (b) risk assessment (in terms of case fatality rate) of the novel covid- for some profoundly affected countries by finding various important demographic characteristics of the countries along with some disease characteristics. to solve the first problem, we presented a hybrid approach based on autoregressive integrated moving average model and wavelet-based forecasting model that can generate short-term (ten days ahead) forecasts of the number of daily confirmed cases for canada, france, india, south korea, and the uk. the predictions of the future outbreak for different countries will be useful for the effective allocation of health care resources and will act as an early-warning system for government policymakers. in the second problem, we applied an optimal regression tree algorithm to find essential causal variables that significantly affect the case fatality rates for different countries. this data-driven analysis will necessarily provide deep insights into the study of early risk assessments for immensely affected countries. in december , wuhan city of china became the centre of an outbreak of pneumonia of unknown cause, latter named as coronavirus disease (covid- ) , which raised intense attention not only within china but internationally [ ; ] . the covid- pandemic is the most significant global crisis since the world war-ii that affected almost all the countries of our planet [ ] . as of april , , an outbreak of covid- has resulted in , , confirmed cases with reported deaths of , worldwide [ ] . on march , who publicly characterized covid- as a "global pandemic", and shortly after that, the united states declared covid- outbreaks a national emergency. the covid- has caused a great threat to the health and safety of people all over the world due to its widespread and potential harm. thus, the studies of the novel covid- epidemics and its future development trend has become a cutting-edge research topic at this moment. we are therefore motivated to ask: (a) can we generate real-time forecasts of daily new covid- cases for countries like canada, france, india, south korea, and the uk? (b) what are the probable causal variables that have significant impacts on the case fatality rates for the profoundly affected countries? to answer the first question, we study classical and modern forecasting techniques for which the prediction accuracy largely depend on the availability of data [ ] . in outbreaks of covid- epidemics, there are limited data available, making predictions widely uncertain. from previous studies, it was evident that the timing and location of the outbreak facilitated the rapid transmission of the virus within a highly mobile population [ ] . in most of the affected countries, the governments implemented a strict lockdown in subsequent days of initial transmission of the virus and within hospitals, patients who fulfill clinical and epidemiological characteristics of covid- are immediately isolated. the constant increase in the global number of covid- cases is putting a substantial burden on the health care system for canada, france, india, south korea, and the uk. to anticipate additional resources to combat the epidemic, various mathematical and statistical forecasting tools [ ; ] and outside china [ ; ; ] were applied to generate short-term and long-term forecasts of reported cases. these model predictions have shown a wide range of variations. since the time series datasets of covid- contain both nonlinear and nonstationary patterns, therefore, making decisions based on an individual model would be critical. in this study, we propose a hybrid modeling approach to generate short-term forecasts for multiple countries. in traditional time series forecasting, the autoregressive integrated moving average (arima) model is used predominantly for forecasting linear time series [ ] . but in recent literature, the wavelet transformation based forecasting model has shown excellent performance in nonstationary time series data modeling [ ] . thus, combining both models may accurately model such complex autocorrelation structures in the covid- time-series datasets and reduce the bias and variances of the prediction error of the component models. in the absence of vaccines or antiviral drugs for covid- , these estimates will provide an insight into the resource allocations for the exceedingly affected countries to keep this epidemic under control. besides shedding light on the dynamics of covid- spreading, the practical intent of this data-driven analysis is to provide government officials with realistic estimates for the magnitude of the epidemic for policy-making. the second problem is connected with the global concern of health and mortality due to the significant covid- outbreaks. mortality is crudely estimated using a statistic, the case fatality rate (cfr), which divides the number of known deaths by the total number of identified cases [ ; ; ] . during the current phase of this global pandemic, it is criti-cally important to obtain reliable estimates of the overall cfr. the estimates of cfr are highly dependent on several country-specific demographic parameters and various disease characteristics. a key differentiation among the cfr of different countries can be found by determining an exhaustive list of causal variables that significantly affect cfr. in this work, we put an effort to identify critical parameters that may help to assess the risk (in terms of cfr) using an optimal regression tree model [ ] . the regression tree has a built-in variable selection mechanism from high dimensional variable space and can model arbitrary decision boundaries. the regression tree combines case estimates, epidemiological characteristics of the disease, and heath-care facilities to assess the risks of major outbreaks for profoundly affected countries. such assessments will help to anticipate the expected morbidity and mortality due to covid- and provide some critical information for the planning of health care systems in various countries facing this epidemic. the rest of the paper is organized as follows. in section , we discuss the data, development of the hybrid model, and experimental results for short-term forecasts of covid- for canada, france, india, south korea, and the uk. in section , country-wise cfr datasets, method, and results for finding critical parameters are presented. we discuss the assumptions and limitations of our findings in section . finally, the discussions about the results and policy recommendations are given in section . we focus on the daily figures of confirmed cases for five different countries, namely canada, france, india, south korea, and the uk. the datasets are retrieved by the global change data lab ). all these datasets are collected from the starting date of the disease for the respective countries to april , . in this section, we first briefly discuss these datasets, followed by the development of the proposed hybrid model, and finally, the application of the proposed model to generate short-term forecasts of the future covid- cases for five different countries. all these datasets and codes to be used in this section are made publicly available at https://github.com/indrajitg-r/covid for the reproducibility of this work. five univariate time-series datasets are collected for the real-time prediction purpose of covid- cases for india, canada, france, south korea, and the uk. several previous studies have forecasted future covid cases for china and a few other countries using mathematical and traditional time series forecasting models, for details see [ ; ; ; ; ] . we try to nowcast the covid- cases of five different countries based on their past cases. for india and uk, we consider the daily laboratory-confirmed cases from january , , through april , and from january , through april , , respectively, for model building. daily covid- cases data for canada, france, and south korea are taken for the time period january , through april , , january , through april , and january through april , respectively. the dataset for india contains a total of observations, observations for the uk, observations for canada, observations for france, and for south korea. for these five countries the outbreaks of covid- started almost from the same timeline and the epidemic curves still not showing the sharp diminishing nature, just like china. we limit our attention to trended and non-seasonal models, given the patterns, observed in table . note that we follow a pragmatic approach in that we assume that the trend will continue indefinitely in the future in contradiction with other s-curve or deterministic sir modeling approaches which assume convergence. training data acf plot pacf plot to forecast confirmed cases of covid- , we adopt hybrid time series forecasting approaches combining arima and wavelet-based forecasting techniques. the proposed hybrid model overcome the deficiencies of the single time series models. before describing the proposed methodology, we give a brief description of the individual models to be used in the hybridization. arima is a classical time series model, used for tracking linear tendencies in stationary time series data. arima model is denoted by arima(p, d, q). the parameters p and q are the order of the ar model and the ma model respectively, and d is the level of differencing [ ] . arima model can be mathematically expressed as follows: where y t denotes the actual value of the variable under consideration at time t, ε t is the random error at time t. the φ i and θ j are the coefficients of the arima model. the basic assumption made by the arima model is that the error series follows zero mean with constant variance, and satisfies the i.i.d condition. building an arima model for any given time series dataset can be described in three iterative steps: model identification (achieving stationarity), parameter estimation (the autocorrelation function (acf) and the partial autocorrelation function (pacf) plots are used to select the values of parameters p and q), and model diagnostics checking (finding the 'best' fitted forecasting model using akaike information criterion (aic) and the bayesian information criterion (bic)) [ ] . wavelet analysis is a mathematical tool that can reveal information within the signals in both the time and scale (frequency) domains [ ] . this property overcomes the basic drawback of fourier analysis and wavelet transforms the original signal data (especially in the time domain) into a different domain for data analysis and processing. wavelet-based models are most suitable for nonstationary data, unlike arima [ ] . most epidemic and climatic time-series datasets are nonstationary; therefore, wavelet transforms are used as a forecasting model for these datasets [ ; ] . when conducting wavelet analysis in the context of time series analysis, the selection of the optimal number of decomposition levels is vital to determine the performance of the model in the wavelet domain. the following formula for the number of decomposition levels, w l = int[log(n)] is used to select the number of decomposition levels, where n is the time-series length. the wavelet-based forecasting (wbf) model transforms the time series data by using a hybrid maximal overlap discrete wavelet transform (modwt) algorithm with a 'haar' filter. daubechies wavelets can produce identical events across the observed time series in so many fashions that most other time series prediction models cannot recognize [ ] . the necessary steps of a wavelet-based forecasting model, defined by [ ] , are as follows. firstly, the daubechies wavelet transformation and a decomposition level are applied to the nonstationary time series data. secondly, the series is reconstructed by removing the high-frequency component, using the wavelet denoising method. and, lastly, an appropriate arima model is applied to the reconstructed series to generate out-of-sample forecasts of the given time series data. for the covid- datasets, we propose a hybridization of stationary arima and nonstationary wbf model to reduce the individual biases of the component models [ ] . the covid- cases datasets for five different countries are complex in nature. thus, the arima model fails to produce random errors or even stationary residual series, evident from figure . the behavior of the residual series generated by arima is mostly oscillatory and periodic; thus, we choose the wavelet function to model the remaining series. several hybrid models based on arima and neural networks are available in the field of time series forecasting; see for example [ ; ; ; ; ; ] . algorithm proposed hybrid arima-wbf model given a time series of length n, input the in-sample (training) covid- daily cases data. determine the best arima(p, d, q) model using the in-sample (training) data. • arima parameters p, d, and q values are selected using the procedures described in section . . . • obtain the predictions using the selected arima(p, d, q) model for the in-sample data and generate required number of out-of-sample forecasts. • obtain the residual series (ε t ) by subtracting arima predicted values from the original training series. train the residual series (ε t ) generated by arima by the wbf model, as described in section . . . • select the number of decomposition level using the formulae w l = int[log(n)] and boundary is chosen to be 'periodic'. • obtain in-sample predictions (ε t ) using the wbf model and generate required number of out-of-sample forecasts.. motivated by the above discussion, we propose a novel hybrid arima-wbf model which is a two-step pipeline approach. in the first step of the proposed hybrid approach, an arima model is built to model the linear components of the epidemic time series, and a set of outof-sample forecasts are generated. in the second phase, the arima residuals (oscillatory residual series) are remodeled using a mathematically-grounded wbf model. here, wbf models the left-over autocorrelations (in this case, the oscillatory series in figure ) in the residuals which arima could not model. the algorithmic presentation of the proposed hybrid model is given in algorithm . the proposed model can be looked upon as an error remodeling approach in which we use arima as the base model and remodel its error series by wavelet-based time series forecasting technique to generate more accurate forecasts. this is in relevance to model misspecification in which disturbances in the nonlinear time series of covid- cases cannot be correctly modeled with the arima model. therefore, if the error series generated by arima is adequately modeled and incorporated with the forecasts, the performance of the out-of-sample estimates can be improved, even though marginally at times. remark. the proposed hybrid approach contradicts other mathematical and traditional forecasting modeling approaches applied to covid- data. we choose two completely diverse models for hybridization, one from classical forecasting literature and another from modern forecasting approaches. five time series covid- datasets for canada, france, india, south korea, and the uk are considered for training the proposed model and the component models. the datasets are nonlinear, nonstationary, and non-gaussian in nature. we have used root mean square error (rmse), mean absolute error (mae), to evaluate the predictive performance of the models used in this study [ ] . since the number of data points in both the datasets is limited thus going for advanced deep learning techniques will simply over-fit the datasets [ ] . we start the experimental evaluation for all the five datasets with the classical arima(p,d,q) using 'forecast' [ ] statistical package in r software. to fit an arima model, we first specify the parameters of the model. using acf plot and pacf plot (see table ), we can decide the value of the parameters of the model. we have also performed unit root tests for stationarity check and all the datasets were found nonstationary. the 'best' fitted arima model is chosen using aic and bic values for each training dataset. the fitted arima models for five datasets are as follows: arima( , , ) for india, arima( , , ) for canada, arima( , , ) for france, arima( , , ) for south korea, and arima( , , ) for the uk. we employ a pre-defined box-cox transformation set to λ = to ensure the forecast values stay positive. as the arima model is fitted, forecasts are generated for -time steps ( april to april ) for all the five datasets. we also compute training data predicted values and calculate the residual errors. plots for the residual series are given in figure . it is interesting to see that the error series (residuals) generated by arima are oscillating and nonstationary for all the datasets. these seasonal oscillations can be captured through the wavelet transform, which can decompose a time series into a linear combination of different frequencies. these residual series as in figure ) satisfy the admissibility condition (zero mean) that forces wavelet functions to wiggle (oscillate between positive and negative), a typical property of wavelets. thus, we remodel the residuals obtained using the arima model with that of the wbf model. the value of wavelet levels is obtained by using the formula, as mentioned in algorithm . wbf model was implemented using 'waveletarima' [ ] package in r software with 'periodic' boundary and all the other parameters were kept as default. as the wbf model is fitted on the residual time series, predictions are generated for the next ten time steps ( april to april ). further, both the arima forecasts and wbf residual forecasts are added together to get the final out-of-sample forecasts for the next ten days ( april to april ). the hybrid model fittings (training data) for five countries, namely canada, france, india, south korea and the uk are displayed in figures (a) , (a), (a), (a) and (a) respectively. the real-time (short-term) forecasts using arima, wbf, and hybrid arima-wbf model for canada, france, india, south korea, and the uk are displayed in figures (b) , (b), (b), (b) and (b) respectively. the predicted values for the training covid- cases datasets of the proposed hybrid model for five countries are further used for model adequacy checking and based on actual and predicted test outputs, we computed rmse and mae for all the datasets and reported them in table . the performances of the proposed hybrid arima-wbf model are superior as compared to the individual models for canada, france, and the uk, whereas, for india and south korea, our results are competitive with arima. it is often true that no model can be universally employed in all circumstances, and this is in relevance with "no free lunch theorem" [ ] . even if in a very few cases hybrid arima-wbf model gave lower information criteria values (in terms of rmse and mae for training data), we still can opt for the hybrid model given the asymmetric risks involved as we believe that it is better to take decisions based on a hybrid model rather than depending on a single one at least for this pandemic. we produced ten days ahead point forecasts based on all the three models discussed in this chapter and reported then in figures - . our model can easily be updated on a daily or periodic basis once the actual values are received for the country-wise covid- cases. remark. please note that this is not an ex-post analysis, but a real, live forecasting exercise. thus, these real-time short-term forecasts based on the proposed hybrid arima-wbf model for canada, france, india, south korea, and the uk will be helpful for government officials and policymakers to allocate adequate health care resources for the coming days. at the outset of the covid- outbreak, data on country-wise case fatality rates due to covid- were obtained for affected countries. the case fatality rate can be crudely defined as the number of deaths in persons who tested positive for covid- divided by the confirmed number of covid- cases. in this section, we are going to find out a list of essential causal variables that have strong influences on the cfr. the datasets and codes of this section are made publicly available at https://github.com/indrajitg-r/covid for the reproducibility of this work. in the face of rapidly changing data for covid- , we calculated the case fatality ratio estimates for countries from the day of starting the outbreak to april from the following website . a lot of preliminary analysis is done to determine a set of possible variables, some of which are expected to be critical causal variables for risk assessments of covid- in these affected countries. previous studies [ ; ; ; ] have suggested that the total number of cases, age distributions, and shutdown period have high impacts on the cfr values for some of the countries. along with these three variables, we also considered seven more demographic structures and disease characteristics for these countries as input variables that are likely to have a potential impact on the cfr estimates. therefore, the cfr modeling dataset consists of observations having ten possible causal variables and one numerical output variable (viz. cfr), as reported in table . the possible causal variables considered in this study are the followings: the total number of covid- cases (in thousands) in the country till april, , population density per km for the country, total population (in millions) of the country (approx.), percentage of people in the age group of greater than years, lockdown days count (from the starting day of lockdown till april , ), time-period (in days) of covid- cases for the country (starting date to april , ), doctors per people in the country, hospital beds per people in the country, income standard (e.g., high or lower) of the country and climate zones (e.g., tropical, subtropical or moderate) of the country. the dataset contains a total of numerical input variables and two categorical input variables. for the risk assessment with the cfr dataset for countries, we apply the regression tree (rt) [ ] that has built-in feature selection mechanism, easy interpretability, and provides better visualization. rt, as a widely used simple machine learning algorithm, can model arbitrary decision boundaries. the methodology outlined in [ ] can be summarized into three stages. the first stage involves growing the tree using a recursive partitioning technique to select essential variables from a set of possible causal variables and split points using a splitting criterion. the standard splitting criteria for rt is the mean squared error (mse). after a large tree is identified, the second stage of rt methodology uses a pruning procedure that gives a nested subset of trees starting from the largest tree grown and continuing the process until only one node of the tree remains. the cross-validation technique is popularly used to provide estimates of future prediction errors for each subtree. the last stage of the rt methodology selects the optimal tree that corresponds to a tree yielding the lowest cross-validated or testing set error rate. to avoid instability of trees in this stage, trees with smaller sizes, but comparable in terms of accuracy, are chosen as an alternative. this process can be tuned to obtain trees of varying sizes and complexity. a measure of variable importance can be achieved by observing the drop in the error rate when another variable is used instead of the primary split. in general, the more frequent a variable appears as a primary split, the higher the importance score assigned. a detailed description of the tree building process is available at [ ] . the rationale behind the choice of rt as a potential model to find the important casual variables out of input variables for the cfr estimates is the simplicity, easy interpretability, and high accuracy of the rt algorithm. we apply an optimal rt model to the dataset consisting of different country samples and try to find out potential casual variables from the set of available variables that are related to the case-fatality rates. rt is implemented using 'rpart' [ ] package in r with "minsplit" equals to % of the data as a control parameter. we have used rmse, co-efficient of multiple determination (r ), and adjusted r (adjr ) to evaluate the predictive performance of the tree model used in this study [ ] . an optimal regression tree is built with variables with 'minsplit' = with equal costs for each variable. the estimates of the performance metrics for the fitted tree are as follows: rmse = . , r = . , and adjr = . . a variable importance list from the rt is given in figure and the fitted tree is provided in figure . from the variable importance plot based on the complexity parameter of the rt model (also see figure ), seven causal variables are obtained out of potential input variables having higher importance. these seven causal variables that significantly affect the cfr for most affected countries are the followings: total number of covid- cases in the country (in thousands), percentage of people in the age group of greater than years, total population (in millions) of the country, doctors per people in the country, lockdown period (in days) for the country, time-period (in days) of covid- cases for the country, and hospital beds per people in the country. our results are consistent with previous results obtained by [ ; ; ] , where the authors suggested that the total number of cases, age distributions, and shutdown period have high impacts on the cfr estimates. but interestingly, we obtained four more essential causal variables that will provide some new insights into the study of risk assessments for covid- affected countries. out of these numerical input variables, there are four control variables (number of cases, people of age group > years, lockdown period, and hospital beds per people) present that can be managed to fight against this deadly disease. once these variables are taken care of, the respective country may reduce their case fatality rate at a significant rate. x.x x.x x.x x.x x.x x.x x.x x.x x.x figure : variable importance percentages affecting the cfr based on a complexity parameter in rt figure shows the relationship between the important causal variables and cfr. in figure , the tree starts with the total number of covid- cases as the most crucial causal variable in the parent node. in each box, the top most numerical values suggest the average cfr estimates based on the tree. one of the key findings of the tree is the following rule: when the number of cases of a country is greater than , having a population between to million are having second highest case fatality rate, viz., %. similarly, one can see all the rules generated by rt to get additional information about the relationships between control parameters and the response cfr variable. x.x >= . x.x < x.x >= . x.x < x.x >= . x.x >= x.x < x.x < . x.x >= x.x < x.x < x.x < . yes no x.x >= . x.x < x.x >= . x.x < x.x >= . x.x >= x.x < x.x < . x.x >= x.x < x.x < x we made some simplifying assumptions to carry out the analysis of covid- datasets. the assumptions are listed as follows: (a) the virus mutation rates are comparable for different countries; (b) the recovered persons will achieve permanent immunity against covid- ; (c) we ignore the effect of climate change (also spatial data structures) during the shortterm predictions. along in this line, we presented two different approaches to deal with two inter-connected problems on covid- . in the first problem of short-term predictions for covid- outbreak in five countries, we proposed a hybrid methodology combining arima and wbf models. in the second problem of risk assessment, we found some important factors affecting case fatality rates of covid for highly affected nations. however, there may exist a few more controllable factor(s), and some disease-based characteristics that can also have an impact on the value of cfr for different countries, can be regarded as future scope of the study. the covid- outbreaks globally present a significant challenge for modelers, as there are limited data available on the early growth trajectory, and epidemiological characteristics of the novel coronavirus have not been fully elucidated. in this study, we considered two alarmingly important problems relevant to ongoing covid- pandemic. the first problem deals with the real-time forecasts of the daily covid- cases in five different countries. we proposed a hybrid arima-wbf model that can explain the nonlinear and nonstationary behavior present in the univariate time series datasets of covid- cases. ten days ahead forecasts are provided for canada, france, india, south korea, and the uk. the proposed model can be used as an early warning system to fight against the covid- pandemic. below we present a list of suggestions based on the results of the real-time forecasts. . since we presented a real-time forecast system unlike an ex-post analysis, thus one can regularly update the actual confirmed cases and update the predictions, just like it happens in weather forecasting. . the forecasts mostly show oscillating behavior for the next days and reflect the impact of the broad spectrum of social distancing measures implemented by the governments, which likely helped stabilize the epidemic. . the short-term forecasts don't necessarily show any stiff decay sooner; also, these five countries are not going to face any unlike uplifts in the number of cases too. . guided by the short-term forecasts reported in this paper, the lockdown period can be adjusted accordingly. secondly, we assessed the risk of covid- by finding seven key parameters that are expected to have powerful associations with that of case fatality rates. this is done by designing an optimal regression tree model, a simplified machine learning approach. the model is very flexible, easily interpretable, and the more data will come, one can just incorporate the new data sets and rebuild the trees to get the updated estimates. rt provides a better visual representation and is easily interpretable to be understood by a broader audience. quantification of the outbreak risks and their dependencies on the key parameters will support the governments and policymakers for the planning of health care systems in different countries that faced this epidemic. experimental results suggest four control variables out of seven highly influential variables that will have a significant impact on controlling cfr. below we present a point by point discussion of the control variables affecting cfr and preventive actions to be taken by the governments. . the number of covid cases of the country can be reduced by enforcing social distancing strategies. . number of people of age group > years should be specially taken care of and isolated. . lockdown time period can be extended if the country faces a sharp increase in the number of cases and or deaths. . the number of hospital beds should be increased by making special health care arrangements in other places to deal with this emergency due to covid- . forecasting nonlinear time series with a hybrid methodology forecasting time series using wavelets wavelet-based nonlinear multiscale decomposition model for electricity load forecasting modeling and forecasting of epidemic spreading: the case of covid- and beyond risk assessment of novel coronavirus covid- outbreaks outside china time series analysis: forecasting and control classification and regression trees forecasting dengue epidemics using a hybrid methodology the analysis of time series: an introduction analysis and forecast of covid- spreading in china, italy and france a wavelet transfer model for time series forecasting correcting and combining time series forecasters clinical characteristics of coronavirus disease in china the elements of statistical learning: data mining, inference, and prediction forecasting: principles and practice. otexts an introduction to statistical learning real-time estimation of the risk of death from novel coronavirus (covid- ) infection: inference using exported cases a comparative study of series arima/mlp hybrid models for stock price forecasting early dynamics of transmission and control of covid- : a mathematical modelling study. the lancet infectious diseases early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia serial interval of novel coronavirus (covid- ) infections. international journal of infectious diseases comparative study of wavelet-arima and wavelet-ann models for temperature time series data in northeastern bangladesh ensembles for time series forecasting a hybrid arima-svm model for the study of the remaining useful life of aircraft engines wavelet methods for time series analysis forecasting the novel coronavirus covid- real-time forecasts of the covid- epidemic in china from february th to february th cm-mid ncov working group, et al. estimating the infection and case fatality ratio for covid- using age-adjusted data from the outbreak on the diamond princess cruise ship. medrxiv package 'rpart'. available online: cran a novel coronavirus outbreak of global health concern no free lunch theorems for optimization nowcasting and forecasting the potential domestic and international spread of the -ncov outbreak originating in wuhan, china: a modelling study time series forecasting using a hybrid arima and neural network model preliminary estimation of the novel coronavirus disease (covid- ) cases in iran: a modelling analysis based on overseas cases and air travel data key: cord- -s zv hr authors: narayanan, c. s. title: modeling the covid- outbreak in the united states date: - - journal: nan doi: . / . . . sha: doc_id: cord_uid: s zv hr the covid- contagion has developed at an alarming rate in the us and as of april , , tens of thousands of people have already died from the disease. in the event of an outbreak like such, forecasting the extent of the mortality that will occur is crucial to aid the implementation of effective interventions. mortality depends on two factors: the case fatality rate and the case incidence. we combine a cohort-based model that determines case fatality rates along with a modified logistic model that evaluates the case incidence to determine the number of deaths in all the us states over time; the model is also able to include the impact of interventions. both models yield exceptional goodness-of-fit. the model predicted a range of death outcomes ( k to k) all of which are considerably greater than the figures presented in mainstream media. this model can be used more effectively than current models to estimate the number of deaths during an outbreak, allowing for better planning. the first case of coronavirus disease , or covid- , a respiratory infection caused by severe acute respiratory syndrome coronavirus (sars-cov- ), was first identified in wuhan, china in late [ ] . subsequently, the outbreak has spread to [ ] countries, including the united states, where the first case of covid- was detected in washington state on january , [ ] . as of april , , the u.s has reported cases and deaths [ ] . as the pandemic progresses, determining its prognosis is essential to inform the adoption of adequate mitigation efforts. many have attempted to forecast the trajectory of the epidemic in the united states, and at the forefront is the white house dr. fauci suggested that the us would likely face , to , deaths, with millions of cases [ ] . however, on april he said the estimate had been revised down to , [ ] . moreover, a model by the university of washington, closely followed by the white house, projects , deaths as of april , [ ] , in line with dr. fauci's statement [ ] . it is well-known that there is a wide variance in the sizes of outbreaks between states as well as the resulting incidence of deaths. the primary reason for the variance is the uncertainty in the prediction of infections and the fatality rate of infected individuals. improved models to determine both the cumulative case incidence and the case fatality rate. we calculated the case fatality rate (cfr) for each state using a cohort-based approach, which has demonstrated greater accuracy than traditional methods [ ] . additionally, the number of cumulative cases in each state is predicted using a modified logistic model. combining the two, we are able to forecast the number of cumulative deaths by state. additionally, we analyze the drivers of deaths and discuss implications on policy formation. data sources the primary data for this study is publicly available and was obtained from the center for systems science and engineering (csse) at johns hopkins university [ ] . we obtained the data pertaining to daily new cases/deaths and cumulative cases/deaths for the period of january , , to april , . in addition, we obtained us state population data from the united states census bureau [ ] . the number of deaths is a product of the case fatality rate (cfr) and the population confirmed to have been infected [ ] . therefore, in order to determine cumulative fatalities due to covid- , it is necessary to first predict the cfr and the number of cases. cfr, case incidence, and deaths were evaluated for all fifty states as well as the district of columbia. calculating cfr there are three principal measures of disease lethality: the case fatality rate (cfr), infection fatality rate (ifr) and mortality rate (mr). the mortality rate is represented by the proportion of cumulative deaths to the total at-risk population. this is ultimately indicative of the probability of any individual's mortality among the total population. the cfr uses the same numerator (cumulative deaths) but instead divides it by the number of cumulative confirmed cases. the case fatality [ ] rate is the proportion of individuals who die from a disease among all individuals diagnosed with the disease within a specified timeframe [ ] . that is, it reveals the percentage of individuals that die among all individuals who test positive for the disease. the ifr is similar to the cfr, except it represents the ratio of deaths to the total number of people who are infected; it accounts for all infected individuals regardless of whether their disease is reported or not. in an ideal scenario, where zero individuals with covid- went unnoticed, and surveillance was faultless, the ifr and cfr would be equivalent. however, this is not truly plausible; testing is limited, asymptomatic infections commonly are not surveilled, and not all instances of the disease are accounted for in reality. as a result, the cfr that we calculate will be much higher than the ifr and the mortality rate. in this paper, we use a logistical function [ ] to describe the exponential growth and subsequent flattening of covid- cfr. the cfr depends on three parameters: the final cfr (l), the cfr growth rate (k), and the onset-to-death interval (t ) and is expressed as: using this model, we calculate the number of deaths each day for each cohort or group of individuals infected on the same day. next, we build an objective function that april , / minimizes the root mean square error between the actual and predicted values of cumulative deaths. we ran , simulations, using numerous values of the onset-to-death interval, the cfr, and the cfr growth rate. the cfr was kept in the range of . % to %, the slope was kept in the range of . and . , and the onset-to-death interval bounded between and days. we assigned these bounds because after in-depth explorations of the model, we were convinced the solutions would be within these parameters. we then identify the model parameters that best fit the data (top % of the best-fit rmse). with a kernel density distribution of case fatality rates, we determined the low cfr (the lowest value regardless of its frequency); the mode cfr (the most probable cfr); and the high cfr (the highest value regardless of its frequency). methods is the incorporation of the growth rate. in the sir model, this is r , the transmission rate-given that the population lacks immunity and there are no deliberate interventions to impede disease transmission. the number of infections will continuously rise in a population if r > , will remain steady if r = , and will decrease if r < . to explicitly account for the impact of mitigation efforts, models must support gradual changes in the shape of the case growth rate. the logistic model forecasts the slow initial rise, exponential growth, and eventual decay of cumulative cases, but cannot account for the changes that result from parameters: the terminal number of cumulative cases (c), the cfr growth rate (r), and the days to the inflection point (t i ). the inflection point indicates the day at which the number of daily cases reaches its maximum. this function that describes the change in case incidencei(t) over time can be expressed as: the modified logistic model has five parameters; however, the terminal number of cumulative cases (c) and inflection point (t ) remain unchanged. the set of equations that describe the incidence using the modified logistic model are: april , / . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . the parameter r m is the modified growth function that changes over time, and f is a smoothing function that determines how quickly the rate diverges around the point of inflection, as well as the magnitude of the transformation. using this model, we calculated the number of cases for each state. next, we built an objective function that minimizes the root mean square error between the actual and predicted values of cumulative cases, and ran numerous simulations by varying the four parameters. we held p constant at . in all our simulations. we ran , simulations, using numerous values of the days to inflection: t , the terminal number of cases, c, the growth rate, r, and the growth rate multiplier, k. the terminal number of cases was kept in the range of . % to % of the each state's penetration, the growth rate was kept in the range of . and . , and the number of days to inflection was bounded between and days. the growth rate multiplier was kept in the range between and %. the cumulative mortality is the product of the case fatality rate and the cumulative case incidence. with the low, mode, and high values of both cfr and cumulative case incidence, we evaluate the nine possible death tolls for each state by finding the product of each cfr value and each value of the cumulative case incidence. to determine cumulative mortality on the national scale, we add up the respective cells for all states. predicting case incidence we calculated the case incidence for each jurisdiction. fig shows the goodness-of-fit between the forecasted cases and the true number of cases for new york. the two sets of figures show the cumulative case incidence and new case incidence. it demonstrates that there is an excellent fit for both new and cumulative cases. we calculated the r for all the states and the fit was excellent (greater than % for all the states) for all states indicating that the modified logistical function does a great job of modeling the transition after the intervention. more information regarding the model's goodness-of-fit for the states is provided in the supplemental information (see s table) . s table) . however, new york is a substantial outlier: the model predicts , cases for the mode case. all the numbers we will quote in the rest of the document will be the mode case (unless otherwise specified), as it has the highest likelihood. even if the best-case scenario transpires, its case incidence will probably exceed the incidence of any other state by over . next, we evaluated the forecasted case incidence for the entire united states (fig ) . the total number of cases predicted is above . m, and the number of new daily cases april , / . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . forecasted case incidence for the top us states. the range of forecasted total case incidence for the top states that contribute % of all the deaths. the low, mode, and high cases are displayed. the low and high cases are determined as the low and high end of the % confidence interval. peaks at more than . as new york and new jersey contribute significantly toward the overall case incidence, the united states peak daily cases is strongly dependent on the peak of these two states. in order to understand whether the different states have a case incidence proportionate to their populations, we calculated the discrepancy between the projected cases per capita for individual states and u.s average projected cumulative cases per capita (fig ) . new york, new jersey, massachusetts, connecticut, and louisiana have much higher case incidence per capita. this means that the disease affected these areas disproportionately. in contrast, the states of california, texas, florida, and ohio did much better in controlling the spread of the infection. difference in forecasted case incidence between states. difference in forecasted case incidence between the top states that contributed % of all the deaths. this was calculated by subtracting the difference between projected cases per capita for individual states and the u.s average projected cumulative cases per capita. we previously calculated the case fatality rates for hubei province and showed that the goodness-of-fit was excellent [ ] . we used the same methodology and calculated the cfr for all the states. the model is able to fit the data extremely well showing that both the model and the methodology are sound. we provide the r value for all the states in the supplemental information (see s table) . we calculated the range of final case fatality rates for each state (fig ) . the cfr for most states is between % and %. compared to the case incidence, case fatality rates have far less variability. massachusetts, connecticut, new york, and maryland have relatively higher cfr's. in contrast, texas, california and georgia have much lower cfr's. we also provide supplemental information for all state (see s table) . in order to measure how well states are faring relative to each other, we calculate the difference between projected cfr's for each state and the average cfr for the us (fig ) . positive (negative) values indicate that the cfr is worse (better). it clearly shows that there is a wide disparity between states' case fatality rates. furthermore, the difference in cfr closely corresponds with projected cumulative deaths. this shows even more dramatically how much greater new york's and connecticut's outbreaks are compared to other states. april , / . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. the copyright holder for this preprint this version posted may , . . fig . difference in forecasted case fatality rates between states. difference in forecasted case fatality rates between the top states that contributed % of all the deaths. this was calculated by subtracting the difference between projected cfr for individual states and the u.s average cfr. considering that cfr and case incidence are the factors of death, it is fitting now to discuss the projected incidence of deaths (fig ) . we calculated the number of deaths for each of the top states that contribute % of the deaths. this reveals the tremendous disparity between the size of outbreaks in different states (see also s table) . the majority of states will experience less than , deaths. new york is once again a significant outlier; the model returns a minimum of , deaths and a mode of , deaths, % of the u.s. total and more than the next five jurisdictions combined. this can be traced back to the state's relatively high projected case fatality rate and indicative of differences in cfr and case incidence. sources of variation include the chronology and success of mitigation efforts, the prevalence of testing, and the distribution of age [ ] and comorbidities [ ] within populations at risk of infection. fig . forecasted deaths for the top us states. the range of forecasted deaths for the top states that contribute % of all the deaths. the low, mode, and high cases are displayed. the low and high cases are determined as the low and high end of the % confidence interval. table summarizes the various possible death tolls under each of the conditions. there is a % likelihood that any of these results are possible. the lowest cumulative deaths the u.s. could experience is , , considerably higher than both fauci's predicted deaths tolls for the nine different scenarios. these scenarios are constructed using the low, mode, and high cases for cfr and cases. the low and high cases are determined as the low and high end of the % confidence interval. for the best-fit case in all the states, we calculated the number of deaths per day and the cumulative number of deaths (fig ) . this shows that the number of deaths will reach an asymptote in the end of june and the number of daily deaths will peak in early may. we used a cohort analysis approach to estimate cfr and a modified logistic model (that explicitly accounts for the impact of mitigation efforts) to forecast case incidence on the state level, and afterwards calculated mortality on the state and national levels. our model showed a wide range of mortality, with , deaths on the low end and a maximum of , deaths. every possibility predicted by the model exceeds the prognostications produced by both the white house and the university of washington model. our model also revealed the deep disparity in deaths among different states, which is attributable to differences in case fatality rate and case incidence. we postulate reasons for these variations. many states in the us northeast, including new york, new jersey, massachusetts, and connecticut are disproportionately represented in the cumulative death toll. this disparity is primarily because these states have much worse case incidence and case fatality rates. new york is forecasted to experience the largest outbreak, the greatest cfr, and the highest mortality of any state by far. one explanation for its high case fatality rate is the strain the epidemic has placed on its healthcare system. as a result of its high case incidence, more hospitalizations will be required, overwhelming the medical care system. this could result in diminished quality of medical care, resulting in a high case fatality rate. additionally, the disease has disproportionately impacted low-income, more vulnerable areas. [ ] the reason for the high case incidence itself is more perplexing. numerous factors are likely at play, such as the popularity of public transportation [ ] and the high population density of the new york city metropolitan area [ ] ] (where the vast majority of cases have been reported [ ]). however, it is difficult to find conclusive evidence that any of these factors are directly accountable for the outbreak. it is very likely that luck played a large role in determining where clusters appeared. there is ample evidence that super-spreading events, or sse's, can cause sizable outbreaks [ ] . for instance, officials in new york stated that as many as fifty infections could be traced back to a single man in westchester county [ ] . we propose two principal explanations for the discrepancy between the death tolls forecasted by the university of washington model and that of our model: the differences in both the procedure for calculating cfr and the procedure for calculating mortality. our cohort-based method to determining cfr's predicts case fatality rates more accurately at every stage of the outbreak than other models because it explicitly accounts for the onset-to-death interval. further, we forecast cumulative mortality by independently evaluating cfr and case incidence. in contrast, the university of washington model directly predicts deaths; this method is prone to greater errors. while both the cfr and case incidence models fit the data extremely well, there are several challenges with estimating the number of deaths accurately. our model assumes the scale and methods of surveillance do not significantly change between today and the future. any changes to the testing process will affect the number of confirmed cases. breakthroughs in leveraging telemedicine, for instance, would result in increased detection of infected individuals. in this case, the model's current forecasts for case incidence would be underestimates [ ] . additionally, if the shelter-in-place order is withdrawn from states too early, there will likely be an increase in both the case incidence and the mortality. the model itself has limitations; if our assumptions do not hold true, then our analysis will not hold true either. . cc-by-nd . international license it is made available under a is the author/funder, who has granted medrxiv a license to display the preprint in perpetuity. (which was not certified by peer review) the copyright holder for this preprint this version posted may , . i would like to thank chandra narayanan for all his guidance on building the best case incidence model. first known person-to-person transmission of severe acute respiratory syndrome coronavirus (sars-cov- ) in the usa who coronavirus disease first case of novel coronavirus in the united states who. coronavirus disease (covid- ) situation report - npr fauci estimates that , to , americans could die from the coronavirus npr the coronavirus crisis -fauci says u.s. coronavirus deaths may be 'more like institute for health metrics and evaluation. covid- projections a novel cohort analysis approach to determining the case fatality rate of covid- and other infectious diseases jhu novel coronavirus covid- ( -ncov) data repository by johns hopkins csse united states census bureau state population totals methods for estimating the case fatality ratio for a novel, emerging infectious disease how testing completely skews coronavirus case fatality rates case fatality rate the mathematics of infectious diseases estimating epidemic exponential growth rate and basic reproduction number estimation of the final size of the covid- epidemic clinical characteristics of coronavirus disease in china comorbidity and its impact on patients with covid- in china: a nationwide analysis data suggests many new york city neighborhoods hardest hit by covid- are also low income areas land area, and population density by county identifying and interrupting superspreading events-implications for control of severe acute respiratory syndrome coronavirus . emerg infect dis new york officials traced more than coronavirus cases back to one attorney virtually perfect? telemedicine for covid- key: cord- -e ey eo authors: patel, urvish; malik, preeti; mehta, deep; shah, dhaivat; kelkar, raveena; pinto, candida; suprun, maria; dhamoon, mandip; hennig, nils; sacks, henry title: early epidemiological indicators, outcomes, and interventions of covid- pandemic: a systematic review date: - - journal: journal of global health doi: . /jogh. . sha: doc_id: cord_uid: e ey eo background: coronavirus disease- (covid- ), a pandemic that brought the whole world to a standstill, has led to financial and health care burden. we aimed to evaluate epidemiological characteristics, needs of resources, outcomes, and global burden of the disease. methods: systematic review was performed searching pubmed from december , , to march , , for full-text observational studies that described epidemiological characteristics, following moose protocol. global data were collected from the jhu-corona virus resource center, who-covid- situation reports, kff.org, and worldometers.info until march , . the prevalence percentages were calculated. the global data were plotted in excel to calculate case fatality rate (cfr), predicted cfr, covid- specific mortality rate, and doubling time for cases and deaths. cfr was predicted using pearson correlation, regression models, and coefficient of determination. results: from studies of patients, . % of patients died, . % recovered, . % were admitted to icu and . % required ventilation. covid- was more prevalent in patients with hypertension ( . %), smoking ( . %), diabetes mellitus ( %), and cardiovascular diseases ( . %). common complications were pneumonia ( %), cardiac complications ( . %), acute respiratory distress syndrome ( . %), secondary infection ( . %), and septic shock ( . %). though cfr and covid- specific death rates are dynamic, they were consistently high for italy, spain, and iran. polynomial growth models were best fit for all countries for predicting cfr. though many interventions have been implemented, stern measures like nationwide lockdown and school closure occurred after very high infection rates (> cases per population) prevailed. given the trend of government measures and decline of new cases in china and south korea, most countries will reach the peak between april - , if interventions are followed. conclusions: a collective approach undertaken by a responsible government, wise strategy implementation and a receptive population may help contain the spread of covid- outbreak. close monitoring of predictive models of such indicators in the highly affected countries would help to evaluate the potential fatality if the second wave of pandemic occurs. the future studies should be focused on identifying accurate indicators to mitigate the effect of underestimation or overestimation of covid- burden. viewpoints research theme : there are confirmed cases worldwide with ( . %) deaths and ( . %) recovered cases [ ] . new york is the current epicenter of covid- ( cases and deaths) of united states of america (usa) ( cases, ( . %) deaths and ( . %) recovered patients) while italy ( deaths) and spain ( deaths) being worst affected countries [ , ] . globally, the epidemiological scenario of covid- is changing on a daily basis. the origin of severe acute respiratory syndrome coronavirus (sars-cov- ) virus was linked to a seafood market in wuhan from the handling and close contact with animals [ ] . in usa, the first case was reported on january , , with a recent travel history to wuhan [ ] . according to emerging literature, covid- symptoms can range from mild respiratory illness causing fever, dry cough, dyspnea, myalgia and fatigue to more severe manifestation of pneumonia, cardiac complications requiring intensive care unit (icu) admission and mechanical ventilation [ ] . the median incubation period is around days (range: - days), requiring prolonged monitoring in extreme cases [ , ] . real-time reverse transcriptase polymerase chain reaction (rt-pcr) of nasopharyngeal and/or oropharyngeal swabs are usually used to confirm the diagnosis [ , ] . preliminary demographic data of the infected patients suggests that most patients have mild disease, with older adults (≥ years) appearing to be more susceptible to severe illness requiring hospitalization [ , ] . covid- shows evidence of human to human transmission via respiratory droplets and from contact with contaminated surfaces or objects, with estimated median basic reproduction number (r ) of . (range: . - . ) [ ] , making the spread of the disease tough to contain. while recently published observational studies have provided insights on the epidemiology of this pandemic, their sample sizes are too limited for any definitive conclusions. hence, we sought to conduct a systematic review and analysis of all available studies comparing outcomes. primary aim of the study is to evaluate the epidemiological characteristics, needs of resources, and patients' outcomes. secondary aim is to evaluate the global burden and interventions. we evaluated epidemiological characteristics, risk factors, laboratory and imaging findings, complications and treatment utilized. we also calculated the mortality, recovery, and needs of resources like icu beds and mechanical ventilators. in order to evaluate the primary outcome, we performed a systematic review of these observational studies according to moose guidelines [ , ] . we searched the pubmed database for original observational studies that described any details on epidemiological characteristics on patients with covid- . the database was searched from december , , to march , . the search was conducted using the following keyword/mesh terms: ((covid- [title/abstract]) or coronavirus [title/abstract]) or sars-cov- [title/abstract] or -ncov [title/abstract]. all studies that compared outcomes of interest in covid- patients were included. any literature other than observational studies was excluded. non-english literature, non-full text, and animal studies were excluded. abstracts were reviewed, and articles were retrieved accordingly. two independent reviewers performed the search and literature screening (up, pm), with disputes resolved by consensus following discussion with a third author (cp). for the ease of understanding, we used a flow diagram to describe literature search and study selection process in figure s in the online supplementary document. a prespecified data collection excel sheet was used to collect the data relating to study characteristics and outcomes of interest by two authors (pm and cp), and discrepancies were solved by a discussion with a third author (up). the following study characteristics were extracted: publication year, country of origin, sample size, age, sex, direct exposure to infection, travel history, signs and symptoms, risk factors and comorbidities, laboratory and radiology findings, treatment utilized, and complications. data on the following outcomes were extracted: mortality, recovery, need for icu beds and mechanical ventilators. viewpoints research theme : covid- pandemic all analysis was done in excel (microsoft inc, seattle wa, usa) and sas . (sas institute, cary, nc, usa). the frequencies and percentages of epidemiological characteristics and outcomes were calculated. we evaluated the global burden of covid- including case fatality rates (cfr), strength of association between deaths and cases to predict cfr, case doubling time, covid- specific mortality rates, and control measures by governments to prevent spread among usa, china, italy, iran, spain, germany, india, and south korea. for this purpose, data were taken from the johns hopkins university corona-virus resource center [ ] , kff.org [ ] , world health organization-covid- situation reports [ ] , and worldometers.info [ ] up until march , . we evaluated changes in cases and deaths, cfr, created a predictive modeling for cfr, covid- specific mortality rate, and doubling time for cases and deaths. cfr was defined as the number of cases divided by the number of the diagnosed patients with covid- , and covid- specific mortality rate was defined by deaths due to covid- infections divided by total population of the country in , counted per population [ ] pearson correlation coefficient (r) was obtained to establish the strength of association between deaths and cases for individual countries. to predict cfr, we modelled the epidemic curves with simple linear regression, exponential growth, and polynomial growth models and used a coefficient of determination (r ) for model selection. the time of reporting the first death was used as the starting point for that country for all three models. we utilized government websites, national media, and other standard open sources to evaluate the governments' interventions during covid- pandemic, infection rate [(diagnosed cases/country' s population in ) per population] [ ] at the time of interventions like nationwide school closure and lockdown, and effects of such measures to predict the dates of peak number of cases in each country. our search resulted in studies, out of which non-human studies and other than observational studies, non-full text and articles with non-english language information were excluded. full-text studies were screened and studies with insufficient clinical information or outcomes-related information were excluded. full-text articles were assessed for eligibility. the final analysis included fulltext observational studies, presented in table , including a total of patients. [ ] china jan , -feb , young, mar [ ] singapore jan , -feb , - chang, feb [ ] china jan , -jan , wang, feb [ ] china jan , -jan , ng, mar [ ] singapore jan , -feb , - spiteri, mar [ ] europe jan , -feb , - covid- national incident room surveillance team, mar [ ] australia mar xu, feb [ ] china jan , -jan , . - bajema, feb [ ] usa jan , --ki, feb [ ] south korea jan , --chen, jan [ ] china jan , -jan , zhang, feb [ ] china jan , -feb , --yang, feb [ ] china figure s in the online supplementary document. several models, including a simple linear regression, exponential and polynomial (quadratic) growth models, were used to determine the type of association between cumulative deaths and cumulative cases to predict cfr ( table ) . the polynomial growth model had the best fit (higher r ) and indicates that for all countries the death rate increases with the number of cases, and this increase is steeper than a linear relationship. interestingly, while for the usa, italy, iran, spain, and india this association is always positive, for china, south korea, and germany the initial slope is negative but then is reversed as the number of cases continues to increase (figure ). figure s a in the online supplementary document). the daily covid- specific death rate is highest in spain (daily . deaths per population) and italy (daily . deaths per population) followed by usa (daily . deaths per population) ( figure s b in the online supplementary document). the county-specific timeline of doubling time for cases and deaths is shown in table and the increment in cases and deaths are plotted in (figure in the online supplementary document). march barred entry of foreign nationals who had been to european countries within last days [ ] march nationwide schools closed [ ] , lockdown in new york [ ] march a us$ trillion coronavirus stimulus bill was passed and signed by the president [ ] march more than half of us states underwent lockdown [ ] china: january response to public health emergency launched by hubei [ ] january the central government of china imposed a lockdown in wuhan and other cities in hubei province; public transport suspended. the wuhan airport, railway stations and metro were closed, not allowing residents to leave the city without permission [ ] ; public health emergency response announced by mainland province of zhejiang [ ] january mainland china has initiated public health emergency response [ ] ; quarantined whole hubei province [ ] ; curfew laws implemented in huanggang,wenzhou and other mainland cities [ ] south korea: an unlicensed covid- test authorized by the korea centers for disease control and prevention (cdc) [ ] ; travel denied to foreign nationals from hubei province into south korea [ ] february all kindergartens, elementary schools, middle schools, and high schools were announced to delay the semester start [ ] february entire country opened drive-through testing [ ] italy: january state of emergency declared, flights to and from china suspended [ ] february the council of ministers announced a new decree-law to quarantining more than people from different municipalities in northern italy [ ] march nationwide schools and universities closed [ ] march prime minister imposed nationwide quarantine lockdown [ ] march all commercial activities except pharmacies and supermarkets ordered to shut down [ ] ; € billion allocated by the government [ ] april drive-through testing began [ ] iran: all concerts and other cultural events cancelled for one week by ministry of islamic culture and guidance [ ] ; closure of educational institutions in several cities and provinces announced by the ministry of health and medical education [ ] march checkpoints placed between cities to limit travel [ ] march fatima masumeh shrine, jamkaran mosque in qom city, and imam reza shrine in mashhad closed [ ] viewpoints research theme : germany: new health security measures enacted to regulate air and sea travel that required passengers from china, south korea, japan, italy and iran to report their health status before entry [ ] ; federal police stepped up checks within km of the border [ ] march bavaria declared a state of emergency for days and measures to limit public movement and additional funds for medicine supplies were introduced [ ] ; all flights from iran and china stopped by german ministry of transport [ ] ; travelling in coaches, attending religious meetings, visiting playgrounds or engaging in tourism prohibited [ ] finance minister announced us$ billion stimulus package [ ] infection rate at the beginning of the major intervention (nationwide closure of school or major table mentions the predicted dates of the peak number of cases based on strict interventions. in china and south korea, it took - days and - days respectively in order to achieve the peak of the pandemic before the new number of cases began to decline. we have used a - days post-interventional model to calculate the peak of the pandemic keeping in mind the effect of china' s model of interventions. covid- has significantly impacted the entire world both socially and economically. the rapid human-to-human transmission has posed a great public health threat. across studies included in this review, we found confirmed cases of covid- with the majority of the published studies from china. % of the cases had a history of direct exposure or being exposed to the seafood market in wuhan, % were china residents and % had a travel history to china. initially the virus was limited to only wuhan and despite travel restriction, the virus continued to spread across the world at a rapid rate from china, likely due to asymptomatic transmission in the initial stages of the outbreak with a median incubation period of only days [ , ] , before travel restrictions. the covid- cases are increasing exponentially but underestimated due to mild symptoms in a portion of cases, long incubation periods, and shortage of testing kits. in concurrence with other studies [ , ] , we found that clinical characteristics of covid- are similar to those of sars and influenza virus. fever ( %), cough ( %) and myalgia or fatigue ( %) were the most prominent symptoms. % of patients reported dyspnea and sputum production/expectoration. major comorbidities were hypertension, smoking, diabetes mellitus, and cardiovascular disease. patients with these comorbidities are at high risk for complications including pneumonia, ards and cardiovascular complications. we found that patients had increased inflammatory markers including elevated crp in %, lymphopenia in % and elevated esr in % which is similar to other respiratory infections (sars, influenza). few studies [ , ] , have reported abnormal liver function in covid- patients, and we found % of patients had elevated alt and ast. additionally, increased ldh ( %), d-dimer( %) may indicate the severity of the disease [ ] . some studies have also reported elevated neutrophil count and cytokine storm induced by virus leading to coagulation activation and sustained inflammatory response [ ] associated with higher mortality [ ] . there is no proven therapy available as of now for covid- [ ] . large scale clinical trials for these drugs are under way. % patients received oxygen and antibiotics ( %), antivirals ( %) and steroids ( %) as supportive therapies. the prognosis of patients after receiving these treatments is not yet clear. in people with compromised immune systems such as older age, hiv, malignancy, diabetes, chronic pulmonary disease if treated promptly with antibiotics, convalescent plasma to increase the immune support might reduce the risk of complications and mortality [ ] . in our analysis, % of the patients required icu admission, % needed mechanical ventilation, % died and % recovered and were discharged from the hospital. these findings are consistent with guan et al. and wang et al that present similar rates [ , ] . currently in the usa, covid- is in the acceleration phase surpassing china and italy, and a national emergency was declared by the president, but the viewpoints research theme : duration and severity may vary depending on the virus characteristics and public health response [ ] . if confirmed cases continue to grow with this trend, soon the covid- pandemic will cause shortages of ventilators. as per institute for health metrics and evaluation (ihme) projections, on a peak day in the usa, there would be a shortage of icu beds by and a need of ventilators [ ] . the growing number of cases will place a burden on the current capacity of hospitals and hence it is essential to develop and implement strategies to mitigate the gap by increasing capacity and fair allocation of available resources. as of march , cfr in italy was . % and . % in china. according to onder et al. [ ] , cfr stratification by age, shows similar rates for - years ( %- . %) but higher in > years( %- . %). this difference might be due to high cfr reported in people > years in italy and no data from china for the same age group [ ] . other reasons might be demographics differences between two countries (≥ years population: italy- . % vs china- . %), overwhelming health care system, and shortage of icu beds and ventilators, which might lead to prioritizing treatment to younger and otherwise healthy patients over older patient [ ] . in our analysis cfr in italy increased from . % on february to . % on march , possibly due to the implementation of a strict policy of testing only suspected cases with severe symptoms [ ] . though widespread and drive-through testing is becoming more available in usa, cumulative tests conducted per million population lags behind compared to germany, italy, south korea, and spain. our data driven polynomial growth model predicts more deaths in future with an increase in cases in usa [ ] , italy, iran, spain, and india. as per our model predictions, doubling time of cases in the usa, germany and india is decreasing suggesting that they are inching towards the peak. different countries undertook interventions at different points in the timeline of spread of virus. the infection rates in the usa, italy, iran, spain, and germany were higher when they undertook substantial measures compared to china, south korea, and india, suggesting a delayed response and failure to undertake timely measures. the aforementioned timelines for peaks look optimistic because multiple other factors may influence the trajectory of spread, ie, population density, economy, demographics, health care, religious beliefs, and legislation. for instance, despite the growing number of cases, iran continued to keep its shrines open to pilgrims for a long time, but recently closed them, and no stringent curfew laws were imposed. also, many states in usa have still not implemented strict quarantine measures. such practices can seriously impede the efforts at containing the spread and skew the projection in many ways. restrictions have neither been homogeneously imposed nor simultaneously adopted throughout the country, making it difficult to predict the exact model of the spread. also, covid- testing capacity of the nations are limited and the true number of the infected people might have been higher than the estimated numbers at the time of our analysis. hence, an early phase covid- specific death rate would be a better estimate than cfr to compare the severity of the disease. many factors contribute to the accurate estimation of cfr such as testing capacity, care seeking and lack of understanding of the proportion of asymptomatic and pre symptomatic cases [ , ] . limited knowledge of these factors in the early covid- phase might have contributed to overestimation of cfr in our study. the use of serological testing for presence of igm or igg antibodies against sars-cov- will provide a better estimate of cumulative prevalence of covid infection [ ] . as recommended by who, measuring the seroprevalence of antibodies to covid- is crucial and will contribute to determine accurate cfr and help plan adequate public health response [ ] . the research on covid- is rapidly evolving and new publications are becoming available daily. the majority of the epidemiologic data are coming from single center with limited sample sizes. to overcome this limitation and provide a global view of the covid- pandemic, we have analyzed data on over patients from peer-reviewed studies. as a result, we provided more generalizable estimates of laboratory findings, clinical symptoms and complications of covid- patients. we have included data from several countries/regions; however, one limitation is that the majority of cohorts are from china, and as more data from other countries become available, additional meta-analyses would be essential. this is the first study rigorously tracking the timing of government interventions across multiple countries; however, as mentioned earlier, the adherence to those interventions could vary from one country to another, making the projections of the potential effectiveness challenging. we have not evaluated the duration of strict interventions in all these countries. the population prevalence data are based on the symptomatic patients with confirmed rt-pcr testing. since some patients can be infected and present mild or no symptoms, or have not undergone rt-pcr testing, serological antibody testing in the future may allow a viewpoints research theme : covid- pandemic more accurate understanding of the disease prevalence and death rates. despite all the limitations, this is the first study in our knowledge, highlighting and explaining epidemiological indicators, testing capacity, interventions, and expected burden of the covid at early phase. we have reviewed the burden of this pandemic and steps taken by the governments of different countries. though the governments can continue strict lockdowns, it is not a long-term solution. good hand hygiene, widespread testing, detection and isolation of new cases, rigorous contact tracing in low-prevalence settings, early vaccine development and its quick distribution, strengthening the overburdened health care system, and protecting frontline health care workers may help to gradually relax the strict lockdowns and cope with covid- pandemic. this would only be possible by a collective approach undertaken by responsible governments, wise strategy implementation, and receptive populations. the future studies should be focused on identifying accurate indicators to mitigate the effect of underestimation or overestimation of covid- burden. close monitoring of such indicators in highly affected countries is very crucial to evaluate the potential fatality if the second wave of pandemic occurs. who director-general' s opening remarks at the media briefing on covid- - coronavirus covid- global cases by the center for systems science and engineering (csse) at johns hopkins university. (jhu). first case of novel coronavirus in the united states clinical characteristics of coronavirus disease in china incubation period and other epidemiological characteristics of novel coronavirus infections with right truncation: a statistical analysis of publicly available case data the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application interim guidelines for collecting, handling, and testing clinical specimens from persons for coronavirus disease detection of novel coronavirus ( -ncov) by real-time rt-pcr clinical characteristics of coronavirus disease in china the novel coronavirus pneumonia emergency response epidemiology team. the epidemiological characteristics of an outbreak of novel coronavirus diseases (covid- ) -china estimation of the reproductive number of novel coronavirus (covid- ) and the probable outbreak size on the diamond princess cruise ship: a data-driven analysis covid- ) situation reports countries in the world by population the incubation period of coronavirus disease (covid- ) from publicly reported confirmed cases: estimation and application clinical features of patients infected with novel coronavirus in wuhan a comparative study on the clinical features of covid- pneumonia to other pneumonias epidemiologic features and clinical course of patients infected with sars-cov- in singapore epidemiologic and clinical characteristics of novel coronavirus infections involving patients outside wuhan, china clinical characteristics of hospitalized patients with novel coronavirus-infected pneumonia in wuhan, china evaluation of the effectiveness of surveillance and containment measures for the first patients with covid- in singapore first cases of coronavirus disease (covid- ) in the who european region clinical findings in a group of patients infected with the novel coronavirus (sars-cov- ) outside of wuhan, china: retrospective case series persons evaluated for novel coronavirus -united states epidemiologic characteristics of early cases with novel coronavirus ( -ncov) disease in korea. epidemiol health epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study clinical characteristics of patients infected with sars-cov- in wuhan clinical course and outcomes of critically ill patients with sars-cov- pneumonia in wuhan, china: a single-centered, retrospective, observational study clinical features of cases with coronavirus disease in wuhan, china clinical characteristics of refractory covid- pneumonia in wuhan, china characteristics and outcomes of critically ill patients with covid- in washington state risk factors associated with acute respiratory distress syndrome and death in patients with coronavirus disease clinical course and risk factors for mortality of adult inpatients with covid- in wuhan, china: a retrospective cohort study statement from the press secretary regarding the president' s coronavirus task force trump declares coronavirus a public health emergency and restricts travel from china trump signs emergency coronavirus package, injecting $ . billion into efforts to fight the outbreak trump administration announces measures to speed coronavirus testing here' s what trump' s coronavirus emergency declaration does new york launches drive-thru testing site for covid- see which states and cities have told residents to stay at home dhs notice of arrival restrictions on china, iran and certain countries of europe map: coronavirus and school closures. house passes $ trillion coronavirus stimulus bill, which includes direct payments to americans and business loans notice of the people ' s government of hubei province on strengthening the prevention and control of pneumonia infected by new coronavirus china coronavirus: lockdown measures rise across hubei province zhejiang: newly diagnosed cases of new coronavirus infection and pneumonia were launched, and the first-level response to major public health emergencies was initiated all provinces in mainland china have initiated first-level response to public health emergencies xiangyang railway station is closed, and the last prefecture-level city "hubei" in hubei province ningbo have implemented the most restrictive order south korea learned its successful covid- strategy from a previous coronavirus outbreak: mers korea bars foreigners traveling from hubei province s ministry of education opening on march south korea pioneers coronavirus drive-through testing station italy suspends all china flights as coronavirus cases confirmed in rome ten lombard municipalities: thousand people forced to stay at home. quarantine at the milanese hospital in baggio. italy orders closure of all schools and universities due to coronavirus italy extends emergency measures nationwide merkel warns virus could infect two-thirds of germany coronavirus emergency, the government' s plan rises to billion. gualtieri to the eu: 'stimulus is needed'. we will make a day long holiday' to contain coronavirus, as sixth victim dies coronavirus: iran limits travel and urges banknote avoidance. shiite hardliners in iran storm shrines that were closed to stop coronavirus spread germany enacts new health security measures against coronavirus infections these rules apply in bavaria. germany halts flights from iran and china over coronavirus: bild. deutschland im shutdown-modus -die alternativlos-kanzlerin kehrt zurück. bayern impose curfew! contact bans on more than two people, hairdressers too -the federal and state governments have agreed on this spain prohibits all direct flights from italy until the community of madrid decrees the mandatory closure of bars, restaurants and clubs until spain to impose nationwide lockdown -el mundo marlaska suspends free movement and reestablishes border controls predictions and role of interventions for covid- outbreak in india coronavirus: all international arrivals to india to share travel history at airports. icmr to test for community transmission of covid- coronavirus: icmr recommends hydroxychloroquine for high-risk population. coronavirus: india enters 'total lockdown' after spike in cases global airlines have completely stopped flying scheduled flights due to travel bans, airspace closures, and low demand for travel fm nirmala sitharaman announces rs . lakh crore relief package for poor. to understand the global pandemic, we need global testing -the our world in data covid- testing dataset early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia a comparative study on the clinical features of covid- pneumonia to other pneumonias remdesivir and chloroquine effectively inhibit the recently emerged novel coronavirus ( -ncov) in vitro breakthrough: chloroquine phosphate has shown apparent efficacy in treatment of covid- associated pneumonia in clinical studies hydroxychloroquine and azithromycin as a treatment of covid- : results of an open-label non-randomized clinical trial treatment of critically ill patients with covid- with convalescent plasma covid- ) -united states icu-days, ventilator days and deaths by us state in the next months case-fatality rate and characteristics of patients dying in relation to covid- in italy. jama. . online ahead of print italian doctors on coronavirus frontline face tough calls on whom to save the many estimates of the covid- case fatality rate evidence supporting transmission of severe acute respiratory syndrome coronavirus while presymptomatic or asymptomatic seroprevalence of immunoglobulin m and g antibodies against sars-cov- in china population-based age-stratified seroepidemiological investigation protocol for coronavirus (covid- ) infection key: cord- - km authors: iosa, marco; paolucci, stefano; morone, giovanni title: covid- : a dynamic analysis of fatality risk in italy date: - - journal: front med (lausanne) doi: . /fmed. . sha: doc_id: cord_uid: km italy was the second country in the world to face a wide epidemic of covid- after china. the ratio of the number of fatalities to the number of cases (case fatality ratio, cfr) recorded in italy was surprisingly high and increased in the month of march. the older mean age of population, the changes in testing policy, and the methodological computation of cfr were previously reported as possible explanations for the incremental trend of cfr, a parameter theoretically expected to be constant. in this brief report, the official data provided by the italian ministry of health were analyzed using fitting models and the linear fit method approach. this last methodology allowed us to reach two findings. the trend of the number of deaths followed a – -day delay of positive cases. this delay was not compatible with a biological course of covid- but was compatible with a health management explanation. the second finding is that the italian number of deaths did not increase linearly with the number of positive cases, but their relationship could be modeled by a second-order polynomial function. the high number of positive cases might have a direct and an indirect effect on the number of deaths, the latter being related to the overwhelmed bed capacity of intensive care units. the severe acute respiratory syndrome coronavirus (sars-cov- ) has developed worldwide into a pandemic ( , ) . there is a wide clinical debate on the different strategies required to minimize deaths and a political one on the economic impact of those strategies, but minimizing both fatalities and cost is proving to be quite difficult ( ) . in china, the epidemic seemed to be effectively contained by quarantine, social distancing, and the isolation of the infected population. conversely, on march , the spread of corona virus disease in italy largely increased despite the restrictions put in place by the government. in italy, the first case of sars-cov- was diagnosed in lombardy region on the th of february ( ). only month later, the number of deaths due to the covid- recorded in italy was the highest globally, even higher than that documented in china, and this was only recently exceeded by united states. many different mathematical models have been proposed to help governments to decide on what health policies they should follow. some models have been based on an exponential curve for fitting the number of infected cases and deaths. although mass media reported this initial exponential trend, it was conceivable to expect a deviation from that-rather than a plateau-followed by a progressive decrement, according to a bell-shaped curve ( ) . a recent study based on data recorded up to the th of march hypothesized for italy a trend similar to that observed in the hubei province in china, and it predicted a peak of cases at around the th of april ( ). anderson et al. ( ) have developed an illustrative simulation of the transmission model of covid- , showing that social distancing could flatten the curve of positive case frequency, retarding and reducing the peak of the curve estimation in case of no social restrictions ( ) . this theoretical modeling, reported by many mass media, suggested that a delay in contagions may reduce the number of deaths. this hypothesis was based on the idea that the number of beds in intensive care units (icus) could be sufficient only for a flattened curve of positive case frequency. otherwise, if the number of severely affected patients exceeded that of beds in icus, the number of deaths could dramatically increase. at the beginning, the italian case fatality rate (cfr) seemed to be similar to that of china, initially fixed at . % ( ) . the case fatality rate is the ratio of deaths caused by a given disease calculated on the total number of cases that the disease generated in a specific time period ( ) . updated with the new data from the th of march, the italian cfr exceeded the %. in a comparison report, a possible explanation was provided by the higher mean age of the italian population compared with the chinese one ( ) . but this may be only a partial explain of the difference in the case fatality rate of covid- in italy with respect to china. an older population, such as the italian one, may suffer from comorbidities, which increase the risk of death and hence the cfr ( ). however, the italian cfr has been higher than the chinese one even after being corrected for age: in patients older than years, cfr was . % in italy, and . % in china ( ) . furthermore, the mean older age of people did not explain the incremental trend for cfr within the italian population during the month of march. the authors of that research suggested also other possible explanations for the high cfr, mainly related to the methodological differences in case recording and case testing ( ) . in the early phase of the epidemic, italy carried out an extensive testing strategy by collecting swabs of both symptomatic and asymptomatic contacts of the infected patients, as was done in china. then, the italian ministry of health issued more stringent testing policies, prioritizing tests for patients with severe clinical symptoms who required hospitalization. this could have caused an increase in the computed value of cfr for the underestimation of the number of the asymptomatic or mildly affected patients for whom the tests were often not administered. it means that, in italy more than in other countries, the full denominator of cfr remains unknown because asymptomatic cases or patients with mild symptoms might not be tested and hence will not be identified. recent studies have faced the problem of a correct computation and interpretation of cfr related to covid- . one of them suggested, in this dynamic situation, to estimate the cfr as the number of deaths on the number of infected patients evaluated weeks before ( ) . this delay was suggested to be helpful for taking into account the incubation period and the median time from onset of symptoms to death ( , ) . a recent report, using a delay-adjusted cfr of . % (computed from a previous large study conducted in china), estimated that less than the % of the contagions in italy were actually diagnosed . however, it is noteworthy that the italian policy change on tests occurred on the th of february when the italian cfr was . % and had then continued to increase, hitting % only month later on the th of march. italy was the first western country with a wide spread of covid- , and it could be important, for other countries, to analyze in depth the italian case. the italian cfr increased day by day, despite, from a theoretical point of view, the cfr being expected to be constant ( ) . a constant cfr means that the number of deaths proportionally (linearly) increase with the number of cases. the above studies seemed to suggest that cfr was only miscomputed because the more severe cases the clinicians need to bring assistance to the less time they have to test non-severe cases, causing an apparent increase of cfr ( ) . in the present study, mathematical models were used to test if the high italian cfr was only apparent because it was related to an underestimation of positive cases or if it represents a real increment of covid- lethality, maybe related to the difficulties of the health national system to manage many cases in a short period and in a small region as occurred in the north of italy. these possibilities have led to the different theoretical scenarios depicted in figure . the cfr computed day by day could be high due to the need to take into account a biological delay of about days between deaths and the recorded number of positive cases ( ) or for the insufficient number of beds into icus. in the former case, there is a statistical problem, whereas, in the latter case, the health policy of other countries should take into account the italian lesson for covid- . the aim of this study was to provide a deeper insight into the italian cfr, testing the hypothesis that the number of deaths increased more than linearly with the number of positive cases. in this study, the data officially provided by the italian ministry of health and istituto superiore di sanità were used to monitor the increment of cases of contagion and death related to covid- in italy. data were collected from the th of february to the th of march (supplementary table ) . polynomial, logistic, and bell-shaped functions were applied to fit the data. the equation of a bell-shaped function was the following the adjusted coefficient of determination (r ) was preferred to the raw one to assess the goodness of the fitting models independently by the number of their coefficients. the approach of the linear fit method (lfm) was used to compare the number of cases and that of deaths. this method was previously validated for assessing the waveform similarity in clinical data. the lfm relies on the idea of plotting one dataset vs. another one to compare the similarity of their waveforms, such as the contemporaneity of their peaks ( ) . in the rapid evolution of the pandemic of covid- , the dayby-day cfr was computed. it means that, for each day, the cfr was the percentage of deaths on the number of actually positive patients plus dead patients plus discharged patients. the theoretical scenarios are depicted in figure , which reports the case of a constant cfr as theoretically expected ( ) and that of a cfr computed to take into account a biological delay ( ) . a third case is reported, related to a dynamic perspective of cfr taking into account a potential increase in the period in which the number of severe cases overwhelmed the capacity of intensive care units (icus), which was the worst-case scenario hypothesized by anderson et al. ( ) . the bell-shaped models of figure show that the number of positive cases in italy is still increasing day by day, as is that of deaths. although a prediction is very difficult, these models have exhibited very high values for the adjusted coefficient of determination r ( . for actually positive, total infected, and dead patients, whereas it was . for discharged patients). independently by the goodness of the predictions, the trend of deaths seems to follow that of infections, with a delay of about days. the linear fit method approach has allowed us to compare the trends of real data, as reported in figure . the number of dead patients increased with the increment of infected patients (left panel). as clearly shown by the data, this increment has a second-order polynomial trend more than the expected linear one. when the cfr was computed (right panel of figure ) , an initial quite constant low value of cfr was observed, and it was followed by a progressive increment. in fact, in the first days of data collection, the italian cfr was roughly constant and lower than . %. it then started to increase. the linear increment computed using the lfm showed that r = . . the model, based on a theoretical biological delay of days in the computation of deaths, showed a lower value r = . . furthermore, this model had a concavity opposite to that revealed by data. conversely, in this phase, a bell-shaped increment related to the overwhelmed icus showed that r = . in fitting the data. this last model coincided with a double bell-shape model with a delay of only day between positive tests and deaths. mathematical models and parameters are often used in epidemiology to generate insight into the transmission dynamics of infectious diseases and to assess the potential impact of the different intervention strategies. first of all, italian data and our models supported the theoretical prediction that the italian trend of infected patients could be similar to that one of china. this prediction was previously suggested by remuzzi and remuzzi on the basis of italian data recorded up to the th of march upon which a tend similar to that observed in the hubei province, china, was applied ( ). our results indirectly suggested that the italian interventions, mainly based on the social distancing, have been effective in reducing the speed of contagions, as occurred in china. these restrictions seemed to reduce the increment of infected patients (often incorrectly reported as an exponential growth), preventing the intensive care units in the rest of italy from being overwhelmed as occurred in lombardy ( ). however, the resulting italian cfr was very high and progressively increased throughout march. this could be due to a miscomputation of cfr ( ) . however, figure clearly shows the number of deaths increased following a second-order polynomial function with respect to the number of positive cases. in a theoretical stationary situation, cfr is expected to be constant, meaning that the number of deaths proportionally (linearly) increased with the number of positive cases. but the high number of positive cases that occurred in lombardy in a small period might have overwhelmed the icus, having a secondary effect on the number of deaths in that italian region. in the case of covid- , the case fatality rate might be relevant for optimizing a health policy. many recent studies investigating this cfr have tried to explain the high value recorded in italy and progressively in other western countries ( , ) . our study showed that the italian data had a different and unexpected second-order increment of the number of deaths related to covid- with respect to the relevant number of infected patients. some authors have suggested that it could be due to the change in testing policy ( ), but the increasing trend occurred even after this change. other authors have suggested a correction in cfr computation for taking into account the time of incubation and worsening ( ) , but it seemed to fail in modeling the italian data. in fact, our results, obtained with different data analysis, seemed to show a delay ranging from - days between the curve of positive cases and that of deaths. furthermore, the concavity of the -week delayed cfr seemed to be opposite to that of data. the small delay found in our analyses was not compatible with a biological explanation, but it could be compatible with a health management explanation. this hypothesis seemed to be confirmed by a bell-shaped increment of deaths related to the difficulties of icus in managing a high number of patients with severe symptoms. it is possible that, although all the possible miscomputation of cfr could be related to an underestimation of positive cases, the italian cfr was affected by what happened in lombardy region, the region most infected. it was a scenario of an unexpected high number of cases, most of them recorded in a small area and in a short period of time (about weeks). the italian health policy was conceivably effective in attenuating the lombardy trend in the other regions, reducing the velocity of contagions thanks to the imposed social distancing. furthermore, in lombardy and in other regions, the number of beds in icus was increased. this possible explanation did not exclude that the high cfr was also due to an underestimation of positive cases. the emergency might also have leaded clinicians to focus on severe cases, progressively applying the reduction of tests in mildly affected and asymptomatic people ( ) . both these explanations, related to health policy, could be concomitant with the progressively increased high value of italian cfr. many other countries are now facing the emergence of covid- , and the computation of cfr could be misleading, even taking into account the biological delay. in an emergency and rapidly changing scenario such as the italian one, the cfr should be interpreted from a dynamic perspective, as it is potentially affected by many changing variables with effects that are not necessarily linear. direct and indirect effects of a wide contagion should be taken into account. the analysis of the evolution of the italian cfr trend could be of help to further develop a suitable health policy in other countries. for example, in further studies, it could be important to assess the complementary value of cfr, which is related to recovered patients. there could be an important percentage of them needing rehabilitation of motor and respiratory functions. some of these patients may not be able to wait for the end of emergency, but the health policy should face the problem of rehabilitation with a respect for safety. even unaffected older people may have motor deficits related to the long period spent at home. another aspect could be the psychological effects of covid- in recovered patients, including the fear of being infected or the psychological effects of social distancing in uninfected people ( ) . the datasets for this study can be found in the repository of frontiers and also in the official site of italian ministery of health (www.salute.gov.it). ethical review and approval was not required for the study on human participants in accordance with the local legislation and institutional requirements. written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. mi performed all the data analysis and wrote the first draft of the manuscript. gm and sp provided important clinical content to the manuscript and supervised the study. this study was financed by italian ministry of health in the framework of current research for the line d of irccs fondazione santa lucia. the supplementary material for this article can be found online at: https://www.frontiersin.org/articles/ . /fmed. . /full#supplementary-material time to use the p-word? coronavirus enter dangerous new phase early transmission dynamics in wuhan, china, of novel coronavirus-infected pneumonia how will country-based mitigation measures influence the course of the covid- epidemic? critical care utilization for the covid- outbreak in lombardy, italy. early experience and forecast during an emergency response covid- and italy: what next? encyclopedia of forensic and legal medicine case-fatality rate and characteristics of patients dying in relation to covid- in italy real estimates of mortality following covid- infection incubation period of novel coronavirus ( -ncov) infections among travellers from wuhan available online at assessment of waveform similarity in clinical gait data: the linear fit method the covid- outbreak and psychiatric hospitals in china: managing challenges through mental health service reform the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.copyright © iosa, paolucci and morone. this is an open-access article distributed under the terms of the creative commons attribution license (cc by). the use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. no use, distribution or reproduction is permitted which does not comply with these terms. key: cord- -m xx authors: qian, jie; zhao, lin; ye, run-ze; li, xiu-jun; liu, yuan-li title: age-dependent gender differences of covid- in mainland china: comparative study date: - - journal: clin infect dis doi: . /cid/ciaa sha: doc_id: cord_uid: m xx background: the ongoing pandemic of novel coronavirus disease (covid- ) is challenging global public health system. sex-differences in infectious diseases are a common but neglected problem. methods: we used the national surveillance database of covid- in mainland china to compared gender differences in attack rate (ar), proportion of severe and critical cases (pscc) and case fatality rate (cfr) in relation to age, affected province, and onset-to-diagnosis interval. results: the overall ar was significantly higher in female population than in males ( . versus . per million persons; p ˂ . ). by contrast, pscc and cfr were significantly lower among female patients ( . % and . %) than among males ( . % and . %), with ors of . and . , respectively (both p ˂ . ). the female-to-male differences were age-dependent, which were significant among people aged – years for ar, and in the patients of -years or older for both pscc and cfr (all p ≤ . ). the ar, pscc and cfr varied greatly from province to province. however, female-to-male differences in ar, pscc and cfr were significant in the epicenter, hubei province, where . % confirmed cases and . % deaths occurred. after adjusting for age, affected province and onset-to-diagnosis interval, the female-to-male difference in ar, pscc and cfr remained significant in multivariate logistic regression analyses. conclusions: we elucidate an age-dependent gender dimorphism for covid- , in which the females have higher susceptibility but lower severity and fatality. further epidemiological and biological investigations are required to better understand the sex-specific differences for effective interventions. the coronavirus disease , caused by severe acute respiratory syndrome coronavirus (sars-cov- ), was first reported in wuhan, china in december [ ] , and is leading to a global health crisis. considering the gender differences in infectious diseases of humans are a common but neglected global health problem [ ] , there is a greatly need to investigate the specific question as regard to covid- . global health / , an independent health equity research organization based at university college london, compiled sex-disaggregated infection and mortality data available from tens of affected countries, and implied that the male patients were more likely to die than the female patients. however, data have so far provided no clear pattern in terms of who is more likely to become infected with sars-cov- [ ] . furthermore, no data on severity of the disease are available for them to do the comparative analyses. in mainland china, some reports have mentioned differences in fatality between male and female patients using data only from early reported cases or hospital settings [ ] [ ] [ ] . in this study, we used the surveillance data containing all confirmed cases in mainland china as of april , to evaluated gender-specific differences in attack rate, proportion of severe and critical cases, and case fatality in relation to age, affected province and onset-to-diagnosis interval, in order to provide evidence-based guidance for more effective and equitable interventions and treatments. a c c e p t e d m a n u s c r i p t according to the diagnosis and treatment protocol for novel coronavirus pneumonia (trial version ), which was updated by national health commission & state administration of traditional chinese medicine on march , (supplemental material) [ ] , confirmed cases were patients who had related epidemiological history and clinical manifestations, with one of the following etiological or serological evidences: sars-cov- nucleic acid detected by specific real-time rt-pcr assay, viral gene sequence homologous to sars-cov- , specific igm and/or igg are detectable in serum, or a -fold increase in igg titer in convalescent serum compared with the acute phase. among the confirmed cases, if patients had mild symptoms but no sign of pneumonia on imaging, they were defined as mild cases. if patients presented fever and respiratory symptoms with radiological findings of pneumonia, they were defined as moderate cases. if adult patients met any of the following criteria, i.e. respiratory distress (≧ breaths per minute, bpm ), oxygen saturation ≤ % at rest, or arterial partial pressure of oxygen (pao ) / fraction of inspired oxygen (fio ) ≦ mm hg (l mm hg = . kpa), they were defined as severe cases. the criteria for severe child cases were as following: respiratory distress (≥ bpm for infants aged below months, ≥ bpm for infants aged - months, ≥ bpm for children aged - years, and rr ≥ bpm for children above years old), oxygen saturation ≤ % at rest, having labored breathing, cyanosis or intermittent apnea, showing lethargy and convulsion, or having difficulty for feeding and signs of dehydration. critical cases were defined if they had respiratory failure requiring mechanical ventilation, shock, or other organ failure that requires cares in the intensive care unit. we collected data of all confirmed covid- cases reported to the china information system for diseases control and prevention (cisdcp), official reports by the national, a c c e p t e d m a n u s c r i p t provincial, and municipal health commissions as of april , . the surveillance data included the information on age, sex, occupation, residence location, date of illness onset, date of diagnosis, and disease classification. according to the regulations issued by the central government of mainland china, all the confirmed patients should be admitted to either general hospitals or temporary cabin hospitals until recovery from covid- or death. the disease classification was duly updated according to the change in clinical manifestations of each case. as this study constituted public health surveillance rather than research in human beings, ethical approval from institutional review boards was not required. all the information regarding individual persons had been anonymized. we to estimate the differences between groups, the student's t test for a continuous variable, and the chi-square test or a fisher's exact test for a categorical variable were used where appropriate. the administrative divisions including provinces, autonomous regions, and municipalities of china were all referred to as provinces in the paper for simplicity. we evaluated the association between gender and ar in each age group and affected province, and then estimated risk ratio (rr) and its % ci by woolf method. we compared the a c c e p t e d m a n u s c r i p t psccs and cfrs between female and male patients in each age group and affected province, and then estimated odds ratio (or) and its % ci by maximum likelihood method. the gender difference in either pscc or cfr was validated by the multivariate logistic regression analysis using spss software (version . ) by including gender as an independent variable and age group, affected province, and onset-to-diagnosis interval as co-variables. a two-sided p value less than . was considered to be significant. as of april , , a total of , confirmed case were reported, of which , ( . %) were female ( table ). the median age of the patients was years (iqr - ), with a mean (±sd) of . ± . years. there was no significant difference in age distribution between female and male patients. the number of health care workers (hcws) was , accounting for . % ( % ci, . - . %) of total cases. the female cases ( ) among hcws outnumbered males ( ) (female, . %; male, . %; p < . ). the mean (±sd) time from illness onset to diagnosis was . ± . days. the onset-to-diagnosis interval was significantly longer among female cases than among male cases. the overall ar was . per million persons ( % ci, . - . ), which was significantly higher in female population than in males ( . versus . per million persons; p ˂ . ). among the confirmed cases, , ( . %) were severe and ( . %) were critical, with an overall pscc of . %. the pscc was significantly lower in females ( / , . %) than in males ( / , . %) with a female-to-male or was . ( % ci, . - . ; p < . ). the overall cfr was . % ( / ), which was significantly lower among female patients ( / , . %) than among male patients ( / , . %), with an or of . ( %ci . - . ; p < . ). the pscc ( . %) and cfr ( . %) among a c c e p t e d m a n u s c r i p t verhcws were significantly lower than other cases ( . % and . %; p < . ). similarly, pscc and cfr were significantly lower in female ( . % and . %) than males hcws ( . % and . %) ( both p < . ). the overall ar of covid- was significantly increased with age (χ² for trend test, p < . ), with people over years having a . times higher than those under years of age table ). the older were the patients, the more severe were their illness. the pscc was continuously increased with age (χ² for trend test, p < . ). the psccs were lower in female than male cases in all age groups except - years. the female-to-male ors were significantly in the age groups older than years (all p < . ) (figure b, supplemental table ). the cfr sharply grew with age (χ² for trend test, p < . ). similar with psccs, cfrs were lower among female patients in all age groups, and the gender differences in cfr were significant in the patients -years or older (all p < . ) ( figure c , supplemental table ). table ). the pscc was lower among female than among male cases in all provinces except shandong province. the female-to-male differences in pscc were statistically significant in hubei, zhejiang, guangdong, hunan and jiangxi provinces ( figure b , supplementary table ). . % of all dead cases occurred in hubei province, where significant female-to-male difference in cfr was observed, with an or of . ( % ci . - . ; p < . ) ( figure c , supplementary table ). the cfr of female patients was comparable to that of male patients in rest of china (or = . ; % ci . - . ; p = . ). we conducted a multivariate logistic regression to validate the gender-differences in pscc and cfr by adjusting for age, affected province, and onset-to-diagnosis interval, which were significant in the univariate analyses. considering both pscc and cfr were increased with age group and onset-to-diagnosis interval, we included them as continuous co-variables, while affected province as categorical co-variables in the models. we revealed that female-tomale or for pscc remained significant after adjusting for those possible confounding variables, with an adjusted or of . ( % ci . - . ). the older age and longer onsetto-diagnosis interval was also identified as risk predictor for severity of illness (both p < . ) ( table ) . similarly, the adjusted female-to-male or of cfr was . with a % ci of . - . (p < . ). in the finale model for cfr, the or for age was . ( % ci . -a c c e p t e d m a n u s c r i p t . ) with each -year increase (p < . ), and or for onset-to-diagnosis interval was . ( % ci . - . ) for each day longer (p < . ) ( table ) . as the world responds to the unprecedented pandemic of covid- , it is critical to recognize the populations at high-risk for sars-cov- infection and disease severity for creating effective surveillance and target interventions. because gender is a determinant of health [ ] , understanding the extent to which outbreaks affect women and men differently is a fundamental step to evaluating the primary and secondary effects of a health emergency on individuals and communities [ ] . we did comparative analyses using the national surveillance data containing all confirmed covid- cases of mainland china. the agedependent gender differences in incidence, severity and fatality of covid- imply that more intensive public health surveillance and preventions should focus on women older than years especially in the epicenter to control the transmission more efficiently. on the other hand, more attentions should be provided to male patients especially those over years of age for enhanced clinical managements. furthermore, our findings on gender differences have also provided evidences for addressing the health needs of men and women equally, so as to help policy maker and societies prevent future human tragedies [ , ] . at first glance, covid- seems to occur equally among women and men. because there are more men than women in the general population of mainland china, we look at the ar by taking sex constitution into consideration. as a result, the female tendency is significant especially in hubei province, where . % cases occurred. the gender difference in ar in mainland china is age-dependent, with the peak in individuals aged - years. this is disparate from that in the republic of korea, where the highest rate is among people aged - years, with a much greater female-to-male ration of nearly : [ ] . these m a n u s c r i p t findings in the two early affected countries imply that women are in general more likely infected by sars-cov- , especially in some specific age groups. the infection of sars-cov- is primarily through angiotensin converting enzyme (ace ) receptor, which serves as a gateway for the virus's entry into tissues [ ] . the ace gene is located on the x chromosome, therefore female individuals should have higher ace levels [ ] , which might be the reason for more susceptible to sars-cov- infection in comparison to males. further investigations into ace enzyme activity in correlation with sex is required to verify the hypothesis. in addition to the biological factors, age-related social and behavioral factors might have contributed to the age-dependent gender difference in covid- morbidity. lack of adherence to social distancing and self-quarantine recommendations initiated by korean health authorities is supposed to be the risk factor for the higher infection rates among the young adults and teenagers as well as the shincheonji religious community [ ] . lockdown of city and closure of schools to control covid- transmission in china might have increased risk of sars-cov- infection in women, who have provided cares in families and communities. as seen in outbreak of ebola virus disease in west african during to , women were more likely to be infected, given their predominant roles as caregivers within families and as front-line health-care workers [ ] . the higher covid- morbidity in female than male population might also come from more likely seeing a doctor after symptom onset. an example is that the incidence rate of zika virus disease for persons seeking care was higher among women than among men during the outbreak in micronesia [ ] . systematically investigations are required to understand whether observed age-dependent gender difference in ar is due to differences in infection rates, development of disease, seeking medical care, or reporting bias. a c c e p t e d m a n u s c r i p t our analyses revealed that both pscc and cfr were lower among female patients than among male patients in nearly all provinces and all age groups. these findings are consist with the results of previous reports based on hospital data in china and in other affected countries [ , , , ] . the reasons for gender difference in the severity and fatality of covid- might be attributed to underlying comorbidity and higher risk behaviors such as smoking [ , , ] . the higher female proportion of hcws might have some contribution to the gender difference in pscc and cfr, because hcws tend to have less severe illness as observed in our study and in the united states [ ] . female individuals generally have stronger innate and adaptive immune responses than males, because the x-chromosome contains more copy numbers of immune-related genes [ ] , which might lead to more prompt clearance of sars-cov- in women, and subsequently decrease the severity and fatality of the disease. in addition, sex-dependent production of steroid hormones may contribute to gender specific disease outcomes after virus infections [ , ] . a recent observation that the female patients have higher level of igg antibody against saes-cov- compared with male patients [ ] , provides direct evidence for sex differences in immune responses. further investigations on the association between stronger immune response and less severity in female are warranted. sex-differences in ace might also play a role in pathogenesis, because ace can protect against lung damage through its anti-inflammatory function [ ] . therefore, the higher ace levels among women are supposed to protect them from more severe disease [ ] . the study had some limitations. first, we used the database of cisdcp, in which the individual characteristics relevant to gender, such as socioeconomic status, comorbidity, and immunological condition, were not recorded. lack of such information has prohibited us from further investigating their possible impacts on gender differences. second, we did comparative analyses using the surveillance data, which did not include the information on a c c e p t e d m a n u s c r i p t the clinical managements. unfortunately, we could not compare any treatments given that may confound the results regarding pscc and cfr between female and male patients. in fact, treatments in different hospitals and areas varied, and even in the same hospital the treatments might be different among female and male patients. third, missed diagnosis is avoidable due to lack of health facilities and /or laboratory capacity in the early stage of the outbreak. this situation certainly have led to under-estimates of covid- burdens, and might cause bias in some specific groups. in conclusion, this report raises awareness about the age-based gender differences in incidence, severity and fatality of covid- . interestingly, the females might be more prone to get the disease, but less likely for a poor or fatal outcome. the age-dependent gender dimorphism in covid- might contribute to various factors, and deserves further investigations on immune responses and other biological mechanisms for sex differences. policies and public health efforts have rarely addressed the gendered impacts of disease outbreaks [ ] . our gender analyses using the data from the first outbreak country have not only got insight into the gender differences, but also provided evidences for target treatment, jq, xjl and yll designed the study, performed the main data analysis, and wrote the paper. jq, lz, rzy and xjl managed the data and did the statistical analysis. this work is supported by the natural science foundation of china ( ) and peking union medical college education fund. we acknowledge the china cdc for their valuable assistance in coordinating data collection. union medical college education fund. we declare that we have no conflicts of interest. clinical features of patients infected with novel coronavirus in wuhan, china sex differences in infectious diseases-common but neglected epidemic update and risk assessment of novel coronavirus -china epidemiological and clinical characteristics of cases of novel coronavirus pneumonia in wuhan, china: a descriptive study clinical findings in a group of patients infected with the novel coronavirus (sars-cov- ) outside of wuhan, china: retrospective case series diagnosis and treatment protocol for novel coronavirus pneumonia addressing sex and gender in epidemic-prone infectious diseases on behalf of the gender and covid- working group. covid- : the gendered impacts of the outbreak better science with sex and gender: facilitating the use of a sex and gender-based analysis in health research the gendered dimensions of covid- disparities in age-specific morbidity and mortality from sars-cov- in china and the republic of korea soluble angiotensin-converting enzyme : a potential approach for coronavirus infection therapy? sex differences in angiotensin-converting enzyme modulation of ang ( - ) levels in normotensive wky rats a gendered human rights analysis of ebola and zika: locating gender in global health emergencies zika virus outbreak on yap island, federated states of micronesia gender differences in patients with covid- : focus on severity and mortality clinical characteristics of coronavirus disease in china sex difference and smoking predisposition in patients with covid- characteristics of health care personnel with covid- -united states the x chromosome and sex-specific effects in infectious disease susceptibility sex differences in susceptibility to viral infection igg antibody between male and female covid- patients: a possible reason underlying different outcome between sex overcoming the "tyranny of the urgent": integrating gender into disease outbreak preparedness and response pscc (%) ( % ci) cfr (%) ( % ci) abbreviations: iqr, interquartile range; sd, standard deviation ar, attack rate; pscc, proportion of severe and critical cases; cfr, case fatality rate a c c e p t e d m a n u s c r i p t key: cord- -zklwovba authors: jombart, thibaut; van zandvoort, kevin; russell, timothy w.; jarvis, christopher i.; gimma, amy; abbott, sam; clifford, sam; funk, sebastian; gibbs, hamish; liu, yang; pearson, carl a. b.; bosse, nikos i.; eggo, rosalind m.; kucharski, adam j.; edmunds, w. john title: inferring the number of covid- cases from recently reported deaths date: - - journal: wellcome open res doi: . /wellcomeopenres. . sha: doc_id: cord_uid: zklwovba we estimate the number of covid- cases from newly reported deaths in a population without previous reports. our results suggest that by the time a single death occurs, hundreds to thousands of cases are likely to be present in that population. this suggests containment via contact tracing will be challenging at this point, and other response strategies should be considered. our approach is implemented in a publicly available, user-friendly, online tool. as the coronavirus- (covid- , ) epidemic continues to spread worldwide, there is mounting pressure to assess the scale of epidemics in newly affected countries as rapidly as possible. we introduce a method for estimating cases from recently reported covid- deaths. results suggest that by the time the first deaths have been reported, there may be hundreds to thousands of cases in the affected population. we provide epidemic size estimates for several countries, and a user-friendly, web-based tool that implements our model . using deaths to infer cases covid- deaths start to be notified in countries where few or no cases had previously been reported . given the nonspecific symptoms , and the high rate of mild disease , a covid- epidemic may go unnoticed in a new location until the first severe cases or deaths are reported . available estimates of the case fatality ratio, i.e. the proportion of cases that are fatal (cfr, , ), can be used to estimate the number of cases who would have shown symptoms at the same time as the fatal cases. we developed a model to use cfr alongside other epidemiological factors underpinning disease transmission to infer the likely number of cases in a population from newly reported deaths. our approach involves two steps: first, reconstructing historic cases by assuming non-fatal cases are all undetected, and, second, model epidemic growth from these cases until the present day to estimate the likely number of current cases. we account for uncertainty in the epidemiological processes by using stochastic simulations for estimation of relevant quantities. two pieces of information are needed to reconstruct past cases: the number of cases for each reported death, and their dates of symptom onset. intuitively, the cfr provides some information on the number of cases, as it represents the expected number of deaths per case, so that cfr - corresponds to the expected number of cases per death. in practice, the number of cases until the first reported death can be drawn from a geometric distribution with an event probability equal to the cfr. note that while our approach could in theory use different cfr for each case (to account for different risk groups), our current implementation uses the same cfr for all cases in a simulation. dates of symptom onset are simulated from the distribution of the time from onset to death, modelled as a discretised gamma distribution with a mean of days and a standard deviation of . days . once past cases are reconstructed, we use a branching process model for forecasting new cases , . this model combines data on the reproduction number (r) and serial interval distribution to simulate new cases 'y t ' on day 't' from a poisson distribution: where w(.) is the probability mass function of the serial interval distribution. more details on this simulation model can be found in jombart et al. . optionally, this model can also incorporate heterogeneity in transmissibility using a negative binomial distribution instead of poisson. the serial interval distribution was characterized as a discretized lognormal distribution with mean . days and standard deviation . days . we assume that past cases caused secondary transmissions independently (i.e. are not ancestral to each other), so that simulated cases for each death can be added. this assumption is most likely to be met when reported deaths are close in time. as the time between reported deaths increases, past cases may come from the same epidemic trajectory rather than separate, additive ones, in which case our method would overpredict epidemic size. further details on model design and parameters values are provided in supplementary material. our approach is implemented in the r software and publicly available as r scripts (see extended data) , as well as in a user-friendly, interactive web-interface available at: https://cmmid.github.io/visualisations/ inferring-covid -cases-from-deaths . we first used our model to assess likely epidemic sizes when an initial covid- death is reported in a new location. we ran simulations for a range of plausible values of r ( . , and ) and cfr ( %, %, % and %), assuming a single death on the st march . , epidemic trajectories were simulated for each parameter combination. simulations for an 'average severity' scenario with r = and cfr = % show that by the time a death has occurred, hundreds to thousands of cases may have been generated in the affected population ( figure ) . results vary widely across other parameter settings, and amongst simulations from a given setting (table ) , with higher r and lower cfr leading to higher estimates of the numbers of cases. however, a majority of settings give similar results to our 'average' scenario, suggesting that a single death is likely to reflect several hundreds of cases. results were qualitatively unchanged when incorporating heterogeneity in the model using recent estimates , but prediction intervals were wider (extended data). we applied our approach to three countries which recently reported their first covid- deaths (spain, italy, and france), using the same range of parameters as in the single-death analysis. in order to compare predictions to cases actually reported in these countries, projections were run until th march. overall, predictions from the model using the baseline scenario (r = , crf = %) were in line with reported epidemic sizes ( several limitations need to be considered when applying our method. first, our approach only applies to the deaths of patients who have become symptomatic in the location considered, which should usually be the case in places where traveler screening is in place. we also assume constant transmissibility (r) over time, which implies that behavior changes and control measures have not taken place yet, and that there is no depletion of susceptible individuals. consequently, our method should only be used in the early stages of a new epidemic, where these assumptions are reasonable. similarly, the assumption that each death reflects independent, additive epidemic trajectories is most likely to hold true early on, when reported deaths are close in time (e.g. no more than a week apart). used on deaths spanning longer time periods, our approach is likely to overestimate epidemic sizes. contact tracing has been shown to be an efficient control measure when imported cases can be detected early on , in addition to permitting the estimation of key epidemiological parameters . when the first cases reported in a new location are mostly deaths, however, our results suggest that theunderlying size of the epidemic would make control via contact tracing extremely challenging. in such situations, efforts focusing on social distancing measures such as schoolclosures and self-isolation may be more likely to mitigate epidemic spread. underlying data all data underlying the results are available as part of the article and no additional source data are required. this project contains the file 'extended_data' (pdf), which contains supplemental information and methodological details regarding the model described in this article. extended data are available under the terms of the creative commons attribution . international license (cc-by . ). the shiny app using the model is available at: https://cmmid. github.io/visualisations/inferring-covid -cases-from-deaths. source code and r scripts available at: https://github.com/thibautjombart/covid _cases_from_deaths. cmmid covid- working group gave input on the method, contributed data and provided elements of discussion. all authors read and approved the final version of the manuscript. the following authors were part of the centre for mathematical modelling of infectious disease -ncov working group: mark jit, charlie diamond, fiona sun, billy j quilty, kiesha prem, nicholas davies, stefan flasche, alicia rosello, james d munday, petra klepac, joel hellewell. each contributed in processing, cleaning and interpretation of data, interpreted findings, contributed to the manuscript, and approved the work for publication. all authors read and approved the final version of the manuscript. to reconstruct the past history to know how much trouble one is currently in? what would the effect of a heterogeneous cfr be? (i believe this would correspond e.g. to a 'beta-geometric distribution', unless one instead wanted to treat it as a finite mixture of probabilities for discrete risk categories). it would be nice to have a little more detail (i.e. a few sentences) on the simulation procedure. i see how to get from cfr and deaths to a total number of preceding cases, and how to simulate times of symptom onset for the observed deaths. it's not completely obvious to me how to get from there to 'history of past cases' (i.e. incidence over time); does one run the renewal process backward in time? or use branching-process theory to find the time distribution of symptom onset of the index case given the current size of the epidemic? please clarify "we assume that past cases caused secondary transmissions independently (i.e. are not ancestral to each other), so that simulated cases for each death can be added." does this mean that you assume that all observed deaths are from separate lineages/transmission chains? (the last sentence of the paragraph suggests that, but the initial statement could probably be clearer.) (does this assumption even matter if we are in the branching-process regime?). i appreciate that the authors are trying to keep things simple, and thus the scenario-based approach (try the model for a range of cfr/r values and see what is implied) is useful. i note that the confidence intervals are already very wide (that's part of the point), but there are several quantities that are treated as known (delay distribution, serial interval distribution); i wonder how sensitive the results are to these assumptions (probably not much -i'm guessing that with r specified they might only change the timing, not the numbers). given that the authors are already basing the answers on , solutions, it might not be too hard to construct point estimates and intervals based on a prior/uncertainty distribution of r and cfr (rather than constructing separate scenarios), and allowing for uncertainty in the delay and serial distributions. minor comments/typos: intro, line ; methods, l. : extra comma inside parens before superscript refs?) "use [a] different cfr for each case" "parameters" values "theunderlying" "schoolclosures" in tables and consider stating " . % quantile, % quantile, % quantile, . % quantile" (rather than lower/upper x %/ %) ? is the description of the method technically sound? yes who: novel coronavirus -china reference source inferring covid- cases from recent death world health organization: coronavirus disease (covid- ) world health organization: pneumonia of unknown cause--china the incubation period of -ncov from publicly reported confirmed cases: estimation and application. medrxiv estimating underdetection of internationally imported covid- cases. medrxiv. . reference source estimating the case fatality risk of covid- using cases from outside china. medrxiv. . publisher full text . midas-network: covid- parameter estimates reference source incubation period and other epidemiological characteristics of novel coronavirus infections with right truncation: a statistical analysis of publicly available case data pubmed abstract | publisher full text | free full text a simple approach to measure transmissibility and forecast incidence pubmed abstract | publisher full text | free full text the cost of insecurity: from flare-up to control of a major ebola virus disease hotspot during the outbreak in the democratic republic of the congo serial interval of novel coronavirus ( -ncov) infections. medrxiv. . publisher full text r foundation for statistical computing extended data for: inferring the number of covid- cases from recently reported deaths pattern of early human-to-human transmission of wuhan novel coronavirus ( -ncov) this article describes a statistical modeling method for estimating the number of covid- cases from the first reported deaths in a defined location. the described methodology can provide useful information for decision making, especially as a shiny app has been developed for facilitating quick application of the method by public health practitioners, and the r code has been made available. i would be interested to see in the text a few words about how many (and which) countries found themselves in the situation of observing no covid- case before the first deaths were reported. the reference provided (number ) is not really specific about this point. the statistical method is well described and seems sound.i have a minor comment: in practice, published estimates of the cfr and r will be used as input parameters for the model. these estimates are derived from samples and are usually published with a certain measure of uncertainty, typically the standard deviation or a confidence interval. my understanding is that this estimation uncertainty on these input parameters is not taken into account in the prediction model: instead, the cfr and r are held constant for all simulations drawn with a set of parameter.taking into account the uncertainty on these input parameters may lead to even greater prediction intervals, but may reflect more completely the uncertainty about the total number of cases given the current knowledge about the disease at a certain point in time. this could be done, for example, by drawing the cfr in a beta distribution with a and b derived from the published mean and sd instead of holding it constant. in the shiny app, the user could provide the confidence interval. this is a useful, technically correct, and clearly written contribution. could the authors comment on how much extra mileage one gets/advantages of this approach relative to simply saying that the current number of cases is approximately equal to /cfr? that is, does one have to reconstruct the past history to know how much trouble one is currently in? if any results are presented, are all the source data underlying the results available to ensure full reproducibility? yesare the conclusions about the method and its performance adequately supported by the findings presented in the article? yesno competing interests were disclosed. competing interests:reviewer expertise: ecology, evolution, epidemiological modeling i confirm that i have read this submission and believe that i have an appropriate level of expertise to confirm that it is of an acceptable scientific standard.