key: cord-0683802-t6w9y8wh authors: Wood, Simon N; Wit, Ernst C; Fasiolo, Matteo; Green, Peter J title: COVID-19 and the difficulty of inferring epidemiological parameters from clinical data date: 2020-05-28 journal: Lancet Infect Dis DOI: 10.1016/s1473-3099(20)30437-0 sha: 5a084304b9b355f26db3a38b1626c71abf4e84f5 doc_id: 683802 cord_uid: t6w9y8wh nan We attempt to construct a model for the Diamond Princess (henceforth DP) data and aggregated data from China, with the intention that the DP data informs the absolute magnitude of the IFR while the China data contributes to the estimation of relative IFR by age class. For the Diamond Princess we lump the 80-89 and 90+ age groups into an 80+ group to match the China data, noting that there are no deaths in the 90+ group. We obtained the age of death of the 12 cases from the Diamond Princess Wikipedia page, checking the news reports on which the information was based. One case has no age reported except that he was an adult. Given that there was no mention of a young victim we have assumed that he was 50 or over. We adopt the assumptions of Verity et al. (2020) of a constant attack rate with age, and that there is perfect ascertainment in one age class, but assume that this is the 80+ age group for the DP. The assumption seems more tenable for the DP population than for China, given that 4003 PCR tests were administered to the 3711 people on board, with the symptomatic and elderly tending to be tested first. However given that the tests were not administered weekly to all people not yet tested positive, from the start of the outbreak, and that the tests are not 100% reliable, the assumption is still unlikely to be perfect, which may bias results upwards. Unlike Verity et al. (2020) we do not correct the case data, but adopt a simple model for under-ascertainment by age, allowing some, but by no means all, of the uncertainty associated with this assumption to be reflected in the intervals reported below. We then model a proportion of the potentially detectable cases as being symptomatic, making a second strong assumption that this rate is constant across age classes. This assumption is made because the data only tell us that there were 314 symptomatic cases among 706 positive tests but not their ages, so we have no information to further distinguish age specific under-ascertainment and age specific rates of being asymptomatic. We then adopt a simple model for the probability of death with age (quadratic on the logit scale). For the China data we necessarily use a different attack rate to the DP, but the same model as the DP to go from infected to symptomatic cases (on the basis that this reflects an intrinsic characteristic of the infectious disease). However we assume that only a proportion of symptomatic cases are detected (at least relative to whatever threshold counted as symptomatic on the DP). Furthermore we are forced to adopt a modified ascertainment model for China, and correct for the difference between this and the DP ascertainment model, within the sub-model for China. We assume the same death rates for symptomatic cases in China, but apply the Verity et al. (2020) correction for not-yet-occurred deaths, based on their fitted Gamma model, treating this correction as fixed. case develops symptoms and p d i the probability that a symptomatic case dies. p c i p s i p d i is the IFR for age class i. Let a i denote the lower age boundary of class i. The models are (i) for the detectability probability Note the assumption that all cases in the two oldest age class are ascertained; (ii) a constant symptomatic probability model, p s i = φ, and (iii) for the probability that a symptomatic case dies, For a case to be recorded on the DP, the person needed to be attacked by the virus, gotten ill and detected at the right moment. In principle, this means that the number of cases in age class i is distributed as a binom(p c i α, n i ), where p c i α is the probability of gotten ill and detected, and n i is the number of people in age class i on the DP. However, as only 619 out of the 706 cases have their age recorded, we split the cases into where C i are the observed cases of known age and C + i are the additional cases, assumed to follow the same age distribution, but not actually recorded by age. Binomial parameters are rounded appropriately. Letting S i denote the symptomatics among the cases in age group i, we have The deaths among the symptomatics of known age are distributed as where h i is the probability of being of known age on death (this is treated as fixed at 1 for ages less than 50, and 11/12 for 50+ given the one victim on the DP for which no age was recorded, except that he was an adult). For the deaths of unknown age, D na , (there is one of these) among the symptomatics of unknown age (an artificial category) where the probability of death is Finally the total number of symptomatics is modelled as S t ∼ N ( i S i , 5 2 ), allowing some limited uncertainty in the symptomatic/asymptomatic classification. The actual available data on the DP are S t , D na and {C i , n i , D i } 80 i=0 . Moving on to the Chinese data, the assumption is that the patterns with age with respect to detection (p c i ), to being symptomatic (p s i ) and to death (p d i ) are similar, but the attack rateα for China is different. Let N i be the population size in age class i andS i the symptomatics. Theñ Unlike on the DP, only a fraction δ i of the symptomatics are tested to become cases, and the (observed) deaths are then distributed as where p y i is the average probability of a case in class i having died yet, given they will die -this was treated as a fixed correction and is computed from the Verity et al. (2020) estimated Gamma model of time from onset to death, and the known onset times for the cases. The scaling by δ i ensures that p d i maintains the same meaning between DP and China. We model δ i as δ i = δp cc i /p c i where p cc i is an attempt to capture the shape of the actual China detectability with age and is defined as p cc i = exp{−(a i − 65) 2 /e γc }. We define the following priors using precision and not variance when defining normal densities: This structure uses the information from the DP to assess the symptomatic rate and hidden case rate and the scale of the death probabilities, while the China data refines the information on how death rates change with age. It is possible to formulate a model in which the China data appear to contribute to inference about absolute levels of mortality, but this model is completely driven by the prior put on proportion of cases observed (about which the China data are completely uninformative). The model was implemented in JAGS 4.3.0. Mixing is slow, but 5×10 7 steps, retaining every 2500th sample, gives an effective sample size of about 660 for δ, the slowest moving parameter. We discarded the first 2000 retained samples as burn in, although diagnostic plots show no sign that this is necessary. Posterior predictive distribution plots are shown in Figure 2 . We note the problems with young Chinese detected cases, although even the most extreme mismatch only corresponds to a factor of 2 IFR change, if reflecting incorrect numbers of actual cases. In older groups the model cases are a little high on average, but not by enough to suggest much change in IFR. These mismatches might be reduced by better models for the ascertainment proportion by age. Figure 1 shows the posterior predictive distribution for total Diamond Princess deaths with the actual deaths as a thick red bar. The median and credible intervals for the IFR as percentages in various groups are in Table 1 . They show different estimates of this crucial quantity compared to Verity et al. (2020) , again emphasising the urgent need for statistically principled sampling data to directly measure prevalence, instead of having to rely on complex models of problematic data with strong built in assumptions. ## Diamond Princess and China model -the two data sources that appear to ## contain information. library(rjags) load.module("bugs") Estimates of the severity of coronavirus disease 2019: a model-based analysis. The Lancet Infectious Diseases Sy)mptomatic total ## DNA is Deaths No Age, pa is probability of not knowing age. -list(age = 0:8 * 10 , n = c ) ## corrections for insufficient time to see all deaths 208) ################### setwd("foo/bar") ## NOTE: set to jags file location ################### jdp <-jags.model("dp-china.jags system.time(um <-jags.samples(jdp, c ## burn-in for (k in 1:4) { if (k==1) { pp <-um$Cpp;true <-dat$C;xlab <-"DP cases"} else if (k==2) { pp <-um$Dpp;true <-dat$D;xlab <-"DP deaths"} else if (k==3) { pp <-um$Cchpp;true <-dat$Cch;xlab <-"China cases"} else { pp <-um$Dchpp;true <-dat$Dch;xlab <-"China deaths"} for <-hist(log10(ifr), main=(i-1) * 10,xlab="log10(risk)") DP-death-pp.eps 04) ## roughly China demography ## Wikipedia Indian demography 27) ## total pop statista uk <-uk/sum(uk) ## 2018 UK demography ## Verity point estimate IFR by age ## DP deaths according to Verity and assuming all cases found sum(uk * verity) ## UK IFR Verity point estimates sum(uk * ci[2,]) ## UK IFR median point estimates ## overall IFR for various demographies 975)) * 100 ## China pt <-uk % * % (um$ps 975)) * 100 ## UK pt <-india % * % (um$ps 975)) * 100 ## India Acknowledgements: we thank Jonathan Rougier and Guy Nason for helpful discussion of onset-todeath interval estimation and the individual level data.