key: cord-0567698-qkk4da8m authors: Stoye, Jorg title: Bounding Disease Prevalence by Bounding Selectivity and Accuracy of Tests: The Case of COVID-19 date: 2020-08-14 journal: nan DOI: nan sha: 0cf0a110b09532f31871fba0824494f0154a42f2 doc_id: 567698 cord_uid: qkk4da8m I propose novel partial identification bounds on disease prevalence from information on test rate and test yield. The approach broadly follows recent work by cite{MM20} on COVID-19, but starts from user-specified bounds on (i) test accuracy, in particular sensitivity, (ii) the extent to which tests are targeted, formalized as restriction on the effect of true status on the odds ratio of getting tested and thereby embeddable in logit specifications. The motivating application is to the COVID-19 pandemic but the strategy may also be useful elsewhere. Evaluated on data from the pandemic's early stage, even the weakest of the novel bounds are reasonably informative. For example, they place the infection fatality rate for Italy well above the one of influenza by mid-April. Prevalence of a novel disease like COVID-19 is a quintessential missing data problem: Only a small subset of the population has been tested, this subset is almost certainly selective, we do not even know the sensitivity of our tests, and our understanding of the pandemic is vague enough so that we might not want to overly rely on heavily parameterized models. This is a natural application for partial identification analysis, i.e. the analysis of bounds on parameter values that can be inferred from imperfect data and weak but credible assumptions, without forcing statistical identifiability of a model. 1 In work that is already proving influential, Manski and Molinari (2020, MM henceforth) bring this analysis to prevalence estimation and illustrate one way to carry it out. I build on their work to propose a general framework for analyzing partial identification of disease prevalence, assuming that one has partially identifying information on the selectivity and sensitivity of diagnostic tests. I strongly agree with the overall thrust of MM. I deviate from their approach because I refine worst-case bounds by placing a priori restrictions on test sensitivity and selectivity but not on negative predictive value (all terms will be defined later). These restriction are arguably somewhat easy to relate to other literatures, and -unlike with the negative predictive value -bounds on them can be asserted independently of prior bounds on prevalence itself. In the empirical application, bounds that only restrict the direction of selectivity are considerably more informative than the analogous bounds emphasized in MM, and yet I will argue that assumptions became more compelling. Bounds become much tighter if one decides to substantively restrict selectivity, though the fair comparison then is to other, also tighter, bounds in MM. Consider first the problem of bounding prevalence of a disease in a stylized example where one has observed test rate and test yield for one population. I will call the disease COVID-19 henceforth but the ideas are obviously more general. Thus, let C indicate true infection status (with C = 1 indicating infection), T test status (with T = 1 indicating having been tested), and R test result (with R = 1 a positive test result; we observe R only conditionally on T = 1). In particular, define the testing rate τ = Pr(T = 1) and the test yield γ = Pr(R = 1|T = 1). These objects are directly identified from the data, and we will assume that they are known; indeed, we abstract from inference theory throughout. We also maintain the assumption that (PCR-)tests for COVID-19 have specificity (=true negative rate Pr(R = 0|T = 1, C = 0)) of 1; thus, Pr(C = 1|R = 1) = 1. 1 See Manski (2003) for an early monograph and Molinari (2020) for an extensive survey. I propose to refine (2.1) by asserting bounds on test sensitivity (i.e., true positive rate Pr(R = 1|C = 1)) and on test selectivity (i.e., the relation of Pr(T = 1|C = 1) to Pr(T = 1|C = 0) but not either of these two probabilites by itself). I do not claim that any of these are contextindependent, much less known; hence, prior information will be used in form of bounds. However, test sensitivity relates directly to a large medical literature, and test selectivity can be easily related to econometric models of binary response. I next explain the approach and work out its implications. Refinement: Allow for measurement error through bounding sensitivity. Sensitivity is a parameter that medical experts think about a lot. as will be discussed later, it is also the target parameter in much research on COVID-19. Assumption 1: Sensitivity of the test is bounded by (2. 2) The effect of Assumption 1 on prevalence bounds is easily calculated. Proposition 1: Suppose Assumption 1 holds. Then: Pr(C = 1) = Pr(C = 1|T = 1) Pr(T = 1) + Pr(C = 1|T = 0) Pr(T = 1). (2.4) [3] While no informative bound on Pr(C = 1|T = 0) is available, we have Pr(R = 1|T = 1) = Pr(R = 1|C = 1, T = 1) Pr(C = 1|T = 1) + Pr(R = 0|C = 1, T = 1) Pr(C = 0|T = 1) = Pr(R = 1|T = 1, C = 1) =π Pr(C = 1|T = 1), implying (in the notation introduced above) that Pr(C = 1|T = 1) = γ/π ∈ [γ/π, γ/π]. The bounds follow by substituting into (2.4). Remark 2.1: This result is easily extended to allow for specificity (=true negative rate Pr(R = 0|C = 0, T = 1)) to differ from 1. Indeed, the bounds simply adjust prevalence in the tested population through the well-known formula "prevalence=(yield+specificity-1)/(sensitivity+specificity-1)" and leave prevalence in the untested population unconstrained. This is not worked out to economize on notation. Refinement: A "logit bound" on test selectivity. Consider also the following: Assumption 2: The factor κ in can be bounded as κ ∈ [κ, κ]. Assumption 2 bounds the relative odds ratio of being tested between true positives and true negatives. Of course, this is only one of many possible ways to constrain how targeted tests are. Considerations in favor of Assumption 2 are: • In principle, bounds on κ can be asserted independently of any knowledge of, or bounds on, either prevalence ρ or empirical test rate τ . This contrasts with the superficially simpler strategy of asserting bounds on Pr(T = 1|C = 1)/ Pr(T = 1|C = 0). The latter quantity is bounded above by (1 − ρ)/(τ − ρ), so that plausibility of an upper bound on it will depend on ideas on those other quantities. Similarly, Pr(C = 1|T = 1)/ Pr(C = 1|T = 0) depends on τ and ρ. While one may plausibly restrict this fraction to exceed 1 (this is the test monotonicity in MM), it may be difficult to convincingly assert a tighter bound. • The bound is easily related to models of selection. In particular, bounding κ in the above is equivalent to bounding it in the logit model Pr(T = 1|C = c) = exp(α + κc) 1 + exp(α + κc) . [4] Logit models are well understood in econometrics and medical statistics, so this connection generates an interface to natural estimation strategies and maybe researcher intuitions about plausible parameter values. Also, if covariates are included in the above logit, the equivalence is maintained conditionally on covariates (with the κ constant across covariates). The selectivity factor κ could be bounded from both above and below. In this paper's application, I will impose throughout that κ ≥ 1, thus there is at least weak selection of infected subjects into testing, and I will consider values of κ that force strict selection. Bounding selectivity from above, or also allowing for a lower bound below 1, may be interesting in other contexts, for example, if getting tested is stigmatized or tests are targeted but not at the at-risk population. The implications of bounding κ are slightly more involved. Proposition 2: Suppose that Assumptions 1 and 2 hold. Then prevalence is sharply bounded by including by the corresponding limiting expressions as κ → −∞ or κ → ∞. In particular, if κ = ∞ as in the empirical application, we have ρ ∈ τ γ π , γ π × π + (κ − 1)τ (π − γ) κ(π − γ) + γ . (2.6) Proof. To keep algebra transparent, introduce new notation τ c = Pr(T = 1|C = c). Write γ = Pr(R = 1|T = 1) Pr(T = 1) = ρτ 1 π τ =⇒ τ 1 = γτ ρπ . (2.7) Substituting Substituting for τ 1 from (2.7) and some rearranging of terms leads to This expression decreases in both π and κ, and bounds follow by evaluating this at (π, κ) = [5] (π, κ) respectively (π, κ) = (π, κ). Note that these bounds effectively multiply sample prevalence by an adjustment factor that reflects test selectivity. As would be expected, the implied ρ decreases in κ and π. Note also (again as expected) that the adjustment factor simplifies to ρ = γ/π at κ = 1 (no selectivity would mean we estimate prevalence by prevalence in the tested subpopulation), to ρ = τ γ/π as κ → ∞ (perfect targeting means we impute zero prevalence in the untested population; compare (2.6)) and also, for the record, ρ = 1 − τ + τ γ/π as κ → 0 (perfectly wrong targeting means we impute complete prevalence in the untested population). Remark 2.2: Propositions 1 and 2 are separable in their effects on bounds: The first one restricts the relation between test yield and prevalence in the tested population, the second one restricts prevalence across tested and untested populations. Readers are encouraged to "pick and choose" and, of course, also to propose other approaches. For example, sensitivity adjustment could be combined with MM's suggestion to restrict the rate of asymptomatic infections. The negative predictive value NPV = Pr(C = 0|R = 0, T = 1) is the probability that a negative test result is accurate. It is of great importance in medical decision making (Eng and Bluemke, 2020; Manski, 2020; Watson et al., 2020) . It can be bounded as follows: Proposition 3: Suppose Assumption 1 holds. Then sharp bounds on the NPV η = Pr(C = 0|R = 0, T = 1) are given by where γ = π Pr(C = 1|T = 1) was used. The expression can be simplified and is easily seen to be decreasing in π. This result could again be easily generalized to also allow for specificity of less than 1. In that case, there would also be nondegenerate bounds on the positive predictive value Pr(C = 1|R = 1, T = 1), which equals 1 here because of the assumption of perfect specificity. [6] Assumption 1 contrasts with MM's strategy of inputting ex ante bounds on the NPV. In each case, bounds on the respective other quantity become an output of the model, so the direction of logical inference is reversed. Notating the input bounds as η ∈ [η, η], the direct comparable to (2.6) is (2.9) The following are some methodological considerations as to why one might want to start from sensitivity and selectivity. • By bounding the NPV, one necessarily directly bounds prevalence in the tested population. This is because Pr(C = 1|T = 1) = γ + (1 − γ)(1 − η), so the lower and upper bound on Pr(C = 1|T = 1) necessarily exceed the corresponding bound on (1 − η). Since also the upper bound on overall prevalence is just the upper bound on prevalence in the tested population, the effect can be large. [.22, .53 ]. This example is stark but not hypothetical; in fact, it is basically the first entry in Table 2 in MM (replicated in the first line of Table 1 In contrast, prior bounds on sensitivity do not directly imply anything about prevalence. • Inputting sensitivity (and possibly specificity) generates an interface with the literature on diagnostic tests because that is what this literature focuses on. For a general example, see Table 1 in Paules and Subbarao (2017) . With regard to COVID-19, practitioners' guides (Eng and Bluemke, 2020; Watson et al., 2020) MM seem to disagree when they write: "Medical experts have been cited as believing that the rate of false-negative test findings is at least 0.3. However, it is not clear whether they have in mind one minus the NPV or one minus test sensitivity." The technical definition of false-negative rate is not in doubt, so the concern is about informal usage. This may be a valid point in general, especially as conflation of the two corresponds to base-rate neglect, but it did not occur to me with regard to the literature on • The textbook view (Zhou et al., 2002, chapter 2) of sensitivity as technological constant has been challenged (Leeflang et al., 2008) . In the specific example, one could maybe imagine that sensitivity and prevalence are related through the distribution of viral load among the infected. However, to justify the present analysis, it is not necessary that sensitivity be constant or even unrelated to prevalence, as long as it is always within the input bounds. This appears plausible enough, especially compared to the analogous restriction on NPV. • Asserting bounds on NPV without taking targeting of tests into account may ignore constraining information that could lead to tighter bounds. Specifically, relatively low bounds on the NPV (i.e., asserting that a large fraction of negative test results are false) will be more plausible if one believes the test to be efficiently targeted. But in that same case one would conclude that the constraint Pr(C = 1|T = 1) ≥ Pr(C = 1|T = 0) is far from binding. Therefore, the degree of targeting informally enters the bounds twice, in different directions, but derivation of (2.9) does not force the value to be the same in both appearances. Assumption 2 is intended to allow for targeting of tests to affect bounds in a more disciplined manner. These bounds are mainly designed to process the information that is available early in a pandemic. For this reason and to highlight some important differences, I first illustrate them on MM's data, i.e. daily counts of tests, test results, and fatalities for Illinois, New York, and Italy in March and April. 4 However, I also present some results for current hot spots. The first two columns of Tables 1-3 bound prevalence based on the assumption that NPV is in [.6, .9 ]. This replicates MM's Table 2 . 5 The next two columns show implied bounds on test sensitivity computed by inverting Proposition 3. The next two columns do not directly restrict NPV, but restrict sensitivity to be in [.7, .95 ]. This is the sensitivity interval used by Frazier et al. (2020) in the analysis on which Cornell's Fall reopening plans are based, 3 The footnote accompanying the cited sentence links to a news piece that attributes an estimated falsenegative rate of .3 to Yang et al. (2020) . While the news piece has vague language, Yang et al. (2020) unambiguously estimates one minus sensitivity. 4 MM's results were independently replicated from their original data. MATLAB code generating all tables is available from the author. 5 MM refine these bounds by imposing time monotonicity; that is, prevalence (and therefore both bounds on it) cannot decrease over time. I agree with that restriction and can provide tables that implement it. It is dropped here solely because those tables have many identical rows, obscuring some interesting comparisons. [8] and it corresponds to my reading of the literature. 6 The final two columns show the implied bounds on NPV. All bounds also impose that prevalence is larger in the tested compared to the untested population. The new upper bounds are considerably more restrictive and the lower bounds are slightly less so, though the relative effect on lower bounds is occasionally quite large. In sum, all bounds move down, and the dominating overall impression is one of tighter bounds. The difference is frequently large -many upper bounds are reduced by more than half. It is also meaningful. The new bounds would rather clearly have ruled out speculation of saturation being "around the corner" at the time. Consider also the implied bounds on the infection fatality rate (IFR). The most informative NPV-based lower bounds (i.e., evaluated on 4/24) equal .0003 for Illinois, .0013 for New York, and .0010 for Italy, close to "flu-like" numbers that were the subject of speculation. The comparison numbers for the novel bounds are .0005, .0016, and .0026; for Italy, the lower bound is above .001 starting on 3/29. In places where the data admittedly spoke very loudly, these numbers would have cast strong doubts on "just the flu" conjectures in real time. Of course, tighter bounds are an unambiguous improvement only if assumptions did not become less credible. I would argue that this is the case here. The tables reveal that input assumptions were barely compatible: The NPV-based bounds frequently imply sensitivity below .7, and the sensitivity-based bounds imply NPV mostly close to, and frequently above, .9. Which numbers are more convincing is obviously a judgment call. But the former number seems out of step with expert opinion, including at the time, whereas the latter one would probably not have raised any eyebrows. 7 Also, Table 2 reveals that according to NPV-based bounds, sensitivity increased in New York (the bounds fail to overlap). This might have happened, but forcing it by assumption is arguably against the spirit of weak and credible partial identification assumptions. I would contend that these observations corroborate my main methodological qualm: Prior bounds on NPV depend on prior guesses of prevalence, and it is difficult to get those right. Table 4 repeats the exercise for data from current (as of 8/13) hot spots of the pandemic. 8 I deliberately restrict attention to states with high test yield because it seems that MM 6 UCSF (2020) base medical advice on a point estimate of .8. Watson et al. (2020) give .7 as "lower end of current estimates from systematic reviews." Frazier et al. (2020) use a preferred point estimate of .9. 7 As part of a recent partial identification analysis, Sacks et al. (2020) provide an empirically informed NPV estimate for Indiana of .995. This comes with caveats: It corresponds to obviously lower prevalence than in the data considered here, so that MM would presumably have inputted different NPV bounds; also, it operationalizes NPV as test-retest validity. UCSF (2020) gives NPV as .972 for symptomatic and .998 for asymptomatic cases in the Bay Area, though using the sort of point-identifying assumptions that we seek to avoid here. 8 Test counts and results were retrieved from the COVID tracking project. State populations are U.S. Census estimates for 7/1/19. [10] Using NPV to Bound... Prevalence [11] Using NPV to Bound... Using Sensitivity to Bound... Prevalence [12] Using NPV to Bound... calibrated their input bounds to such places. NPV-based bounds continue to allow for very high prevalence but also force test sensitivity to be relatively low. Sensitivity-based upper bounds are at most half -often much less -than their NPV-based counterparts, and other implications of respective bounds are roughly as before. This comes with a caveat: As the pandemic progresses, credible bounds should differentiate between current and past infection and account for repeat tests. I leave these extensions to future research and emphasize that Table 4 is meant to be illustrative, especially with regard to "first wave" hot spots. Tables 5-7 show the effect of increasingly restricting test selectivity through Assumption 2. The tables start with κ = 1, i.e. test monotonicity, and progress through arguably weak restrictions up to κ = 5, which is restrictive and may be more in the spirit of a sensitivity parameter. Upper bounds respond strongly. This is reflected in the implied lower bounds on the IFR; for Italy, these increase to (in order) .0036, .0046, .0065, and .0102. Of course, these numbers should not be compared to MM's Table 3 ; to the contrary, MM reach similar conclusions when restricting the proportion of asymptomatic infections. The same exercise but for current hot spots is displayed in Table 8 . inference might in principle be about very small probabilities, so that straightforward (bootstrap or normal approximation based) delta method approaches would not apply. Questions like this inform an exciting strand of current research (Rothe, 2020; Toulis, 2020) . They are orthogonal to the thrust of this paper and also less salient in the application because the massive sample sizes (i) would presumably justify normal or bootstrap approximation after all and (ii) mean that identification dominates estimation as source of uncertainty. This paper proposes new methods to bound prevalence of a disease from partially identifying data and assumptions. It is mainly intended as "think piece" to alert researchers to the possibly fruitful application of partial identification methods. I have no doubt that domain knowledge may inform further, and better, iterations. The conceptual innovation is to think of test accuracy as (unknown, not necessarily constant, and possibly not even identifiable) technological parameter and of test selectivity as something that econometric or epidemiological models can speak to. Bounds are therefore constructed with these as starting points, deriving bounds on the NPV by implication and not imposing any prior bound on prevalence in the tested population. In the empirical application, it turns out that some of the more audacious speculations floated at the time were at tension with credible partial identification analysis even then. This illustrates the potential utility of such analysis in early stages of a pandemic. At the current, more advanced stage of the pandemic, certain simplifications used in this paper are a stretch. Notably, I would want to distinguish between current and past infection. I leave the careful development of such bounds to future work, but would recommend to use restrictions on test sensitivity as primitive of the analysis. Once again, my main hope is to get the ball rolling. [17] False-Negative Results of Initial RT-PCR Assays for COVID-19: A Systematic Review Imaging Publications in the COVID-19 Pandemic: Applying New Research Results to Clinical Practice COVID-19 Mathematical Modeling for Cornells Fall Semester Diagnostic test accuracy may vary with prevalence: implications for evidence-based diagnosis Anatomy of the Selection Problem Bounding the Accuracy of Diagnostic Tests Working Paper Estimating the COVID-19 infection rate: Anatomy of an inference problem Microeconometrics with Partial Identification The Lancet Combining Population and Study Data for Inference on Event Rates, with an Application to the Infection Fatality Rate of SARS-CoV-2 What can we learn about SARS-CoV-2 prevalence from testing and hospital data Estimation of Covid-19 Prevalence from Serology Tests: A Partial Identification Approach COVID-19 Diagnostic Testing Interpreting a covid-19 test result Evaluating the accuracy of different respiratory specimens in the laboratory diagnosis and monitoring the viral shedding of 2019-nCoV infections Statistical Methods in Diagnostic Medicine