key: cord-0435066-w5x5p17e authors: Obradovi'c, Filip title: Measuring Diagnostic Test Performance Using Imperfect Reference Tests: A Partial Identification Approach date: 2022-04-01 journal: nan DOI: nan sha: 14a0ce10ab460236584764488e1e15096de59a63 doc_id: 435066 cord_uid: w5x5p17e Diagnostic tests are almost never perfect. Studies quantifying their performance use knowledge of the true health status, measured with a reference diagnostic test. Researchers commonly assume that the reference test is perfect, which is not the case in practice. When the assumption fails, conventional studies identify"apparent"performance or performance with respect to the reference, but not true performance. This paper provides the smallest possible bounds on the measures of true performance - sensitivity (true positive rate) and specificity (true negative rate), or equivalently false positive and negative rates, in standard settings. Implied bounds on policy-relevant parameters are derived: 1) Prevalence in screened populations; 2) Predictive values. Methods for inference based on moment inequalities are used to construct uniformly consistent confidence sets in level over a relevant family of data distributions. Emergency Use Authorization (EUA) and independent study data for the BinaxNOW COVID-19 antigen test demonstrate that the bounds can be very informative. Analysis reveals that the estimated false negative rates for symptomatic and asymptomatic patients are up to 3.89 and 5.42 times higher than the frequently cited"apparent"false negative rate. Diagnostic tests are almost never perfect. Test performance studies seek to quantify their accuracy, predominantly in the form of sensitivity and specificity. The definition is often attributed to Yerushalmy (1947) , but Binney, Hyde, and Bossuyt (2021) note that their use dates back to early twentieth century. The two parameters are also referred to as performance measures or operating characteristics. Sensitivity (true positive rate) is the probability that a test will return a positive result for an individual who truly has the underlying condition, while specificity (true negative rate) is the probability that a test will produce a negative result for an individual who does not have the underlying condition. Equivalently, one can measure false positive and false negative rates. False negative rate and sensitivity sum to unity, as do specificity and the false positive rate. Determining sensitivity and specificity for a diagnostic test of interest, referred to as an index test, requires knowledge of the true health status for all participants in the study. The true health status is most often unobservable, so a reference test is commonly used in lieu of it. However, such tests are rarely perfect themselves. When the reference is imperfect, conventional studies only identify "apparent" sensitivity and specificity, or the so-called rates of positive and negative agreement with the reference. 1 They measure performance with respect to the reference test and not true performance. Hence, they are typically not of interest. Furthermore, I show that true performance measures are usually partially identified. In other words, there exists a set of parameter values that are consistent with the observed data, called the identified set. The smallest such set under maintained assumptions, or the set that exhausts all information from the data, is known as the sharp identified set. This paper addresses the issue of finding, estimating and doing inference on the points in the sharp identified set for sensitivity and specificity, or equivalently false negative and positive rates, under standard assumptions used in the literature. I provide the sharp joint identified set for the true performance measures without imposing any assumptions on the statistical dependence between the index (test of interest) and reference tests conditional on health status, assuming exact or approximate knowledge of the reference test characteristics. If the reference test performance is known exactly, this set is a line segment in the unit square [0, 1] 2 . Otherwise, it is a union of line segments. The framework addresses the concerns raised in Boyko, Alderman, and Baron (1988) : "When two tests are strongly suspected of being conditionally dependent, then the performance of one of these tests should probably not be compared with that of the other, unless better methods are developed to sort out the degree of bias caused by reference test errors in the presence of conditional dependence." I show how one can further reduce the size of the sharp identified set by layering assumptions regarding the dependence between the two tests conditional on health status. In doing so, I formalize an informally stated assumption in the literature. I call the assumption the "tendency to wrongly agree". It maintains that if the reference test yields a false result for a particular health status, the index test is more likely than not to produce the same error. It is plausible in certain cases when the two tests share physical characteristics, such as sample types. I show how the derived identified sets may be estimated consistently. The FDA Statistical Guidance on Reporting Results Evaluating Diagnostic Tests 2 requires that all diagnostic performance studies report confidence intervals for index test sensitivity and specificity to quantify the statistical uncertainty in the estimates. To conform to the practice, this paper demonstrates that all derived identified sets may be represented using moment inequalities. I rely on the procedure from Romano, Shaikh, and Wolf (2014) to construct confidence sets for points in the identified set that are uniformly consistent in level over a family of permissible distributions relevant in the application. Namely, the confidence sets asymptotically cover all points in the identified set uniformly over the family of population distributions with probability of at least 1 − α, where α is the chosen significance level. The methodological framework offers solutions to two issues in the current research practice guidelines set forth by the FDA Statistical Guidance as explained by Remark 3: 1) Inability to measure true test performance in common settings; 2) Inability to demonstrate that the index test can outperform the reference. Given that sensitivity and specificity are frequently used to obtain other policy-relevant parameters, I present two use cases for the derived identified sets: 1) Bounding prevalence, or the population rate of illness, in a screened population; 2) Bounding predictive values, i.e. probabilities that a patient is sick conditional on observing a test result. The specific shape of the identified set for test characteristics is critical for the sharpness of bounds on prevalence. Finally, I use the developed framework to revisit the results of the original Emergency Use Authorization (EUA) performance study of the ubiquitous Abbott BinaxNOW COVID-19 Ag2 CARD rapid antigen test, as well as an independent study by Shah et al. (2021) . 3 All studies for rapid tests have a mandated RT-PCR reference test which is known to produce false negative results, and thus pertain to the setting analyzed in the paper. I construct the confidence sets and estimates of the identified sets for sensitivity and specificity, and consequently false negative and positive rates. The bound estimates can be very informative and they are found to have width as small as 0.007 for sensitivity and 0.003 for specificity in the independent study under plausible assumptions. Based on the EUA study interim and final results, the widely-cited estimated "apparent" false negative rates are 8.3% and 15.4%, respectively. Following the results found in the literature (Arevalo-Rodriguez et al. (2020) , Kucirka et al. (2020) , Kanji et al. (2021) , Fitzpatrick et al. (2021) ), and assuming that the reference has perfect specificity and 90% sensitivity, I find that that estimated bounds on the true false negative rate are [20%, 23.9%] in the same data set. Relaxing the assumption so that reference test sensitivity is only known to be in [80%, 90%] , the estimated bounds are [20%, 32.3%] . These correspond to bounds on sensitivity of [76.1%, 80%] and [67.7%, 80%]. Both estimated "apparent" false negative rates are understating even the estimated lower bounds for the true false negative rate. The estimated average number of infected people missed by the antigen test is up to 2.1 and 3.89 times higher than the test users may be led to believe by the final and interim study results, respectively. Data from Shah et al. (2021) show that the estimated true false negative rate can be up to 2.92 and 5.42 times higher for asymptomatic patients than the cited final and interim figures for symptomatic individuals, respectively. Depending on interpretation, the results from both studies suggest that the test may not satisfy the initial FDA requirement for EUA of at least 80% estimated sensitivity, despite fulfilling the criterion of high "apparent" sensitivity, implying the need for alternative testing protocols. The outlined approach may be viewed as an attractive alternative to posing untenable convenient assumptions, such as perfect performance of the reference test or conditional independence of the reference and index tests in addition to exactly known reference test characteristics, at the expense of credibility. Therefore, I provide replication files that researchers may directly utilize to obtain estimates and confidence sets in their own work. 4 Since the method requires no changes to the data-collection process of most current applied work, it can also be used to easily interpret existing published studies, as demonstrated by the application section of the paper. The medical profession refers to the difference between the "apparent" and true performance measures as gold standard bias. Gart and Buck (1966) , Staquet et al. (1981), and Zhou, McClish, and Obuchowski (2009) show that when the reference and index tests are statistically independent conditional on the true health status, index test sensitivity and specificity are point identified. They offer a maximum likelihood estimator under the assumption of exactly known reference test performance measures. This is an appealing result, however, Hui and Zhou (1998) elaborate that conditional independence is often untenable. Several authors have analyzed the impact that conditional dependence may have on sensitivity and specificity measurement errors. Deneef (1987) shows that if the two tests are conditionally independent, "apparent" performance will be lower than true performance, and that when the tests are positively correlated, "apparent" accuracy may be higher than true accuracy. Boyko, Alderman, and Baron (1988) use a case study to examine the difference between "apparent" and true operating characteristics when the tests are conditionally independent and disease prevalence is varied. Valenstein (1990) concludes that when classification errors committed by an index test and a reference test are highly correlated, the "apparent" sensitivity and specificity will be higher than the true parameters. Additionally, the author reports that when the correlation is slight, the "apparent" operating characteristics may either be over-or underestimating the true values. However, they do not demonstrate this analytically. More importantly, they do not precisely define highly correlated classification errors. This leaves them open for interpretation and it has prompted the formalization of the assumption in this paper. A significant portion of the published work focuses on the direction of the effects of the conditional dependence, rather than on the magnitude. The purpose is to enable researchers to determine whether 4. Available from: https://github.com/obradovicfilip/bounding test performance their estimates are biased upwards or downwards. Correlation between the results of the two tests conditional on the health status cannot be observed, as it conditions on an unobservable random variable, diminishing the practical relevance of some findings. Additionally, one could argue that the magnitude is perhaps even more important than the direction of the bias. A formal approach to the issue of unknown bias magnitude is found in Thibodeau (1981) , who poses explicit assumptions on the magnitude of the deviation from conditional independence when the reference test is at least as accurate as the index to bound the bias at the population level. The framework presented below does not require such assumptions. More recently, Emerson et al. (2018) sketch an argument for individual bounds on sensitivity and specificity when the conditional independence assumption is not imposed. This study goes beyond published work by deriving the sharp joint identified set, formalizing and incorporating existing dependence assumptions to further reduce its size, bounding derived parameters of interest, and providing an appropriate uniform inference procedure. Ziegler (2021) uses a setting similar to the one in this paper to characterize sufficient conditions for informativeness of the index test in terms of predictive values, but does not focus on measuring index test performance when the reference test is imperfect. This paper primarily contributes to the literature on gold standard bias in diagnostic test performance studies Walter (1980), Thibodeau (1981) , Staquet et al. (1981 ), Vacek (1985 ), Deneef (1987 , Boyko, Alderman, and Baron (1988) , Valenstein (1990) , Zhou (1998), Feinstein (2002) , Emerson et al. (2018) ), and to a growing body of literature concerning partial identification in medical and epidemiologic research such as, Bhattacharya, Shaikh, and Vytlacil (2012) , Manski (2020) , Toulis (2021), Manski and Molinari (2021 ), Ziegler (2021 ), and Stoye (2022 . In doing so, it merges ideas from two branches of econometric research, partial identification (Manski (2003) , Manski (2007)) and inference in moment inequality models (Andrews and Soares (2010) , Barwick (2012), Chernozhukov, Lee, and Rosen (2013) , Romano, Shaikh, and Wolf (2014) , Canay and Shaikh (2017) , Bugni, Canay, and Shi (2017), Chernozhukov, Chetverikov, and Kato (2019) , Kaido, Molinari, and Stoye (2019) , Bai, Santos, and Shaikh (2021) ). Finally, to the extent of my knowledge, these are the first empirical results aiming to recover the true sensitivity and specificity for COVID-19 antigen tests despite reference test imperfections. This is an addition to the corpus of COVID-19 test performance studies (Shah et al. (2021) , Pollock et al. (2021 ), Siddiqui et al. (2021 ). The remainder of the paper is organized as follows. Section 2 provides the identification argument. Section 3 demonstrates identification of prevalence and predictive values. Section 4 explains estimation and inference. Section 5 presents confidence and estimated identified sets for the operating characteristics of the COVID-19 antigen test. Section 6 concludes. All proofs are collected in Appendix B. Studies quantifying the performance of a test of interest, also known as an index test, require knowledge of the true health status. Health status is usually unobservable, so it is determined by an alternative test, called the reference test. Even though the reference test should be the best available test for the underlying condition, it is almost always imperfect in practice giving rise to identification issues. In this section, I present the setting and assumptions, and derive the sharp joint identified sets for index test performance measures -sensitivity and specificity, as defined below. Let t = 1 and r = 1 if the index and reference tests, respectively, yield positive results and t = 0, r = 0 otherwise. Let y = 1 denote the existence of the underlying condition we are testing for and y = 0 the absence of it. 5 We are interested in learning the sensitivity and specificity of the index test: Specificity: θ 0 = P (t = 0|y = 0). Equivalently, one can study the false negative and false positive rates, 1 − θ 1 and 1 − θ 0 . Finally, define the reference test sensitivity s 1 = P (r = 1|y = 1) and specificity s 0 = P (r = 0|y = 0). Data collection in test performance studies is commonly done by testing all study participants with both the reference and index tests. The observed outcome for each participant is (t, r) ∈ {0, 1} 2 . The data identify the joint probability distribution P (t, r). "Apparent" sensitivity and specificity are defined whenever P (t = 1) ∈ (0, 1): "Apparent" sensitivity:θ 1 = P (t = 1|r = 1) "Apparent" specificity:θ 0 = P (t = 0|r = 0). A common approach is to assume that the reference test is perfect, so that r = y. Then, "apparent" measures are equal to the parameters of interest (θ 1 , θ 0 ). This is rarely the case in practice. Generally, θ j = θ j for j = 0, 1 which referred to as gold standard bias. Interpreting (θ 1 ,θ 0 ) as true performance measures can lead to severely misleading conclusions due to the bias. Alternatively, researchers may explicitly study (θ 1 ,θ 0 ). However, they only measure performance of t with respect to r, and not y. If one wishes to learn about true performance (θ 1 , θ 0 ), then these parameters are not of interest. Focusing the analysis on binary tests and binary health statuses is standard procedure. Sensitivity and specificity are defined only in such settings. Many tests that yield discrete or continuous test results, such as RT-PCR tests, are reduced to binary tests by thresholding in practice. FDA Statistical Guidance recognizes only binary reference tests and health statuses, explicitly stating: "A reference standard ... divides the intended use population into only two groups (condition present or absent)." The section begins by outlining the formal assumptions used. I then provide the set of parameter 5. I interchangeably say that the person is ill when y = 1 and when y = 0, that they are healthy. This can be extended to encompass antibody tests with minor semantic changes, since they can also measure if a person has been ill. values (θ 1 , θ 0 ) consistent with the observed data, also known as the identified set, without imposing any assumptions on the statistical dependence between t and r. The set is sharp, or the smallest possible under maintained assumptions. For simplicity of exposition, this is first done when (s 1 , s 0 ) are known. I show how an additional assumption on the dependence structure between the two tests can be used to further reduce the size of the sharp identified set. Finally, I allow (s 1 , s 0 ) to be approximately known by assuming (s 1 , s 0 ) ∈ S, where S is some known set. The framework in this paper relies on common assumptions maintained in the literature. Assumption 1. (Random Sampling) The study sample is a sequence of i.i.d random vectors W i = (t i , r i ), where each W i follows a categorical distribution P (t, r) for (t, r) ∈ {0, 1} 2 and i = 1, . . . , n. The distribution P (t, r) is a marginal of the joint distribution P (t, r, y). Since y is not observable, Assumption 2. (Reference Performance) Sensitivity and specificity of the reference test s 1 = P (r = 1|y = 1) and s 0 = P (r = 0|y = 0) are known, and s 1 > 1 − s 0 . Knowledge of (s 1 , s 0 ) is assumed in papers dealing with gold standard bias correction, such as Gart and Buck (1966), Thibodeau (1981) , Staquet et al. (1981) , and Emerson et al. (2018) . The current norm of relying on the assumption that the reference test is perfect means that researchers regularly maintain (s 1 , s 0 ) = (1, 1). The analysis is first done for the simple case when (s 1 , s 0 ) are known exactly. The approach is generalized in Section 2.4 by assuming (s 1 , s 0 ) ∈ S, where S is some known set. Hence, reference test performance needs to be known only approximately. The generalization can also be used to perform sensitivity analyses. Section 2.4.1 contains the discussion on credibility of these assumptions. I further maintain that s 1 > 1 − s 0 , or that the reference test is reasonable. 6 DiCiccio et al. (2021) refer to such r as a test that has diagnostic value. If s 1 = 1 − s 0 , one can show that r ⊥ ⊥ y, so the test provides no information on y. Tests are costly, and any use of such test is not rational. If s 1 < 1 − s 0 , then the probability of a true positive is less then a probability of a false positive. It would be possible to redefine r * = 1 − r, so that s * 1 = 1 − s 1 and Assumption 3. (Bounded Prevalence) The population prevalence P (y = 1) satisfies 0 < P (y = 1) < 1. In a study population in which all participants are either healthy or diseased, one of the measures θ 1 or θ 0 is undefined. The assumption is implicitly found in diagnostic test performance studies measuring sensitivity and specificity. 6. The assumption does not require that both s1 and s0 are high. Indeed, it is possible that either s1 or s0 are close to 0, but that their sum is higher than 1. 2.2 Identified Set for (θ 1 , θ 0 ) We would first like to learn (θ 1 , θ 0 ) without imposing any assumptions on the statistical dependence structure between t and r conditional on y. This will yield the identified set for (θ 1 , θ 0 ) when (s 1 , s 0 ) is known. The data reveal P (t, r), while probability distributions involving y are not directly observable. Still, P (r, y) can be determined using (s 1 , s 0 ) and P (t, r). I henceforth use P s 1 ,s 0 to denote probability distributions that are derived from observable distributions given (s 1 , s 0 ). All directly observable distributions, such as P (t, r), do not have the subscript. By the law of total probability and s 1 = 1 − s 0 from Assumption 2: P s 1 ,s 0 (r, y) is then known from P s 1 ,s 0 (r, y) = P s 1 ,s 0 (r|y)P s 1 ,s 0 (y), since (s 1 , s 0 ) fully characterize P s 1 ,s 0 (r|y). To outline the idea of finding the identified set, first note that θ j = P s 1 ,s 0 (t = j|y = j) for j = 0, 1: Probabilities P s 1 ,s 0 (t = j, r = k, y = j) for k = 0, 1 are unobservable. However, they can be bounded using the knowledge of P (t, r) and P s 1 ,s 0 (r, y). By the properties of probability measures, an upper bound on P s 1 ,s 0 (t = j, r = k, y = j) is min P (t = j, r = k), P s 1 ,s 0 (r = k, y = j) . To form a lower bound, one can similarly find that P s 1 ,s 0 (t = j, r = k, y = 1 − j) ≤ min P (t = j, r = k), P s 1 ,s 0 (r = k, y = 1 − j) and use: P s 1 ,s 0 (t = j, r = k, y = j) = P (t = j, r = k) − P s 1 ,s 0 (t = j, r = k, y = 1 − j) ≥ max 0, P (t = j, r = k) − P s 1 ,s 0 (r = k, y = 1 − j) . Such bounds are shown to be sharp. By summing the upper and lower bounds on P s 1 ,s 0 (t = j, r = k, y = j) for k = 0, 1, one can bound θ j . A sharp joint identified set for (θ 1 , θ 0 ) then follows from (5). Proposition 1 expands on this intuition to provide sharp bounds on θ j and the sharp joint identified set for (θ 1 , θ 0 ). Proposition 1. The sharp identified set H θ j (s 1 , s 0 ) for parameter θ j and j = 0, 1, given reference test sensitivity s 1 and specificity s 0 is an interval H θ j (s 1 , s 0 ) = [θ L j , θ U j ], where: θ L j = max 0, P (t = j, r = j) − P s 1 ,s 0 (r = j, y = 1 − j) for P s 1 ,s 0 (y = 1) as in (5) and P s 1 ,s 0 (r, y) = P s 1 ,s 0 (r|y)P s 1 ,s 0 (y). The sharp identified set H (θ 1 ,θ 0 ) (s 1 , s 0 ) for (θ 1 , θ 0 ) given reference test sensitivity s 1 and specificity s 0 is: The set H (θ 1 ,θ 0 ) (s 1 , s 0 ) is a line segment on [0, 1] 2 for a given value of reference test operating characteristics s 1 and s 0 . Emerson et al. (2018) sketch an argument for individual bounds on θ j as in (9) and do not discuss the joint identified set. Proposition 1 goes further by proving that both individual bounds and the joint identified sets are the smallest possible under the assumptions. Section 3.1 shows that the linear structure of the set H (θ 1 ,θ 0 ) (s 1 , s 0 ) is crucial for sharpness of bounds on certain derived policy-relevant parameters, such as the population illness rate, otherwise known as prevalence, in screened populations. Resulting bounds on prevalence are unnecessarily wide if one supposes that the joint identified set is a rectangle H θ 1 (s 1 , s 0 ) × H θ 0 (s 1 , s 0 ). Observe also that H (θ 1 ,θ 0 ) (s 1 , s 0 ) directly yields the sharp joint identified set for false negative and false positive rates (1 − θ 1 , 1 − θ 0 ). The same will hold for other identified sets for (θ 1 , θ 0 ) below. Remark 1. It is possible that θ j > s j in the identified set. The bounds overcome an important limitation of conventional studies in which the index can never be shown to outperform the reference test. Since such studies assume (s 1 , s 0 ) = (1, 1), then by definition θ j ≤ s j . Intuitively, if one maintains that a reference test is perfect so r = y, all discordant results t = r will always be treated as errors of the index test, even though they need not be. The researcher can never observe a strictly lower error rate of the index test if the reference is assumed to be infallible. 7 That is not the case when using the method in this paper. For example, θ j > s j if P (t = j, r = j) > P s 1 ,s 0 (r = j, t = j) = s j P s 1 ,s 0 (y = j) and min P (t = j, r = 1 − j), P s 1 ,s 0 (r = 1 − j, y = j) > 0. Then θ U j = min P (t=j,r=1−j) Ps 1 ,s 0 (y=j) Ps 1 ,s 0 (r=1−j,y=j) Ps 1 ,s 0 (y=j) + s j > s j . 7. For an extreme example, assume a perfect index test (θ1, θ0) = (1, 1). Let the reference be imperfect so s1 < 1 or s0 < 1, but the researcher maintains that it is perfect. The index test will have "apparent" sensitivity and specificity equal to (s1, s0). "Apparent" operating characteristics will be treated as true operating characteristics under the assumption. Remark 2. "Apparent" measures (θ 1 ,θ 0 ) need not be contained in the identified set for (θ 1 , θ 0 ). In that sense, (θ 1 ,θ 0 ) may be over-or understating (θ 1 , θ 0 ). A relevant empirical example is found in Section 5. The identified set for (θ 1 , θ 0 ) is sharp. Encountering wide bounds on sensitivity and specificity implies that is not possible to learn the operating characteristics more precisely without additional assumptions that may be untenable, or without changing the reference test. Since the reference test is supposed to be the best available test, researchers and practitioners may have to embrace the ambiguity regarding the index test performance. Remark 3. The FDA Statistical Guidance defines a reference standard for a condition as: "The best available method for establishing the presence or absence of the target condition. ... established by opinion and practice within the medical, laboratory, and regulatory community." The guidance does not require a reference standard to be perfect, as it rarely is. When used as a reference test, the estimates may be reported as pertaining to sensitivity and specificity, even though the estimands are "apparent" measures when it is imperfect. This practice can be misleading. Tests other than the reference standard may be used as reference tests. However, then the estimates should be reported as "apparent". If one wishes to learn true test performance, they are typically not of interest. The FDA does not require or suggest any corrections that would allow researchers to form adequate estimates of the true operating characteristics in either case. The method outlined in this paper proposes a solution by forming the smallest possible bounds on the true performance measures under standard assumptions. Furthermore, the guidance emphasizes that the index test can never be shown to be superior to any reference test in conventional studies, even if it is. This issue is also addressed, since the identified set can contain values for sensitivity and specificity that are larger than the corresponding measures of the reference test. Points in the identified set H (θ 1 ,θ 0 ) (s 1 , s 0 ) derived in the previous section correspond to different nonobservable probability distributions P s 1 ,s 0 (t, r, y) that are consistent with the identified distribution P (t, r) and (s 1 , s 0 ). Until this point, no additional restrictions on the statistical dependence structure between t, r and y were imposed. Literature on gold standard bias suggests that t and r may frequently be statistically dependent conditional on y in ways that would further restrict the set of distributions P s 1 ,s 0 (t, r, y) consistent with the data, resulting in more informative identified sets for (θ 1 , θ 0 ). It is thus important to incorporate assumptions on the dependence structure into the framework. A particular kind of restrictions that researchers may be willing to consider concern the error probabilities of t conditional on r making a misclassification error for a specific value of y. The appeal of such assumptions stems from the ability to scrutinize their credibility based on shared physical properties of the two tests. Valenstein (1990) informally discusses one such restriction. The author analyzes the magnitude of the difference θ j −θ j for j = 0, 1 by means of a numerical example when the two tests have classification errors that are referred to as "highly correlated". The meaning of highly correlated errors is not formally defined, and in the numerical example the assumption is imposed as P (t = y|r = y, y) = P (t = 1 − y|r = 1 − y, y) = 1 for all y. I formalize this assumption and derive the resulting sharp identified set for (θ 1 , θ 0 ). Given that its plausibility may vary across health statuses, I also allow it to hold only for a particular value of y. Thibodeau (1981 ), Vacek (1985 , and Deneef (1987) formally analyze the difference between the corresponding true and "apparent" operating characteristics under a different type of dependence. They consider an assumption that restricts the conditional covariance so that Cov(t, r|y) ≥ 0. The condition Cov(t, r|y) ≥ 0 is equivalent to: P (t = 1, r = 1|y) ≥ P (t = 1, r = 1|y) + P (t = 1, r = 0|y) P (t = 1, r = 1|y) + P (t = 0, r = 1|y) . (11) However, expression (11) does not have a clear interpretation in terms of individual error probabilities P (t, r|y) where t = y or r = y. Determining its plausibility based on the physical characteristics of the tests may thus be more difficult in practice than for assumptions that clearly restrict particular error probabilities. Definition 1. (Tendency to wrongly agree) An index test has a tendency to wrongly agree with the reference test for disease statusȳ given (s 1 , s 0 ) if P s 1 ,s 0 (t = 1 −ȳ|r = 1 −ȳ, y =ȳ) ≥ P s 1 ,s 0 (t =ȳ|r = 1 −ȳ, y =ȳ). If an index test exhibits a tendency to wrongly agree with the reference test forȳ, conditional on the reference test making a classification error, the index test is more likely to misdiagnose the patient than to diagnose them correctly. 89 Valenstein (1990) explains that the tendency may arise if the two tests have common properties, such as the type of sample used, e.g. the same swab type. Proposition 2. Suppose that the index and reference tests have a tendency to wrongly agree only for y = j. The sharp identified setH θ j (s 1 , s 0 ) for parameter θ j j = 0, 1, given reference test sensitivity s 1 8. One can also define the tendency to correctly disagree for disease statusȳ as Ps 1 ,s 0 (t = 1 −ȳ|r = 1 −ȳ, y =ȳ) ≤ Ps 1 ,s 0 (t =ȳ|r = 1 −ȳ, y =ȳ). Identified sets that follow can be derived symmetrically. Thibodeau (1981) emphasizes that tests are generally not expected to exhibit such dependence. 9. Condition (11) is neither sufficient nor necessary for the assumption. To see this consider counterexamples where for y = 1, (P (t = 1, r = 1|y), P (t = 1, r = 0|y), P (t = 0, r = 1|y), P (t = 0, r = 0|y)) is (0.1, 0.25, 0.25, 0.4) and (0.5, 0.3, 0.1, 0.1). and specificity s 0 is an intervalH θ j (s 1 , s 0 ) = [θ L j ,θ U j ], where: θ L j = max 0, P (t = j, r = j) − P s 1 ,s 0 (r = j, y = 1 − j) for P s 1 ,s 0 (y = 1) as in (5) and P s 1 ,s 0 (r, y) = P s 1 ,s 0 (r|y)P s 1 ,s 0 (y). The corresponding sharp joint identi- If the index and reference tests have a tendency to wrongly agree for y = 0 and y = 1, the sharp joint identification regionH (θ 1 ,θ 0 ) (s 1 , s 0 ) for parameters (θ 1 , θ 0 ), given reference test sensitivity s 1 and specificity s 0 is: Proposition 2 provides sharp identified sets for (θ 1 , θ 0 ) when the researcher maintains that the tests have a tendency to wrongly agree for only one or both health statuses. The identified set given (s 1 , s 0 ) is again a line segment in [0, 1] 2 . The bounds [θ L j ,θ U j ], and [θ L j ,θ U j ] imply that the sets are reduced in size only from above compared to H (θ 1 ,θ 0 ) (s 1 , s 0 ) in Proposition 1, in the sense thatθ U j ≤ θ U j and The identified set H (θ 1 ,θ 0 ) (s 1 , s 0 ) was derived by finding all distributions P s 1 ,s 0 (t, r, y) that are consistent with the data given (s 1 , s 0 ). It thus represents a domain of consensus for the value of (θ 1 , θ 0 ) under additional assumptions restricting the set of P s 1 ,s 0 (t, r, y) that are considered to be feasible. In other words, any identified set obtained under further assumptions on the statistical dependence of t, r, and y will be a subset of H (θ 1 ,θ 0 ) (s 1 , s 0 ). Thus, the bounds in Thibodeau (1981) obtained for tests satisfying condition (11) are also subsumed under the general analysis in this paper. One case where it may be plausible to maintain the assumption that an index test has a tendency to wrongly agree with the reference for y = 1 is when using SARS-CoV-2 RT-PCR tests to evaluate performance of rapid antigen swab tests. Weissleder et al. (2020) note that RT-PCR tests typically have exceptionally high analytical sensitivities and specificities. These measure performance in contrived samples produced by the researchers, rather than clinical samples. Thus, we know that if any viral specimens are present in a test-sample, the test will return a positive result with very high probability. Arevalo-Rodriguez et al. (2020) explain that false negatives are still an issue in clinical settings due to the absence of viral specimens at the swab location. That is, it is possible that the virus simply is not present at the swabbed site of a diseased individual at the time of sampling, inducing a false negative result. Conversely, since the test is almost perfectly analytically sensitive, if it does produce a false negative result, it is highly likely that the sample did not contain any viral particles. All participants are tested with both tests by taking a swab sample typically from the same location, e.g. nasopharynx, nares or oropharynx. Suppose that the RT-PCR produces produced a false negative result, i.e. that the swab did not contain any viral particles. Then the antigen test is more likely than not to make the same error using a swab from the same location. 10 This would be equivalent to a claim that the two tests have a tendency to wrongly agree for y = 1. More examples can be found in the literature. Hadgu (1999) observes that the same assumption is credible for the ligase chain reaction (LCR) and culture tests for Chlamydia trachomatis by the same reasoning. Valenstein (1990) indicates that when determining the performance of direct immunoassay swab tests for Group A streptococci using a culture as a reference, the tendency to wrongly agree may hold for y = 1 due to inadequately obtained samples leading to false negatives. Furthermore, the same is suggested for y = 0. Patients who are ill with viral pharyngitis, but incidentally carry the bacteria elsewhere, may appear falsely positive on both tests. Vacek (1985) argues that Tine and Mantoux tuberculin tests may have a tendency to wrongly agree for any y as both rely on the antibody reaction to tuberculin. For simplicity of exposition, previously derived identified sets for (θ 1 , θ 0 ) were presented under the premise that (s 1 , s 0 ) are known exactly. That assumption might be implausible depending on the setting. Researchers may instead prefer to maintain that they do not possess exact, but rather approximate 10. We cannot maintain this with certainty, since the antigen test can still potentially falsely produce a positive result, even though there is no virus in the sample. That is, one cannot credibly claim that (r, y) = (0, 1) ⇒ (t, y) = (0, 1). knowledge of (s 1 , s 0 ). I thus relax Assumption 2 by supposing that we only have knowledge of a set S that contains true sensitivity and specificity of the reference test. Assumption 2A. Sensitivity and specificity of the reference test are contained in a known compact set Assumption 2A is a generalization of the previously used Assumption 2. Compactness of S is not relevant for identification, but it is utilized in the inference procedure defined in Section 4.2. For a fixed arbitrary element (s 1 , s 0 ) ∈ S, the identified set G (θ 1 ,θ 0 ) (s 1 , s 0 ) for (θ 1 , θ 0 ) can be found using expressions from Proposition 1, or Proposition 2, depending on which of the previously discussed assumptions the researcher is willing to maintain. Denote by G (θ 1 ,θ 0 ) (S) the corresponding identified set for (θ 1 , θ 0 ) when (s 1 , s 0 ) is known to be in S. All values (θ 1 , θ 0 ) that are found in at least one set G within a collection contains all values of (θ 1 , θ 0 ) that are consistent with the observed data and at least one (s 1 , s 0 ) ∈ S. We can formally define: given a value (s 1 , s 0 ) as defined in Proposition 1, or Proposition 2. Then Any set G (θ 1 ,θ 0 ) (s 1 , s 0 ) contains only the values of (θ 1 , θ 0 ) that are consistent with the observed data and (s 1 , s 0 ). The union of sets G (θ 1 ,θ 0 ) (s 1 , s 0 ) over all possible (s 1 , s 0 ) ∈ S then only contains the values of (θ 1 , θ 0 ) that are consistent with the observed data and at least one (s 1 , s 0 ) ∈ S. Hence, the identified set G (θ 1 ,θ 0 ) (S) is the smallest possible under the maintained assumptions. The set S may take different forms. Expected ones would include sets of finitely many values, line segments or rectangles. In general, within G (θ 1 ,θ 0 ) (S) test performance measures θ 1 and θ 0 will no longer necessarily be linearly dependent. The set G (θ 1 ,θ 0 ) (S) may not be a line segment in [0, 1] 2 , but rather a union of line segments. One might rightfully ask how it is possible to credibly come up with (s 1 , s 0 ) or S for a reference test r. To identify the performance of r by means of conventional test performance studies, one would require a different reference test whose performance would have to be known, which again would have to be determined using yet another reference test, and so on. It may seem that researchers would be entering a vicious cycle. Hence Assumption 2 or even Assumption 2A might appear untenable. Yet, knowledge of (s 1 , s 0 ) is routinely maintained by researchers in practice. Performance of certain tests used as references can be learned via alternative methods such as those in Hui and Walter (1980) , and Kanji et al. (2021) that do not require a reference test with known (s 1 , s 0 ). The latter is used to choose (s 1 , s 0 ) in Section 5 of this paper. Such methods rely on stronger assumptions than standard performance studies and are applicable only in specific settings. Still, they may allow us to find (s 1 , s 0 ) for certain reference tests. Furthermore, tests are generally expected to have precisely measured analytical performance. Woloshin, Patel, and Kesselheim (2020) explain that analytical performance may not always accurately represent clinical performance denoted by (s 1 , s 0 ). Nonetheless, it may provide some information on how the tests will perform in clinical settings. For example, Kucirka et al. (2020) consider COVID-19 RT-PCR tests to be perfectly specific due to their perfect analytical specificity. 11 Finally, it is sensible to assume that practitioners will accumulate at least some knowledge of test performance through use. Information on patient health statuses needed to do so can be obtained through means other than reference tests. Examples are autopsy reports, positive reactions to illness-specific treatment regimes, or invasive tests that are not well suited to be used as references, such as biopsies or pathology reports following prophylactic surgeries. The arguments above suggest that it is plausible that researchers may be able to come up with a value of (s 1 , s 0 ) or a set S for their test r. If no knowledge on the performance of r can be acquired or credibly assumed, Emerson et al. (2018) explain that one cannot reasonably expect to use such a test as a reference in conventional studies. A standardized procedure for choosing the appropriate (s 1 , s 0 ) or S is outside of the scope of this paper, but is an important question for future applications. Sensitivity and specificity are often used to derive other parameters of interest. Two notable examples are: 1) prevalence in a population being screened; 2) test predictive values. Both will be defined in detail in the following subsections. If the operating characteristics of the test are partially identified, the derived parameters will be too. I demonstrate how to find their corresponding identified sets and show how the particular structure of the identified set for (θ 1 , θ 0 ) affects the sharpness of bounds on prevalence. In practice, the population of interest may not be the same as the test performance study population. Test performance measurements are done in separate studies and their results are extrapolated to a different set of individuals. The researcher must find it credible that the test will perform similarly in both populations. This is an often maintained assumption both in the literature and in clinical settings, albeit implicitly. Mulherin and Miller (2002) emphasize that clinicians should consider study samples carefully to determine whether the results are generalizable to their specific patient population. For example, if the test performance has been measured on a population of patients with severe respiratory symptoms it may not be plausible to claim that the results will extrapolate readily to asymptomatic screening of some other population. However, if the operating characteristics were bounded using an asymptomatic population with similar traits, then the conclusions may be more plausible. External validity of the identified set for (θ 1 , θ 0 ) will be assumed throughout this section. 11. Specificity on contrived laboratory samples containing other pathogens, but not SARS-CoV-2. Suppose that a researcher is interested in learning the true prevalence P (y = 1) in a population that is undergoing screening using a test t. Assume that each individual is tested exactly once. This is a standard problem in epidemiology where the prevalence can be found for known identified operating characteristics θ 1 and θ 0 , as explained by Gart and Buck (1966) , Greenland (1996) and Diggle (2011) . As before, in the population of interest it follows that: Proposition 3 extends the identity above to the case when (θ 1 , θ 0 ) are partially identified. and θ H j be the smallest and largest values of θ j in the identified set. Then the sharp bounds on prevalence are: Let G θ j (s 1 , s 0 ) = {θ j : (θ 1 , θ 0 ) ∈ G (θ 1 ,θ 0 ) (s 1 , s 0 )} denote the individual bounds on θ j for j = 0, 1. The sets G θ 1 (s 1 , s 0 ) and G θ 0 (s 1 , s 0 ) are also referred to as projection bounds on θ 1 and θ 0 . Remark 4. If we were to disregard the linear structure of the sharp identified set G (θ 1 ,θ 0 ) (s 1 , s 0 ) by supposing that it is a rectangle G θ 1 (s 1 , s 0 ) × G θ 0 (s 1 , s 0 ), then the bounds on the prevalence would be: Disregarding the linear structure of the identified set for (θ 1 , θ 0 ) will yield strictly wider bounds on prevalence. Corollary 2. Let G (θ 1 ,θ 0 ) (S) be the sharp identified set for (θ 1 , θ 0 ) where (s 1 , s 0 ) ∈ S. The sharp bounds for prevalence are: when ∀(θ 1 , θ 0 ) ∈ G (θ 1 ,θ 0 ) (S) : θ 1 = 1 − θ 0 , and P (y = 1) ∈ [0, 1] otherwise. If the shape of G (θ 1 ,θ 0 ) (S) was disregarded by assuming that the identified set was a rectangle, bounds Π S analogous to the ones in (19) can still be formed, and it would hold that Π S ⊂Π S . As a result of high communicability of the SARS-CoV-2 virus, identification of population prevalence through testing has become an important goal for various institutions, states and countries. Daily positivity rates are used to decide upon further mitigation measures that will be implemented. The rates are treated as a measure of prevalence, though they are not the same when the tests are imperfect. The use of such a heuristic to make quick decisions is not surprising, given that selection into testing makes it difficult to precisely measure the true prevalence, as Manski and Molinari (2021) explain. In this setting, the bounds in (18) and (20) do not hold. Stoye (2022) provides the appropriate bounds on prevalence relying on known bounds on sensitivity and specificity, where the identified sets for (θ 1 , θ 0 ) derived here are natural inputs. Some institutions do mandate regular population-level or random screening, in which case the (18) and (20) may hold. Many universities have mandatory COVID-19 antigen test screening that is conducted on a random subset or on all students. If each student is tested exactly once, bounds on the prevalence given above are valid. If such testing is mandated with regular frequency, then formulating a time series of prevalence bounds is also possible. Adaptation of the method when each individual may be tested repeatedly is left for future research. Positive predictive value (PPV) is the probability that a patient is diseased conditional on receiving a positive test result. Negative predictive value (NPV) is the probability that a patient who has tested negative is truly healthy. Clinicians are usually more concerned with knowing predictive values of a test t than its sensitivity and specificity. As Watson, Whiting, and Brush (2020) explain, the probability of the patient being diseased prior to observing a test result is referred to as a pre-test probability. For a known pre-test probability, sensitivity and specificity, the predictive values can be found using Bayes' theorem. Clinicians settle on a pre-test probability using the knowledge of local rates of infection and patients' symptoms and characteristics. I denote it as π X = P (y = 1|X) where X stands for a vector of covariates observed by the clinician. Manski (2020) provides bounds on predictive values for COVID-19 antibody tests using point identified values of θ 1 and θ 0 , when the pre-test probability π X is bounded. The author notes that the analysis can be generalized to take bounds rather than exact values of θ 1 and θ 0 as inputs. Ziegler (2021) extends the analysis of predictive values when θ 1 and θ 0 are partially identified due to an imperfect reference test, assuming that s 0 = 1. The bounds below do not require that s 0 = 1 in the performance study. The predictive values are defined as: Note that in (21), θ 1 and θ 0 appear to be independent of X. This is not generally true according to Willis (2008) . It is conceivable that for patients with severe symptoms tests may exhibit higher sensitivity compared to that for patients with mild clinical manifestations. However, this question is primarily one of external validity of (θ 1 , θ 0 ) rather than of independence. Mulherin and Miller (2002) clarify that clinicians should consider study samples used to find (θ 1 , θ 0 ) carefully to determine whether the results are generalizable to their specific patient population. The omission of X does not mean that sensitivity or specificity do not depend on it, but that their measurements in (21) have been made in study populations with similar relevant traits in X. I follow this practice and keep X implicit. Assume that the identification region G (θ 1 ,θ 0 ) (s 1 , s 0 ) for (θ 1 , θ 0 ) and the pre-test probability of the clinician π X are known. From (21), it can be seen that both PPV and NPV increase with θ 1 and θ 0 . Thus, the bounds are: If the clinician is not willing to settle on a single value of π X , rather on a range of values π X ∈ [π L , π H ], the bounds are simply: The bounds are generalizable analogously to the previously outlined case for bounding prevalence when when the identification region Identified sets for (θ 1 , θ 0 ) in Section 2 can be found when P (t, r) is fully known. In practice, researchers must use sample data to estimate the identified set and do inference on the points in the set. I demonstrate consistent estimation of the identified set and construction of confidence sets for the points in the identified set that are uniformly consistent in level over a family of permissible distributions that is relevant in the application of this paper. . . , n constitute the observed data of n i.i.d observations from the distribution P (t, r) ∈ P, where P is a family of categorical distributions with 4 categories. Let G (θ 1 ,θ 0 ) (s 1 , s 0 ) denote an arbitrary identified set for (θ 1 , θ 0 ) given (s 1 , s 0 ) from any of the propositions above, and G θ j (s 1 , s 0 ) the corresponding identified set for θ j with j = 0, 1. A natural way of estimating is to replace the population parameters in the closed form expressions for the bounds with consistent sample estimators. This is known as a plug-in estimator of the identified set. Let 1{·} denote the indicator function. Suppose first that (s 1 , s 0 ) are known. Under the assumptions, CombiningP (t = j, r = k) with the knowledge of (s 1 , s 0 ) yieldsP s 1 ,s 0 (r = k, y = l) for every k, l ∈ {0, 1} 2 . Next, the plug-in estimatorĜ θ j (s 1 , s 0 ) for the identified set of a single parameter θ j follows immediately by inputtingP (t = j, r = k) andP s 1 ,s 0 (r = k, y = l) into the bounds in Proposition 1, or Proposition 2. Finally, (10), (13), or (14) give the consistent plug-in estimatorĜ (θ 1 ,θ 0 ) (s 1 , s 0 ) of the joint identified set for (θ 1 , θ 0 ). Remark 5. Manski and Pepper (1998) note that consistency of plug in estimators when the bounds consist of maxima and minima of population parameters is easy to establish, as long as the parameters can be consistently estimated. The plug-in estimator of the bounds is consistent for the true identified set in the sense that the Hausdorff distance d H Ĝ (θ 1 ,θ 0 ) (s 1 , s 0 ), To see this, note thatP (t = j, r = k) p − → P (t = j, r = k),P (t = j) p − → P (t = j), and hencê P s 1 ,s 0 (r = k, y = l) p − → P s 1 ,s 0 (r = k, y = l) by the continuous mapping theorem as n − → ∞. De- The continuity of the maximum and minimum imply thatθ L Ps 1 ,s 0 (y=0) and the facts thatP s 1 ,s 0 (y = 1) = P (t=1)+s 0 −1 s 1 +s 0 −1 In the case when (s 1 , s 0 ) are only known to be bounded by some compact set S, one can obtain the consistent estimatorĜ (θ 1 ,θ 0 ) (S) = (s 1 ,s 0 )∈SĜ (θ 1 ,θ 0 ) (s 1 , s 0 ). This is done by finding a union of G (θ 1 ,θ 0 ) (s 1 , s 0 ) over a fine grid of (s 1 , s 0 ) covering S. The procedure requires two nested grid-search algorithms, and the level of coarseness of the two grids can impact computation time. 12. For sets A and B that are closed subsets of R 2 , the Hausdorff distance is defined as dH A, B) = max sup a∈A inf b∈B ρ(a, b), sup b∈B infa∈A ρ(a, b) , where ρ(·, ·) is some metric defined on R 2 . All diagnostic performance studies must report confidence intervals for θ 1 and θ 0 according to the FDA Statistical Guidance on Reporting Results Evaluating Diagnostic Tests. I show how one can use the method for inference based on moment inequalities from Romano, Shaikh, and Wolf (2014) to form confidence sets that cover the true parameters with at least some pre-specified probability 1−α and that are uniformly consistent over a family of permissible distributions P that is relevant in the application. Let C n be the confidence set of interest and let Θ(P ) = (s 1 ,s 0 )∈S G (θ 1 ,θ 0 ) (s 1 , s 0 ) × {(s 1 , s 0 )} be an identification region for θ = (θ 1 , θ 0 , s 1 , s 0 ) that depends on P ∈ P, and where S can be a singleton. 13 Note that θ includes reference test performance measures (s 1 , s 0 ). This is done to facilitate convenient definition of moment inequalities that represent the identified set of interest, regardless of whether (s 1 , s 0 ) are known exactly or not. The confidence set C n should satisfy: Canay and Shaikh (2017) provide an overview of the recent advances in inference based on moment inequalities that are focused on finding C n in partially identified models. They underline the importance of uniform consistency of C n in level in these settings. If it fails, it is possible to construct a distribution of the data P (t, r) such that for any sample size, finite-sample coverage probability of the confidence set is arbitrarily low. In that sense, inference based on confidence intervals that are consistent only pointwise in level may be severely misleading in finite samples. To exploit existing inference methods based on moment inequalities to construct C n , the identified set Θ(P ) must be equivalent to some setΘ(P ): where m j (W i , θ) for j ∈ J 1 ∪ J 2 are the components of a random function m : {0, 1} 2 × [0, 1] 2 × S − → R k such that |J 1 | + |J 2 | = k. Construction of the uniformly consistent confidence set for points in the identified setΘ(P ) is done by imposing a fine grid over the parameter space [0, 1] 2 × S for θ and performing test inversion using inference methods such as those in Andrews and Soares (2010), Andrews and Barwick (2012), Romano, Shaikh, and Wolf (2014) , and Chernozhukov, Chetverikov, and Kato (2019) . 14 Identified sets derived in the previous section are representable by (26). Focus in particular on the 13. More precisely, we are interested in Cn for the points in G (θ 1 ,θ 0 ) (S) = (s 1 ,s 0 )∈S G (θ 1 ,θ 0 ) (s1, s0) = {(θ1, θ0) : θ ∈ Θ(P )}. When P (t, r) is known, whether one defines the identified set as G (θ 1 ,θ 0 ) (S) or Θ(P ) is inconsequential. 14. We wish to find the confidence set for θ = (θ1, θ0). When S is a singleton, then the distinction between the confidence sets for θ and θ is immaterial. When S is not a singleton, the projection of the confidence set for θ may be conservative, in which case subvector inference methods outlined in Bugni, Canay, and Shi (2017) and Kaido, Molinari, and Stoye (2019) may exhibit higher power. However, they also warrant additional assumptions on P. Since the full parameter space is low-dimensional, I limit myself to projections in this paper. bounds for θ 1 in Proposition 1 given (s 1 , s 0 ) for intuition. The bounds were: θ 1 ∈ max 0, P (t = 1, r = 1) − P s 1 ,s 0 (r = 1, y = 0) + max 0, P s 1 ,s 0 (r = 0, y = 1) − P (t = 0, r = 0) , min P (t = 1, r = 0), P s 1 ,s 0 (r = 0, y = 1) + min P (t = 1, r = 1), P s 1 ,s 0 (r = 1, y = 1) 1 P s 1 ,s 0 (y = 1) . Note Proposition 4. Let the moment function m be: Moment inequalities defined by m represent the joint identification region Θ(P ) = (s 1 ,s 0 )∈S H (θ 1 ,θ 0 ) (s 1 , s 0 )× {(s 1 , s 0 )} for H (θ 1 ,θ 0 ) (s 1 , s 0 ) defined in Proposition 1. For each θ ∈ [0, 1] 2 ×S such that E P m j (W i , θ) ≤ 0 for j = 1, . . . , 6 and E P m 7 (W i , θ) = 0, it must be that θ ∈ Θ(P ). Conversely, if θ ∈ Θ(P ), then The system in (27) is simple to adapt to the cases considered in Section 2.3. Suppose that the index and reference tests have a tendency to wrongly agree only for y = 1 as in Proposition 2. Again note that there are three non-trivial values that are lower bounds, identical to the ones in the previous case. 15. The bounds on θ1 are sums of intersection bounds on P (t = 1, r = j, y = 1) over j = 0, 1. An alternative route may be to augment the approach of Chernozhukov, Lee, and Rosen (2013) . In the current form, it is unable to capture the linear relationship between (θ1, θ0) which is shown to be important for sharp bounds on derived parameters in Section 3. Inference methods based on moment inequalities require no adaptation and are a natural choice in this setting. θ 1 P s 1 ,s 0 (y = 1) ≤ P s 1 ,s 0 (r = 0, y = 1) 2 + P s 1 ,s 0 (r = 1, y = 1) = 1 + s 1 2 P s 1 ,s 0 (y = 1) There are no parameters pertaining to the population distribution in (28). This is a restriction on the parameter space for θ 1 . It then holds that there are three relevant cases for the upper bound on θ 1 , when the parameter space is appropriately limited. More precisely, since θ 1 can now only take values θ 1 ∈ [0, 1+s 1 2 ], the relevant parameter space for θ when the two tests have a tendency to wrongly agree Remark 6. The restriction on the parameter space when the two tests have a tendency to wrongly agree for y = 1 still allows θ 1 to be higher than s 1 , but not by more than 1−s 1 2 . Proposition 5. Assume that the index and reference tests have a tendency to wrongly agree only for y = 1. Let the moment functionm 1 be: Moment inequalities and equalities defined bym 1 for J 1 = {1, . . . , 6} and J 2 = {7} represent the joint identification region Θ(P ) = (s 1 ,s 0 )∈S H (θ 1 ,θ 0 ) (s 1 , s 0 ) × {(s 1 , s 0 )} forH (θ 1 ,θ 0 ) (s 1 , s 0 ) defined in Proposition 2 for y = 1. For each θ ∈ (s 1 ,s 0 )∈S [0, 1+s 1 2 ] × [0, 1] × {(s 1 , s 0 )} such that E P m 1 j (W i , θ) ≤ 0 for j = 1, . . . , 6 and E P m 1 7 (W i , θ) = 0, it must be that θ ∈ Θ(P ). Conversely, if θ ∈ Θ(P ), then E P m 1 j (W i , θ) ≤ 0 for j = 1, . . . , 6 and E P m 1 7 (W i , θ) = 0. Similarly, it is possible to define moment inequality functions that represent remaining identified sets in Proposition 2. They are found in equations (31), and (32) in Appendix A. Romano, Shaikh, and Wolf (2014), Theorem 3.1 provides sufficient conditions for uniform consistency of confidence sets over a large family of distributions. Assumption 4 defines a family P to which the conclusions of Theorem 3.1 apply. This is demonstrated by Theorem 1 below. Assumption 4. There exists a number ε > 0 such that P (t = j, r = k) ≥ ε for all (j, k) ∈ {0, 1} 2 and any P (t, r) ∈ P. The assumption restricts P to distributions P (t, r) such that all outcomes (t, r) ∈ {0, 1} 2 have probability that is bounded away from zero. It serves a technical purpose, ensuring that the uniform integrability condition required by Romano, Shaikh, and Wolf (2014) , Theorem 3.1 holds. The assumption appears reasonable in the analyzed data, as discussed in Section 5. 1. V ar P (m j (W i , θ)) > 0 and for all P ∈ P and θ ∈ [0, 1] 2 × S; where µ j (θ, P ) = E P (m j (W i , θ)) and σ j (θ, P ) = V ar P (m j (W i , θ)). Theorem 1 enables us to use the inference method from Romano, Shaikh, and Wolf (2014) to construct confidence sets C n for points (θ 1 , θ 0 ) in the identified sets defined by Proposition 1 and Proposition 2 that satisfy (25) when the relevant family of population distributions conforms to Assumption 4. In this section, I apply the developed method to existing study data to provide confidence and estimated identified sets for (θ 1 , θ 0 ) of the rapid antigen COVID-19 test with the currently highest market share in the United States -Abbott BinaxNOW COVID-19 Ag2 CARD test. Testing has been a crucial containment measure during the SARS-CoV-2 pandemic. for a value θ and components of the moment function m j (W i , θ) j = 1, . . . , k. The testing procedure consists of two steps: 1) Construction of confidence regions for the moments; 2) Formation of a critical value incorporating information on which moment inequalities are "negative". I perform test inversion over a fine grid of 10 5 points for the relevant parameter space for (θ 1 , θ 0 ), and additionally over 10 points over S, where applicable. Following the original paper, I use 500 bootstrap samples to find the critical values and set β = α/10. The results do not change significantly with alternative values β = α/5 and β = α/20. To apply the method, one must first need to determine a credible set of values (s 1 , s 0 ) ∈ S for the reference RT-PCR test. Following Kucirka et al. (2020) who cite perfect analytical specificity, I maintain that s 0 = 1. The same assumption has been used in existing work, such as Manski (2020) In the absence of a perfect gold standard, it is impossible to identify sensitivity of the RT-PCR tests by means of a conventional diagnostic test performance study. Some studies use alternative approaches to estimate the parameter of interest. Kanji et al. (2021) provide a discordant result analysis of the RT-PCR test used for frontline testing of symptomatic individuals in Alberta, Canada. The authors define discordant results as initial negative RT-PCR findings followed by a positive test result within the incubation period. The initial negative samples were retested by three alternative RT-PCR assays 17. Link: https://www.fda.gov/media/140615/download 18. Link: https://www.fda.gov/media/141570/download targeting different genes. If at least one alternative test yielded a positive result, the initial result was treated as a false negative finding. Assuming perfect specificities of each of the three alternative tests, and perfect sensitivity of the combined testing procedure, they estimate the sensitivity of the used RT-PCR test at 90.3%. Arevalo-Rodriguez et al. (2020) use data from 34 observational studies to estimate false negative rates, defining false negatives to be patients who were symptomatic and negative, but subsequently positive on the same or different RT-PCR test within the incubation period. They emphasize that the estimates obtained on some of the studies can be severely biased and state that the corresponding findings may have "very low certainty of evidence". There are two estimates based on data from the United States, only one of which they do not consider to be at high risk of being biased. That estimate is 10%, yielding the corresponding estimate of sensitivity of 90%. Following the two references, I assume that s 1 = 0.9. RT-PCR tests may differ in terms of sensitivity. Fitzpatrick et al. (2021) stress that the fact that studies often do not specify the used RT-PCR test may be a source of additional ambiguity. The majority of estimates obtained from the 34 data sets used by Arevalo-Rodriguez et al. (2020) indicate that the sensitivity may be lower than 90% for some tests. To accommodate for that possibility in the studies analyzed here, I further assume that the corresponding false negative rate may be up to twice as high as the one implied by s 1 = 0.9. Therefore, I also provide results assuming s 1 ∈ [0.8, 0.9]. The set S is either S = {(0.9, 1)} or S = [0.8, 0.9] × {1}. Finally, I assume that the antigen and RT-PCR tests have a tendency to wrongly agree only for y = 1 and all assumed (s 1 , s 0 ), following the reasoning outlined in Section 2.3. The plausibility of the tendency to wrongly agree for y = 0 is difficult to establish. Therefore, I do not maintain that assumption. The study measuring the performance is outlined in the EUA documentation and the instructions for where 929 were tested within 7 days of symptom onset. I omit the symptomatic individuals tested more than 7 days after initial symptoms for comparability with the EUA study. I also separately analyze the performance on 877 asymptomatic participants to provide plausible estimates of performance in the absence of symptoms. The data are summarized in Table 1 . In all three samples, estimates of joint probabilitiesP (t = j, r = k) for (r, k) ∈ {0, 1} 2 are bounded away from zero. I find it reasonable to maintain that population distributions which generated the observed samples are found in a family P for which Assumption 4 holds. The original EUA was granted based on interim results of the study in which the test exhibited estimated "apparent" sensitivity and specificity of (91.7%, 100%), implying an estimated "apparent" false negative rate of 8.3%. Observe that this is lower than the estimated false negative rate for certain (t i , r i ) In both panels the red dot representing the estimate of "apparent" measures is outside the confidence set for (θ 1 , θ 0 ). While this is not a formal test of equality of two random vectors, we can still see that at the 5% significance level the hypothesis H 0 : (θ 1 , θ 0 ) = (84.6%, 98.5%) would be rejected. In other words, under the assumptions, the true sensitivity and specificity are not jointly equal to currently often cited "apparent" values (84.6%, 98.5%) at the ubiquitous level of significance. The same argument holds for the interim "apparent" estimates (91.7%, 100%). Remark 7. The results from the EUA study data show that under the assumptions and depending on 19. For example: https://www.bloomberg.com/press-releases/2020-12-16/abbott-s-binaxnow-covid-19-rapid-testreceives-fda-emergency-use-authorization-for-first-virtually-guided-at-home-rapid-test-u. 20. Assuming lower values of s1 with s0 = 1 further increases the differences. interpretation, the test may not satisfy FDA's original requirement of at least 80% estimated sensitivity. In that case, the FDA requires further exploration of alternative methods of test use, such as serial testing. Remark 8. The estimated false negative rate in the EUA study is between 20% and 23.9% for (s 1 , s 0 ) = (0.9, 1). These are 1.3 and 1.55 times larger than the corresponding estimates of "apparent" false negative rate in the final EUA study. Comparing to often-cited interim results, the estimate of the false negative rate is between 2.41 and 2.88 times larger than the "apparent" analog. Relaxing the assumption to imperfectly known (s 1 , s 0 ) further magnifies the difference. The estimated false negative rate is then up to 3.89 times higher than the analogous interim result. The average number of people who are infected and missed by the antigen test is up to 3.89 higher than the test users may be led to believe by the reported "apparent" estimates. Panels (a) low credibility according to the authors. Hadgu (1999) highlights in their critique of discrepant analysis that the an errors in measurement of 2.9 percentage points for sensitivity and 2.8 percentage points for specificity are significant. The differences I find in this paper between the estimates of "apparent" and true sensitivity are substantially larger under plausible assumptions. This section defines moment functions for identified sets in Proposition 2 when the tests have a tendency to wrongly agree only for y = 0, and for both y = 1 and y = 0. Assume first that the index and reference tests have a tendency to wrongly agree only for y = 0. Following the reasoning in Section 4.2 we decompose the bounds on θ 0 to construct the appropriate moment inequalities. As in the case when the tests have a tendency to wrongly agree only for y = 1, the three non-trivial lower-bound values are identical to the ones when there is no tendency to wrongly agree for any y. There are four cases for the upper bound, one of which is: θ 0 P s 1 ,s 0 (y = 0) ≤ P s 1 ,s 0 (r = 1, y = 0) 2 + P s 1 ,s 0 (r = 0, y = 0) = 1 + s 0 2 P s 1 ,s 0 (y = 0) Again, this is a restriction on the parameter space, since it states only that θ 0 ∈ [0, 1+s 0 2 ]. The relevant parameter space for θ when the two tests have a tendency to wrongly agree for y = 0 is The restriction allows θ 0 > s 0 , but not by more than 1−s 0 2 . Remark 9. If the index and reference tests have a tendency to wrongly agree only for y = 0, then the functionm 0 defining moment inequalities that represent the corresponding identified set for θ ∈ (s 1 ,s 0 )∈S [0, 1] × [0, 1+s 0 2 ] × {(s 1 , s 0 )} would be: The proof is analogous to that of Proposition 5. Finally, the same steps yield a moment function that defines the identified set when the tests have a tendency to wrongly agree for both y = 1 and y = 0. As in the case where the tendency exists only for y = 1, the appropriate parameter space is θ ∈ (s 1 ,s 0 )∈S [0, 1+s 1 2 ] × [0, 1] × {(s 1 , s 0 )}. Proposition 6. Assume that the index and reference tests have a tendency to wrongly agree for y = 1 and y = 0. Let the moment functionm be equal tom 1 in (29) in all components exceptm 4 (W i , θ), and m 6 (W i , θ):m . . , 6 and E P m 7 (W i , θ) = 0, it must be that θ ∈ Θ(P ). Conversely, if θ ∈ Θ(P ), then E P m j (W i , θ) ≤ 0 for j = 1, . . . , 6 and E P m 7 (W i , θ) = 0. Proposition 1. The sharp identified set H θ j (s 1 , s 0 ) for parameter θ j and j = 0, 1, given reference test sensitivity s 1 and specificity s 0 is an interval H θ j (s 1 , s 0 ) = [θ L j , θ U j ], where: θ U j = min P (t = j, r = 1 − j), P s 1 ,s 0 (r = 1 − j, y = j) + min P (t = j, r = j), P s 1 ,s 0 (r = j, y = j) 1 P s 1 ,s 0 (y = j) for P s 1 ,s 0 (y = 1) as in (5) and P s 1 ,s 0 (r, y) = P s 1 ,s 0 (r|y)P s 1 ,s 0 (y). The sharp identified set H (θ 1 ,θ 0 ) (s 1 , s 0 ) for (θ 1 , θ 0 ) given reference test sensitivity s 1 and specificity s 0 is: Proof of Proposition 1. The proof follows through a series of claims. Claim 1. Bounds on P s 1 ,s 0 (t = j, r = k, y = l) for any (j, k, l) ∈ {0, 1} 3 are: P s 1 ,s 0 (t = j, r = k, y = l) ∈ max 0, P s 1 ,s 0 (r = k, y = l) − P (t = 1 − j, r = k) , min P (t = j, r = k), P s 1 ,s 0 (r = k, y = l) . Proof. Probability P s 1 ,s 0 (t = j, r = k, y = l) for any (j, k, l) ∈ {0, 1} 3 is the probability of the intersection of events P s 1 ,s 0 ({t = j, r = k} ∩ {r = k, y = l}). An upper bound on P s 1 ,s 0 (t = j, r = k, y = l) is then: P s 1 ,s 0 ({t = j, r = k} ∩ {r = k, y = l}) ≤ min P (t = j, r = k), P s 1 ,s 0 (r = k, y = l) . The upper (34) holds for any (j, k, l) ∈ {0, 1} 3 . The lower bound on P s 1 ,s 0 (t = j, r = k, y = l) is then: P s 1 ,s 0 (t = j, r = k, y = l) = P (t = j, r = k) − P s 1 ,s 0 (t = j, r = k, y = 1 − l) ≥ P (t = j, r = k) − min P (t = j, r = k), P s 1 ,s 0 (r = k, y = 1 − l) = max 0, P (t = j, r = k) − P s 1 ,s 0 (r = k, y = 1 − l) Suppressing the subscript in P s 1 ,s 0 for clarity, the final line of (35) follows from: Claim 2. Bounds (34) on P s 1 ,s 0 (t = j, r = j, y = j) and P s 1 ,s 0 (t = j, r = 1 − j, y = j) are sharp. Bounds are independent in the sense that any pair of points within the two bounds is attainable. Proof. Write all eight joint and observable probabilities as a matrix equation: P s 1 ,s 0 (t = 1, r = 1, y = 1) P s 1 ,s 0 (t = 1, r = 1, y = 0) P s 1 ,s 0 (t = 0, r = 1, y = 1) P s 1 ,s 0 (t = 0, r = 1, y = 0) P s 1 ,s 0 (t = 1, r = 0, y = 1) P s 1 ,s 0 (t = 1, r = 0, y = 0) P s 1 ,s 0 (t = 0, r = 0, y = 1) P s 1 ,s 0 (r = 1, y = 1) P (t = 0, r = 1) P s 1 ,s 0 (r = 1, y = 0) P (t = 1, r = 0) P s 1 ,s 0 (r = 0, y = 1) Matrix A has rank 6. The bottom four rows cannot be represented as a linear combination using any of the top four rows. The bottom four rows are only mutually linearly dependent. Similarly, the top four rows are only mutually linearly dependent. Including marginal probabilities P (t) and P (r) will not change this structure. Therefore, the value of P s 1 ,s 0 (t = j, r = 1 − k, y = l) does not affect the values of P s 1 ,s 0 (t = j, r = k, y = l) for (j, l) ∈ {0, 1} 2 within their respective bounds. There exist two separate systems of equations, one for each value of r. Focus on one system for an arbitrary r = k: Matrix A has rank 3. I show that both the upper and lower bounds on any of the joint probabilities P s 1 ,s 0 (t = j, r = k, y = l) in (38) are attainable for (j, l) ∈ {0, 1} 2 . Focus on P s 1 ,s 0 (t = j, r = k, y = j). Assume that it is equal to its upper bound, P s 1 ,s 0 (t = j, r = k, y = j) = min P (t = j, r = k), P s 1 ,s 0 (r = k, y = j) . Let first P (t = j, r = k) < P s 1 ,s 0 (r = k, y = j). From (36), P s 1 ,s 0 (r = k, y = 1 − j) < P (t = 1 − j, r = k). Then from (38): P s 1 ,s 0 (t = j, r = k, y = j) = P (t = j, r = k) P s 1 ,s 0 (t = j, r = k, y = 1 − j) = 0 P s 1 ,s 0 (t = 1 − j, r = k, y = j) = P s 1 ,s 0 (r = k, y = j) − P (t = j, r = k) P s 1 ,s 0 (t = 1 − j, r = k, y = 1 − j) = P s 1 ,s 0 (r = k, y = 1 − j). By assumption, P s 1 ,s 0 (t = j, r = k, y = j) is equal to its upper bound. Consequently, P s 1 ,s 0 (t = j, r = k, y = 1−j) is equal to 0 = max(0, P s 1 ,s 0 (r = k, y = 1−j)−P (t = 1−j, r = k)) which is its lower bound. Similarly, P s 1 ,s 0 (t = 1 − j, r = k, y = j) = P s 1 ,s 0 (r = k, y = j) − P (t = j, r = k) = max(0, P s 1 ,s 0 (r = k, y = j) − P (t = j, r = k)), which is its lower bound. Finally, P s 1 ,s 0 (t = 1 − j, r = k, y = 1 − j) = P s 1 ,s 0 (r = k, y = 1 − j) = min(P (t = 1 − j, r = k), P s 1 ,s 0 (r = k, y = 1 − j)), representing the upper bound. All four probabilities achieve their corresponding upper and lower bounds. Let now P (t = j, r = k) ≥ P s 1 ,s 0 (r = k, y = j), or equivalently P s 1 ,s 0 (r = k, y = 1 − j) ≥ P (t = 1 − j, r = k). The system then is: P s 1 ,s 0 (t = j, r = k, y = j) = P s 1 ,s 0 (r = k, y = j) P s 1 ,s 0 (t = j, r = k, y = 1 − j) = P (t = j, r = k) − P s 1 ,s 0 (r = k, y = j) P s 1 ,s 0 (t = 1 − j, r = k, y = j) = 0 P s 1 ,s 0 (t = 1 − j, r = k, y = 1 − j) = P (t = 1 − j, r = k). As before, P s 1 ,s 0 (t = j, r = k, y = j) and P s 1 ,s 0 (t = 1 − j, r = k, y = 1 − j) are equal to their respective upper bounds. P s 1 ,s 0 (t = j, r = k, y = 1 − j) and P s 1 ,s 0 (t = 1 − j, r = k, y = j) attain the lower bounds. That P s 1 ,s 0 (t = j, r = k, y = j) and P s 1 ,s 0 (t = 1 − j, r = k, y = 1 − j) attain lower bounds when P s 1 ,s 0 (t = j, r = k, y = 1 − j) and P s 1 ,s 0 (t = 1 − j, r = k, y = j) are equal to their upper bounds can be shown symmetrically. Thus, for an arbitrary r = k, all probabilities can be equal to their upper and lower bounds. From (38), reducing any probability that is on the upper bound will lead to an increase in the probabilities at lower bounds and a decrease in the remaining probability at the upper bound. Any value in the interior of the bounds must be feasible. Therefore, the bounds (33) must be sharp for P (t = j, r = k, y = l) and any (j, l) ∈ {0, 1} 2 . This is true for an arbitrary r = k, hence the bounds are sharp for any P (t = j, r = k, y = l) such that (j, k, l) ∈ {0, 1} 3 . Finally, from (37), the value which P s 1 ,s 0 (t = j, r = j, y = j) takes does not influence the value of P s 1 ,s 0 (t = j, r = 1 − j, y = j). Any pair of values coming from the Cartesian product of the bounds on the two probabilities is feasible. By Claim 2, the sharp bounds on P s 1 ,s 0 (t = j, y = 1) = P s 1 ,s 0 (t = j, r = j, y = j) + P s 1 ,s 0 (t = j, r = 1 − j, y = j) are a sum of the sharp bounds on individual probabilities. Hence, the sharp bounds on θ j are: Claim 3. The sharp joint identified set for (θ 1 , θ 0 ) is: H (θ 1 ,θ 0 ) (s 1 , s 0 ) = (t 1 , t 0 ) : t 0 = t 1 P s 1 ,s 0 (y = 1) P s 1 ,s 0 (y = 0) + 1 − P (t = 1) P s 1 ,s 0 (y = 0) , t 1 ∈ H θ 1 (s 1 , s 0 ) . P (t = 1) = P s 1 ,s 0 (t = 1, y = 1) + P s 1 ,s 0 (t = 1, y = 0) = = θ 1 P s 1 ,s 0 (y = 1) + P s 1 ,s 0 (y = 0) − θ 0 P s 1 ,s 0 (y = 0). For any value t 1 ∈ H θ 1 (s 1 , s 0 ), it must be that t 0 P s 1 ,s 0 (y = 0) = t 1 P s 1 ,s 0 (y = 1)+P s 1 ,s 0 (y = 0)−P (t = 1). Since H θ 1 (s 1 , s 0 ) is sharp, H (θ 1 ,θ 0 ) (s 1 , s 0 ) is a sharp joint identification region for (θ 1 , θ 0 ). Proposition 2. Suppose that the index and reference tests have a tendency to wrongly agree only for y = j. The sharp identified setH θ j (s 1 , s 0 ) for parameter θ j j = 0, 1, given reference test sensitivity s 1 and specificity s 0 is an intervalH θ j (s 1 , s 0 ) = [θ L j ,θ U j ], where: θ L j = max 0, P (t = j, r = j) − P s 1 ,s 0 (r = j, y = 1 − j) P s 1 ,s 0 (r = 1 − j, y = j) 2 + min P (t = j, r = j), P s 1 ,s 0 (r = j, y = j) 1 P s 1 ,s 0 (y = j) . (12) for P s 1 ,s 0 (y = 1) as in (5) and P s 1 ,s 0 (r, y) = P s 1 ,s 0 (r|y)P s 1 ,s 0 (y). The corresponding sharp joint identification regionH (θ 1 ,θ 0 ) (s 1 , s 0 ) for (θ 1 , θ 0 ) is: If the index and reference tests have a tendency to wrongly agree for y = 0 and y = 1, the sharp joint identification regionH (θ 1 ,θ 0 ) (s 1 , s 0 ) for parameters (θ 1 , θ 0 ), given reference test sensitivity s 1 and specificity s 0 is: H (θ 1 ,θ 0 ) (s 1 , s 0 ) = (t 1 , t 0 ) : t 0 = t 1 P s 1 ,s 0 (y = 1) P s 1 ,s 0 (y = 0) + 1 − P (t = 1) P s 1 ,s 0 (y = 0) , t 1 ∈H (θ 1 ) (s 1 , s 0 ) , whereH (θ 1 ) (s 1 , s 0 ) = [θ L 1 ,θ U 1 ], for: θ L j = max 0, P (t = j, r = j) − P s 1 ,s 0 (r = j, y = 1 − j) Proof of Proposition 2. First, I prove a lemma used below. The proof then follows through a series of claims. Lemma 1. The index test has a tendency to wrongly agree with the reference test for y = j for a given (s 1 , s 0 ), if and only if P s 1 ,s 0 (t = 1 − j, r = 1 − j, y = j) ≥ Ps 1 ,s 0 (r=1−j,y=j) 2 . Proof. It holds that P s 1 ,s 0 (t = 1−j, r = 1−j, y = j)+P s 1 ,s 0 (t = j, r = 1−j, y = j) = P (r = 1−j, y = j). For sufficiency, note that 2P s 1 ,s 0 (t = 1 − j, r = 1 − j, y = j) = P s 1 ,s 0 (r = 1 − j, y = j) − P s 1 ,s 0 (t = j, r = 1 − j, y = j) + P s 1 ,s 0 (t = 1 − j, r = 1 − j, y = j) ≥ P s 1 ,s 0 (r = 1 − j, y = j), since by assumption P s 1 ,s 0 (t = 1 − j, r = 1 − j, y = j) ≥ P s 1 ,s 0 (t = j, r = 1 − j, y = j). Necessity is immediate. Claim 4. Assume that the tests have a tendency to wrongly agree only for y = j. The sharp identified set for (θ 1 , θ 0 ) isH (θ 1 ,θ 0 ) (s 1 , s 0 ). Proof. From Lemma 1, P s 1 ,s 0 (t = 1 − j, r = 1 − j, y = j) ≥ Ps 1 ,s 0 (r=1−j,y=j) 2 . Then, P s 1 ,s 0 (t = j, r = 1 − j, y = j) ≤ Ps 1 ,s 0 (r=1−j,y=j) 2 ≤ P s 1 ,s 0 (r = 1 − j, y = j). Using this and following the steps taken to obtain (34): The lower bound on P s 1 ,s 0 (t = j, r = 1 − j, y = j) is derived from the upper bound on P s 1 ,s 0 (t = 1 − j, r = 1 − j, y = j) which is unaffected by the assumption. Substituting the upper bound into the system (38) yields the lower bound P s 1 ,s 0 (t = j, r = 1 − j, y = j) ≥ max 0, P s 1 ,s 0 (r = 1 − j, y = j) − P (t = 1 − j, r = 1 − j) , as in (35). For the bounds defined by (35) and (43) on P s 1 ,s 0 (t = j, r = 1 − j, y = j) to be sharp, all values contained between them must be feasible for a given population distribution. The lower bound is identical as in Proposition 1. The upper bound in (43) is at most as large as the upper bound (34) in Proposition 1. Thus, all points in the bounds on P s 1 ,s 0 (t = j, r = 1 − j, y = j) are attainable by the same argument as in Claim 2 in the proof of Proposition 1. Hence, the bounds defined by (35) and (43) are sharp. Sharp bounds on probabilities P s 1 ,s 0 (t = k, r = j, y = l) from (33) are unaffected by the assumption for (k, l) ∈ {0, 1} 2 as they form an independent system of equations from (37). Using the reasoning in Claims 2, and 3 of Proposition 1,H (θ 1 ,θ 0 ) (s 1 , s 0 ) is a sharp identification region for (θ 1 , θ 0 ). Claim 5. Assume that the tests have a tendency to wrongly agree for y = 0 and y = 1. The sharp identified set for (θ 1 , θ 0 ) isH (θ 1 ,θ 0 ) (s 1 , s 0 ). Proof. By Lemma 1, P s 1 ,s 0 (t = 1 − j, r = 1 − j, y = j) ≥ Ps 1 ,s 0 (r=1−j,y=j) 2 for j ∈ {0, 1}. The sharp upper bound on P s 1 ,s 0 (t = j, r = 1 − j, y = j) is again as in (43). The sharp upper bound on P s 1 ,s 0 (t = j, r = j, y = j) is no longer equivalent to (34). Analogously to the steps used to derive (43): P s 1 ,s 0 (t = j, r = j, y = j) ≤ min P (t = j, r = j) − P s 1 ,s 0 (r = j, y = 1 − j) 2 , P s 1 ,s 0 (r = j, y = j) , where the first value in the minimum is derived using Lemma 1 and: P s 1 ,s 0 (t = j, r = j, y = j) = P (t = j, r = j) − P s 1 ,s 0 (t = j, r = j, y = 1 − j) ≤ P (t = j, r = j) − P s 1 ,s 0 (r = j, y = 1 − j) 2 . (45) Remark 10. Only the upper bounds on P s 1 ,s 0 (t = j, r = 1 − j, y = j) and P s 1 ,s 0 (t = j, r = j, y = j) are changed by the assumption that tests have a tendency to wrongly agree for y ∈ {0, 1}. The lower bounds remain as in (35). To see this, observe from (37) that the bounds on P s 1 ,s 0 (t = j, r = 1 − j, y = j) and P s 1 ,s 0 (t = j, r = j, y = j) belong to separate systems of equations and will not affect each other. The bounds on P s 1 ,s 0 (t = j, r = 1 − j, y = j) hold as in the Claim 4. The bounds on P s 1 ,s 0 (t = j, r = j, y = j) are derived using P s 1 ,s 0 (t = j, r = j, y = 1 − j) which is affected only from below by the assumption. From (38) it can be seen that substituting P s 1 ,s 0 (t = j, r = j, y = 1 − j) with its upper bound min P (t = j, r = j), P s 1 ,s 0 (r = j, y = 1 − j) yields an identical lower bound for P s 1 ,s 0 (t = j, r = j, y = j) as in (35). Bounds (35) and (43) on P s 1 ,s 0 (t = j, r = 1 − j, y = j) were shown to be sharp in the previous claim. Using the same argument, bounds (35) and (44) on P s 1 ,s 0 (t = j, r = j, y = j) are also sharp. Any pair of points in the bounds for the two probabilities is feasible. Hence,H (θ 1 ,θ 0 ) (s 1 , s 0 ) is the sharp identified set for (θ 1 , θ 0 ). Proposition 3. Let G (θ 1 ,θ 0 ) (s 1 , s 0 ) be the sharp identified set for (θ 1 , θ 0 ) for a known (s 1 , s 0 ). Let θ L j and θ H j be the smallest and largest values of θ j in the identified set. Then the sharp bounds on prevalence are: when ∀(θ 1 , θ 0 ) ∈ G (θ 1 ,θ 0 ) (s 1 , s 0 ) : θ 1 = 1 − θ 0 , and P (y = 1) ∈ [0, 1] otherwise. Proof of Proposition 3. The bounds on P (y = 1) are: P (y = 1) ∈ min (θ 1 ,θ 0 )∈G (θ 1 ,θ 0 ) (s 1 ,s 0 ) The value P (t=1)+θ 0 −1 θ 1 +θ 0 −1 is increasing in θ 0 and decreasing in θ 1 . The extreme values occur for boundary values of (θ 1 , θ 0 ) ∈ G (θ 1 ,θ 0 ) (s 1 , s 0 ), when ∀(θ 1 , θ 0 ) ∈ G (θ 1 ,θ 0 ) (s 1 , s 0 ) : θ 1 = 1 − θ 0 . To show this, let the joint probability distributions used to find G (θ 1 ,θ 0 ) (s 1 , s 0 ) in the test performance study be denoted with P * (t, r) and P * s 1 ,s 0 (r, y), and the marginal distributions P * (t), P * (r), and P * s 1 ,s 0 (y). P (t) and P (y) pertain to the screening study and as such are not the same as P * (t) and P * s 1 ,s 0 (y) from the performance study. Then: The second equality follows from θ 0 = θ 1 P * s 1 ,s 0 (y=1) P * s 1 ,s 0 (y=0) + 1 − P * (t=1) P * s 1 ,s 0 (y=0) which is true for all (θ 1 , θ 0 ) ∈ G (θ 1 ,θ 0 ) (s 1 , s 0 ) by Propositions 1 and 2. The first derivative (47) with respect to θ 1 is (P * (t=1)−P (t=1))P * s 1 ,s 0 (y=0) (P * (t=1)−θ 1 ) 2 which is either positive or negative for all θ 1 in the identified set. Then, the lower bound for P (y = 1) occurs either at θ L 1 or θ H 1 . Conversely, the upper bound will be at the opposite extreme value of θ 1 . Finally, θ L 1 or θ H 1 correspond to θ L 0 and θ H 0 in G (θ 1 ,θ 0 ) (s 1 , s 0 ), respectively, giving (18). Since G (θ 1 ,θ 0 ) (s 1 , s 0 ) is sharp, it is immediate that (18) is sharp. If θ 1 = 1 − θ 0 is feasible, then P (y = 1) ∈ [0, 1] since it is possible that t ⊥ ⊥ y, as in Section 2.1. Proposition 4. Let the moment function m be: Moment inequalities defined by m represent the joint identification region Θ(P ) = (s 1 , . . , 6 and E P m 7 (W i , θ) = 0, it must be that θ ∈ Θ(P ). Conversely, if θ ∈ Θ(P ), then E P m j (W i , θ) ≤ 0 for j = 1, . . . , 6 and E P m 7 (W i , θ) = 0. Proof of Proposition 4. I prove this by finding E P m j (W i , θ) for j = 1, 2 . . . , 7 and demonstrating that the resulting system is equivalent to the bounds defined in Proposition 1 extended to Θ(P ) = (s 1 ,s 0 )∈S H (θ 1 ,θ 0 ) (s 1 , s 0 ) × {(s 1 , s 0 )} . Suppose that E P m j (W i , θ) ≤ 0 for j = 1, 2 . . . , 6 and E P m 7 (W i , θ) = 0. From (27): E P m 1 (W i , θ) = −θ 1 P s 1 ,s 0 (y = 1) + P s 1 ,s 0 (r = 1, y = 1) − P (r = 1) + P (t = 1, r = 1) = P (t = 1, r = 1) − P s 1 ,s 0 (r = 1, y = 1) − θ 1 P s 1 ,s 0 (y = 1) ≤ 0 E P m 2 (W i , θ) = (−θ 1 + 1 − s 1 )P s 1 ,s 0 (y = 1) − P (t = 0, r = 0) = P s 1 ,s 0 (r = 0, y = 1) − P (t = 0, r = 0) − θ 1 P s 1 ,s 0 (y = 1) ≤ 0 E P m 3 (W i , θ) = (−θ 1 + 1)P s 1 ,s 0 (y = 1) + P (t = 1) − 1 = P s 1 ,s 0 (y = 1)s 1 − P (r = 1) + P s 1 ,s 0 (y = 1)(1 − s 1 ) + P (t = 1) − P (r = 0) − θ 1 P s 1 ,s 0 (y = 1) = P (t = 1, r = 1) − P s 1 ,s 0 (r = 1, y = 0) + P s 1 ,s 0 (r = 0, y = 1) − P (t = 0, r = 0) − θ 1 P s 1 ,s 0 (y = 1) ≤ 0. Note further that if θ 1 ∈ [0, 1], which is true by definition, the three inequalities above yield the lower bound from Proposition 1 for θ 1 ∈ H θ 1 (s 1 , s 0 ) given an arbitrary (s 1 , s 0 ) ∈ S: θ 1 P s 1 ,s 0 (y = 1) ≥ max 0, P (t = 1, r = 1) − P s 1 ,s 0 (r = 1, y = 0) + max 0, P s 1 ,s 0 (r = 0, y = 1) − P (t = 0, r = 0) . This is equivalent to the lower bound for the element θ 1 of (θ 1 , θ 0 , s 1 , s 0 ) ∈ Θ(P ). Consider next: E P m 4 (W i , θ) = θ 1 P s 1 ,s 0 (y = 1) − t = θ 1 P s 1 ,s 0 (y = 1) − P (t = 1, r = 0) − P (t = 1, r = 1) ≤ 0 E P m 5 (W i , θ) = (θ 1 − s 1 )P s 1 ,s 0 (y = 1) − P (t = 1, r = 0) = θ 1 P s 1 ,s 0 (y = 1) − P (t = 1, r = 0) − P s 1 ,s 0 (r = 1, y = 1) ≤ 0 E P m 6 (W i , θ) = (θ 1 − 1 + s 1 )P s 1 ,s 0 (y = 1) − P (t = 1, r = 1) = θ 1 P s 1 ,s 0 (y = 1) − P s 1 ,s 0 (r = 0, y = 1) − P (t = 1, r = 1) ≤ 0 Similarly, the upper bound from Proposition 1 is obtained for the element θ 1 of (θ 1 , θ 0 , s 1 , s 0 ) ∈ Θ(P ): θ 1 P s 1 ,s 0 (y = 1) ≤min P (t = 1, r = 1 − 1), P s 1 ,s 0 (r = 1 − 1, y = 1) + min P (t = 1, r = 1), P s 1 ,s 0 (r = 1, y = 1) . Taking the expected value of the final component of the moment function yields: E P m 7 (W i , θ) = (θ 0 − 1)(1 − P s 1 ,s 0 (y = 1)) − θ 1 P s 1 ,s 0 (y = 1) + P (t = 1) = 0 It is then is true that θ 0 P s 1 ,s 0 (y = 0) = P s 1 ,s 0 (y = 0) + θ 1 P s 1 ,s 0 (y = 1) − P (t = 1). This is the linear relationship between (θ 1 , θ 0 ) in the identified set from Proposition 1. Going in the other direc-tion, it is immediate that if the two bounds and the linear relationship hold so that θ ∈ Θ(P ), then E P m j (W i , θ) ≤ 0 for j = 1, 2 . . . , 6 and E P m 7 (W i , θ) = 0, demonstrating that the expected values of moment functions represent the joint identification region θ ∈ Θ(P ). Proposition 5. Assume that the index and reference tests have a tendency to wrongly agree only for y = 1. Let the moment functionm 1 be: Moment inequalities and equalities defined bym 1 for J 1 = {1, . . . , 6} and J 2 = {7} represent the joint identification region Θ(P ) = (s 1 ,s 0 )∈S Proof of Proposition 5. The proof is analogous to the proof of Proposition 4. From the definition of H θ 1 (s 1 , s 0 ) for y = 1 in Proposition 2: θ 1 P s 1 ,s 0 (y = 1) ≥ max 0, P (t = 1, r = 1) − P s 1 ,s 0 (r = 1, y = 0) + max 0, P s 1 ,s 0 (r = 0, y = 1) − P (t = 0, r = 0) θ 1 P s 1 ,s 0 (y = 1) ≤ min P (t = 1, r = 0), P s 1 ,s 0 (r = 0, y = 1) 2 + min P (t = 1, r = 1), P s 1 ,s 0 (r = 1, y = 1) . Suppose that E P m 1 j (W i , θ) ≤ 0 for j = 1, 2 . . . , 6 and E P m 1 7 (W i , θ) = 0. From (29): E P m 1 6 (W i , θ) = θ 1 + −1 + s 1 2 P s 1 ,s 0 (y = 1) − P (t = 1, r = 1) = θ 1 P s 1 ,s 0 (y = 1) − P s 1 ,s 0 (r = 0, y = 1) 2 − P (t = 1, r = 1) ≤ 0 (54) 1. V ar P (m j (W i , θ)) > 0 and for all P ∈ P and θ ∈ [0, 1] 2 × S; where µ j (θ, P ) = E P (m j (W i , θ)) and σ j (θ, P ) = V ar P (m j (W i , θ)). Proof of Theorem 1. I first show that under the assumptions V ar P (m j (W i , θ)) > 1 M 2 j > 0, for any j ∈ 1, . . . , 7 in (27), where M j do not depend on P and θ. I then demonstrate the same for components (29), (31), and (32) that are not identical. Finally, I show that m j (W i , θ) are bounded irrespective of P and θ, and use that to prove that the second claim is true. for some binary random vector (X, Y ) with distribution P ∈ P. The following Lemma will be used to bound the variances from below. Lemma 2. Suppose that Assumption 4 holds. Then for any P ∈ P, the following are true: where h(ε) = 1{ε ∈ [0.2, 0.25]} 2−6ε 3−6ε + 1{ε ∈ (0, 0.2)} 1 − (1−ε) 2 (1+ε) 2 ∈ (0, 1). Proof. Denote P (t i = j, r i = k) = P jk . Assumption 4 states that for (j, k) ∈ {0, 1} 2 , P jk ≥ ε > 0, and implies that ε ≤ 1 4 . Parameter ρ P (r i , t i ) 2 is the largest when either P 01 = P 10 = ε or P 11 = P 00 = ε. I prove the statement for P 01 = P 10 = ε, and the argument for the P 11 = P 00 = ε is symmetric. The maximal ρ P (r i , t i ) 2 must then be for P 11 + P 00 = 1 − 2ε and P (t i = 1) = P (r i = 1). Next, let P 11 = α(1 − 2ε), P 00 = (1 − α)(1 − 2ε) for some α ∈ [ ε 1−2ε , 1−3ε 1−2ε ], and P (t i = 1) = P (r i = 1) = α(1 − 2ε) + ε. By plugging in the relevant probabilities, ρ P (r i , t i ) becomes a function of α: ρ α (r i , t i ) = P 11 − P (t i = 1)P (r i = 1) with respect to α, we obtain the upper bound on ρ P (r i , t i ) 2 . The second order condition confirms that this is a concave optimization problem. The first order condition yields the maximizing α * = 1 2 . For any ε ≤ 1 4 , it is true that α * ∈ [ ε 1−2ε , 1−3ε 1−2ε ]. To conclude the proof of statement 1, plug in α * into (55) to find max P ∈P ρ P (r i , t i ) = ρ α * (r i , t i ) = (1 − 4ε). By using Statement 1 and replacingt i = 1 − t i , it follows directly that max P ∈P ρ P (r i , (1 − t i ) = max P ∈P ρ P (r i ,t i ) = ρ α * (r i ,t i ) = (1 − 4ε). From the definition of ρ P (r i , r i t i ): (1 − P (r i = 1)) P (r i = 1)(1 − P 11 ) = P 11 (1 − P 11 − P 01 ) (P 11 + P 01 )(1 − P 11 ) . Notice that ρ P (r i , r i t i ) decreases in P 01 , so at the maximum, P 01 = ε. Therefore, we only need to maximize ρ P (r i , r i t i ) 2 with respect to feasible P 11 . The maximization problem is: max P ∈P ρ P (r i , r i t i ) = max P 11 ∈[ε,1−3ε] P 11 (1 − P 11 − ε) (P 11 + ε)(1 − P 11 ) . The objective function is concave. The first order condition implies that for an interior maximum, the maximizing P 11 is 1−ε 2 . If ε ∈ [0.2, 0.25], the constraint P 11 ≤ 1 − 3ε will bind. Therefore, the value of the parameter at the maximum is P * 11 = min 1−ε 2 , 1 − 3ε . The maximum of the objective function obtained by plugging in P * 11 into (56) is: Statements 4, 5, and 6 Following the definition of ρ P (r i , (1 − r i )t i ): = − P (r i = 1)P 10 (1 − P (r i = 1))(1 − P 10 ) . The square of the correlation is increasing in both P (r i = 1) = P 11 + P 01 and P 10 . Consequently, at the maximum, together they will be at the upper bound, meaning that P 11 + P 01 + P 10 = 1 − ε, or equivalently, that P (r i = 1) = 1 − ε − P 10 . We can then rewrite the problem as: (1 − ε − P 10 )P 10 (ε + P 10 )(1 − P 10 ) . In this form, the problem is identical to the one in (57). Following the same steps: Analogously to the proof of Statement 3, for ρ(r i , (1 − t i )r i ) 2 in Statement 5, the optimization problem can be represented as: max P ∈P ρ P (r i , r i (1 − t i )) 2 = max P 01 ∈[ε,1−3ε] (1 − ε − P 01 )P 01 (ε + P 01 )(1 − P 01 ) . Following the steps in the proof of Statement 4 ρ(r i , (1−r i )(1−t i )) 2 in Statement 6, the optimization problem will be: (1 − ε − P 00 )P 00 (ε + P 00 P )(1 − P 00 ) . Consequently, from the solutions to (57) and (59), (62) and (63) (1 − ε) 2 (1 + ε) 2 . (64) Claim 6. For any P ∈ P and θ ∈ [0, 1] 2 × S it holds that V ar P (m j (W i , θ)) > 0 for all m j (W i , θ) in (27). Proof. Consider first a component of m that pertains to the upper bound of θ 1 . The variance V ar P (m 4 (W i , θ)) for some θ and P is defined as: V ar P (m 4 (W i , θ)) = V ar P θ 1 r i − 1 + s 0 s 1 − 1 + s 0 − t i = θ 1 s 1 − 1 + s 0 2 V ar P (r i ) + V ar P (t i ) − 2 θ 1 s 1 − 1 + s 0 Cov P (r i , t i ). Fix any (s 1 , s 0 ) ∈ S. As shown in Section 2.1, P (r = 1) ∈ (1 − s 0 , s 1 ) so V ar P (r i ) > 0. The value θ * 1 where V ar P (m 4 (W i , θ)) is globally minimized given s 1 and s 0 from the first order condition is: ∂V ar P (m 4 (W i , θ)) ∂θ 1 : θ * 1 = (s 1 − 1 + s 0 ) Cov P (r i , t i ) V ar P (r i ) . The second order condition shows that this indeed is a minimization problem. Let θ * = (θ * 1 , θ 0 , s 1 , s 0 ), where I suppress the dependence θ * 1 (s 1 , s 0 ) for clarity. The minimum variance for any (s 1 , s 0 ) ∈ S is then: V ar P (m 4 (W i , θ * )) = (Cov P (r i , t i )) 2 V ar P (r i ) + V ar P (t i ) − 2 (Cov P (r i , t i )) 2 V ar P (r i ) For any θ it follows: V ar P (m 4 (W i , θ)) ≥ V ar P (m 4 (W i , θ * )) = V ar P (t i ) 1 − ρ P (r i , t i ) 2 ) where the first inequality follows from the definition of θ * . Focus on the second inequality. We wish to find the lower bound on the variance 1 M 2 4 over all possible P ∈ P. One such bound is equal to the expression at the smallest value of V ar P (t i ) and the largest value of ρ P (r i , t i ) 2 . The second is given by Lemma 2, and the first follows directly from Assumption 4 which implies that P (t i = 1) ∈ [2ε, 1 − 2ε], so V ar P (t i ) ≥ 2ε(1 − 2ε). 21 Therefore, V ar P (m 4 (W i , θ)) ≥ 1 M 2 4 > 0 for all P ∈ P and θ ∈ [0, 1] 2 × S. Following the same steps for the remaining components pertaining to the upper bound, the smallest variances for any P ∈ P and θ are: V ar P (m 5 (W i , θ * )) = V ar P (t i (1 − r i )) 1 − ρ P (r i , t i (1 − r i )) 2 ) ≥ ε(1 − ε) (1 − h(ε)) = 1 M 2 5 > 0 V ar P (m 6 (W i , θ * )) = V ar P (t i r i ) 1 − ρ P (r i , t i r i ) 2 ) ≥ ε(1 − ε) (1 − h(ε)) = 1 M 2 6 > 0 (69) 21. As long as ε < 0.25, the inequality is strict, since the largest value of ρP (ri, ti) 2 warrants that P (ti = 1, ri = 1) = 1−2ε 2 while the smallest V arP (ti) requires P (ti = 1, ri = 1) = ε or P (ti = 1, ri = 1) = 1 − 3ε. where the inequalities follow from the definition of θ * , the fact that V ar P (t i (1 − r i )) ≥ ε(1 − ε) and V ar P (t i (1 − r i )) ≥ ε(1 − ε), and Lemma 2. Next observe the components pertaining to the lower bound. First for V ar P (m 1 (W i , θ)) for any θ and P : V ar P (m 1 (W i , θ)) = V ar P (−θ 1 + s 1 ) r i − 1 + s 0 s 1 − 1 + s 0 + (t i − 1)r i = s 1 − θ 1 s 1 − 1 + s 0 2 V ar P (r i ) + V ar P ((t i − 1)r i ) − 2 s 1 − θ 1 s 1 − 1 + s 0 Cov P ((1 − t i )r i , r i ) Fix an arbitrary s 1 and s 0 . The value θ * 1 where V ar P (m 1 (W i , θ)) is globally minimized given s 1 and s 0 from the first order condition is: ∂V ar P (m 1 (W i , θ)) ∂θ 1 : θ * 1 = (s 1 − 1 + s 0 ) Cov P ((1 − t i )r i , r i ) V ar P (r i ) + s 1 . The second order condition shows that this indeed is a minimization problem. The minimum variance V ar P (m 1 (W i , θ * )) for an arbitrary (s 1 , s 0 ) ∈ S is: where the first inequality follows from the definition of θ * . And the second follows from Lemma 2 and V ar P (t i r i ) ≥ ε(1 − ε). Therefore V ar P (m 1 (W i , θ)) ≥ 1 M 2 1 > 0 for all P ∈ P and θ ∈ [0, 1] 2 × S. Again, following the same steps for the remaining components pertaining to the lower bound, the smallest variances for any P ∈ P and θ are: Finally, consider the component pertaining to the moment equality V ar(m 7 (W i , θ)). It is defined as: V ar P (m 7 (W i , θ)) = V ar P (1 − θ 0 ) 1 − r i − 1 + s 0 s 1 − 1 + s 0 + θ 1 r i − 1 + s 0 s 1 − 1 + s 0 − t i = V ar P (θ 0 + θ 1 − 1) for θ 1 + θ 0 − 1 =θ. Notice how the function (74) , we obtain that V ar(m 7 (W i , θ)) ≥ 2ε(1 − 2ε) 1 − (1 − 4ε) 2 = 1 M 2 7 > 0 for all P ∈ P and θ ∈ [0, 1] 2 × S. Claim 7. For any P ∈ P and θ ∈ [0, 1] 2 × S it holds that V ar P (m j (W i , θ)) > 0 for all m j (W i , θ) in (29), (31), and (32). Proof. Functionsm and m are such thatm 1 j (W i , θ) = m j (W i , θ) only if j = 6. Thus for all components that are equal, the proof follows from Claim 6, so V ar P (m 1 j (W i , θ)) ≥ 1 M 2 j > 0 for j = 6. The variance V ar P (m 1 6 (W i , θ)) for some θ and P is: V ar P (m 1 6 (W i , θ)) = V ar P θ 1 + −1 + s 1 2 r i − 1 + s 0 s 1 − 1 + s 0 − t i r i = θ 1 + −1+s 1 2 s 1 − 1 + s 0 2 V ar P (r i ) + V ar P (r i t i ) − 2 θ 1 + −1+s 1 2 s 1 − 1 + s 0 Cov P (r i , r i t i ). Fix any (s 1 , s 0 ) ∈ S. The value θ * 1 where V ar P (m 6 (W i , θ)) is globally minimized given s 1 and s 0 from the first order condition is: ∂V ar P (m 1 6 (W i , θ)) ∂θ 1 : θ * 1 = (s 1 − 1 + s 0 ) Cov P (r i , r i t i ) V ar P (r i ) + 1 − s 1 2 . The second order condition shows that this indeed is a minimization problem. Following the same steps as before, for any θ ∈ [0, 1] 2 × S: V ar P (Cov P (r i , r i t i )(W i , θ)) ≥ V ar P (Cov P (r i , r i t i )(W i , θ * )) = V ar P (r i t i ) 1 − ρ P (r i , r i t i ) 2 ) The case form 0 j (W i , θ) is symmetric and using the same method of proof it follows that V ar P (m 0 j (W i , θ)) ≥ 1 M 2 j > 0 for all j = 1, . . . , 7, P ∈ P and θ ∈ [0, 1] 2 × S. Likewise, form note thatm j (W i , θ) =m j (W i , θ) except for j ∈ {4, 6}. From (32): V P (m 4 (W i , θ)) = V P θ 1 r i − 1 + s 0 s 1 − 1 + s 0 − t i + 1 2 r i − s 1 r i − 1 + s 0 s 1 − 1 + s 0 = V P θ 1 − s 1 2 s 1 − 1 + s 0 + 1 2 r i − t i V P (m 6 (W i , θ)) = V P θ 1 + −1 + s 1 2 As above, for any θ ∈ [0, 1] 2 × S: Inference for parameters defined by moment inequalities: A recommended moment selection procedure Inference for parameters defined by moment inequalities using generalized moment selection False-negative results of initial RT-PCR assays for COVID-19: a systematic review A Two-Step Method for Testing Many Moment Inequalities Treatment effect bounds: An application to Swan-Ganz catheterization On the Origin of Sensitivity and Specificity Reference test errors bias the evaluation of diagnostic tests for ischemic heart disease Inference for subvectors and other functions of partially identified parameters in moment inequality models Practical and theoretical advances in inference for partially identified models Inference on causal and structural parameters using many moment inequalities Intersection bounds: estimation and inference Evaluating rapid tests for streptococcal pharyngitis: the apparent accuracy of a diagnostic test when there are errors in the standard of comparison Confidence Intervals for Seroprevalence Estimating prevalence using an imperfect test Should RT-PCR be considered a gold standard in the diagnosis of Covid-19? Biomarker validation with an imperfect reference: Issues and bounds Misguided efforts and future challenges for research on" diagnostic tests Buyer beware: inflated claims of sensitivity for rapid COVID-19 tests Comparison of a Screening Test and a Reference Test in Epidemiologic Studies: A Probabilistic Model for the Comparison of Diagnostic Tests Basic methods for sensitivity analysis of biases Discrepant analysis: a biased and an unscientific method for estimating test sensitivity and specificity Should RT-PCR be considered a gold standard in the diagnosis of Covid-19? Evaluation of diagnostic tests without gold standards Estimating the error rates of diagnostic tests Confidence intervals for projections of partially identified parameters False negative rate of COVID-19 PCR testing: a discordant testing analysis Variation in false-negative rate of reverse transcriptase polymerase chain reaction-based SARS-CoV-2 tests by time since exposure Bounding the accuracy of diagnostic tests, with application to COVID-19 antibody tests Estimating the COVID-19 infection rate: Anatomy of an inference problem Monotone instrumental variables with an application to the returns to schooling Spectrum bias or spectrum effect? Subgroup variation in diagnostic test evaluation Performance and implementation evaluation of the Abbott BinaxNOW rapid antigen test in a high-throughput drive-through community testing site in Massachusetts A practical two-step method for testing moment inequalities