key: cord-0856078-yloz7321 authors: Dell'Omodarme, M.; Prati, M. C. title: The probability of failing in detecting an infectious disease at entry points into a country date: 2005-06-24 journal: Stat Med DOI: 10.1002/sim.2131 sha: 5ce9190a0b02a01f3b56187f608604f55218c252 doc_id: 856078 cord_uid: yloz7321 In a group of N individuals, carrying an infection with prevalence π, the exact probability P of failing in detecting the infection is evaluated when a diagnostic test of sensitivity s and specificity s′ is carried out on a sample of n individuals extracted without replacement from the group. Furthermore, the minimal number of individuals that must be tested if the probability P has to be lower than a fixed value is determined as a function of π. If all n tests result negative, confidence intervals for π are given both in the frequentistic and Bayesian approach. These results are applied to recent data for severe acute respiratory syndrome (SARS). The conclusion is that entry screening with a diagnostic test is rarely an efficacious tool for preventing importation of a disease into a country. Copyright © 2005 John Wiley & Sons, Ltd. The increased mobility of people causes frequent importation of diseases. Some of them had disappeared many years ago from Western countries and have new outbreaks, some has been recently recognized, like acquired immune deÿciency syndrome (AIDS), bovine spongiform encephalopathy (BSE) and severe acute respiratory syndrome (SARS). A possible method to prevent such importation is to check at entry points into a country passengers coming from zones where a dangerous disease is known to be present. This practice is not always useful because of the ways in which the disease is transmitted (e.g. in the case of AIDS) or it is realizable but very expensive. Sometimes, in the case of outbreaks of a new disease, diagnostic tests are at ÿrst not available. Furthermore, even if a diagnostic test is available, there is a positive probability of not detecting the infection because the diagnostic test is not perfect. Screening tests' results and estimation of disease prevalence have arisen considerable interest in literature. Among other studies one can remember [1] [2] [3] [4] [5] . In this paper one considers a ÿnite population of individuals coming from a risk zone where a disease is present with prevalence , as could be the N passengers of a train, ship or airplane. At the border a number n6N of them undergo a diagnostic test with sensitivity s and speciÿcity s . The exact expression of the probability P of failing in detecting the infection (as a function of s; s ; n; N and ) is calculated. The limit case of an inÿnite population is treated. The interesting inverse problem of computing the minimum number of individuals which must be tested if the probability P has to be lower than a certain value, is also studied. If the n results of the diagnostic tests are all negative, conÿdence intervals for the prevalence of the disease in the population are given, both in the frequentistic and Bayesian approach. The Bayesian credible interval is especially interesting, because it takes into account prior information of epidemiological and medical character. The mathematical results of the present study are applied to recent data (October 2003) on a diagnostic test for SARS [6] . The e cacy of border screening for SARS has been evaluated from a clinical point of view in References [7] [8] [9] . Let N be the size of a population and R the number of subjects carrying a latent infection. We introduce the following notation: I = the individual is infected; I = the individual is not infected = R N = the prevalence of the infection A diagnostic test would be carried out on each individual of a sample of size n extracted without replacement from the population. The possible results of the test are T + = the test is positive; T − = the test is negative A diagnostic test is perfect only in an ideal case, otherwise there are false negative and false positive results. The following probabilities are of interest: s is the sensitivity of the test, i.e. the probability of a positive test result when the individual is indeed infected (true positive). 1 − s is the probability of a false negative result. s is the speciÿcity of the test, i.e. the probability of a true negative result. 1 − s is the probability of a false positive result. One considers also the predictive value of the positive test (PPV) and of the negative test (NPV): If in the sample of size n there are k infected individuals (k = 0; 1; : : : ; m where m = min(n; R)), the probability of extracting such a sample without replacement is the hypergeometric probability: Under the assumption that the results of the tests are independent, the probability that the n diagnostic tests are all negative, i.e. that the infection is not detected, is given by This formula can be generalized in order to ÿnd the probability that from the n tests some are positive (see Reference [10] ). In the special case of a perfect test (s = s = 1) Equation (1) reduces to the probability that the sample does not contain any infected individual: 1 − s s (2) 2 F 1 ( ; ÿ; ; z) is the hypergeometric series, the properties of which are well known and tabulated [11] : The series terminates if either or ÿ or both are equal to zero or to a negative integer, as in Equation (2) where the number of terms is m + 1. In the limit N; R → ∞ with ÿxed ÿnite prevalence , Equation ( 2) reduces to the binomial limit: This result can be obtained either from Equation (1) by some algebra or from the following argument. The result of a test can be negative for two reasons: either the individual is infected but the test shows wrong (with probability (1 − s)) or the individual is not infected and the test gives the right result (with probability (1 − )s ). The fact that the n tests are independent leads directly to Equation (3). In Equation (3) the probability of missing an infection in a population is computed extracting the sample with replacement (Bernoullian extraction). Carrying out a diagnostic test extraction without replacement is more appropriate. In the limit of an inÿnite population extraction without replacement is in practice equivalent to the bernoullian one, but for a ÿnite population important di erences can arise. In Figure 1 , the exact results obtained from Equation (2) are compared with the binomial limit (3). One can see that, using the binomial limit, the probability P is overestimated and the error is large when the size N of the population is small. The probability P for various sizes ÿgure, one can see that P is higher for small values of s, because the probability of false negative results decreases with s. On the right side of the ÿgure one can see that P decreases with s since the probability of false positive results increases as s decreases. In the case of infectious diseases a high sensitivity is advisable even at the expense of a lower speciÿcity. However, if the prevalence is low, the positive predictive value of the test decreases quickly when the speciÿcity gets lower, so that testing is not e ective for a clinical diagnosis. Equation (2) can also be used to solve the inverse problem of computing the size n of the sample needed to have a ÿxed probability P of not detecting the infection as a function of the unknown prevalence , given the size N of the population, the sensitivity s and the speciÿcity s of the diagnostic test. Equation (2) cannot be solved analytically for n so that one has to employ computer facilities. In Figure 2 , for ÿxed population size N = 200 and required probability P = 0:01, the size n of the sample is determined as a function of the prevalence . On the left side of the ÿgure the curves are given for di erent values of the sensitivity with ÿxed speciÿcity and vice versa on the right side. As expected from the analysis of Figure 1 , the number n of subjects which must be tested in order to reach the required probability P increases with s when s is ÿxed. The right side of Figure 2 shows that n does not change appreciably for su ciently low prevalences in dependence on s, at least for values of s of practical use. In Figure 3 , the di erence between the values of n obtained with the approximate binomial expression and the exact values given on left side of Figure 2 is shown as a function of . In the binomial limit the value of n is always overestimated. The error is larger when the prevalence is low. Recently, reverse transcription-PCR protocols of two WHO SARS network laboratories were evaluated for the rapid diagnosis of the SARS-associated CoV in Hong Kong [6] . The resulting speciÿcity of these PCR assays was 100 per cent, while for sensitivity the best results for the two laboratories were 71 and 79 per cent, respectively. The low sensitivity is related to the high mutation rate of the coronavirus, which makes di cult to identify its presence. It is a general problem which cannot be easily solved. Let us suppose that an airplane, carrying N = 200 passengers, arrives from a region where SARS prevalence is estimated to be around 3 per cent. A diagnostic test with speciÿcity s = 1 and sensitivity s = 0:75 is available (the values are chosen in agreement with Reference [6] ). From Figure 4 one can see that, if n = 80 passengers are tested, the probability P of missing SARS is estimated from 10 to 20 per cent. Vice versa, requiring P to be around 1 per cent the number n of passengers that must be tested is about 140. If P is required to be less than 0.001, the number of passengers tested must be greater than 180. If the airplane comes from a region where SARS prevalence is 1 per cent, even if all 200 passengers are tested the probability P of not detecting SARS amounts from 5 to 10 per cent. The high values come from the fact that the sensitivity of the diagnostic test is low and many dangerous false negative people cannot be prevented from di using the infection. One could consider the possibility of testing pools of blood, instead of performing single diagnostic tests. In such a way costs could be saved. However the pooling procedure is not so appealing when the speciÿcity of the diagnostic test is s = 1 [5] , as in our case, because in this situation the results of pool testing is always worse than the ones of individual testing. Furthermore, pool testing is very useful in assessing the prevalence of a disease in a population (when this prevalence is supposed to be low), but not for individual diagnosis, which is the target in the case of SARS. Once the n diagnostic tests have been carried out and found all negative, the estimated prevalence isˆ = 0. The conÿdence interval gives an idea of how reliable is this result. For a generic value of the prevalence the expression of the conÿdence interval is given for example in Reference [14, p. 117 ]. This calculation is not appropriate when the proportionˆ is obtained as a result of a non-perfect diagnostic test. In this case the frequentistic conÿdence interval must be constructed with the correction given in Reference [5] . An alternative approach is the Bayesian credible interval, which allows to take into account prior information of epidemiological and medical type. One should point out that the meaning of frequentistic and Bayesian intervals is di erent. A proportion lies actually in its 95 per cent Bayesian interval with probability 95 per cent, while from a frequentistic perspective one assumes that, constructing for many samples the 95 per cent conÿdence intervals, approximately 95 per cent of them will contain the proportion value. In the Bayesian approach information available at the start of the study leads to speciÿcation of the prior distribution of the parameters. When data are collected and provide new information, Bayes' rule is used in order to compute the posterior distribution. Appropriate quantiles of the posterior probability distribution are used for inference. Direct calculation of posterior distributions can be di cult. The Gibbs sampler (see Appendix A), an iterative Markov chain Monte Carlo technique for approximating posterior densities, is widely used in medical literature. In this section the 95 and 99 per cent credible intervals obtained from the posterior distribution of the prevalence (given the informative prior distribution described below) and the frequentistic results for the conÿdence interval of are compared. Let us suppose that the prevalence of SARS in the zone from which air passengers come from is around 3 per cent, with a 95 per cent conÿdence interval from 0 to 10 per cent. The corresponding informative prior distribution is given by a beta density with = 1 and ÿ = 35 (see Appendix A). At the airport n passengers are tested and all tests give a negative result. Among them there is a unknown number k of false negatives. Given these data, it is possible to obtain the posterior distribution of the prevalence using the Gibbs sampler algorithm (Appendix A). In Table I the 95 and 99 per cent credible intervals for are listed for some values of n, along with the corresponding frequentistic conÿdence intervals, which coincide with the Bayesian credible intervals computed assuming a non-informative (uniform) prior distribution. As expected, informative credible intervals are narrower. The di erence decreases as the number of tests carried out increases, because prior information gets superseded by the likelihood of data. From this table, one can see that testing n = 80 passengers who turn out to be all negative, the prevalence of SARS is in the credible interval [0; 0:038] with 95 per cent probability. The corresponding conÿdence interval is [0; 0:060]. The frequentistic estimate of the prevalence is 0, while the Bayesian estimate is 0.007, the median of the posterior distribution. In the case when all 200 passengers are tested and found negative, the 95 per cent credible interval is [0; 0:020]. The Bayesian estimate of the prevalence is 0.004. Border screening against emerging infectious diseases would be a desirable disease control measure, with a privileged role in preserving public conÿdence and limiting bad economic consequences. However, before organizing a screening protocol, one should evaluate the possible impact of such a screening on international tra c and trade, the cost of the procedure in terms of personnel and logistics needed, and its real e ectiveness. In the case of the SARS outbreak in 2003, a screening programme was organized at the border entry of several countries, such as Canada, New Zealand, Hong Kong, Australia and Italy [7] [8] [9] . Symptomatic passengers, coming from SARS-a ected areas (i.e. Vietnam, Taiwan, Singapore, Hong Kong, China, Canada and the Philippines), were sent to a quarantine team and, after further investigations, eventually assessed at hospitals. The e cacy of this programme turned out to be low and the sensitivity and speciÿcity of the testing procedure was not easily assessable. In this paper the probability of missing an infection was evaluated in the hypothesis that a diagnostic test with known sensitivity and speciÿcity could be used directly on the airport on a random sample of passengers (including pre-symptomatic ones). This probability was found to be high if the prevalence of the disease was low and the diagnostic test used had a sensitivity di erent from one. Therefore, before planning an expensive screening program at entry points of a country, one should ÿrst of all have a highly sensitive screening test at one's disposal. This is not always the case when a disease outbreaks. Furthermore, the procedure of testing should be the less invasive as possible, in order to prevent a dramatic decrease of tourism with heavy economic consequences, not justiÿed by the real impact of a low prevalence disease. It appears that, in most cases, border entry screening is more e ective in keeping public concern low than in stopping the infection to enter a country. A similar conclusion can be found in References [8, 9] . The Gibbs sampler is an important tool in the Bayesian approach to compute posterior distributions [15] . When the posterior distribution of a proportion is sought, a beta density is usually taken as prior distribution: where B( ; ÿ) is the beta function evaluated at ( ; ÿ). The result for the posterior distribution is again a beta density with di erent parameters. The non-informative uniform prior distribution is a particular case with = ÿ = 1. If the prior distribution is supposed to be informative, and an expected value for is given with its 95 per cent conÿdence interval, the parameters and ÿ are chosen in such a way that the mean value of the beta distribution equals the expected value of and the conÿdence intervals match. For example the case considered in the text, where the expected value of was 0.03 (95 per cent conÿdence interval from 0 to 0.10), can be well reproduced with parameters = 1, ÿ = 35. Let k be the number of false negatives among the n individuals in the sample. The conditional distributions of k and , given the values of all other parameters, can be speciÿed as follows [16] : k|n; ; s; s ∼ Binomial n; (1 − s) (1 − s) + (1 − )s |n; k; ; ÿ ∼ Beta(k + ; n − k + ÿ) An arbitrary starting value is chosen for . Then a point is drawn from the conditional distribution of k. This value is used in the conditional distribution of , from which another value is drawn. The cycle is repeated a large number of times (in our case 20 500), so that the random samples generated for each parameter can be regarded as random samples from their correct unknown marginal distribution [15, 16] . The ÿrst 500 points are discarded from the samples, since their aim is to assess convergence. The Monte Carlo code was run using R version 1.9.1 [17] . Comparison of a screening test and a reference test in epidemiologic studies Estimating prevalence from the results of a screening test E ects of misclassiÿcations on statistical inferences in epidemiology Estimation of test error rates, disease prevalence and relative risk from misclassiÿed data: a review On the informativeness and accuracy of pooled testing in estimating prevalence of a rare disease: application to HIV screening Evaluation of reverse transcription-PRC assays for rapid diagnosis of severe acute respiratory syndrome associated with a novel coronavirus Experience of severe acute respiratory syndrome in Singapore: importation of cases, and defense strategies at the airport Border screening for SARS in Australia: what has been learnt? Border screening for SARS A new probability formula for surveys to substantiate freedom from disease FAILING IN DETECTING AN INFECTIOUS DISEASE 2679 Table of Integrals, Series, and Products Identiÿcation of a novel coronavirus in patients with severe acute respiratory syndrome A novel coronavirus associated with severe acute respiratory syndrome Sampling-based approaches to calculating marginal densities Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing The authors thank the anonymous referees for their useful comments and suggestions, that have contributed to improve this paper.