key: cord-1044788-7w3mvhqa authors: Alexander, N.; Carabali, M.; Lim, J. K. title: Estimating Force of Infection from Serologic Surveys with Imperfect Tests date: 2020-06-11 journal: nan DOI: 10.1101/2020.06.09.20125724 sha: 8cc35789446cf0ef61a2f24d0eee3a85a53f9340 doc_id: 1044788 cord_uid: 7w3mvhqa The force of infection, or the rate at which susceptible individuals become infected, is an important public health measure for assessing the extent of outbreaks and the impact of control programs. Here we present methods for estimating force of infection from serological surveys of infections which produce lasting immunity, taking into account imperfections in the test used, and uncertainty in such imperfections. The methods cover both single serological surveys, in which age is a proxy for time at risk, and repeat surveys in the same people, in which the force of infection is estimated more directly. Fixed values can be used for the sensitivity and specificity of the tests, or existing methods for belief elicitation can be used to include uncertainty in these values. The latter may be applicable, for example, when the specificity of a test depends on co-circulating pathogens, which may not have been well characterized in the setting of interest. We illustrate the methods using data from two published serological studies of dengue. The force of infection, or the rate at which susceptible individuals become infected, is an important public health measure used to assess the speed and extent of an epidemic, and the impact of disease control programs, as well as to prioritize and identify regions requiring further control, and vaccine implementation (1) (2) (3) (4) (5) . For infections inducing lasting immunity, the force of infection is usually estimated via serological surveys ('serosurveys') of immunological status. Ideally, assays used in serosurveys should be highly sensitive and specific while also suitable for high throughput, in terms of the cost and personnel required (6) (7) (8) (9) (10) (11) . In practice, however, available assays may not completely meet all these criteria, as is currently evident with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus responsible for coronavirus disease (COVID19) (12) . The force of infection may be estimated from single or repeated serosurveys. In the former case, the simplest analysis is to assume that the force of infection was constant over calendar time and age, and consider age as the time at risk (13) . More sophisticated models allow for changing force of infection over time, or over age, or even allow for maternal antibodies if the analysis includes new-borns or infants under a year of age (4, 14) . Repeated surveys in the same individuals provide more robust estimates of the force of infection during a given study period (4, 7) . Using repeated surveys, rate ratios can be obtained from binomial regression with complementary log-log link and the logarithm of the time between surveys as an offset (13) . While age is used as the time at risk in the analysis of a single survey, in repeated surveys it can be considered a risk factor like any other. However, errors in test status are usually ignored, whether analysing one or more surveys. In particular, for repeat surveys, individuals testing positive at baseline are usually considered no longer at risk (1, 4, 7) . The choice of assay may substantially affect the study's interpretation (15) . Various methods have taken into account certain kinds of test imperfection, for either single or repeated surveys. In particular, Trotter & Gay (16) developed a compartmental model of multiple surveys, in which the force of infection and imperfect sensitivity were estimated for . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020. Here we provide methods to estimate force of infection, from a single serosurvey or two serosurveys in the same individuals, accounting for imperfect sensitivity and/or specificity, and uncertainty in these parameters. We started from methods for estimating prevalence based on an imperfect diagnostic test, as reviewed by Lewis & Torgerson (21) , and use similar notation. Estimation is done using a Bayesian framework and Markov chain Monte Carlo (MCMC) (22) . We assume that the immune response being measured is long-lasting so that, for example, apparent seroreversions, i.e. changes over time from positive to negative, are due to test errors rather than loss of immunity. We use "seroprevalence" to mean the proportion of individuals with the underlying immune response, which the diagnostic tests measure with error. The probability of testing positive (T + ) is specified as a function of the unobserved true status (π), and the assumed values for sensitivity (S e ) and specificity (S p ): Then, representing a constant seroconversion rate, a binomial regression is specified with a complementary log-log link, and the logarithm of age as an offset. The only other term in the . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125724 doi: medRxiv preprint model is an intercept, which is the logarithm of the force of infection (13) . A vague prior -Gaussian with mean zero and standard deviation 1,000 -is specified for the logarithm of the force of infection. Example data from a single serosurvey of dengue are from Colombo, Sri Lanka, which used a capture enzyme-linked immunosorbent assay (ELISA) to detect immunoglobulin G (IgG) (14, 23) . Here we omit individuals aged less than six months to limit the influence of maternal antibodies. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125724 doi: medRxiv preprint This is shown schematically, as a Directed Acyclic Graph, in Figure 1 . Example data are from a community-based study of dengue in Medellin, Colombia, using a commercially available IgG indirect ELISA test (23) . Residents were randomly selected, followed over time, and tested up to five times. For the current purpose, we use only the first survey, done in 2011, and the last one, done in 2014, approximately 26 months later. In the standard binomial regression model for seroconversion across paired surveys, those individuals positive at baseline are assumed to be not at risk, i.e. there is no allowance for measurement error in the serostatus. By contrast, as well as seroconversion, the current model allows seroreversion, i.e. for individuals to change from seropositive to seronegative status. Fixed values for sensitivity and specificity can be used for the repeat surveys, as for a single one. However, there may be reasonable doubt as to the exact values of sensitivity and specificity, e.g. because there cross-reacting pathogens circulate to an unknown extent. This uncertainty may have been quantified by systematic reviews, although their generalizability to a given setting may be doubtful. Another way to quantify uncertainty in terms of expert opinion, e.g. via the Delphi technique (24) . Here we follow the elicitation method of Johnson et al. (25) . For each parameter, each expert is presented with a range of values. For the current purpose, the parameters are sensitivity and specificity, each with a range of 0 to 100%, in intervals ("bins") of 5%. Each expert is invited to i) make a point or is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125724 doi: medRxiv preprint stickers for the units of 5% weight of belief. We adapted this to a spreadsheet in Microsoft Excel (Appendix 1). This approach could also be applied to the analysis of a single survey. For the current study, beliefs were elicited from one of the authors (MC) who was also an investigator of the serological study in Medellin (23) . In the case of dengue, one important consideration is whether the test in question may cross-react with other flaviviruses (26) , or have lower specificity in those who have been vaccinated against them (27) . The elicited distributions for sensitivity and specificity are used here to illustrate the current method and are not conclusive in terms of the performance of the test in question. Also, the considerations for other diagnostic tests and other settings will vary. A smooth distribution between 0 and 1 was fitted to these belief weights. Both beta and logistic-normal families were fitted. Each has two parameters, which were fitted by the method of moments, i.e. equating the mean and variance of the belief weights to those of the distribution. The beta distribution was used for the estimation of the force of infection. More broadly, some models for sensitivity and specificity are unidentifiable (28) For all MCMC models, the point estimate is taken to be the median of the iterative values and the 95% credible interval is from the 2.5 th to 97.5 th percentiles. Figure 2 shows the fitted proportions of seropositive by age, in the dengue study in Colombo (14) . Values of 85% sensitivity or specificity have been chosen to illustrate the method rather than on the basis of expert opinion or of comparison against a gold standard. However, they are in the range found for other dengue IgG ELISAs (30, 31) . As expected, imperfect sensitivity implies higher seroprevalence, and imperfect specificity the reverse. The consequences of other values of sensitivity and specificity are shown in Figure 3 . The . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125724 doi: medRxiv preprint confidence bands reflect sampling variability in the data rather than uncertainty in the values of sensitivity and specificity. From a standard frequentist analysis with binomial regression, the estimated force of infection is 13.7% per year (95% confidence interval 12.4-15.2%). Figure 3 shows that the results from the current model approach the results from the standard analysis as sensitivity and specificity tend to 100%. For 100% sensitivity the results from the current model are the same, and for 100% specificity the point estimate is the same and the credible interval is 0.1% lower (12.3-15.1%). In Medellin, 705 people had test results available for both surveys (23) . Of these, 260 originally tested negative, of whom 31 (11.9%) were positive on the second survey, is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125724 doi: medRxiv preprint varying, the results are qualitatively similar to Figure 5b (Appendix 3) , the point estimate is again 15.6% per year, and the credible interval is from 4.9% to 44.5%. Not all assays are suitable for serological surveys. For example, the World Health Organization discourages the use of rapid tests in such studies of dengue (5) , and the utility of serological assays for SARS-CoV-2 is currently being debated (12, 32) . Statistical methods can help quantify the degree of uncertainty that would arise from the use of any given test. Previous studies have simultaneously estimated test sensitivity and force of infection for single or repeat surveys (16) (17) (18) , and estimated the force of infection subject to fixed values for sensitivity and specificity in a single survey (20) . Here we present methods for estimating the force of infection taking into account imperfect sensitivity and/or specificity, and uncertainty in these parameters, for either single or repeat surveys. Should wellestablished and generalizable values of sensitivity and specificity be available, they can be used in the methods described here. However, this is not always the case. For example, in the case of dengue, there may be cross-reaction with other flaviviruses (26), whose occurrence varies geographically. The model for the single serosurvey, in which age is taken as the time at risk, applied to the dengue serosurvey in Colombo (14) , showed how the force of infection depends on the assumed sensitivity and specificity. When perfect sensitivity and specificity are assumed, the results are effectively identical to those from the standard binomial regression. For the example of repeat serosurveys in Medellin (23), the elicited expert belief for the specificity was relatively precise, resulting in a fairly precise estimate of the force of infection (95% credible interval 4.0 to 5.8% per year). The belief for sensitivity was less precise and resulted in an interval estimate that was so wide (5.5 to 44.4%) as to potentially lack utility. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125724 doi: medRxiv preprint The results from these two studies illustrate the method, but the force of infection values should not be taken as authoritative for the study settings. We have opted for estimation in a Bayesian framework by MCMC (22) . The model for a single serosurvey is similar to that of Lewis et al. (21) for prevalence, and may be soluble by direct application of maximum likelihood, hence avoiding the need for iterative sampling. The identifiability of some Bayesian models for the estimation of prevalence is affected by the choice of priors for sensitivity, specificity and other parameters: inaccurate priors can then give rise to inaccurate conclusions (28) . Although it may be possible to 'learn' about both the assay parameters and the force of infection, here we have avoided identifiability concerns by including the elicited uncertainty in sensitivity and specificity via Monte Carlo simulation. In effect, the elicited distribution is both the prior and posterior distribution. This approach was shown for the model for repeat surveys but could equally be applied to the one for a single survey. It was illustrated by eliciting beliefs about sensitivity and specificity from a single expert. To reach substantive conclusions, multiple experts would be required (25) . Estimates from systematic reviews could be used instead of expert opinion if they were generalizable to a given study area. Future work could seek models with Bayesian priors for sensitivity and specificity, while still correctly estimating the force of infection. In the meantime, the use of Monte Carlo in an outer loop, with MCMC estimation each time, makes the analysis relatively time-consuming. Also, a reformulation would be required to allow the inclusion of covariates. The method is shown for two surveys, studies with more than two could be included, with each being constrained to have a seroprevalence no lower than the previous. Another limitation is the assumption that each individual has long-lasting immunity, so that apparent seroreversions are due to test errors rather than waning immunity. Depending on the infection in question, the validity of this assumption may depend on factors such as age and immunocompetence. In conclusion, the methods presented here can make more realistic estimates of force of infection, and can help inform the choice of serological tests for future serosurveys. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this preprint this version posted June 11, 2020. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020. . Proportion seropositive for dengue by age in Colombo (14) . The solid line is the fit from a standard analysis assuming a perfectly sensitive and specific test. The upper dashed line is from an analysis assuming 85% sensitivity and 100% specificity, and the lower dashed line with these values exchanged. Relation between force of infection, sensitivity and specificity in the Colombo data. The force of infection is estimated for each value of sensitivity or specificity, considered fixed. In this figure, when sensitivity is less than 100% then specificity is assumed to be 100%, and conversely. The grey zones are the 95% credible intervals. As sensitivity and specificity approach 100%, to the right side of the plot, the credible intervals approach the 95% confidence interval from standard binomial regression (vertical dashed line). Uncertainty in a) specificity and b) sensitivity for Medellin study. . CC-BY 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted June 11, 2020. . https://doi.org/10.1101/2020.06.09.20125724 doi: medRxiv preprint The use of simple epidemiological models in the evaluation of disease control programmes: a case study of trachoma A force-of-infection model for onchocerciasis and its applications in the epidemiological evaluation of the Onchocerciasis Control Programme in the Volta River basin area Pertussis in England and Wales: an investigation of transmission dynamics and control by mass vaccination Seventy-five years of estimating the force of infection from current status data Informing vaccination programs: a guide to the design and conduct of dengue serosurveys. Geneva: World Health Organization Viral Infections of Humans: Epidemiology and Control Ten years of serological surveillance in England and Wales: methods, results, implications and action Approaches for the development of rapid serological assays for surveillance and diagnosis of infections caused by zoonotic flaviviruses of the Japanese encephalitis virus serocomplex Influenza serological studies to inform public health action: best practices to optimise timing, quality and reporting Research priorities for the development and implementation of serological tools for malaria surveillance A review of dengue diagnostics and implications for surveillance and control The important role of serology for COVID-19 control. The Lancet Infectious Diseases Modelling Binary Data. London: Chapman and Hall Estimates of dengue force of infection in children in Colombo, Sri Lanka Estimating seroprevalence of vaccinepreventable infections: is it worth standardizing the serological outcomes to adjust for different assays and laboratories Analysis of longitudinal bacterial carriage studies accounting for sensitivity of swabbing: an application to Neisseria meningitidis Estimating the burden of rubella virus infection and congenital rubella syndrome through a rubella immunity assessment among pregnant women in the Democratic Republic of the Congo: Potential impact on vaccination policy Evaluation of nationwide supplementary immunization in Lao People's Democratic Republic: Population-based seroprevalence survey of anti-measles and anti-rubella IgG in children and adults, mathematical modelling and a stability testing of the vaccine Force-of-infection and true infection rate of dengue in Singapore -its implication on dengue control and management Reconstruction of Rift Valley fever transmission dynamics in Madagascar: estimation of force of infection from seroprevalence surveys using Bayesian modelling A tutorial in estimating the prevalence of disease in humans and animals in the absence of a gold standard diagnostic Markov Chain Monte Carlo in Practice Dengue virus serological prevalence and seroconversion rates in children and adults in Medellin, Colombia: implications for vaccine introduction How to use the nominal group and Delphi techniques A valid and reliable belief elicitation method for Bayesian priors Cross-reactivity in flavivirus serology: new implications of an old finding? Evaluation of ELISA-based serodiagnosis of dengue fever in travelers Prior Precision, Prior Accuracy, and the Estimation of Disease Prevalence Using Imperfect Diagnostic Tests A program for analysis of Bayesian graphical models using Gibbs sampling Multicountry prospective clinical evaluation of two enzyme-linked immunosorbent assays and two rapid diagnostic tests for diagnosing dengue fever Comparison of seven commercial antigen and antibody enzyme-linked immunosorbent assays for detection of acute dengue infection The Role of Antibody Testing for SARS-CoV-2: Is There One? Excel file for use in eliciting beliefs. R code. One file for single survey, one for repeat surveys, and one with utility functions.