key: cord-0652122-oragu0bm
authors: Staerk-Ostergaard, Jacob; Kirkeby, Carsten; Christiansen, Lasse Engbo; Andersen, Michael Asger; Moller, Camilla Holten; Voldstedlund, Marianne; Denwood, Matthew J.
title: Evaluation of diagnostic test procedures for SARS-CoV-2 using latent class models: comparison of antigen test kits and sampling for PCR testing based on Danish national data registries
date: 2021-12-21
journal: nan
DOI: nan
sha: 86591956901dcbd949d69376ff039fb5607fa9d4
doc_id: 652122
cord_uid: oragu0bm

Antigen test kits have been used extensively as a screening tool during the worldwide pandemic of coronavirus (SARS-CoV-2). While it is generally expected that taking samples for analysis with PCR testing gives more reliable results than using antigen test kits, the overall sensitivity and specificity of the two protocols in the field have not yet been estimated without assuming that the PCR test constitutes a gold standard. We use latent class models to estimate the in situ performance of both PCR and antigen testing, using data from the Danish national registries. The results are based on 240,000 paired tests results sub-selected from the 55 million test results that were obtained in Denmark during the period from February 2021 until June 2021. We found that the specificity of both tests is very high in our data sample (>99.7%), while the sensitivity of PCR sampling was estimated to be 95.7% (95% CI: 92.8-98.4%) and that of the antigen test kits used in Denmark over the study period was estimated at 53.8% (95% CI: 49.8-57.9%). Our findings can be used as supplementary information for consideration when implementing serial testing strategies that employ a confirmatory PCR sample following a positive result from an antigen test kit, such as the policy used in Denmark. We note that while this strategy reduces the number of false positives associated with antigen test screening, it also increases the false negatives. We demonstrate that the balance of trading false positives for false negatives only favours the use of serial testing when the expected true prevalence is low. Our results contain substantial uncertainty in the estimates for sensitivity due to the relatively small number of positive test results over this period: validation of our findings in a population with higher prevalence would therefore be highly relevant for future work.

the PCR test constitutes a gold standard. We use latent class models to estimate the in situ performance of both PCR and antigen testing, using data from the Danish national registries. The results are based on 240,000 paired tests results sub-selected from the 55 million test results that were obtained in Denmark during the period from February 2021 until June 2021.

We found that the specificity of both tests is very high in our data sample (>99.7%), while the sensitivity of PCR sampling was estimated to be 95.7% (95% CI: 92.8-98.4%) and that of the antigen test kits used in Denmark over the study period was estimated at 53.8% (95% CI: 49.8-57.9%). Our findings can be used as supplementary information for consideration when implementing serial testing strategies that employ a confirmatory PCR sample following a positive result from an antigen test kit, such as the policy used in Denmark.

We note that while this strategy reduces the number of false positives associated with antigen test screening, it also increases the false negatives. We demonstrate that the balance of trading false positives for false negatives only favours the use of serial testing when the expected true prevalence is low. Our results contain substantial uncertainty in the estimates for sensitivity due to the relatively small number of positive test results over this period: validation of our findings in a population with higher prevalence would therefore be highly relevant for future work.

Teaser description: Estimating the diagnostic performance of antigen and PCR sampling as used routinely during the COVID-19 pandemic in Denmark.

Diagnostic testing procedures play a crucial role in the control of an infectious disease in terms of identifying infectious individuals and estimating the burden of infection. This is particularly true for diseases such as Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), which spread rapidly due partly to asymptomatic individuals who have no other way of knowing that they are infected (1) . However, it is essential to consider the diagnostic performance of each testing procedure, including potential errors related to imperfect sampling and accuracy of reporting results, when considering their utility within a disease control programme.

This requires information on the real-world sensitivity and specificity of each of the available diagnostic test procedures.

There are several challenges involved with providing these estimates of sensitivity and specificity. The first is that laboratory sensitivity and specificity as estimated under tightly controlled conditions may not be representative of the performance of the test in the field (2) , which also include challenges not directly related to the laboratory procedure such as contamination within the submitted sample. The second challenge is that the reference test against which we may wish to evaluate a new diagnostic test may itself be imperfect. This necessitates the use of latent class models (LCM) to analyse paired testing data in order to provide unbiased estimates of sensitivity and specificity for both tests being evaluated in the absence of a 'gold standard' test. LCM based on the Hui-Walter paradigm were originally proposed over 40 years ago (3) and have since become widely used for evaluating diagnostic tests within the veterinary literature (4) (5) (6) . Such tests also have great potential for further use within the human medical literature (7) . LCM have been used to estimate the sensitivity and specificity of reverse transcriptase polymerase chain reaction (RT-PCR), computed tomography (CT) and a number of other clinical and laboratory parameters for diagnosing SARS-CoV-2 infection (8) , and to esti-mate the sensitivity and specificity of three commonly used tests for diagnosing COVID-19 (7) . However, the majority of studies that have evaluated antigen tests for SARS-CoV-2 to date have not used LCM methods to analyse the data, i.e. the studies have assumed that the reference test (typically RT-PCR) is perfect (9) . Despite this common assumption, all diagnostic tests are in fact imperfect, especially when considering extraneous sources of error such as mislabelling and contamination of the submitted sample. Therefore, using another imperfect test as a gold standard will bias the estimates of sensitivity and specificity of the test being evaluated.

There is one example of using LCM to assess the performance of antigen testing using laboratory samples (10) . However, laboratory diagnostic test performance does not account for extraneous sources of error that are important in real-world settings, so data collected in the field provides a more relevant estimate of the performance of diagnostic tests from the perspective of disease control programmes. A previous study used field data to evaluate the diagnostic performance of antigen tests, but did so assuming that PCR sampling constituted a gold standard (11) . To our knowledge, there are currently no studies that use LCM methods to provide unbiased estimates of test performance based on field data.

The scope of this paper is to apply LCM to estimate the in situ sensitivity and specificity of PCR test sampling and antigen test kits performed during the SARS-CoV-2 epidemic in the Danish population between February and June 2021. We use the term 'PCR sampling' to refer to the entire process between sample collection in the field, analysis of the sample using a Nucleic Acid Amplification Test (NAAT) procedure (which are predominantly RT-PCR in Denmark), and reporting of the result via the national database, to distinguish this concept from the RT-PCR test itself. Similarly, we use the term 'antigen test' to mean the entire process including sampling and potentially imperfect use of the various different test kits that have been used in Denmark. Our analysis follows the STARD-BLCM reporting guidelines (12) .

Descriptive statistics A total of 239,221 test pairs, where an antigen test kit was taken within 10 hours of being sampled for PCR testing, were available from 222,805 individuals. Among these, 77,439 pairs were from people living in high-prevalence parishes, 62,837 were from medium-prevalence parishes and 57,992 from low-prevalence parishes. The vaccinated group had 40,953 test pairs. We assume that each test pair was performed on separate samples for antigen and PCR testing from the same individual, however a small number of sample pairs (n=360) were registered with exactly the same date/time stamp, so it is possible that some of these tests were performed on the same sample. The tallies for each combination of test results is presented in Table 1 , while Figure 4 shows the age distributions of the high/medium/low-prevalence groups. The medium-prevalence group shows similar age characteristics to the full population, whereas the high/low-prevalence groups have an over/under-representation of 20-40-year-old individuals and an under/over-representation of >55-year-old individuals, respectively, which aligns well with 20-40 year olds being the main source of positive tests during May-June 2021.

The demographic makeup of individuals taking the test combination PCR→antigen generally follows that of the population taking antigen tests, with some additional sex bias ( Figure 3 ).

Overall, the frequency of antigen testing has shifted towards younger individuals for both males and females, possibly reflecting that antigen tests have been used in Denmark to screen for SARS-CoV-2 in primary schools, high schools and university campuses. The demography of the population undertaking PCR sampling is more evenly distributed across ages, with a higher rate for females than males. The higher rate of female PCR sampling might be the cause for the higher rate of females in the PCR→antigen data. The higher rate of female PCR sampling is partially due to this procedure being used to screen healthcare professionals, the majority of whom are female.

The distribution of time intervals between a positive PCR sample and a subsequent antigen test does not depend on the result of the antigen test ( Figure 5 ). There was a slightly longer interval on average between a negative PCR sample and a negative antigen test (negative → negative) compared to that between a negative PCR sample and a positive antigen test (negative → positive). However, a Kolmogorov-Smirnov (KS) test did not show a statistically significant difference (p-value= 0.370).

There is, however, a substantial reduction in time intervals following a positive PCR sample compared to that following a negative PCR sample, regardless of the antigen test result ( Figure   5 ). This difference is statistically significant as measured by the KS test (p-value < 0.001 for any combination). This indicates that individuals who are more likely to have a positive PCR sample (perhaps because they are a high-risk secondary contact and/or because they have clinical symptoms) are more likely to take an antigen test within a shorter period of time following their PCR sample.

Statistical modelling The frequencies for the four combinations of paired test results for each of the three populations are shown in Table 1 . The Hui-Walter model that was fit to these cross-tabulations converged and produced effective sample sizes above 1,000 for all parameters. Results are presented in Table 2 The sensitivity and specificity estimates are the sole focus of this study, however, the prevalence estimates corresponding to each of the groups are presented in the appendix ( Table 9 ).

The prevalence estimates for the three unvaccinated groups are highest for the high-prevalence group and lowest for the low-prevalence group, as would be expected given the way in which these groups were artificially constructed. The confidence interval limits of the low-prevalence unvaccinated group overlaps with that of the vaccinated group, indicating a somewhat similar prevalence between these two groups.

Results of the sensitivity analysis, varying the maximum allowable time between PCR sampling and antigen testing (time lag), are given in appendix A. The sensitivity of antigen tests was estimated consistently for all analyses, with 95% CI substantially overlapping across the range of time lag values. For PCR sampling, a small decrease in the median estimate of sensitivity was observed from a maximum of 97.65% at 2 hours to a minimum of 94.66% at 24 hours, however the 95% CI produced for each time lag value overlapped substantially. Similarly, the specificity of both procedures was estimated consistently across varying time lag values.

Implications for serial testing For serial testing, the estimated overall specificity is 100%, while the sensitivity is 51.48% (95% CI: [47.37;55.96]), see Table 2 .

During the study period, there were 35,530 positive antigen samples in Denmark from a total of 32,789,084 tests, and 109,922 positive PCR samples from a total of 22,052,829 samples.

Adjusting the observed proportion of positive results from each test using the Rogan-Gladen estimator (19) , we found a corrected prevalence of 0.0916% based on antigen tests and 0.4121% based on PCR sampling. This discrepancy can be explained to some extent by the fact that while antigen tests are being used for screening in the general population, PCR sampling is used for diagnostic purposes, i.e. for confirmation, near-contacts and screening in hospitals.

Therefore, PCR sampling might be more often applied in sub-groups where the prevalence would be expected to be higher. Of the 35,530 positive antigen tests, 28,366 were followed by a confirmatory PCR sample within 3 days, corresponding to around 80%. Of these followup PCR samples, 11,985 were negative, thus releasing these individuals from quarantine. We therefore adopted 80% as a reference of how many positive antigen tests would be followed (and potentially superseded) by a PCR sample in the scenarios of varying prevalence. Table 3 shows the estimated total number of test result cases (and 95% confidence limits) of false positives and false negatives for antigen testing alone compared with serial testing. Evidently, the number of false positive cases at national level is quite stable around 21-22,000 with prevalence between 0.01% and 4%. As Table 3 shows, the serial testing scheme effectively removes almost all false positive cases. However, as the prevalence increases, so does the number of false negatives. Since the sensitivity of serial testing is lower than that of antigen testing alone, the rate of increase in false negative cases for serial testing is higher than for antigen tests. Table 4 presents the estimated increase in false negative cases when changing from antigen testing alone to serial testing, as well as the decrease in false positives and the balance between these two. As shown, the balance is in favour of the serial testing scheme when the prevalence is low, since the number of false positives that are eliminated exceeds the expected increase in false negatives. However, a higher prevalence of ≈ 3% favours antigen testing alone, since the median number of false negatives outweighs the false positives in this scenario. At the lower limit of the confidence interval (2.5%), the balance tips at a prevalence of ≈ 1%. These results show that the implicit trade-off between sensitivity and specificity in serial testing should be taken into account if this strategy is used during a disease outbreak.

To our knowledge, this study represents the first use of LCM to assess the overall diagnostic utility of sampling for PCR testing and antigen kit testing for COVID-19 in the field. The study period in this paper covers the 5 months from 1st February to 30th June 2021, with data subsampled from the complete database of Danish test results. The pandemic in Denmark peaked at the end of 2020, before the mass administration of vaccines began in early 2021. Antigen tests were rolled out during 2021, with daily tests beginning to increase rapidly by February. As such, the study period covers a period of increasing test numbers and an initially low incidence that increased during the study period and peaked around the end of May. During this period, vaccines were also administered, beginning with the oldest age groups and others with a high risk of hospitalisation, continuing with younger and less vulnerable groups as the study period went on. From March 2021, there was a transition from national lockdown to a re-opening of schools and shops, with social activities permitted once again. This indicates that the study period covers a time during the pandemic when multiple factors influenced the incidence rates.

A main factor in this study was the shift to younger generations being the main driver of the continued infections. The Hui-Walter model paradigm requires the use of multiple populations with differing prevalence but identical test specificity and sensitivity. In order to maximise the ability of the model to extract information from the data, we used artificial stratification based on the expected prevalence in the parish of residence.

We found that the specificity for both test procedures was estimated to be close to 100%: [49.83;57.93]) for antigen testing. These estimates are more uncertain that those of specificity due to the relatively low number of true positive individuals in Denmark over this time period. Our model allowed the sensitivity of the group of vaccinated individuals to differ from that of unvaccinated individuals. However, the 95% confidence intervals for these estimates overlapped substantially, and we therefore conclude that there is no evidence that the performance of these diagnostic procedures is dependent on the vaccination status of the individual being tested. However, these findings should be considered to be uncertain due to the relatively low number of vaccinated individuals in our study and correspondingly wide 95% CI for the sensitivity estimate particularly in vaccinated individuals.

Despite these limitations, our results show that the sensitivity of PCR sampling in Denmark over our study period was relatively high (i.e. over 91.5% in the worst case, and potentially as high as 100% in vaccinated individuals). However, it is important to emphasise that it should not be considered a gold standard when evaluating the performance of antigen testing, since this approach will lead to a downward bias in the estimated performance of the antigen test. The imperfect nature of PCR sampling also affects the use of confirmatory PCR testing following a positive antigen test. This serial testing scheme has been employed in Denmark to reduce the number of false positives generated by the routine use of antigen tests. However, our study highlights the cost of this strategy in terms of increased false negative cases. Indeed, our findings suggest that the expected number of false negatives have increased during the study period due to the sensitivity and specificity of the serial testing scheme. As we also demonstrate, this depends heavily on the true prevalence, with the reduction in false positives expected to be equal to the increase in false negatives at a prevalence of around 3%. As such, the serial testing strategy is justifiable when the prevalence is low, but as infection rates increase, decision makers must consider whether a trade-off of 1:1 is acceptable. Given a confirmatory PCR sample follow-up rate of 80% combined with the estimates in Table 4 and an assumed true prevalence between 0.1% and 0.4%, the increase in false negatives is expected to lie somewhere between 610 (= 0.8 · 763) and 2,443 (= 0.8 · 3054) cases, while the corresponding reduction in false positives is expected to be between 17,645 (= 0.8 · 22, 056) and 17,592 (= 0.8 · 21, 990). This implies a trade-off between 29:1 to 7:1 in favor of reducing false positives.

Compared to the results of Jakobsen et al ( (11)), we found a similar, although marginally higher, specificity for antigen testing. The slight increase in the estimate for specificity is most likely due to false negative PCR sample results being erroneously attributed to false positive antigen tests. However, our estimate for sensitivity (54.77%) is substantially lower than the value of 68.9% that was previously reported. There are multiple possible reasons for this discrepancy.

Firstly, data in Jakobsen et al ( (11)) were collected under a research protocol and therefore under more tightly controlled conditions than would be expected in the field, which would be expected to increase the diagnostic test performance. Furthermore, the previous study used data collected on 26th December 2020, and out of the 4,697 sampled individuals, 705 (15%) reported symptoms, while 3,008 (64%) reported no symptoms. For the symptomatic group, the sensitivity climbed to 78.8%, while for the group without symptoms, the number was 49.2%.

Based on a voluntary questionnaire when booking a time for PCR sampling in Denmark, 10 .0% reported that they booked a PCR test "due to showing COVID-19 symptoms". From February 2021 to March 2021, the group of self-reported symptoms ranged from 4.7% to 7.2%. As such, the proportion of individuals with symptomatic disease in the real world dataset is substantially lower than that for the previous study, which may be expected to negatively impact the overall sensitivity of antigen tests.

As with all LCM, we must consider the implicit meaning of the latent class that we are estimating (12) . The definition of this latent class is tied to the statistical concepts inherent to the LCM, and represents the underlying 'true state' on which the test results can be considered to be conditionally independent (20) . However, we note that the 'true state' in the LCM sense may not perfectly match the biological definition of 'infected' or even 'infectious'. This is because RT-PCR tests detect viral RNA, while antigen tests detect viral antigens. As such, the latent state implicitly defined by the LCM is 'presence of viral RNA and antigens in the samples' rather than 'individual is infected with virus'. It is therefore possible that part of the reason for the estimated imperfect sensitivity of the PCR sampling as estimated by the LCM is due to detection of either early-stage infection or late-stage infection corresponding to detectable levels of viral RNA but absence of viral antigens, which may be considered by the LCM as a 'true negative'. In addition to this, it is also important to take into account the self-selection bias caused by non-random sampling of individuals for testing. It is relatively uncommon for an individual to take an antigen test within 10 hours following a PCR sample, and we cannot reasonably expect that these individuals are representative of the general population. The true interpretation of the prevalence estimates presented here is therefore: the average prevalence of virus shedding in each of the subgroups among the individuals who chose to take a PCR sample with a follow-up antigen test within 10 hours over the 5-month period. There is also a strong temporal confounding with these estimates due to the gradual roll-out of vaccines in Denmark -the vaccinated group is predominantly represented by tests taken later in the time series when the prevalence can be expected to be lower. It may also be tempting to compare the prevalence estimate from the unvaccinated groups to that of the vaccinated groups. However, there is substantial temporal bias in terms of the proportion of individuals vaccinated over this time period (21) , so vaccination status is therefore confounded with the underlying temporal trends of disease burden in the general population. Furthermore, there was variation in the official policy towards routine testing between vaccinated and unvaccinated individuals. We therefore note that the prevalence estimates for each of the four groups should not be interpreted as the prevalence of either clinical disease or SARS-CoV-2 infection in these groups. These are provided purely for the context of the LCM and should not be interpreted as being representative of any unbiased prevalence estimate that could be made over this time period. However, although we do not consider the prevalence estimates to be directly useful, they are necessary parameters within the Hui-Walter framework and we report them as advised by the STARD-BLCM reporting guidelines (12) .

There are a number of assumptions and limitations associated with this study. The data were collected in such a way that test pairs were used when an antigen test was taken within 10 hours of a PCR sample. Since the usual response time for the PCR sample is between 10-36 hours, with a mean of 14 hours, the PCR result would not have been known before taking the antigen test in almost all cases. The antigen test can therefore be assumed to be independent of the PCR test, conditional on the underlying latent disease state of the individual. It is possible that a small number of individuals may have known their PCR sample result before having an antigen test, which may have affected their decision to take an antigen test. However, we believe that this is unlikely and therefore does not have a strong impact on our conclusions. Our sensitivity analysis, in which we alter the time period between tests produces qualitatively similar results, which supports this conclusion. The second important assumption made by our analysis is that the individuals included in the LCM analysis are representative in terms of the expected test sensitivity and specificity. In our case, this means that we should have no reason to suspect either a higher or lower sensitivity or specificity for individuals having both tests within 10 hours compared to individuals who have only a single test. In reality, it may be the case that our model data include a higher proportion of individuals with clinical disease than is true of the general population: it is therefore possible that we overestimate the sensitivity of both tests to some extent. However, we can think of no reason that the specificity estimates may be in any way biased by our data selection criteria. It is important to note that we expected the prevalence estimates to be heavily biased because we expect individuals who take both tests to have a higher than average probability of testing positive. This bias is also borne out by our results, which show far higher prevalence estimates than are believed to be the case for Denmark. However, this bias in prevalence estimates does not impact our study because estimation of prevalence is not our aim: the only important assumption is that the estimates of sensitivity and specificity are unbiased. It is also important to recognise that our estimates are based on a data sample taken from Denmark over the period February to July 2021: findings may differ in future studies based on different datasets, particularly if fundamental properties of test procedures differ over time.

Finally, we emphasise that our results refer to overall sensitivity and specificity in the field, which includes potential sources of error that are extraneous to the tests themselves such as sample contamination, mislabelling and misreporting of results. These estimates of operational sensitivity and specificity are highly relevant when evaluating diagnostic testing in terms of the overall effectiveness within a disease control programme.

Our results show that the overall sensitivity of antigen testing and PCR testing was around 54% and 96%, respectively, when used as part of the Danish national control programme for SARS-CoV-2 between February and July 2021. However, our estimates for sensitivity are relatively uncertain due to the low number of true positive individuals in our dataset -validation of these findings in a population with higher prevalence would therefore be valuable. We also found that the overall specificity was close to 100% for both procedures, and that the use of confirmatory testing based on PCR sampling following positive antigen tests increase the number of overall false negative results. When the prevalence is low (<1%), a small increase in false negatives may be tolerated due to the relatively large decrease in false positives, but when the prevalence is high (>3%) the increase in false negatives exceeds the decrease in false positives. The imperfect performance of PCR sampling in the field should therefore be accounted for when considering COVID-19 testing policies.

The centralised nature of record keeping within the Danish healthcare system provides a natural way in which to evaluate diagnostic test performance in the field. All medical procedures in as well as standard-procedure screening tests in hospitals. Since we do not know the reason that individual tests were taken, we must assume a mixed population among these test results. This group therefore includes both individuals with and without symptoms and individuals in need of a test result in order to attend social gatherings, visits, work, etc. Although various antigen test kits have been used in Denmark (see Table 8 ), the MiBa database does not contain information on which antigen test kit was used for a given test and as such, we cannot assess the performance of the kits individually. We therefore refer to antigen tests in general and the estimated performance is consequently the overall average performance of the different kits used in Denmark. Similarly, the MiBa database contains results from a number of different Nucleic Acid Amplification Test (NAAT) procedures other than RT-PCR, but we were unable to distinguish these based on the data available. However, the vast majority of these samples were analysed using RT-PCR during the study period, so we refer to all NAAT tests as 'PCR' for the purposes of simplification.

Data subset used for modelling A fundamental requirement of LCM is that two contemporaneous tests are available from the same individual, i.e. that paired test results are available.

An equally important assumption is that the two test results are conditionally independent, i.e. that the decision to undertake one of the tests was not made conditional on the result of the other test, as may be the case for a confirmatory PCR test following a positive antigen test. We therefore restrict the case definition for paired observations used for the model to individuals where the PCR sampling preceded the antigen test by no more than 10 hours. We assume that this results in conditionally independent data based on the following reasoning:

• The usual time for obtaining a PCR test result in Denmark is more than 10 hours, so in almost all cases, the PCR result would not be known at the time of the antigen test.

• Under Danish regulations in force during the relevant time period, a positive antigen test result was considered to be 'overruled' by a negative PCR test due to the high specificity of the latter and relatively large number of antigen tests being performed. However, the converse was not true, i.e. a negative antigen test following a PCR could not be used by the individual to avoid isolation. It is therefore highly unlikely that an individual would take an antigen test if already in possession of a recent PCR test result.

Data processing We identified the valid pairs of PCR samples and antigen tests where the antigen test followed the PCR sample by no more than 10 hours. This was done using the timestamps identifying when each test sample was collected, as registered in the MiBa database.

We allow the same individual to appear multiple times in the data if they possessed multiple pairs of PCR→antigen test results, except if the subsequent PCR sample was within 2 weeks of the previous PCR sample. We refer to this subset of test results as the 'model data'.

In order to use LCM to analyse the data, it was necessary to stratify the data into multiple populations with varying prevalence. For the purposes of this study, these populations were generated artificially in order to maximise the statistical power of the LCM. We note that the artificial nature of these populations renders the estimates of prevalence effectively meaningless, but we can assume that the estimates of sensitivity and specificity are unbiased relative to those that would be obtained from a completely random sample. The procedure used for tests performed on unvaccinated individuals was as follows:

• Each of the 22.1M PCR samples taken during the study period was assigned to one of 2,157 parishes (Danish 'sogne') based on the registered home address of the individual being tested.

• PCR samples that were also included in the model data subset (see above) were removed.

• The proportion of the remaining PCR samples corresponding to a positive result was calculated per parish.

• The parishes were split into low-, medium-and high-prevalence groups based on this observed proportion.

• The two cutoff points were set so that the total population of each group of parishes (low, medium, high prevalence) was balanced to give approximately the same sample size.

• Each test pair within the model data was then linked to the parish of the individual being tested, and subsequently to the low-, medium-or high-prevalence group via the parish.

Alongside these three prevalence-based groups for tests performed on unvaccinated individuals, a fourth group was established consisting of tests carried out on all partially and fully vaccinated individuals. The vaccination group was added to investigate whether any differences in sensitivity and specificity could be detected between vaccinated and unvaccinated individuals. Vaccines in Denmark have been administered in risk-based groups that are heavily correlated with age. Statistical Modelling We fit a modified version of a standard two-test, four-population LCM to the paired test data obtained as described above. We follow the standard assumption of consistent specificity of both tests across all four populations. However, we allow the sensitivity of antigen testing and PCR sampling to vary between unvaccinated and vaccinated (including partly vaccinated) individuals, in order to allow for the possibility of reduced sensitivity due to suppression of viral excretion conditional on vaccination status. The sensitivity of both tests was assumed to be constant across the three unvaccinated populations.

We fit the LCM within a Bayesian framework, which requires prior distributions to be specified for all parameters. Minimally informative Beta(1, 1) priors were used for the prevalence parameters corresponding to each of the four groups, and weakly informative Beta(2, 1) priors were used for the sensitivity and specificity of both tests (two specificity and four sensitivity parameters). The model was fit using Markov chain Monte Carlo (MCMC) methods implemented using JAGS (14) , interfaced from R (15) using the runjags package (16) . A burn-in period of 10,000 iterations was used before sampling 50,000 iterations from the posterior of each of two parallel chains. Convergence diagnostics were assessed using the Gelman-Rubin statistic and visual examination of trace plots (17, 18) , and the effective sample size of all parameters was checked to ensure that it exceeded 1,000 independent samples. The R code needed to replicate the Hui-Walter model discussed is provided as supplementary material.

A sensitivity analysis was also performed in order to assess the impact of the 10-hour cutoff between PCR→antigen test as described above. The data were re-tabulated for time lag values ranging between 1 and 24 hours, and the model was re-run using each of these datasets as input.

Serial testing scheme The standard testing procedure in Denmark over the period February to July 2021 was to recommend the use of a confirmatory PCR test following a positive antigen result, in order to reduce false positive test results due to the presumed lower specificity of antigen tests. This serial scheme means that a negative antigen result does not require a confirmatory PCR, implying that the serial scheme will have a higher specificity but lower sensitivity than antigen testing alone. Using estimates of sensitivity and specificity of the two tests, we can derive the sensitivity and specificity for the serial scheme where both tests must be positive for the final result to be labelled positive, as se serial = se antigen · se PCR

This implies that the serial sensitivity will decrease, whereas the specificity will increase, compared to either of the antigen or PCR tests used alone. For a population of size N and prevalence To illustrate the results, we calculated the total number of false positives and false negatives expected under serial testing compared to antigen testing alone, assuming a known true prevalence. This procedure used Monte Carlo integration based on the estimates from each iteration of the LCM in order to obtain a full posterior distribution for all parameter estimates. Results are presented as expected cases per 10,000, as well as the expected cases in a population size of 32,789,084, which corresponds to the number of antigen tests in Denmark during the study period. In order to assess the impact of prevalence, we calculated the false positives and false negatives for prevalence values ranging between 0.01% and 4%.

This work was carried out exclusively using existing data contained within the Danish national data registries. Individual-level data were processed within the secure computing environment Tests pr. person Figure 3 : Representation of the study data relative to the Danish population. Data are at test level, which means that individuals can be represented several times. The y-axis reflects the average number of tests taken in the respective age group (males/females separately) during the study period from 1st February to 30th June. The higher rate of PCR sampling among females is partly due to screening for healthcare professionals, the majority of whom are female. Figure 4 : Age distribution across prevalence groups. The upper plot shows the age distribution among the full population in each group. The medium-prevalence group (yellow) follows the total population (black) to some extent. The lower plot shows the age distribution among the tests included in the study. It is skewed towards younger individuals. This is partially due to the heavy use of antigen testing in primary schools, high schools and universities. . The legend describes the test outcomes (PCR -Antigen). Note that a positive result from a PCR sample is followed more rapidly by an antigen test than is the case for a negative result from a PCR sample. Figure 6 : The estimated balance between the decrease in false positive cases and increase in false negative cases per 10,000 individuals, with 95% confidence limits, as a result of changing from an antigen test alone to a serial testing scheme. The prevalence varies from 0 to 4% and shows that as the prevalence increases, removing false positives by employing a serial testing scheme will cost more in terms of increasing false negative cases. The estimated median shows that the balance tips at a prevalence of 3%, such that more false negative cases can be expected than the number of removed false positive cases. Table 4 : Estimated increases in false negative (FN) and false positive (FP) cases for 32,789,084 tests and per 10,000 individuals for varying prevalence p. In addition, the table also shows the estimated balance between false positives and false negatives. When the prevalence p reaches ≈3%, the median increase in false negatives balances the false positives. At the lower limit of the confidence interval (2.5%), this balance occurs slightly above p = 1%. Figure 7 : The geographical location of parishes coloured according to their prevalence range: low, medium and high. Even though there are clusters of high-prevalence parishes, these are not exclusive to densely populated areas and larger cities.

Recommendations for national sars-cov-2 testing strategies and diagnostic capacities: interim guidance

Field evaluation of diagnostic test sensitivity and specificity for salmonid alphavirus (sav) infection and pancreas disease (pd) in farmed atlantic salmon (salmo salar l.) in norway using bayesian latent class analysis

Estimating the error rates of diagnostic tests

Evaluation of sensitivity and specificity of routine meat inspection of danish slaughter pigs using latent class analysis

Evaluating diagnostic tests with nearperfect specificity: Use of a hui-walter approach when designing a trial of a diva test for bovine tuberculosis

Accuracy of pcr, mycobacterial culture and interferon-γ assays for detection of mycobacterium bovis in blood and milk samples from egyptian dairy cows using bayesian modelling

Bayesian latent class models to estimate diagnostic test accuracies of covid-19 tests

A statistical framework to estimate diagnostic test performance for covid-19

Eu health preparedness: A common list of covid-19 rapid antigen tests and a common standardised set of data to be included in covid-19 test result certificates

Severe acute respiratory syndrome coronavirus 2 (sars-cov-2) seroprevalence: Navigating the absence of a gold standard

Detection of sars-cov-2 infection by rapid antigen test in comparison with rtpcr in a public setting

Stard-blcm: Standards for the reporting of diagnostic accuracy studies that use bayesian latent class models

Electronic reporting of diagnostic laboratory test results from all healthcare sectors is a cornerstone of national preparedness and control of covid-19 in denmark

Jags: A program for analysis of bayesian graphical models using gibbs sampling

R: A language and environment for statistical computing [internet]. r foundation for statistical computing

runjags: An r package providing interface utilities, model templates, parallel computing methods and additional distributions for mcmc models in jags

Inference from iterative simulation using multiple sequences

Assessing the convergence of markov chain monte carlo methods: an example from evaluation of diagnostic tests in absence of a gold standard

Estimating prevalence from the results of a screening test

Diagnosing diagnostic tests: Evaluating the assumptions underlying the estimation of sensitivity and specificity in the absence of a gold standard

Covid-19 dashboard

LCM estimates of specificity (in %) for the low-, medium-and high-prevalence parish groups for varying time lags between tests. Confidence intervals for both tests overlap across all time lags, thus indicating no trend in test performance up to a lag of 24 hours

The authors declare that they have no competing interests. All data as well as R code needed 

 Table 7 : LCM estimates of prevalence (in %) for the four groups: low, medium and high prevalence as well as vaccinated, for varying time lags between tests.