key: cord-346859-r1v6ir8u authors: Mallett, Sue; Allen, A. Joy; Graziadio, Sara; Taylor, Stuart A.; Sakai, Naomi S.; Green, Kile; Suklan, Jana; Hyde, Chris; Shinkins, Bethany; Zhelev, Zhivko; Peters, Jaime; Turner, Philip J.; Roberts, Nia W.; di Ruffano, Lavinia Ferrante; Wolff, Robert; Whiting, Penny; Winter, Amanda; Bhatnagar, Gauraang; Nicholson, Brian D.; Halligan, Steve title: At what times during infection is SARS-CoV-2 detectable and no longer detectable using RT-PCR-based tests? A systematic review of individual participant data date: 2020-11-04 journal: BMC Med DOI: 10.1186/s12916-020-01810-8 sha: doc_id: 346859 cord_uid: r1v6ir8u BACKGROUND: Tests for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) viral ribonucleic acid (RNA) using reverse transcription polymerase chain reaction (RT-PCR) are pivotal to detecting current coronavirus disease (COVID-19) and duration of detectable virus indicating potential for infectivity. METHODS: We conducted an individual participant data (IPD) systematic review of longitudinal studies of RT-PCR test results in symptomatic SARS-CoV-2. We searched PubMed, LitCOVID, medRxiv, and COVID-19 Living Evidence databases. We assessed risk of bias using a QUADAS-2 adaptation. Outcomes were the percentage of positive test results by time and the duration of detectable virus, by anatomical sampling sites. RESULTS: Of 5078 studies screened, we included 32 studies with 1023 SARS-CoV-2 infected participants and 1619 test results, from − 6 to 66 days post-symptom onset and hospitalisation. The highest percentage virus detection was from nasopharyngeal sampling between 0 and 4 days post-symptom onset at 89% (95% confidence interval (CI) 83 to 93) dropping to 54% (95% CI 47 to 61) after 10 to 14 days. On average, duration of detectable virus was longer with lower respiratory tract (LRT) sampling than upper respiratory tract (URT). Duration of faecal and respiratory tract virus detection varied greatly within individual participants. In some participants, virus was still detectable at 46 days post-symptom onset. CONCLUSIONS: RT-PCR misses detection of people with SARS-CoV-2 infection; early sampling minimises false negative diagnoses. Beyond 10 days post-symptom onset, lower RT or faecal testing may be preferred sampling sites. The included studies are open to substantial risk of bias, so the positivity rates are probably overestimated. Accurate testing is pivotal to controlling severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), otherwise known as the coronavirus disease 2019 . Considerable political and medical emphasis has been placed on rapid access to testing both to identify infected individuals so as to direct appropriate therapy, appropriate return to work, and to implement containment measures to limit the spread of disease. However, success depends heavily on test accuracy. Understanding when in the disease course the virus is detectable is important for two purposes, firstly to understand when and how to detect SARS-CoV-2, and secondly to understand how long individuals are likely to remain infective posing a risk to others. The success of COVID-19 testing depends heavily on the use of accurate tests at the appropriate time. Testing for active virus infection relies predominantly on reverse transcription polymerase chain reaction (RT-PCR), which detects viral ribonucleic acid (RNA) that is shed in varying amounts from different anatomical sites and at different times during the disease course. It is increasingly understood that differences in virus load impact directly on diagnostic accuracy, notably giving rise to negative tests in disease-positive individuals [1, 2] . Positivity is contingent upon sufficient virus being present to trigger a positive test which may depend on test site, sampling methods, and timing [3] . For example, it is believed that positive nasopharyngeal RT-PCR declines within a week of symptoms so that a positive test later in the disease course is more likely from sputum, bronchoalveolar lavage fluid, or stool [4] . Nomenclature for anatomical site is also unclear, with a wide variety of overlapping terms used such as "oral", "throat", "nasal", "pharyngeal", and "nasopharyngeal". Because testing is pivotal to management and containment of COVID-19, we performed an individual participant data (IPD) systematic review of emerging evidence about test accuracy by anatomical sampling site to inform optimal sampling strategies for SARS-CoV-2. We aimed to examine at what time points during SARS-CoV-2 infection it is detectable at different anatomical sites using RT-PCR-based tests. This IPD systematic review followed the recommendations of the PRISMA-IPD checklist [5] . Eligible articles were any case series or longitudinal studies reporting participants with confirmed COVID-19 tested at multiple times during their infection and provided IPD for RT-PCR test results at these times. We stipulated that test timings were linked to index dates of time since symptom onset or time since hospital admission as well as COVID-19 diagnosis by positive RT-PCR and/or suggestive clinical criteria, for example World Health Organization (WHO) guidelines [6] . Search strings were designed and conducted subsequently in PubMed, LitCOVID, and medRxiv by an experienced information specialist (NR). The search end date was 24 April 2020. We additionally included references identified by COVID-19: National Institute for Health Research (NIHR) living map of living evidence (http://eppi.ioe.ac. uk/COVID19_MAP/covid_map_v4.html), COVID-19 Living Evidence (https://ispmbern.github.io/covid-19/livingreview/) with a volunteer citizen science team, "The Virus Bashers" (Additional file 1: Table S1 ). Data were extracted into pre-specified forms. We did not contact authors for additional information. Study, participant characteristics, and ROB were extracted in Microsoft Excel (KG, JS, SG, JA, AW, SM). Data included country, setting, date, number of participants and IPD participants, inclusion criteria, IPD selection, participant age, sample types, RT-PCR test type and equipment, and primers. RT-PCR test results were extracted using Microsoft Access (SM, BS, JP, ZZ, CH). We could not identify an ideal risk of bias (ROB) tool for longitudinal studies of diagnostic tests, so we adapted the risk of bias tool for diagnostic accuracy studies QUADAS-2 [7] to include additional signalling questions to cover anticipated issues. ROB signalling questions, evaluation criteria, and domain assessment of potential bias are reported (Additional file 1: Table S2 ). Details of sampling sites and methods, including location of the sampling site(s) and any sample grouping (for example, if combined throat and nasal swabs), were extracted from full texts by a clinician (NS) with queries referred to a second clinician (ST). If stated, details of sampling methodology were recorded, including who collected samples, information regarding anatomical location (e.g. how the nasopharynx was identified), and sample storage (Additional file 1: Table S3) . IPD RT-PCR results were extracted from each article and converted to binary results ("positive" or "negative"). Data from Kaplan-Meier (KM) curves were extracted using Web digitizer [8] (Additional file 1: Table S3 ). Days since symptom onset and days since hospital admission were calculated from reported IPD. Data were presented collated across 5-day time intervals for each sample method, with longer times grouped within the longest time interval, and 95% CI was calculated for proportions. For comparison of duration of positive RT-PCR from respiratory tract (RT) and faecal samples, analysis and graphical presentation were restricted to participants sampled by both methods. Data analysis used STATA (14.2 StataCorp LP, Texas, USA) (Additional file 1: Table S3 ). A total of 5078 articles were identified, 116 full text articles were screened, and 32 articles were included (Fig. 1 ). Most articles were from China, in hospitalised adult participants (Table 1) . Articles reported on a total of 1023 participants and 1619 test results. Twenty-six (81%) articles reported data on test results since the start of symptoms, and 23 (72%) since hospital admission. Sixteen studies including 22% (229/1023) of the participants reported both these time points: The median time between symptom onset and hospitalisation was 5 days (interquartile range (IQR) 2 to 7 days). The median number of participants per study was 22 (IQR 9 to 56, range 5 to 232), and the median number of RT-PCR test results per participant was 4 (IQR 2 to 9) ( Table 2) . Articles variably specified sampling sites according to anatomical location, or grouped more than one site for analysis, for example as upper RT (Additional file 1: Table S4 ). The most frequent sample sites were faeces (n = 13), nasopharyngeal (n = 10), and throat (n = 9), although there was a range of other sites including blood, urine, semen, and conjunctival swabs ( Table 2) . Details of sampling method were generally absent. Two studies specified the person taking the samples. One study described how the nasopharynx was identified and the swab technique (length of contact time with the nasopharynx and twisting). Five studies specified sample storage and transport details. We present RT-PCR test results for 11 different sampling sites at different times during SARS-CoV-2 infection. Figures 2 and 3 show the number of positive and negative RT-PCR results for 5-day time intervals since symptom onset and time from hospital admission, respectively. The sampling sites yielding the greatest proportion of positive tests were nasopharyngeal, throat, sputum, or faeces. Insufficient data were available to evaluate saliva and semen. Only 33% of participants who were tested with blood samples had detectable virus (44/133; 6 articles [20, 26, 27, 31, 35, 38] ), and almost no samples tested from urine or conjunctival sampling detected virus presence. Using nasopharyngeal sampling, 89% (147/166, 95% CI 83 to 93) RT-PCR test results were positive from 0 to 4 We further grouped sites into upper (URT) and lower (LRT) respiratory tract. The rate of sample positivity reduced faster from URT sites compared to LRT sites (Fig. 4a) . Given that analysis across all participants is likely to be influenced by preferential URT sampling of participants with less severe disease, we also analysed participants who underwent both URT and LRT sampling. Again, URT sites on average cleared faster (median 12 days, 95% CI 8 to 15 days) than LRT sites (median 28 days, 95% CI 20 to not estimable; Fig. 4b ); the majority of participants clear virus from URT site before LRT (Fig. 4c) . Data based on time since hospital admission are consistent with data for time since symptom onset. Across participants sampled by both RT and faecal sampling since hospital admission, 29% of participants were Many articles reported intermittent false negative RT-PCR test results for participants within the monitoring time span. Where participant viral loads were reported, several different profiles were distinguished; two examples are shown in Fig. 6 [14, 15] . Intermittent false negative results were reported either where the level of virus is close to the limit of detection, or in participants with high viral load but for unclear reasons. The proportion of studies with high, low, or unclear ROB for each domain is shown in Fig. 7 , and ROB for individual studies is shown in Additional file 1: Table S5 . All studies were judged at high ROB. All but one were judged at high ROB for the participant selection domain [17] , mainly as they only included participants with confirmed SARS-CoV-2 infection based on at least one positive PCR test. Studies also frequently selected a subset of the participant cohort for longitudinal RT-PCR testing, and only results for these participants were included in the study. Ten studies were judged at unclear ROB for the index test domain as the schedule of testing was based on clinician choice rather than being pre-specified by the study or clinical guidelines, or because the samples used for PCR testing were not pre-specified. Eleven studies were judged at high ROB for the flow and timing domain mainly because continued testing was influenced by easy access to participants, such as by continued hospitalisation. Negative RT-PCR test results were common in people with SARS-CoV-2 infection confirming that RT-PCR testing misses identification of people with disease. Our IPD systematic review has established that sampling site and time of testing are key determinants of whether SARS-CoV-2 infected individuals are identified by RT-PCR. We found that nasopharyngeal sampling was positive in approximately 89% (95% CI 83 to 93) of tests within 4 days of either symptom onset. Sampling 10 days after symptom onset greatly reduced the chance of a positive test result. There were limited data on new methods of sample collection like saliva in these longitudinal studies. Sputum samples have similar or higher levels of detection to nasopharyngeal sampling, although this may be influenced by preferential sputum sampling in severely ill participants. Although based on few participants tested at both sampling sites, URT sites have faster viral clearance than LRT in most of these participants; 50% of participants were undetectable at URT sites 12 days after symptom onset compared to 28 days for LRT. We found that faecal sampling is not suitable for initial detection of disease, as up to 30% of participants detected using respiratory sampling are not detected using faecal sampling. Viral detection in faecal samples may be useful to establish virus clearance, although as noted, whether RT or faecal samples have longer duration of viral detection varies between participants. All included studies were judged at high ROB, so results of this review should be interpreted with caution. Table 3 provides an overview of the major methodological limitations and their potential impact on study results. A major source of bias is that all but one study [19] restricted inclusion to participants with confirmed SARS-CoV-2 infection based on at least one positive RT-PCR test, meaning that the percentage of positive RT-PCR testing is likely to be overestimated. Lack of technical details, for example of how samples are taken and RT-PCR tests performed, limits the applicability of findings to current testing. Compared to real life, studies were likely to use more invasive sampling methods, use experienced staff to obtain samples, and sample participants in hospital settings where sample handling could be standardised. Consequently, estimates of test performance are likely to be overestimated compared to real-world clinical use and in community population testing including self-test kits. These limitations have important implications for how testing strategies should be implemented and in The accuracy of RT-PCR testing is limited by sampling sites used, methods, and the need to test as soon as possible from symptom onset in order to detect the virus. Previous studies have established that in COVID-19 infection, viral loads typically peak just before symptoms and at symptom onset [4] and estimated false negative test results over time since exposure from upper respiratory tract samples [2] . To our knowledge, there has been no prior systematic review of RT-PCR using IPD to quantify the percentage of persons tested who are positive and how this varies by time and sampling site. Understanding the distribution of anatomical sites with detectable virus is clinically relevant, especially given independent viral replication sites in nose and throat using distinct and separate genetic colonies [17] . Understanding of different patterns of detection and duration of virus detection at different body sites is essential when designing strategies of testing to contain virus spread. Notably, it is unclear if detection of virus in faeces is important in disease transmission, although faecal infection was shown in SARS and MERS [41] . This review uses robust systematic review methods to synthesise published literature and identifies overall patterns not possible from individual articles. Using IPD, we examined data across studies and avoided studylevel ecological biases present when using overall study estimates. IPD regarding sample site at different time points during infection is vital because it provides an overview of test performance impossible from individual studies alone. Synthesised IPD can also substantiate or reject patterns appearing within individual studies. Within-participant paired comparisons of sampling sites also become possible with sufficient data. The main limitation is the risk of bias in the included studies. Although constraints were understandable given the circumstances in which the studies were done, the consequences for validity need to be highlighted. The percentage of positive RT-PCR testing is likely to be overestimated, because inclusion was restricted to participants with confirmed SARS-CoV-2 infection based on at least one positive RT-PCR test in all but one study [19] . This means that people who had a COVID-19 infection but never tested positive on at least one RT-PCR test would not have been included. This could arise if SARS-CoV-2 is not present at easily sampled sites or at the time participants were tested. This makes it impossible to determine the true false negative rate of the test-the proportion of people who actually have SARS-CoV-2 but would receive a negative RT-PCR test result. It is possible that only half of persons infected by SARS-CoV-2 may test positive, as a community surveillance study in Italy found only 53% (80/152) persons tested RT-PCR positive in households quarantined for 18 days with persons who tested PCR positive [39] . The same study also identified households where no one tested RT-PCR positive, but where there were clusters of persons with symptoms typical of COVID. Poor reporting of sampling methods and sites impaired our ability to distinguish between and report on variability between them. For some sampling methods such as saliva and throat swabs, more studies are needed. There were also sparse data on sampling methods that are becoming more widespread, such as participant self-sampling [42] and short nasal swab sampling (anterior nares/mid turbinate) [43] . Our index times may be subject to bias as symptom onset is somewhat subjective and hospital admission practices vary by country, pandemic stage, and hospital role (i.e. healthcare vs. isolation). The results presented do not correspond to following the same participants across time, but the testing at clinically relevant time snapshots reported from individual studies, so that participants tested at later time points are likely to have more severe disease; this does not limit the interpretation of results in understanding testing of participants in most clinical contexts. Comparisons of sampling sites should be restricted to participants tested at the relevant sites. We have used analysis methods that do not include clustering within studies, to keep analyses simple to understand and present, and to avoid complications of fitting models where the number of participants in each cluster varies. Ultimately, many potentially eligible studies did not report IPD which led to their exclusion, or only reported IPD for a subset of participants in the study. We would welcome contact and data sharing with clinicians and authors to rectify this. To avoid the consequences of missed infection, samples for RT-PCR testing need to be taken as soon as symptoms start for detection of SARS-CoV-2 infection in preventing ongoing transmission. Even within 4 days of symptom onset, some participants infected with SARS-CoV-2 will receive negative test results. Testing at later times will result in a higher percentage of false negative tests in people with SARS-CoV-2, particularly at upper RT sampling sites. After 10 days post-symptoms, it may be important to use Fig. 6 Example participants with intermittent false negative results. a An example of a participant with high viral load, but where alternate RT-PCR test results report high viral load or undetectable virus. b A participant where virus levels have reduced over time to a level around the limit of viral detection, and at these low levels of virus, intermittent negative results will occur due to differences in the location or amount of sample Table S2 ). For each domain, the percentage of studies by concern for potential risk of bias is shown: low (green), unclear (yellow), and high (red) Details of bias and applicability issues Impact on interpretation of study data In these studies, the reference test usually incorporates RT-PCR (index test). • RT-PCR testing is usually a key component of identifying people with SARS-CoV-2 infection. • Participants will not be detected or included in these studies when SARS-CoV-2 is not present at easily sampled sites and at the time that participants were available for testing. Unclear how many and what severity of participants with SARS-CoV-2 are not included in studies. People who do not have a positive RT-PCR test at some point are excluded. This could lead to overestimation of positivity. Rates of positivity will be inflated as only people with virus accessible for sampling for RT-PCR tests will be included in studies. Most participants are identified or present based on respiratory tract symptoms such as cough or respiratory distress. Unclear how many and what severity of participants with SARS-CoV-2 are not included in studies. • Participants will not be detected or included in these studies when less common symptoms or asymptomatic. • Participants included will be biased to over-represent people with detectable virus in respiratory tract sampling sites and at times frequently used for testing (post symptom onset or at admission to hospital). Studies will inflate positivity for sampling sites that overlap with sampling sites used in RT-PCR reference testing. • For example, we identified 30% of participants with RT positivity but with negative results from faecal sampling. However, if participants had only faecal virus, would they have been included in the studies? Index test: RT-PCR (applicability) • Studies included are likely to use more invasive sampling methods than acceptable in widespread population testing. For example, nasopharyngeal testing is likely in many current studies to be based on long swabs and self test kits. Percentage of people with detectable virus may be overestimated when testing is applied in real-world clinical use and in population testing. • Studies will use experienced staff to obtain samples, handle, process, and conduct tests. • Studies are mostly sampling participants in hospital settings or in specialised research community testing research where sample handling, transport, and storage have been standardised. • Variation in RT-PCR kits is minimised as studies are based in few hospitals or limited to a research setting Reporting of sampling sites and methods is poor. • Poor reporting may have led to less ideal grouping of sampling in analysis. • Some studies are likely to use a variety of nasopharyngeal sampling methods depending on the individual participants, but the type of sampling is typically reported at a study level for a particular sampling site. Percentage of people with detectable virus may be over-or underestimated. Flow and timing Uncertainty and inconsistencies in time of sampling Percentage of people with detectable virus may be over-or underestimated at particular times. • Time of symptom onset can be subjective unless based on fever, but some participants do not have fever. • Time of symptom onset may be different if asked of participants in ICU setting. • Time of hospitalisation and discharge may be affected by function hospitalisation serves in containment of disease lower RT or faecal sampling. Valid estimates are essential for clinicians interpreting RT-PCR results. However, ROB considerations suggest that the positive percentage rates we have estimated may be optimistic, possibly considerably so. Participants can have detectable virus in different body compartments, so virus may not be detected if samples are only taken from a single site. Some hospitals in the UK now routinely take RT-PCR samples from multiple sites, such as the nose and throat. More studies are urgently needed on evolving sampling strategies such as self-collected samples which include saliva and short nasal swabs. Future studies should avoid the risks of bias we have identified by precisely reporting the anatomical sampling sites with a detailed methodology on sample collection. Table 4 details example studies helpful for future study design. Further sharing of IPD will be important, and we would welcome contact from groups with IPD data we can include in ongoing research. RT-PCR misses detection of people with SARS-CoV-2 infection; early sampling minimises false negative diagnoses. Beyond 10 days post-symptom onset, lower RT or faecal testing may be preferred sampling sites. The Table 3 Biases and issues in interpretation (Continued) Details of bias and applicability issues Impact on interpretation of study data spread. In some studies, the hospitals were also quarantine centres, so participants were hospitalised immediately at onset of mild symptoms rather than restricted to patients needing oxygen. Flow and timing Clinical cohort within studies changes across time points. Percentage of people with detectable virus may be overestimated at particular later time points as these correspond to participants who were severely ill. • Participants who have recovered from COVID-19 in most studies are typically not tested after 2 negative tests 24 h apart. • Many studies only test inpatients at the hospital, so the participants sampled between 0 and 14 days typically have less severe disease than those tested longer Flow and timing (selective outcome reporting) Some studies only publish IPD data for a selection of people. Available IPD data may not represent a typical spectrum of participants in the different settings (community setting, hospital, ICU, nursing home, prison). Published data is likely to be biased towards publication of research active groups which may not represent typical real world. Percentage of people with detectable virus may be overestimated. [15] Estimating false-negative detection rate of SARS-CoV-2 by RT-PCR Variation in false-negative rate of reverse transcriptase polymerase chain reactionbased SARS-CoV-2 tests by time since exposure Quantitative detection and viral load analysis of SARS-CoV-2 in infected patients Interpreting diagnostic tests for SARS-CoV-2 Preferred Reporting Items for Systematic Review and Meta-Analyses of individual participant data: the PRISMA-IPD Statement Clinical management of COVID-19. WHO Interim guidance QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies Indirect virus transmission in cluster of COVID-19 cases SARS-CoV-2-positive sputum and feces after conversion of pharyngeal samples in patients with COVID-19 Evaluating the accuracy of different respiratory specimens in the laboratory diagnosis and monitoring the viral shedding of 2019-nCoV infections Clinical progression of patients with COVID-19 in Shanghai Consistent detection of 2019 novel coronavirus in saliva Viral dynamics in mild and severe cases of COVID-19 SARS-CoV-2 viral load in upper respiratory specimens of infected patients Time kinetics of viral clearance and resolution of symptoms in novel coronavirus infection Virological assessment of hospitalized patients with COVID-2019 Evaluation of SARS-CoV-2 RNA shedding in clinical specimens and clinical characteristics of 10 patients with COVID-19 in Macau Detection of 2019 novel coronavirus in semen and testicular biopsy specimen of COVID-19 patients SARS-CoV-2 detection using digital PCR for COVID-19 diagnosis, treatment monitoring and criteria for discharge Evaluation of coronavirus in tears and conjunctival secretions of patients with SARS-CoV-2 infection Caution should be exercised for the detection of SARS-CoV-2, especially in the elderly Profile of RT-PCR for SARS-CoV-2: a preliminary study from 56 COVID-19 patients Factors associated with prolonged viral RNA shedding in patients with coronavirus disease 2019 (COVID-19) Molecular and serological investigation of 2019-nCoV infected patients: implication of multiple shedding routes Epidemiologic features and clinical course of patients infected with SARS-CoV-2 in Singapore Clinical and virological data of the first cases of COVID-19 in Europe: a case series A case series of children with 2019 novel coronavirus infection: clinical and epidemiological features Temporal dynamics in viral shedding and transmissibility of COVID-19 Clinical characteristics of 24 asymptomatic infections with COVID-19 screened among close contacts in Nanjing First 12 patients with coronavirus disease 2019 (COVID-19) in the United States Patients of COVID-19 may benefit from sustained lopinavir-combined regimen and the increase of eosinophil may predict the outcome of COVID-19 progression Clinical features and dynamics of viral load in imported and non-imported patients with COVID-19 Asymptomatic and humanto-human transmission of SARS-CoV-2 in a 2-family cluster Positive rectal swabs in young patients recovered from coronavirus disease 2019 (COVID-19) Characteristics of pediatric SARS-CoV-2 infection and potential evidence for persistent fecal viral shedding Prolonged presence of SARS-CoV-2 viral RNA in faecal samples Viral load dynamics and disease severity in patients infected with SARS-CoV-2 in Zhejiang province, China Suppression of COVID-19 outbreak in the municipality of Vo Saliva is more sensitive for SARS-CoV-2 detection in COVID-19 patients than nasopharyngeal swabs Potential fecal transmission of SARS-CoV-2: current evidence and implications for public health Self-collected oral fluid and nasal swabs demonstrate comparable sensitivity to clinician collected nasopharyngeal swabs for covid-19 detection Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations No specific funding has been received for this research. SM, SAT, and SH receive funding from the National Institute for Health Research (NIHR). SAT is an NIHR senior investigator. SM, SAT, NS, and SH receive funding from the UCL/UCLH Biomedical Research Centre. AJA, SG, KG, JS, and AW are funded by the NIHR Newcastle In Vitro Diagnostics Co-operative. BS is part-funded by NIHR Leeds In Vitro Diagnostic Co-operative. PJT receives funding from the National Institute for Health Research (NIHR) Community Healthcare MedTech and In Vitro Diagnostics Co-operative at Oxford Health NHS Foundation Trust. BDN is an NIHR Academic Clinical Lecturer. Funders had no role in the study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health and Social Care. included studies are open to substantial risk of bias, so the positivity rates are probably overestimated. Supplementary information accompanies this paper at https://doi.org/10. 1186/s12916-020-01810-8.Additional file 1. Including additional tables and figures: search details, QUADAS-2 adaption, anatomical sample size details, risk of bias by article, percentage positive and negative RT-PCR results by sample for days since symptom onset and days since hospitalisation, time to undetectable virus in faecal and respiratory tract. Availability of data and materials The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.Ethics approval and consent to participate Not applicable. Competing interests SM, SH, AJA, SG, KG, SAT, BS, JP, PT, PW, JS, LFR, NR, NS, JS, AW, JA, GB, KG, and BN declare no conflicts of interest for the submitted work. CH has advised Attomarker, a spin-out company of the University of Exeter about the conduct of evaluations of its tests for COVID antibodies, but received no payment for this advice and provided it as part of academic duties at the University of Exeter.