key: cord-0772621-94s289o0 authors: Struyf, Thomas; Deeks, Jonathan J; Dinnes, Jacqueline; Takwoingi, Yemisi; Davenport, Clare; Leeflang, Mariska MG; Spijker, René; Hooft, Lotty; Emperador, Devy; Domen, Julie; Horn, Sebastiaan R A; Van den Bruel, Ann title: Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID‐19 date: 2021-02-23 journal: Cochrane Database Syst Rev DOI: 10.1002/14651858.cd013665.pub2 sha: 3aabd7b75842e38945a42be41939cba2ee6dd5ea doc_id: 772621 cord_uid: 94s289o0 BACKGROUND: The clinical implications of SARS‐CoV‐2 infection are highly variable. Some people with SARS‐CoV‐2 infection remain asymptomatic, whilst the infection can cause mild to moderate COVID‐19 and COVID‐19 pneumonia in others. This can lead to some people requiring intensive care support and, in some cases, to death, especially in older adults. Symptoms such as fever, cough, or loss of smell or taste, and signs such as oxygen saturation are the first and most readily available diagnostic information. Such information could be used to either rule out COVID‐19, or select patients for further testing. This is an update of this review, the first version of which published in July 2020. OBJECTIVES: To assess the diagnostic accuracy of signs and symptoms to determine if a person presenting in primary care or to hospital outpatient settings, such as the emergency department or dedicated COVID‐19 clinics, has COVID‐19. SEARCH METHODS: For this review iteration we undertook electronic searches up to 15 July 2020 in the Cochrane COVID‐19 Study Register and the University of Bern living search database. In addition, we checked repositories of COVID‐19 publications. We did not apply any language restrictions. SELECTION CRITERIA: Studies were eligible if they included patients with clinically suspected COVID‐19, or if they recruited known cases with COVID‐19 and controls without COVID‐19. Studies were eligible when they recruited patients presenting to primary care or hospital outpatient settings. Studies in hospitalised patients were only included if symptoms and signs were recorded on admission or at presentation. Studies including patients who contracted SARS‐CoV‐2 infection while admitted to hospital were not eligible. The minimum eligible sample size of studies was 10 participants. All signs and symptoms were eligible for this review, including individual signs and symptoms or combinations. We accepted a range of reference standards. DATA COLLECTION AND ANALYSIS: Pairs of review authors independently selected all studies, at both title and abstract stage and full‐text stage. They resolved any disagreements by discussion with a third review author. Two review authors independently extracted data and resolved disagreements by discussion with a third review author. Two review authors independently assessed risk of bias using the Quality Assessment tool for Diagnostic Accuracy Studies (QUADAS‐2) checklist. We presented sensitivity and specificity in paired forest plots, in receiver operating characteristic space and in dumbbell plots. We estimated summary parameters using a bivariate random‐effects meta‐analysis whenever five or more primary studies were available, and whenever heterogeneity across studies was deemed acceptable. MAIN RESULTS: We identified 44 studies including 26,884 participants in total. Prevalence of COVID‐19 varied from 3% to 71% with a median of 21%. There were three studies from primary care settings (1824 participants), nine studies from outpatient testing centres (10,717 participants), 12 studies performed in hospital outpatient wards (5061 participants), seven studies in hospitalised patients (1048 participants), 10 studies in the emergency department (3173 participants), and three studies in which the setting was not specified (5061 participants). The studies did not clearly distinguish mild from severe COVID‐19, so we present the results for all disease severities together. Fifteen studies had a high risk of bias for selection of participants because inclusion in the studies depended on the applicable testing and referral protocols, which included many of the signs and symptoms under study in this review. This may have especially influenced the sensitivity of those features used in referral protocols, such as fever and cough. Five studies only included participants with pneumonia on imaging, suggesting that this is a highly selected population. In an additional 12 studies, we were unable to assess the risk for selection bias. This makes it very difficult to judge the validity of the diagnostic accuracy of the signs and symptoms from these included studies. The applicability of the results of this review update improved in comparison with the original review. A greater proportion of studies included participants who presented to outpatient settings, which is where the majority of clinical assessments for COVID‐19 take place. However, still none of the studies presented any data on children separately, and only one focused specifically on older adults. We found data on 84 signs and symptoms. Results were highly variable across studies. Most had very low sensitivity and high specificity. Only cough (25 studies) and fever (7 studies) had a pooled sensitivity of at least 50% but specificities were moderate to low. Cough had a sensitivity of 67.4% (95% confidence interval (CI) 59.8% to 74.1%) and specificity of 35.0% (95% CI 28.7% to 41.9%). Fever had a sensitivity of 53.8% (95% CI 35.0% to 71.7%) and a specificity of 67.4% (95% CI 53.3% to 78.9%). The pooled positive likelihood ratio of cough was only 1.04 (95% CI 0.97 to 1.11) and that of fever 1.65 (95% CI 1.41 to 1.93). Anosmia alone (11 studies), ageusia alone (6 studies), and anosmia or ageusia (6 studies) had sensitivities below 50% but specificities over 90%. Anosmia had a pooled sensitivity of 28.0% (95% CI 17.7% to 41.3%) and a specificity of 93.4% (95% CI 88.3% to 96.4%). Ageusia had a pooled sensitivity of 24.8% (95% CI 12.4% to 43.5%) and a specificity of 91.4% (95% CI 81.3% to 96.3%). Anosmia or ageusia had a pooled sensitivity of 41.0% (95% CI 27.0% to 56.6%) and a specificity of 90.5% (95% CI 81.2% to 95.4%). The pooled positive likelihood ratios of anosmia alone and anosmia or ageusia were 4.25 (95% CI 3.17 to 5.71) and 4.31 (95% CI 3.00 to 6.18) respectively, which is just below our arbitrary definition of a 'red flag', that is, a positive likelihood ratio of at least 5. The pooled positive likelihood ratio of ageusia alone was only 2.88 (95% CI 2.02 to 4.09). Only two studies assessed combinations of different signs and symptoms, mostly combining fever and cough with other symptoms. These combinations had a specificity above 80%, but at the cost of very low sensitivity (< 30%). AUTHORS' CONCLUSIONS: The majority of individual signs and symptoms included in this review appear to have very poor diagnostic accuracy, although this should be interpreted in the context of selection bias and heterogeneity between studies. Based on currently available data, neither absence nor presence of signs or symptoms are accurate enough to rule in or rule out COVID‐19. The presence of anosmia or ageusia may be useful as a red flag for COVID‐19. The presence of fever or cough, given their high sensitivities, may also be useful to identify people for further testing. Prospective studies in an unselected population presenting to primary care or hospital outpatient settings, examining combinations of signs and symptoms to evaluate the syndromic presentation of COVID‐19, are still urgently needed. Results from such studies could inform subsequent management decisions. The clinical implications of SARS-CoV-2 infection are highly variable. Some people with SARS-CoV-2 infection remain asymptomatic, whilst the infection can cause mild to moderate COVID-19 and COVID-19 pneumonia in others. This can lead to some people requiring intensive care support and, in some cases, to death, especially in older adults. Symptoms such as fever, cough, or loss of smell or taste, and signs such as oxygen saturation are the first and most readily available diagnostic information. Such information could be used to either rule out COVID-19, or select patients for further testing. This is an update of this review, the first version of which published in July 2020. To assess the diagnostic accuracy of signs and symptoms to determine if a person presenting in primary care or to hospital outpatient settings, such as the emergency department or dedicated COVID-19 clinics, has COVID-19. For this review iteration we undertook electronic searches up to 15 July 2020 in the Cochrane COVID-19 Study Register and the University of Bern living search database. In addition, we checked repositories of COVID-19 publications. We did not apply any language restrictions. Trusted evidence. Informed decisions. Better health. Studies were eligible if they included patients with clinically suspected COVID-19, or if they recruited known cases with COVID-19 and controls without COVID-19. Studies were eligible when they recruited patients presenting to primary care or hospital outpatient settings. Studies in hospitalised patients were only included if symptoms and signs were recorded on admission or at presentation. Studies including patients who contracted SARS-CoV-2 infection while admitted to hospital were not eligible. The minimum eligible sample size of studies was 10 participants. All signs and symptoms were eligible for this review, including individual signs and symptoms or combinations. We accepted a range of reference standards. Pairs of review authors independently selected all studies, at both title and abstract stage and full-text stage. They resolved any disagreements by discussion with a third review author. Two review authors independently extracted data and resolved disagreements by discussion with a third review author. Two review authors independently assessed risk of bias using the Quality Assessment tool for Diagnostic Accuracy Studies (QUADAS-2) checklist. We presented sensitivity and specificity in paired forest plots, in receiver operating characteristic space and in dumbbell plots. We estimated summary parameters using a bivariate random-e ects meta-analysis whenever five or more primary studies were available, and whenever heterogeneity across studies was deemed acceptable. We identified 44 studies including 26,884 participants in total. Prevalence of COVID-19 varied from 3% to 71% with a median of 21%. There were three studies from primary care settings (1824 participants), nine studies from outpatient testing centres (10,717 participants), 12 studies performed in hospital outpatient wards (5061 participants), seven studies in hospitalised patients (1048 participants), 10 studies in the emergency department (3173 participants), and three studies in which the setting was not specified (5061 participants). The studies did not clearly distinguish mild from severe COVID-19, so we present the results for all disease severities together. Fi een studies had a high risk of bias for selection of participants because inclusion in the studies depended on the applicable testing and referral protocols, which included many of the signs and symptoms under study in this review. This may have especially influenced the sensitivity of those features used in referral protocols, such as fever and cough. Five studies only included participants with pneumonia on imaging, suggesting that this is a highly selected population. In an additional 12 studies, we were unable to assess the risk for selection bias. This makes it very di icult to judge the validity of the diagnostic accuracy of the signs and symptoms from these included studies. The applicability of the results of this review update improved in comparison with the original review. A greater proportion of studies included participants who presented to outpatient settings, which is where the majority of clinical assessments for COVID-19 take place. However, still none of the studies presented any data on children separately, and only one focused specifically on older adults. We found data on 84 signs and symptoms. Results were highly variable across studies. Most had very low sensitivity and high specificity. Only cough (25 studies) and fever (7 studies) had a pooled sensitivity of at least 50% but specificities were moderate to low. Cough had a sensitivity of 67.4% (95% confidence interval (CI) 59.8% to 74.1%) and specificity of 35.0% (95% CI 28.7% to 41.9%). Fever had a sensitivity of 53.8% (95% CI 35.0% to 71.7%) and a specificity of 67.4% (95% CI 53.3% to 78.9%). The pooled positive likelihood ratio of cough was only 1.04 (95% CI 0.97 to 1.11) and that of fever 1.65 (95% CI 1.41 to 1.93). Anosmia alone (11 studies), ageusia alone (6 studies), and anosmia or ageusia (6 studies) had sensitivities below 50% but specificities over 90%. Anosmia had a pooled sensitivity of 28.0% (95% CI 17.7% to 41.3%) and a specificity of 93.4% (95% CI 88.3% to 96.4%). Ageusia had a pooled sensitivity of 24.8% (95% CI 12.4% to 43.5%) and a specificity of 91.4% (95% CI 81.3% to 96.3%). Anosmia or ageusia had a pooled sensitivity of 41.0% (95% CI 27.0% to 56.6%) and a specificity of 90.5% (95% CI 81.2% to 95.4%). The pooled positive likelihood ratios of anosmia alone and anosmia or ageusia were 4.25 (95% CI 3.17 to 5.71) and 4.31 (95% CI 3.00 to 6.18) respectively, which is just below our arbitrary definition of a 'red flag', that is, a positive likelihood ratio of at least 5. The pooled positive likelihood ratio of ageusia alone was only 2.88 (95% CI 2.02 to 4.09). Only two studies assessed combinations of di erent signs and symptoms, mostly combining fever and cough with other symptoms. These combinations had a specificity above 80%, but at the cost of very low sensitivity (< 30%). The majority of individual signs and symptoms included in this review appear to have very poor diagnostic accuracy, although this should be interpreted in the context of selection bias and heterogeneity between studies. Based on currently available data, neither absence nor presence of signs or symptoms are accurate enough to rule in or rule out COVID-19. The presence of anosmia or ageusia may be useful as a red flag for COVID-19. The presence of fever or cough, given their high sensitivities, may also be useful to identify people for further testing. Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews COVID-19 a ects many organs of the body, so people with COVID-19 may have a wide spectrum of symptoms. Symptoms and signs of the illness may be important to help them and the healthcare sta they come into contact with know whether they have the disease. Symptoms: people with mild COVID-19 might experience cough, sore throat, high temperature, diarrhoea, headache, muscle or joint pain, fatigue, and loss or disturbance of sense of smell and taste. Signs are obtained by clinical examination. Signs of COVID-19 examined in this review include lung sounds, blood pressure, blood oxygen level and heart rate. O en, people with mild symptoms consult their doctor (general practitioner). People with more severe symptoms might visit a hospital outpatient or emergency department. Depending on the results of a clinical examination, patients may be sent home to isolate, may receive further tests or be hospitalised. Accurate diagnosis ensures that people take measures to avoid transmitting the disease and receive appropriate care. This is important for individuals as it reduces harm and it saves time and resources. We wanted to know how accurate diagnosis of COVID-19 is in a primary care or hospital setting, based on symptoms and signs from medical examination. We searched for studies that assessed the accuracy of symptoms and signs to diagnose COVID-19. Studies had to be conducted in primary care or hospital outpatient settings only. Studies of people in hospital were only included if symptoms and signs were recorded when they were admitted to the hospital. We found 44 relevant studies with 26,884 participants. The studies assessed 84 separate signs and symptoms, and some assessed combinations of signs and symptoms. Three studies were conducted in primary care (1824 participants), nine in specialist COVID-19 testing clinics (10,717 participants), 12 studies in hospital outpatient settings (5061 participants), seven studies in hospitalised patients (1048 participants), 10 studies in the emergency department (3173 participants), and in three studies the setting was not specified (5061 participants). No studies focused specifically on children, and only one focused on older adults. The studies did not clearly distinguish between mild and severe COVID-19, so we present the results for mild, moderate and severe disease together. The symptoms most frequently studied were cough and fever. In our studies, on average 21% of the participants had COVID-19, which means in a group of 1000 people, around 210 would have COVID-19. Cochrane Database of Systematic Reviews The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and resulting COVID-19 pandemic present important diagnostic evaluation challenges. These range from, on the one hand, understanding the value of signs and symptoms in predicting possible infection, assessing whether existing biochemical and imaging tests can identify infection and recognise patients needing critical care, and on the other hand, evaluating whether new diagnostic tests can allow accurate rapid and point-of-care testing. Also, the diagnostic aims are diverse, including identifying current infection, ruling out infection, identifying people in need of care escalation, or testing for past infection and immunity. This review is part of a suite of reviews on the diagnosis of SARS-CoV-2 infection and COVID-19 disease, and deals solely with the diagnostic accuracy of presenting clinical signs and symptoms. COVID-19 is the disease caused by infection with the SARS-CoV-2 virus. The key target conditions for this suite of reviews are current SARS-CoV-2 infection, current COVID-19, and past SARS-CoV-2 infection. For current infection, the severity of the disease is important. SARS-CoV-2 infection can be asymptomatic (no symptoms); mild or moderate (symptoms such as fever, cough, aches, lethargy but without di iculty breathing at rest); severe (symptoms with breathlessness and increased respiratory rate indicative of pneumonia and oxygen need); or critical (requiring intensive support due to severe acute respiratory syndrome (SARS) or acute respiratory distress syndrome (ARDS), shock or other organ dysfunction). People with severe or critical disease require di erent patient management, which makes it important to distinguish between them. Thus, there are three target conditions for current infection: • SARS-CoV-2 infection (asymptomatic or symptomatic of any severity); • mild or moderate COVID-19; • severe or critical COVID-19. In planning review updates, we will consider the potential addition of another grouping (which is a subset of the above): • whether tests exist that identify people requiring respiratory support (SARS or ARDS) or intensive care. Here we summarise the evidence on signs and symptoms; as a result asymptomatic SARS-CoV-2 and past SARS-CoV-2 infection are out of scope for this review. Signs and symptoms are used in the initial diagnosis of suspected COVID-19, and to identify people with COVID-19 pneumonia. Symptoms are what is experienced by patients, for example, cough or nausea. Signs are what can be evaluated by clinical assessment, for example, lung auscultation findings, blood pressure or heart rate. Key symptoms that have been associated with mild to moderate COVID-19 include: troublesome dry cough (for example, coughing more than usual over a one-hour period, or three or more coughing episodes in 24 hours), fever greater than 37.8 °C, diarrhoea, headache, breathlessness on light exertion, muscle pain, fatigue, and loss of sense of smell and taste. Red flags indicating possible severe disease or pneumonia include breathlessness at rest, loss of appetite, confusion, pain or pressure in the chest, and temperature above 38 °C. Important in the context of COVID-19 is that the pathway is multifaceted because it is designed to care for the diseased individual and to protect the community from further spread. Decisions about patient and isolation pathways for COVID-19 vary according to health services and settings, available resources, and stages of the epidemic. They will change over time, if and when e ective treatments and vaccines are identified. The decision points between these pathways vary, but all include points at which knowledge of the accuracy of diagnostic information is needed to be able to inform rational decision making. In this review on signs and symptoms, no prior tests are required because signs and symptoms are used in the initial diagnosis of suspected COVID-19. Patients can, however, self-assess before presenting to healthcare services based on their symptoms. This is in contrast to contact tracing, in which patients or participants are tested based on a documented contact with a SARS-CoV-2-positive person and may themselves be asymptomatic. Signs and symptoms are used as triage tests, that is, to rule out COVID-19, but also to identify patients with possible COVID-19 who may require further testing, care escalation or isolation. Other Cochrane diagnostic test accuracy (DTA) reviews in the suite of reviews are addressing the following tests. • Chest imaging (computed tomography (CT), chest X-ray and ultrasound; Islam 2020) • Routine laboratory testing, such as for C-reactive protein (CRP) and procalcitonin (PCT) (Stegeman 2020) • Antibody tests (Deeks 2020a) • Laboratory-independent point-of-care and near-patient molecular and antigen tests (Dinnes 2020) • Molecular laboratory tests (in preparation) It is essential to understand the accuracy of diagnostic tests including signs and symptoms to identify the best way they can be used in di erent settings to develop e ective diagnostic and management pathways. We are producing a suite of Cochrane 'living systematic reviews', which will summarise evidence on the clinical accuracy of di erent tests and diagnostic features, grouped according to present research questions and settings, in the diagnosis of SARS-CoV-2 infection and COVID-19 disease. Summary estimates of accuracy from these reviews will help Cochrane Database of Systematic Reviews inform diagnostic, screening, isolation, and patient management decisions. New tests are being developed and evidence is emerging at an unprecedented rate during the COVID-19 pandemic. We will aim to update these reviews as o en as is feasible to ensure that they provide the most up-to-date evidence about test accuracy. These reviews are being produced rapidly to assist in providing a central resource of evidence to assist in the COVID-19 pandemic, summarising available evidence on the accuracy of the tests and presenting characteristics. To assess the diagnostic accuracy of signs and symptoms to determine if a person presenting in primary care or to hospital outpatient settings, such as the emergency department or dedicated COVID-19 clinics, has COVID-19. Where data are available, we will investigate diagnostic accuracy (either by stratified analysis or meta-regression) according to: • days since symptom onset; • population (children; older adults); • reference standard; • study design; and • setting. In our initial review, we found 16 relevant studies with 7706 participants. The median number of participants was 134. Prevalence of the target disease varied from 5% to 38% with a median of 17%. The studies assessed 27 separate signs and symptoms, but none assessed combinations of signs and symptoms. Seven were set in hospital outpatient clinics (2172 participants), four in emergency departments (1401 participants), but none in primary care settings. No studies included children, and only one focused on older adults. All the studies confirmed COVID-19 diagnosis by the most accurate test available, which was reverse transcription polymerase chain reaction (RT-PCR). The studies did not clearly distinguish mild to moderate COVID-19 from severe to critical COVID-19, so we presented the results for all severities together. The results indicated that at least half of participants with COVID-19 had a cough, sore throat, high temperature, muscle or joint pain, fatigue, or headache. However, cough and sore throat were also common in people without COVID-19, so these symptoms alone are less helpful for diagnosing COVID-19. High temperature, muscle or joint pain, fatigue, and headache substantially increase the likelihood of COVID-19 when they are present. Signs and symptoms for which sensitivity was reported above 50% in at least one study were the following: • Cough: sensitivity between 43% to 71%, specificity between 14% to 54% • Fever: sensitivity between 7% to 91%, specificity between 16% to 94% • Sore throat: sensitivity between 5% to 71%, specificity between 55% to 80% • Myalgia or arthralgia: sensitivity between 19% to 86%, specificity between 45% to 91% • Fatigue: sensitivity between 10% to 57%, specificity between 60% to 94% • Headache: sensitivity between 3% to 71%, specificity between 78% to 98% All other signs and symptoms appeared to have very low sensitivities but high specificities, making them unsuitable for diagnosis individually. We concluded that the diagnostic accuracy, especially the sensitivity, of individual signs and symptoms is low. In addition, results were highly variable across studies, making it di icult to draw firm conclusions. We retrieved 28 more studies on signs and symptoms in suspected COVID-19 patients, allowing pooling of the data for some features and estimation of summary measures of diagnostic accuracy. Moreover, this update contains new studies on the diagnostic value of olfactory symptoms, and includes a limited number of studies on combinations of symptoms. The main weakness of the initial review was the high risk of selection bias; many studies included patients who had already been admitted to hospital or who presented to hospital settings to seek treatment. The lack of data on combinations of signs and symptoms was an important evidence gap. Consequently, there was no evidence on syndromic presentation and the value of composite signs and symptoms on the diagnostic accuracy measures. Our search did not find any articles providing data on children. Children have been disproportionally underrepresented in the studies on diagnosing SARS-CoV-2 infection. Their absence seems related to the general mild presentation of the disease in the paediatric population and even more frequently the complete asymptomatic course. The full scope of disease presentation in children is however not known. Misclassification of children both at their presentation to the healthcare system and in the near future, where children will be asked to remain in quarantine when they present with predefined, but not yet evidence-based symptoms needs to be avoided to decrease the possible damage done to children's health. Another important patient group is older adults. They are most at risk of a negative outcome of SARS-CoV-2 infection, especially mortality but also intensive care support. In the initial version of the review, only one study focused on adults aged 55 to 75 years. All other studies included adults of all ages and did not present results separately for the older age groups. The lack of a solid evidence base for the diagnosis of COVID-19 in older adults adds to the di iculty in diagnosing serious infections in this age group, Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews as other serious infections such as bacterial pneumonia or urinary sepsis also tend to lead to aspecific presentations. We included studies of all designs that produce estimates of test accuracy or provide data from which estimates can be computed. We included both single-gate (studies that recruit from a patient pathway before disease status has been ascertained, crosssectional studies) and multi-gate (where people with and without the target condition are recruited separately) designs. When interpreting the results we made sure that we carefully considered the limitations of di erent study designs, using quality assessment and analysis. Studies had to have a sample size of a minimum of 10 participants. Studies recruiting people presenting with a clinical suspicion of SARS-CoV-2 infection, based on a symptomatic presentation, were eligible. At least 50% of the study population had to present with COVID-19-compatible symptoms. We kept the eligibility criteria purposely broad to include all patient groups and all variations of a test at this initial stage of reviewing the evidence (that is, if the patient population was unclear, we included the study). • All signs and symptoms, including: * signs such as oxygen saturation, measured by oximetry and blood pressure; * symptoms, such as fever or cough. • We included combinations of signs and symptoms, but not when they were combined with laboratory, imaging, or other types of index tests as these will be covered in the other reviews. To be eligible studies had to identify at least one of: • mild or moderate COVID-19; • severe or critical COVID-19 (including COVID-19 pneumonia). Asymptomatic infection with SARS-CoV-2 is out of scope for this review, considering it is by definition not possible to detect this based on signs and symptoms. We anticipated that studies would use a range of reference standards. Although RT-PCR is considered the best available test, due to rapidly evolving knowledge about the target conditions, multiple reference standards on their own as well as in combination have emerged. We expected to encounter cases defined by: • RT-PCR alone; • RT-PCR, clinical expertise, and imaging (for example, CT thorax); • repeated RT-PCR several days apart or from di erent samples; • plaque reduction neutralisation test (PRNT) or enzyme-linked immunosorbent assay(ELISA) tests; • information available at a subsequent time point; • World Health Organization (WHO) and other case definitions (see Appendix 1). This list is not exhaustive, and we recorded all reference standards encountered. With a group of methodological and clinical experts, we are producing a ranking of reference standards according to their ability to correctly classify participants using a consensus process. The final search date for this version of the review is 15 July 2020. We conducted a single literature search to cover our suite of Cochrane COVID-19 DTA reviews (Deeks 2020b; McInnes 2020). We used three di erent sources for our electronic searches to 15 July 2020, which were devised with the help of an experienced Cochrane Information Specialist with DTA expertise (RS). These searches aimed to identify all articles related to COVID-19 and SARS-CoV-2 and were not restricted to those evaluating symptoms and signs. Thus, the searches used no terms that specifically focused on an index test, diagnostic accuracy or study methodology. Due to the increased volume of published and preprint articles, we used artificial intelligence text analysis from 25 May 2020 and onwards to conduct an initial classification of documents, based on their title and abstract information, for relevant and irrelevant documents. See Appendix 2. We also included searches undertaken by Cochrane to develop the Cochrane COVID-19 Study Register (covid-19.cochrane.org). These include searches of trials registers at US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov and the World Health Organization International Clinical Trials Registry Platform (apps.who.int/trialsearch), as well as PubMed. Search strategies were designed for maximum sensitivity, to retrieve all human studies on COVID-19 and with no language limits. See Appendix 3. Cochrane Database of Systematic Reviews number of iterations since the end of March 2020 and we anticipate moving back to the Cochrane COVID-19 Study Register as the primary source of records for subsequent review updates. We included Embase records within the CDC library on COVID-19 Research Articles Database (see Appendix 5 for details), and deduplicated these against the Cochrane COVID-19 Study Register. We also checked our search results against two additional repositories of COVID-19 publications including: • Both of these repositories allow their contents to be filtered according to studies potentially relating to diagnosis, and both have agreed to provide us with updates of new diagnosis studies added. For this iteration of the review, we examined all diagnosis studies from both sources up to 15 July 2020. We did not apply any language restrictions. Pairs of review authors independently screened studies. We resolved disagreements by discussion with a third, experienced review author for initial title and abstract screening, and through discussion between three review authors for eligibility assessments. Pairs of review authors independently performed data extraction. We resolved disagreements by discussion between three review authors. We contacted study authors where we needed to clarify details or obtain missing information. Pairs of review authors independently assessed risk of bias and applicability concerns using the QUADAS-2 (Quality Assessment tool for Diagnostic Accuracy Studies) checklist, which was common to the suite of reviews but tailored to each particular review (Whiting 2011; Table 1 ). For this review, we excluded the questions on the nature of the samples as these were not relevant, and we added a question on who assessed the signs. We resolved disagreements by discussion between three review authors. We present results of estimated sensitivity and specificity using paired forest plots and summarised them in tables as appropriate. We estimated summary sensitivity and specificity using a bivariate random-e ects meta-analysis (Macaskill 2013), whenever five or more primary studies were available, and whenever heterogeneity across studies was deemed acceptable on visual inspection of the forest-and receiver operating characteristic (ROC) plots. We performed these analyses using data from studies with a crosssectional design only. We presented results of estimated sensitivity and specificity using paired forest plots in Review Manager 5 (Review Manager 2020), and tables as appropriate. We considered tests to be useful in ruling out a serious infection in ambulatory care if their negative likelihood ratio (LR-) was lower than 0.20; conversely we considered diagnostic tests to be useful as 'red flags' for infections when their positive likelihood ratio (LR +) was 5.0 or higher (Jaeschke 1994, Van den Bruel 2010). We disaggregated data by study design, reporting results from cross-sectional studies separately from studies that used a multigate or other design that were assessed as prone to high risk of bias. We undertook meta-analyses in R version 3.5.1 (lme4 package; R 2020). We have listed sources of heterogeneity that we investigated if adequate data were available in the Secondary objectives. In this version of the review, we used stratification to investigate heterogeneity as we considered it was inappropriate to combine studies. In future updates, if meta-analysis becomes possible, we will investigate heterogeneity through meta-regression. In this version of the review we have stratified by study design only, as stratification by reference standard was not yet possible. We aimed to undertake sensitivity analyses considering the impact of unpublished studies. However, this was not possible in this version of the review. We performed sensitivity analyses to investigate the impact of prospective versus retrospective data collection. We aimed to publish lists of studies that we know exist but for which we have not managed to locate reports, and request information to include in updates of these reviews. However, at the time of writing this version of the review, we are unaware of unpublished studies. We have listed our key findings in a 'Summary of findings' table to determine the strength of evidence for each test and findings, and to highlight important gaps in the evidence. We will undertake monthly searches of published literature and preprints and, dependent on the number of new and important studies that we find, we will consider updating each review with each search if resources allow. Cochrane Database of Systematic Reviews The first selection resulted in 7394 potentially eligible articles. This included the 658 articles that we screened in our initial review. A er screening on title and abstract, we excluded 7092 articles, leaving 302 full-text articles to be assessed. We included 44 articles in this version of the review, 16 of which were included in the initial review. The reasons for excluding 258 articles are listed in the flow chart ( Figure 1 ; Moher 2009). Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Two articles reported on the same cases (Chen 2020; Yang 2020), while using a di erent control group. Chen 2020 used a concurrent control group of pneumonia cases negative for SARS-CoV-2 on PCR testing but Yang 2020 used a historic control group of influenza pneumonia patients. For this reason we only included the Chen 2020 results in the analyses. One study (Song 2020a), reported a study that included a derivation and validation part for the development of a prediction rule. The two parts are identical in set-up and only di er in respect to the time of data collection, that is, the derivation part recruited patients up to 5 February 2020 and the validation part recruited patients from 6 February 2020 onwards. As a result, we consider this to be one study and have entered all data on signs and symptoms as such. A summary of the main study characteristics can be found in Table 2 . The results of the quality assessment are summarised in Figure 2 and Figure 3 . Of the 44 studies included in this review, six studies did not use a cross-sectional design. Four studies were casecontrol studies (Carignan 2020; Nobel 2020; Yang 2020; Zhao 2020), one study selected cases cross-sectionally in five hospitals but only selected controls in one hospital (Chen 2020), and one study emailed patients who had undergone testing for SARS-CoV-2 about olfactory symptoms prior to the SARS-CoV-2 test, with a response rate of 58% in SARS-CoV-2 positive cases and 15% in negative cases (Yan 2020). Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Cochrane Database of Systematic Reviews Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Cochrane Database of Systematic Reviews We rated patient selection as high risk of bias in 15 out of 44 studies. In five studies (Ai 2020; Chen 2020; Cheng 2020; Liang 2020; Yang 2020) this was because a CT scan or other imaging was used to diagnose patients with pneumonia prior to inclusion in the study. RT-PCR results were then used to distinguish between COVID-19 pneumonia and pneumonia from other causes. For all studies, testing was highly dependent on the local case definition and testing criteria that was in e ect at the time of the study, meaning all patients that were included in studies had already gone through a referral or selection filter. The most extreme example of this is Liang 2020, in which patients with radiological evidence of pneumonia and a clinical presentation compatible with COVID-19 were only tested for SARS-CoV-2 a er a panel discussion. We rated all studies except four as high risk of bias for the index tests because there was little to no detail on how, by whom and when the signs and symptoms were measured. Table 3 describes how studies measured olfactory symptoms. Studies collected information about symptoms in di erent ways: interviews by telephone or in person using standardised questionnaires, online surveys, self-reporting at presentation, or systematic assessment by sta at enrolment without standardisation. Unfortunately, the standardised questionnaires themselves are rarely reported, and are o en newly developed by each research team. In addition, there was considerable uncertainty around the reference standard, with some studies providing little detail on the RT-PCR tests that were used or lack of clarity on blinding. Patient flow was unclear in 12 studies (Ahmed 2020; Mao 2020; Pisapia 2020; Tordjman 2020; Yan 2020; Yang 2020; Yombi 2020; Zayet 2020a; Zayet 2020b; Zhao 2020; Zhu 2020; Zimmerman 2020), either because the timing of recording signs and symptoms and conduct of the reference standard was unclear, or because some patients received a second or third reference standard at unclear time points during hospital admission, or because participant records were deleted when they contained missing data. Trusted evidence. Informed decisions. Better health. The main characteristics of all included studies are listed in Table 2 . There were seven studies in hospital inpatients (Ai 2020; Chen 2020; Huang 2020; Xie 2020; Yang 2020; Zayet 2020a; Zhao 2020), twelve studies in hospital outpatients (Carignan 2020; Cheng 2020; Liang 2020; Mao 2020; Nobel 2020; Peng 2020; Song 2020a; Sun 2020; Wei 2020; Yan 2020; Zavascki 2020; Zayet 2020b), ten studies in emergency departments (EDs) (Feng 2020; Chua 2020; O'Reilly 2020; Peyrony 2020; Pisapia 2020; Shah 2020; Tolia 2020; Tordjman 2020; Wee 2020; Zhu 2020), three studies in primary care settings (Brotons 2020; Just 2020; Tudrej 2020), and nine studies in other outpatient settings such as drive-through testing sites (Ahmed 2020; Challener 2020; Clemency 2020; Gilbert 2020; Haehner 2020; Haehner 2020; Lee 2020; Salmon 2020; Trubiano 2020). Three studies did not specify setting (Rentsch 2020; Yombi 2020; Zimmerman 2020). Nine studies assessed accuracy of signs and symptoms for the diagnosis of COVID-19 pneumonia (Ai 2020; Chen 2020; Cheng 2020; Feng 2020; Liang 2020; Tordjman 2020; Xie 2020; Yang 2020; Zhao 2020), the remaining studies had SARS-CoV-2 infection as the target condition. The distinction between these two target conditions was not always very clear though, and a degree of overlap is to be assumed. All but one study used RT-PCR testing as reference standard (Brotons 2020), with some variation in the samples that were used. Brotons 2020 used positive serology for SARS-CoV-2 (IgM and/or IgG) at the time of presentation and presence of symptoms and signs in the previous month as a reference standard. There were 26,884 participants included in all studies, the median number of participants was 345. Prevalence varied from 3% to 71% with a median of 21% (cross-sectional studies). We found data on 84 signs and symptoms, which fall into six di erent categories, that is, upper respiratory, lower respiratory, systemic, gastro-intestinal, cardiovascular and olfactory signs and symptoms. Results for the singe-gate (cross-sectional) studies are presented in forest plots ( Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews Cochrane Database of Systematic Reviews Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Cochrane Database of Systematic Reviews Only two studies (Gilbert 2020; Yombi 2020), assessed combinations of di erent signs and symptoms. Gilbert 2020 investigated six combinations of two to four symptoms and signs each, while Yombi 2020 investigated three combinations of two to three symptoms each. Most of the combinations included fever and cough, on which both studies had preselected their participants. These combinations led to specificities above 80%, but at the cost of low sensitivities (< 30%). Positivity rates of symptoms and signs depend on prevalence and population characteristics, especially pre-selection. As a result, positivity rates were highly variable. In studies with prevalence less than 5%, suggesting little pre-selection had taken place, positivity rates for fever (presence of the symptom in the study population) were between 9% and 41% (11.7% average), for cough between 45% and 70% (68% average), for anosmia between 2.5% and 2.6% (2.5% average), for ageusia (1 study) 2.8%, and for anosmia or ageusia (1 study) 4.3%. Signs and symptoms for which sensitivity was reported above 50% in at least one cross-sectional study are summarised below. We were able to conduct meta-analyses for 14 signs or symptoms (cough, fever, anosmia, ageusia, anosmia or ageusia, sore throat, myalgia, fatigue, headache, dyspnoea, diarrhoea, sputum production, nausea or vomiting, chest tightness) based on clinically acceptable heterogeneity, the scatter of studies on visual inspection of the forest plots, and for which at least five studies were available. The analyses were restricted to cross-sectional studies only. The ranges and summary estimates of the sensitivity and specificity of the 14 index tests are listed below. Additional summary point statistics are listed in additional Table 4 . • Sensitivity ranged from 16% to 89%; specificity from 11% to 79% Cough and fever (see sensitivity analyses) were the only index tests with a pooled sensitivity above 50% but their pooled specificity was only 35.5% and 67.4% respectively ( Figure 20 ; Figure 15 ). Pooled specificity was above 90% for diarrhoea, nausea or vomiting, chest tightness, anosmia, ageusia, and for the presence of anosmia or ageusia ( Figure 16 ; Figure 19 ). However, their pooled sensitivity was very low (maximum 11.6% for diarrhoea), except for anosmia (28.0%) and anosmia or ageusia (41.0%). The only tests exceeding a pooled diagnostic odds ratio (DOR) of 5 were anosmia as a single test or in combination with ageusia (anosmia or ageusia). Yet, their pooled positive likelihood ratio (LR +) was below our predefined cut-o of 5 for a useful red flag (4.25 (95% CI 3.17 to 5.71) and 4.31 (95% CI 3.00 to 6.18), respectively). The pooled negative likelihood ratios (LRs-) were too high to make any of the reported tests useful to rule out the presence of COVID-19 disease. In other words, the absence of the above mentioned index tests does not necessarily imply the absence of COVID-19 disease. • Rhinorrhoea (5 studies, 2252 participants): sensitivity between 4% to 62%, specificity between 37% to 93% • Chills (6 studies, 4151 participants): sensitivity between 4% to 80%, specificity between 36% to 93% • Myalgia or arthralgia (5 studies, 556 participants): sensitivity between 19% to 86%, specificity between 35% to 91% • Anosmia or dysgeusia (2 studies, 457 participants): sensitivity between 9% to 74%, specificity between 78% to 97% In sensitivity analyses, we excluded studies that did not use a prospective study design (20 out of 32 cross-sectional studies excluded). The results show that the pooled diagnostic accuracy estimates were not substantially di erent from the overall result (Table 4 ). In these sensitivity analyses, the scatter of studies on visual inspection of the forest plots appeared to decrease for fever and we decided to add a meta-analysis for fever using prospective studies only. The pooled sensitivity and specificity of fever in prospective studies was 53.8% and 67.4% respectively Figure 15 . This is the highest observed combination of both sensitivity and specificity for a symptom or sign, but the LR+ is still only 1.65 (95% CI 1.41 to 1.93). To further illustrate a test's ability to either rule in or rule out COVID-19, we constructed dumbbell plots showing pre-and posttest probabilities for each olfactory symptom, fever and cough in each cross-sectional study ( Figure 28 ; Figure 29 ; Figure 30 ). For each test, we have plotted the pre-test probability, which is the prevalence of COVID-19 in the study (blue dot). The probability of having COVID-19 a er testing (post-test probability) then changes in four studies (Brotons 2020; Leal 2020; Tudrej 2020; Zayet 2020b), and in the seven other studies there is not much di erence between pre-and post-test probability (Chua 2020; Haehner 2020; Just 2020; Peyrony 2020; Salmon 2020; Tordjman 2020; Trubiano 2020). Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews The majority of individual signs and symptoms included in this review appear to have very poor diagnostic accuracy, although this should be interpreted in the context of selection bias and heterogeneity between studies. Based on currently available data, neither absence nor presence of a single sign or symptom are accurate enough to rule in or rule out COVID-19. However, some combinations of signs and symptoms may be useful as a tool to triage patients for further testing. For example, combining the tests with the highest positive likelihood ratios in a hypothetical cohort with a disease prevalence (pre-test probability) of 2%, the presence of either anosmia or ageusia would increase the post-test probability of the presence of COVID-19 to 8%. The presence of fever together with myalgia and anosmia would increase the post-test probability to 17.8%. We did not identify a useful combination of signs or symptoms that can safely rule out COVID-19. For example, in the same hypothetical cohort with 2% disease prevalence, the absence of fever and anosmia would only lower the probability to 1% for the presence of COVID-19. These results should be interpreted with caution as in Cochrane Database of Systematic Reviews reality these tests are correlated making it highly likely they would result in smaller changes in probability if they were tested in actual studies. The seemingly better sensitivity for fever (and slightly lower specificity) compared to other index tests is unsurprising considering fever was a key feature of COVID-19 that was used in selecting patients for further testing in included studies. As a result, most participants in these studies would have fever, both cases and non-cases. The same applies to olfactory symptoms; only two studies did not select in any way for the presence of olfactory symptoms (Chua 2020; Peyrony 2020), whereas Leal 2020 selected their study participants on the presence of either fever, cough, sore throat, coryza or anosmia. In the studies with no prior selection, less than 10% of the study population presented with anosmia (2.5% in Chua 2020, 9.5% in Peyrony 2020), whereas the study with prior selection reported that 41% had anosmia. Without selection, sensitivity is low and specificity is high (13% to 14% sensitivity and 98% specificity); with prior selection, sensitivity is higher and specificity is lower (56% sensitivity and 70% specificity). Selection bias is present when selective and non-random inclusion and exclusion of participants applies and the resulting association between exposure and outcome (here the accuracy of the test) di ers in the selected study population compared to the eligible study population, and it has been shown that this may decrease estimates of diagnostic accuracy (Rutjes 2006) . For the diagnosis of COVID-19, rapidly and constantly changing, and widely variable test criteria have influenced who was referred for testing and who was not. Inclusion in the study of only a fraction of eligible patients can give a biased estimate of the real accuracy of the index test when measured against the reference standard and real disease status. Gri ith 2020 have reported on the problematic presence of collider stratification bias in the published studies on COVID-19. Appropriate sampling strategies need to be applied to avoid conclusions of spurious relationships, more specifically in our case, the biased accuracy estimates of signs and symptoms for the diagnosis of COVID-19. Selection of participants based on the presence of specific pre-set symptoms, such as fever and cough, leads to biased associations between these symptoms and disease, and sensitivity and specificity estimates that di er from their true values. The example of collider bias for cough is illustrated in Figure 31 . Grouping studies by diagnostic criteria for selection might clarify this issue, but studies do not clearly describe them, with study authors referring to the guidelines in general that were applicable at the time. Another form of selection bias is spectrum bias, where the patients included in the studies do not reflect the patient spectrum to which the index test will be applied. The inclusion of hospitalised patients can lead to such a bias, when in these patients both the distribution of signs and symptoms di er and assessment with the reference standard is di erential. In addition, the distribution and severity of alternative diagnoses may be di erent in hospitalised populations than in patients presenting to ambulatory care settings. Cochrane Database of Systematic Reviews Strengths of our review are the systematic and broad search performed to include all possible studies, including those prior to peer-review, to gather the largest number of studies available at this point. Exclusion of cases-only studies, the largest number of the published cohorts of patients with COVID-19, limits the available data, however improves the quality of the evidence and the possibility to present both sensitivity and specificity (cases only cannot provide both accuracy measures). Because this is a living systematic review, this update o ered the possibility of pooling estimates of diagnostic accuracy, which was not yet possible in our first review. Future updates will further increase the possibilities of analysing the data in more detail, and focusing the analyses on cross-sectional data that were gathered prospectively. The largest weakness of the review is the high risk of selection bias, as discussed above, with many studies including patients that had already been admitted to hospital or who presented to hospital settings seeking treatment. The lack of data on combinations of signs and symptoms is an important evidence gap. Only two studies presented data on such combinations. The few composite signs and symptoms that were presented in those studies had little added diagnostic value compared to single tests. Combinations of tests increased the specificity, but at a large cost in sensitivity, because all signs and symptoms in the composite test had to be present to lead to a positive result. At this point, it is hard to assess the diagnostic value of combinations of signs and symptoms as the existing evidence is too scarce. We need to assess multiple variables for their possible confounding e ect on the summary estimates. Possible confounders include the presence of other respiratory pathogens (seasonality), the phase of the epidemic, exposure to high-versus low-prevalence setting, high or low exposure risk, comorbidity of the participants, or time since infection. Seasonality may influence specificity, because alternative diagnoses such as influenza or other respiratory viruses are more prevalent in winter, leading to more non-COVID-19 patients displaying symptoms such as cough or fever, decreasing specificity. In this version of the review, all studies were conducted in winter or early spring, suggesting this may still have been at play. However, social distancing policies have shortened this year's influenza season in several countries (who.int/influenza/ surveillance_monitoring/updates), which may have led to higher specificity for signs and symptoms than what we may expect in the next influenza season. In future updates of the review, we will explore seasonality e ects if data allow. As for time since onset, given that the moment of infection is more likely than not an unrecognisable and unmeasurable variable, time since onset of symptoms can be used as a proxy. Reporting of studies, with presentation of the 2x2 table stratified by time since onset of disease, is informative and might have the potential to increase accuracy of the signs and symptoms and their diagnostic di erential potential. The high risk of selection bias, with many studies including patients who had already been admitted to hospital or who presented to hospital settings seeking treatment, leads to findings that are less applicable to people presenting in primary care, who on average experience a shorter illness duration, less severe symptoms and have a lower probability of the target condition. Our search did not find any articles providing data on children. Children have been disproportionally underrepresented in the studies on diagnosing SARS-CoV-2 infection. Their absence seems related to the general mild presentation of the disease in the paediatric population and even more frequently the completely asymptomatic course. The full scope of disease presentation in children is, however, not known. It is important to identify signs and symptoms that can be used to assess children with suspected SARS-CoV-2 infection clinically, especially because non-specific presentations and fever without a source are already common in this age group. Children present as a heterogeneous group; having separate data for neonates, young infants, toddlers, school aged children and adolescents is of value. Misclassification of children both at their presentation to the healthcare system and in the short term, where children will be asked to remain in quarantine when they present with predefined, but not yet evidence-based symptoms needs to be avoided to decrease the possible damage done to children's health. Another important patient group is older adults. They are most at risk of a negative outcome of SARS-CoV-2 infection, especially mortality but also intensive care support. In this version of the review, only one study focused on adults aged 55 to 75 years. All other studies included adults of all ages and did not present results separately for the older age groups. The lack of a solid evidence base for the diagnosis of COVID-19 in older adults adds to the di iculty in diagnosing serious infections in this age group, as other serious infections such as bacterial pneumonia or urinary sepsis also tend to lead to non-specific presentations. Studies that focus specifically on older adults or children may also enable us to estimate the diagnostic accuracy of signs and symptoms within these age groups. Given the distinct biological characteristics of children versus younger and versus older adults, these accuracy estimates are likely to be di erent in di erent age groups. The current presentation of overall pooled estimates may therefore prove too simplistic. Until results of further studies become available, broad investigation of people with suspected SARS-CoV-2 infection remains necessary. Neither absence nor presence of individual signs are accurate enough to rule in or rule out disease. Within the context of selection bias of all the studies in this review, the presence of fever, cough, or 'anosmia or ageusia' may be useful to identify people for further testing for COVID-19. Our review update still reflects the need for improved study methodology and reporting in COVID-19 diagnostic accuracy research. • Appropriate patient sampling strategies; prospective crosssectional design; investigating the presence or absence of clinical signs and symptoms in anyone with suspected COVID-19 • Improved reporting, with studies describing assessment of signs and symptoms (providing clearer definitions), and clear Cochrane Database of Systematic Reviews reporting of reference standards. Studies should report the definition of signs and symptoms more clearly, how they were measured, by whom and when. The measurement of key symptoms such as anosmia and ageusia could benefit from standardisation, including the severity and nature of the loss of smell or taste. Yet such standardisation should not be overly complicated, as signs and symptoms will typically be used by frontline clinicians who will incorporate these in their more holistic assessment of the patient which includes more than just COVID-19. • Inclusion of a broader spectrum of patients, with studies in the primary healthcare setting to properly evaluate the diagnostic accuracy of signs and symptoms in this setting; inclusion of studies on patients with the aim of screening for infection (loosening up quarantine measurements may lead to an increased need for this); data on specific patient groups with comorbidities at higher risk of complications or severe disease and higher impact of missing diagnosis of SARS-CoV-2 infection at an early stage; addition of the paediatric population. • Prospective studies in an unselected population presenting to primary care or hospital outpatient settings, examining combinations of signs and symptoms to evaluate the syndromic presentation of COVID-19, are needed. Results from such studies could inform subsequent management decisions such as selfisolation or selecting patients for further diagnostic testing. • We would like to recommend that authors adhere to the STARD guidelines when reporting new studies on this topic (Bossuyt 2015). Members of the Cochrane COVID-19 Diagnostic Test Accuracy Review Group include: Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Unclear risk Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Unclear Were all patients included in the analysis? Yes Ahmed 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Ai 2020 (Continued) Purpose: diagnosis of SARS-CoV-2 infection (mild COVID-19 disease); to measure the seroprevalence of antibodies against SARS- Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( CoV-2 infection in a community sample of asymptomatic and symptomatic patients. Recruitment: patients with mild or moderate COVID-19 symptoms who had a face-to-face or phone consultation with their GP between 2 March and 24 April 2020 Sample size: n = 634 (244 cases) Inclusion criteria: all patients aged ≥ 1 year consulting the primary care physician either face-to-face or by phone with mild or moderate symptoms (without a confirmed diagnosis) during the COV-ID-19 pandemic from 2 March-24 April 2020 Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? High risk Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Library Trusted evidence. Informed decisions. Better health. Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? No Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Carignan 2020 (Continued) Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear High risk Low concern DOMAIN 3: Reference Standard Challener 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Challener 2020 (Continued) If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Chen 2020 (Continued) Cochrane Database of Systematic Reviews Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Haehner 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Haehner 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Huang 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Huang 2020 (Continued) Did the study avoid inappropriate inclusions? Yes Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Just 2020 (Continued) Recruitment: residents of the municipality aged ≥ 12 years with suspected COVID-19 symptoms were encouraged to contact the dedicated platform via the website or phone. They were invited to complete an initial screening questionnaire. Sample size: n = 1583 (444 cases (only the PCR-positive patients) Inclusion criteria: patients meeting the suspected COVID-19 case definition (having at least 2 of the following symptoms: fever, cough, sore throat, coryza or change in/loss of smell (anosmia); or 1 of these symptoms plus at least 2 other symptoms consistent with COVID-19 Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? No Were all patients included in the analysis? Yes Leal 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? No If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? No Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? Yes Liang 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( If a threshold was used, was it prespecified? Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Were all patients included in the analysis? Could the patient flow have introduced bias? Liang 2020 (Continued) Purpose: diagnosis of SARS-CoV-2 infection (mild COVID-19 disease); to ascertain the effectiveness of the screening strategy and provide insight for early diagnosis of COVID-19 Library Trusted evidence. Informed decisions. Better health. Sample size: n = 1004 (cases = 188) Inclusion criteria: all patients visiting the fever clinics within the study period. Patients with fever (body temperature > 37.5° C), or patients with pulmonary symptoms and epidemiological exposure history were requested to visit the fever clinics. All patients visiting the fever clinics during the study period were included. Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Library Trusted evidence. Informed decisions. Better health. Purpose: assess GI symptoms in COVID-19 and their association with short-term outcomes Design: diagnostic case-control, retrospective study Recruitment: adults who underwent nasopharyngeal swab testing for SARS-CoV-2 at outpatient settings: clinics or the ED, of New York-Presbyterian-Columbia or the medical centre's affiliates in New York Inclusion criteria: adults ≥ 18 years of age who underwent nasopharyngeal swab testing for SARS-CoV-2. Indications for testing during this period were respiratory symptoms (cough, fever, shortness of breath) with intent to hospitalise or the same symptoms in essential personnel. Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Nobel 2020 (Continued) Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Peng 2020 (Continued) Study characteristics Purpose: diagnosis of SARS-CoV-2 infection (mild COVID-19 disease); to assess utility of clinical parameters, physician clinical judgment, and lung ultrasonography to accurately identify SARS-CoV-2 infected patients at ED presentation Library Trusted evidence. Informed decisions. Better health. Design: prospective cohort study Recruitment: cohort of all adult (≥ 18 years) patients with suspected COVID-19 who were tested for SARS-CoV-2 prospectively enrolled at university ED (not every patient was tested for SARS-CoV-2: testing was le to the clinician's discretion) Sample size: n = 391 (225 cases) Inclusion criteria: no predefined inclusion criteria. Testing was mostly performed in patients who had severe symptoms such as dyspnoea, reported shortness of breath, presented with comorbidities, or were > 70 years. Some patients without COVID-19 symptoms were also tested when they needed admission to hospital. Exclusion criteria: patients who attended the ED more than once (only the last visit was included). There were no other exclusion criteria. Patient characteristics and setting Facility cases: all patients who tested positive for SARS-CoV-2 by RT-PCR Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Were all patients included in the analysis? Yes Peyrony 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Did all patients receive the same reference standard? No Pisapia 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Yes If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Yan 2020 (Continued) Study characteristics Purpose: to identify differences in CT imaging and clinical features between COVID-19 and influenza pneumonia in the early stage, and to identify the most valuable features in the differential diagnosis Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Zayet 2020b (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Zhao 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Zhao 2020 (Continued) Are there concerns that the included patients and setting do not match the review question? Zhu 2020 (Continued) Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19 ( Cochrane Database of Systematic Reviews Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? No Could the conduct or interpretation of the index test have introduced bias? Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 3: Reference Standard Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Were all patients included in the analysis? Yes Zhu 2020 (Continued) Purpose: diagnosis of SARS-CoV-2 infection (mild COVID-19 disease); to develop a data-driven set of clinical indicators for COV-ID-19 that would help to identify outpatient symptoms and those who most benefit from limited testing availability characteristics and setting Facility cases: positive SARS-CoV-2 RT-PCR test Facility controls: all SARS-CoV-2 RT-PCR results were negative (minimum 2 test negatives in high-risk patients, minimum 1 test low-risk patients) Country: Singapore Dates Symptoms and severity: 252 (33.2%) symptoms > 5 days at presentation Cough Symptoms and severity: cases: 88.1% mild, 11.5% severe, 0.5% critical; controls: 90.3% mild, 9 Demographics: mean age: cases: 56.7 years, controls: 61.3 years. Gender: % female cases: 58.6% SARS-CoV-2 infection • RS: PCR for SARS-CoV-2 (nasopharyngeal swabs, sputum, bronchial aspirates or bronchoalveolar lavage fluids Sensitivity analysis: cross-sectional studies DOR: diagnostic odds ratio; LR+: positive likelihood ratio; LR-: negative likelihood ratio; NA: not applicable, number of studies too small to perform meta-analysis Severe pneumonia Adolescent or adult: fever or suspected respiratory infection, plus one of the following: respiratory rate higher than 30 breaths/minute severe respiratory distress; or oxygen saturation (SpO 2 ) 93% or less on room air. Child with cough or di iculty in breathing, plus at least one of the following: central cyanosis or SpO 2 less than 90%; severe respiratory distress (for example, grunting Other signs of pneumonia may be present: chest indrawing, fast breathing (in breaths/minute): aged under 2 months: 60 or higher; aged 2 to 11 months: 50 or higher X-ray, computed tomography (CT) scan, or lung ultrasound): bilateral opacities, not fully explained by volume overload, lobar or lung collapse Origin of pulmonary infiltrates: respiratory failure not fully explained by cardiac failure or fluid overload. Need objective assessment (for example, echocardiography) to exclude hydrostatic cause of infiltrates/oedema if no risk factor present Oxygenation impairment in adults: • mild ARDS: 200 mmHg less than ratio of arterial oxygen partial pressure/fractional inspired oxygen (PaO 2 /FiO 2 ) 300 mmHg or less (with positive end-expiratory pressure (PEEP) or continuous positive airway pressure (CPAP) 5 cmH 2 O, or more • moderate ARDS: 100 mmHg < PaO 2 /FiO 2 ≤ 200 mmHg (with PEEP ≥ 5 cmH 2 O, or non-ventilated) PaO 2 /FiO 2 ≤ 100 mmHg (with PEEP ≥ 5 cmH 2 O, or non-ventilated) • when PaO 2 is not available, SpO 2 /FiO 2 ≤ 315 mmHg suggests ARDS Use PaO 2 -based metric when available. If PaO 2 not available, wean FiO 2 to maintain SpO 2 ≤ 97% to calculate OSI or SpO 2 /FiO 2 ratio: • bilevel (non-invasive ventilation or CPAP) ≥ 5 cmH 2 O via full-face mask: PaO 2 /FiO 2 ≤ 300 mmHg or SpO 2 /FiO 2 ≤ 264 Embase: (nCoV or 2019-nCoV or ((new or novel or wuhan) adj3 coronavirus) or covid19 or covid-19 or SARS-CoV-2) With the kind support of the Public Health & Primary Care Library PHC (www.unibe.ch/university/services/university_library/ faculty_libraries/medicine/public_health_amp_primary_care_library_phc/index_eng.html), and following guidance of the Medical Library Association Wuhan coronavirus Embase: ncov OR (wuhan AND corona) OR COVID none known René Spijker: the Dutch Cochrane Centre (DCC) has received grants for performing commissioned systematic reviews FIND is a global non-for profit product development partnership and WHO Diagnostic Collaboration Centre. It is FIND's role to accelerate access to high quality diagnostic tools for low resource settings and this is achieved by supporting both R&D and access activities for a wide range of diseases, including COVID-19. FIND has several clinical research projects to evaluate multiple new diagnostic tests against published Target Product Profiles that have been defined through consensus processes. These studies are for diagnostic products developed by private sector companies who provide access to know-how Julie Domen: none known for International Development Outpatient Clinics, Hospital [statistics & numerical data]; Pandemics; Physical Examination; *Primary Health Care • Fever ( If a threshold was used, was it pre-specified? No High risk Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard?Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question?Low concern DOMAIN 3: Reference StandardIs the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests?Cochrane Database of Systematic Reviews Unclear Were the index test results interpreted without knowledge of the results of the reference standard?Yes If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question?Low concern DOMAIN 3: Reference StandardIs the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question?Low concern DOMAIN 4: Flow and Timing Test TST-135. Hyposmia (non-cross-sectional study) Patients (setting, intended use of index test, presentation, prior testing)Primary care, hospital outpatient settings including emergency departmentsInpatients presenting with suspected COVID-19No prior testingSigns and symptoms often used for triage or referral The focus will be on the diagnosis of COVID-19 disease and COVID-19 pneumonia. For this review, the focus will not be on prognosis. Was a consecutive or random sample of patients enrolled?This will be similar for all index tests, target conditions, and populations.YES: if a study explicitly stated that all participants within a certain time frame were included; that this was done consecutively; or that a random selection was done.NO: if it was clear that a different selection procedure was employed; for example, selection based on clinician's preference, or based on institutions.UNCLEAR: if the selection procedure was not clear or not reported. This will be similar for all index tests, target conditions, and populations.YES: if a study explicitly stated that all participants came from the same group of (suspected) patients.NO: if it was clear that a different selection procedure was employed for the participants depending on their COVID-19 (pneumonia) status or SARS-CoV-2 infection status.UNCLEAR: if the selection procedure was not clear or not reported. Studies may have excluded participants, or selected participants in such a way that they avoided including those who were difficult to diagnose or likely to be borderline. Although the inclusion and exclusion criteria will be different for the different index tests, inappropriate exclusions and inclusions will be similar for all index tests: for example, only elderly patients excluded, or children (as sampling may be more difficult). This needs to be addressed on a case-by-case basis.YES: if a high proportion of eligible patients was included without clear selection.NO: if a high proportion of eligible patients was excluded without providing a reason; if, in a retrospective study, participants without index test or reference standard results were excluded; if exclusion was based on severity assessment post-factum or comorbidities (cardiovascular disease, diabetes, immunosuppression). Cochrane Database of Systematic Reviews UNCLEAR: if the exclusion criteria were not reported. YES: if samples included were likely to be representative of the spectrum of disease.NO: if the study oversampled patients with particular characteristics likely to affect estimates of accuracy.UNCLEAR: if the exclusion criteria were not reported. HIGH: if one or more signalling questions were answered with NO, as any deviation from the selection process may lead to bias.LOW: if all signalling questions were answered with YES.UNCLEAR: all other instances.Is there concern that the included patients do not match the review question?HIGH: if accuracy of signs and symptoms were assessed in a case-control design, or in an already highly selected group of participants, or the study was able to only estimate sensitivity or specificity.LOW: any situation where signs and symptoms were the first assessment/test to be done on the included participants.UNCLEAR: if a description about the participants was lacking. This will be similar for all index tests, target conditions, and populations.YES: if blinding was explicitly stated or index test was recorded before the results from the reference standard were available.NO: if it was explicitly stated that the index test results were interpreted with knowledge of the results of the reference standard.UNCLEAR: if blinding was unclearly reported. This will be similar for all index tests, target conditions, and populations.YES: if the test was dichotomous by nature, or if the threshold was stated in the methods section, or if authors stated that the threshold as recommended by the manufacturer was used.NO: if a receiver operating characteristic curve was drawn or multiple threshold reported in the results section; and the final result was based on one of these thresholds; if fever was not defined beforehand.UNCLEAR: if threshold selection was not clearly reported. HIGH: if one or more signalling questions were answered with NO, as even in a laboratory situation knowledge of the reference standard may lead to bias.LOW: if all signalling questions were answered with YES. Is there concern that the index test, its conduct, or interpretation differ from the review question?This will probably be answered 'LOW' in all cases except when assessments were made in a different setting, or using personnel not available in practice. Is the reference standard likely to correctly classify the target condition?We will define acceptable reference standards using a consensus process once the list of reference standards that have been used has been obtained from the eligible studies.For severe pneumonia, we will consider how well processes adhered to the WHO case definition in Appendix 1. HIGH: if one or more signalling questions were answered with NO.LOW: if all signalling questions were answered with YES. Is there concern that the target condition as defined by the reference standard does not match the review question?HIGH: if the target condition was COVID-19 pneumonia, but only RT-PCR was used; if alternative diagnosis was highly likely and not excluded (will happen in paediatric cases, where exclusion of other respiratory pathogens is also necessary); if tests used to follow up viral load in known test-positives.LOW: if above situations were not present.UNCLEAR: if intention for testing was not reported in the study. Was there an appropriate interval between index test(s) and reference standard?YES: this will be similar for all index tests, populations for the current infection target conditions: as the situation of a patient, including clinical presentation and disease progress, evolves rapidly and new/ongoing exposure can result in case status change, an appropriate time interval will be within 24 hours.NO: if there was more than 24 hours between the index test and the reference standard or if participants were otherwise reported to be assessed with the index versus reference standard test at moments of different severity.UNCLEAR: if the time interval was not reported. YES: if all participants received a reference standard (clearly no partial verification).NO: if only (part of) the index test-positives or index test-negatives received the complete reference standard.UNCLEAR: if it was not reported. YES: if all participants received the same reference standard (clearly no differential verification).NO: if (part of) the index test-positives or index test-negatives received a different reference standard. We needed a more e icient approach to keep up with the rapidly increasing volume of COVID-19 literature. A classification model for COVID-19 diagnostic studies was built with the model building function within Eppi Reviewer, which uses the standard SGCClassifier in Scikit-learn on word trigrams. As outputs, new documents receive a percentage (from the predict_proba function) where scores close to 100 indicate a high probability of belonging to the class 'relevant document' and scores close to 0 indicate a low probability of belonging to the class 'relevant document'. We used three iterations of manual screening (title and abstract screening, followed by full-text review) to build and test classifiers. The final included studies were used as relevant documents, while the remainder of the COVID-19 studies were used as irrelevant documents. The classifier was trained on the first round of selected articles, and tested and retrained on the second round of selected articles. Testing on the second round of selected articles revealed poor positive predictive value but 100% sensitivity at a cut-o of 10. The poor positive predictive value is mainly due to the broad scope of our topic (all diagnostic studies in COVID-19), poor reporting in abstracts, and a small set of included documents. The model was retrained using the articles selected of the second and third rounds of screening, which added a considerable number of additional documents. This led to a large increase in positive predictive value, at the cost of a lower sensitivity, which led us to reduce the cut-o to 5. The largest proportion of documents had a score between 0-5. This set did not contain any of the relevant documents. This version of the classifier with a cut-o 5 was used in subsequent rounds and accounted for approximately 80% of the screening burden. We took the following information from the university of Bern website (see: ispmbern.github.io/covid-19/living-review/ collectingdata.html).The register is updated daily and CSV file downloads are made available. From 1 April 2020, we will retriev the curated BioRxiv/MedRxiv dataset (connect.medrxiv.org/relate/content/181). JD, JDi, YT, CD, ML, RS, LH, AVdB, and DE, contributed clinical, methodological and/or technical expertise to dra ing the protocol. JD coordinated contributions from all co-authors and dra ed the protocol. ML dra ed the QUADAS-2 criteria. AVdB oversaw the overall progress of this review, participated in the selection process, data extraction and dra ing of the manuscript. TS analyzed the data, dra ed the manuscript and participated in the selection and data extraction. JD and BH participated in the data extraction, interpretation of the findings and commented on the manuscript. • Clarification regarding inclusion criteria: suspicion of infection was interpreted as: clinical suspicion of SARS-CoV-2 infection based on a symptomatic presentation. At least 50% of the study population had to present with COVID-19 compatible symptoms. • We performed sensitivity analyses to investigate the impact of prospective versus retrospective data collection in cross-sectional studies.