key: cord-0907771-fnmmodd3 authors: Stegeman, Inge; Ochodo, Eleanor A; Guleid, Fatuma; Holtman, Gea A.; Yang, Bada; Davenport, Clare; Deeks, Jonathan J; Dinnes, Jacqueline; Dittrich, Sabine; Emperador, Devy; Hooft, Lotty; Spijker, René; Takwoingi, Yemisi; Van den Bruel, Ann; Wang, Junfeng; Langendam, Miranda; Verbakel, Jan Y; Leeflang, Mariska MG title: Routine laboratory testing to determine if a patient has COVID‐19 date: 2020-11-19 journal: Cochrane Database Syst Rev DOI: 10.1002/14651858.cd013787 sha: b1a837c24c46be95e1904c79d9e7367e5e1bf08b doc_id: 907771 cord_uid: fnmmodd3 BACKGROUND: Specific diagnostic tests to detect severe acute respiratory syndrome coronavirus 2 (SARS‐CoV‐2) and resulting COVID‐19 disease are not always available and take time to obtain results. Routine laboratory markers such as white blood cell count, measures of anticoagulation, C‐reactive protein (CRP) and procalcitonin, are used to assess the clinical status of a patient. These laboratory tests may be useful for the triage of people with potential COVID‐19 to prioritize them for different levels of treatment, especially in situations where time and resources are limited. OBJECTIVES: To assess the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID‐19. SEARCH METHODS: On 4 May 2020 we undertook electronic searches in the Cochrane COVID‐19 Study Register and the COVID‐19 Living Evidence Database from the University of Bern, which is updated daily with published articles from PubMed and Embase and with preprints from medRxiv and bioRxiv. In addition, we checked repositories of COVID‐19 publications. We did not apply any language restrictions. SELECTION CRITERIA: We included both case‐control designs and consecutive series of patients that assessed the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID‐19. The reference standard could be reverse transcriptase polymerase chain reaction (RT‐PCR) alone; RT‐PCR plus clinical expertise or and imaging; repeated RT‐PCR several days apart or from different samples; WHO and other case definitions; and any other reference standard used by the study authors. DATA COLLECTION AND ANALYSIS: Two review authors independently extracted data from each included study. They also assessed the methodological quality of the studies, using QUADAS‐2. We used the 'NLMIXED' procedure in SAS 9.4 for the hierarchical summary receiver operating characteristic (HSROC) meta‐analyses of tests for which we included four or more studies. To facilitate interpretation of results, for each meta‐analysis we estimated summary sensitivity at the points on the SROC curve that corresponded to the median and interquartile range boundaries of specificities in the included studies. MAIN RESULTS: We included 21 studies in this review, including 14,126 COVID‐19 patients and 56,585 non‐COVID‐19 patients in total. Studies evaluated a total of 67 different laboratory tests. Although we were interested in the diagnotic accuracy of routine tests for COVID‐19, the included studies used detection of SARS‐CoV‐2 infection through RT‐PCR as reference standard. There was considerable heterogeneity between tests, threshold values and the settings in which they were applied. For some tests a positive result was defined as a decrease compared to normal vaues, for other tests a positive result was defined as an increase, and for some tests both increase and decrease may have indicated test positivity. None of the studies had either low risk of bias on all domains or low concerns for applicability for all domains. Only three of the tests evaluated had a summary sensitivity and specificity over 50%. These were: increase in interleukin‐6, increase in C‐reactive protein and lymphocyte count decrease. Blood count Eleven studies evaluated a decrease in white blood cell count, with a median specificity of 93% and a summary sensitivity of 25% (95% CI 8.0% to 27%; very low‐certainty evidence). The 15 studies that evaluated an increase in white blood cell count had a lower median specificity and a lower corresponding sensitivity. Four studies evaluated a decrease in neutrophil count. Their median specificity was 93%, corresponding to a summary sensitivity of 10% (95% CI 1.0% to 56%; low‐certainty evidence). The 11 studies that evaluated an increase in neutrophil count had a lower median specificity and a lower corresponding sensitivity. The summary sensitivity of an increase in neutrophil percentage (4 studies) was 59% (95% CI 1.0% to 100%) at median specificity (38%; very low‐certainty evidence). The summary sensitivity of an increase in monocyte count (4 studies) was 13% (95% CI 6.0% to 26%) at median specificity (73%; very low‐certainty evidence). The summary sensitivity of a decrease in lymphocyte count (13 studies) was 64% (95% CI 28% to 89%) at median specificity (53%; low‐certainty evidence). Four studies that evaluated a decrease in lymphocyte percentage showed a lower median specificity and lower corresponding sensitivity. The summary sensitivity of a decrease in platelets (4 studies) was 19% (95% CI 10% to 32%) at median specificity (88%; low‐certainty evidence). Liver function tests The summary sensitivity of an increase in alanine aminotransferase (9 studies) was 12% (95% CI 3% to 34%) at median specificity (92%; low‐certainty evidence). The summary sensitivity of an increase in aspartate aminotransferase (7 studies) was 29% (95% CI 17% to 45%) at median specificity (81%) (low‐certainty evidence). The summary sensitivity of a decrease in albumin (4 studies) was 21% (95% CI 3% to 67%) at median specificity (66%; low‐certainty evidence). The summary sensitivity of an increase in total bilirubin (4 studies) was 12% (95% CI 3.0% to 34%) at median specificity (92%; very low‐certainty evidence). Markers of inflammation The summary sensitivity of an increase in CRP (14 studies) was 66% (95% CI 55% to 75%) at median specificity (44%; very low‐certainty evidence). The summary sensitivity of an increase in procalcitonin (6 studies) was 3% (95% CI 1% to 19%) at median specificity (86%; very low‐certainty evidence). The summary sensitivity of an increase in IL‐6 (four studies) was 73% (95% CI 36% to 93%) at median specificity (58%) (very low‐certainty evidence). Other biomarkers The summary sensitivity of an increase in creatine kinase (5 studies) was 11% (95% CI 6% to 19%) at median specificity (94%) (low‐certainty evidence). The summary sensitivity of an increase in serum creatinine (four studies) was 7% (95% CI 1% to 37%) at median specificity (91%; low‐certainty evidence). The summary sensitivity of an increase in lactate dehydrogenase (4 studies) was 25% (95% CI 15% to 38%) at median specificity (72%; very low‐certainty evidence). AUTHORS' CONCLUSIONS: Although these tests give an indication about the general health status of patients and some tests may be specific indicators for inflammatory processes, none of the tests we investigated are useful for accurately ruling in or ruling out COVID‐19 on their own. Studies were done in specific hospitalized populations, and future studies should consider non‐hospital settings to evaluate how these tests would perform in people with milder symptoms. Specific diagnostic tests to detect severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and resulting COVID-19 disease are not always available and take time to obtain results. Routine laboratory markers such as white blood cell count, measures of anticoagulation, C-reactive protein (CRP) and procalcitonin, are used to assess the clinical status of a patient. These laboratory tests may be useful for the triage of people with potential COVID-19 to prioritize them for di erent levels of treatment, especially in situations where time and resources are limited. To assess the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID-19. On 4 May 2020 we undertook electronic searches in the Cochrane COVID-19 Study Register and the COVID-19 Living Evidence Database from the University of Bern, which is updated daily with published articles from PubMed and Embase and with preprints from medRxiv and bioRxiv. In addition, we checked repositories of COVID-19 publications. We did not apply any language restrictions. Trusted evidence. Informed decisions. Better health. We included both case-control designs and consecutive series of patients that assessed the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID-19. The reference standard could be reverse transcriptase polymerase chain reaction (RT-PCR) alone; RT-PCR plus clinical expertise or and imaging; repeated RT-PCR several days apart or from di erent samples; WHO and other case definitions; and any other reference standard used by the study authors. Two review authors independently extracted data from each included study. They also assessed the methodological quality of the studies, using QUADAS-2. We used the 'NLMIXED' procedure in SAS 9.4 for the hierarchical summary receiver operating characteristic (HSROC) metaanalyses of tests for which we included four or more studies. To facilitate interpretation of results, for each meta-analysis we estimated summary sensitivity at the points on the SROC curve that corresponded to the median and interquartile range boundaries of specificities in the included studies. We included 21 studies in this review, including 14,126 COVID-19 patients and 56,585 non-COVID-19 patients in total. Studies evaluated a total of 67 di erent laboratory tests. Although we were interested in the diagnotic accuracy of routine tests for COVID-19, the included studies used detection of SARS-CoV-2 infection through RT-PCR as reference standard. There was considerable heterogeneity between tests, threshold values and the settings in which they were applied. For some tests a positive result was defined as a decrease compared to normal vaues, for other tests a positive result was defined as an increase, and for some tests both increase and decrease may have indicated test positivity. None of the studies had either low risk of bias on all domains or low concerns for applicability for all domains. Only three of the tests evaluated had a summary sensitivity and specificity over 50%. These were: increase in interleukin-6, increase in Creactive protein and lymphocyte count decrease. Eleven studies evaluated a decrease in white blood cell count, with a median specificity of 93% and a summary sensitivity of 25% (95% CI 8.0% to 27%; very low-certainty evidence). The 15 studies that evaluated an increase in white blood cell count had a lower median specificity and a lower corresponding sensitivity. Four studies evaluated a decrease in neutrophil count. Their median specificity was 93%, corresponding to a summary sensitivity of 10% (95% CI 1.0% to 56%; low-certainty evidence). The 11 studies that evaluated an increase in neutrophil count had a lower median specificity and a lower corresponding sensitivity. The summary sensitivity of an increase in neutrophil percentage (4 studies) was 59% (95% CI 1.0% to 100%) at median specificity (38%; very low-certainty evidence). The summary sensitivity of an increase in monocyte count (4 studies) was 13% (95% CI 6.0% to 26%) at median specificity (73%; very low-certainty evidence). The summary sensitivity of a decrease in lymphocyte count (13 studies) was 64% (95% CI 28% to 89%) at median specificity (53%; low-certainty evidence). Four studies that evaluated a decrease in lymphocyte percentage showed a lower median specificity and lower corresponding sensitivity. The summary sensitivity of a decrease in platelets (4 studies) was 19% (95% CI 10% to 32%) at median specificity (88%; lowcertainty evidence). The summary sensitivity of an increase in alanine aminotransferase (9 studies) was 12% (95% CI 3% to 34%) at median specificity (92%; low-certainty evidence). The summary sensitivity of an increase in aspartate aminotransferase (7 studies) was 29% (95% CI 17% to 45%) at median specificity (81%) (low-certainty evidence). The summary sensitivity of a decrease in albumin (4 studies) was 21% (95% CI 3% to 67%) at median specificity (66%; low-certainty evidence). The summary sensitivity of an increase in total bilirubin (4 studies) was 12% (95% CI 3.0% to 34%) at median specificity (92%; very low-certainty evidence). The summary sensitivity of an increase in CRP (14 studies) was 66% (95% CI 55% to 75%) at median specificity (44%; very low-certainty evidence). The summary sensitivity of an increase in procalcitonin (6 studies) was 3% (95% CI 1% to 19%) at median specificity (86%; very low-certainty evidence). The summary sensitivity of an increase in IL-6 (four studies) was 73% (95% CI 36% to 93%) at median specificity (58%) (very low-certainty evidence). The summary sensitivity of an increase in creatine kinase (5 studies) was 11% (95% CI 6% to 19%) at median specificity (94%) (low-certainty evidence). The summary sensitivity of an increase in serum creatinine (four studies) was 7% (95% CI 1% to 37%) at median specificity (91%; low-certainty evidence). The summary sensitivity of an increase in lactate dehydrogenase (4 studies) was 25% (95% CI 15% to 38%) at median specificity (72%; very low-certainty evidence). Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews were done in specific hospitalized populations, and future studies should consider non-hospital settings to evaluate how these tests would perform in people with milder symptoms. How accurate are routine laboratory tests for diagnosis of COVID-19? What are routine laboratory tests? Routine laboratory tests are blood tests that assess the health status of a patient. Tests include counts of di erent types of white blood cells (these help the body fight infection), and detection of markers (proteins) that indicate organ damage, and general inflammation. These tests are widely available and in some places they may be the only tests available for diagnosis of COVID-19. People with suspected COVID-19 need to know quickly whether they are infected so that they can self-isolate, receive treatment, and inform close contacts. Currently, the standard test for COVID-19 is usually the RT-PCR test. In the RT-PCR, samples from the nose and throat are sent away for testing, usually to a large, central laboratory with specialist equipment. Other tests include imaging tests, like X-rays, which also require specialist equipment. We wanted to know whether routine laboratory tests were su iciently accurate to diagnose COVID-19 in people with suspected COVID-19. We also wanted to know whether they were accurate enough to prioritize patients for di erent levels of treatment. We searched for studies that assessed the accuracy of routine laboratory tests to diagnose COVID-19 compared with RT-PCR or other tests. Studies could be of any design and be set anywhere in the world. Studies could include participants of any age or sex, with suspected COVID-19, or use samples from people known to have -or not to have -COVID-19. We found 21 studies that looked at 67 di erent routine laboratory tests for COVID-19. Most of the studies looked at how accurately these tests diagnosed infection with the virus causing COVID-19. Four studies included both children and adults, 16 included only adults and one study only children. Seventeen studies were done in China, and one each in Iran, Italy, Taiwan and the USA. All studies took place in hospitals, except one that used samples from a database. Most studies used RT-PCR to confirm COVID-19 diagnosis. Accuracy of tests is most o en reported using 'sensitivity' and 'specificity'. Sensitivity is the proportion of people with COVID-19 correctly detected by the test; specificity is the proportion of people without COVID-19 who are correctly identified by the test. The nearer sensitivity and specificity are to 100%, the better the test. A test to prioritize people for treatment would require a high sensitivity of more than 80%. Where four or more studies evaluated a particular test, we pooled their results and analyzed them together. Our analyses showed that only three of the tests had both sensitivity and specificity over 50%. Two of these were markers for general inflammation (increases in interleukin-6 and C-reactive protein). The third was for lymphocyte count decrease. Lymphocytes are a type of white blood cell where a low count might indicate infection. Our confidence in the evidence from this review is low because the studies were di erent from each other, which made them di icult to compare. For example, some included very sick people, while some included people with hardly any COVID-19 symptoms. Also, the diagnosis of COVID-19 was confirmed in di erent ways: RT-PCR was sometimes used in combination with other tests. Routine laboratory tests can be issued by most healthcare facilities. However, our results are probably not representative of most clinical situations in which these tests are being used. Most studies included very sick people with high rates of COVID-19 virus infection of between 27% and 76%. In most primary healthcare facilities, this percentage will be lower. Routine laboratory tests cannot distinguish between COVID-19 and other diseases as the cause of infection, inflammation or tissue damage. None of the tests performed well enough to be a standalone diagnostic test for COVID-19 nor to prioritize patients for treatment. They will mainly be used to provide an overall picture about the health status of the patient. The final COVID-19 diagnosis has to be made based on other tests. Hypoalbuminaemia is the term used for low albumin levels and an indication of increased protein loss or decreased protein synthesis (e.g. due to kidney disease, sepsis or severe liver damage). Most patients with COVID-19 will be missed at any cut-o value. Bilirubin is a breakdown product of haemoglobin. An excess may be an indication that the liver is not capable of removing bilirubin from the blood stream; it is not a specific indication of COVID-19, as most patients with COVID-19 will be missed at any cut-o . High certainty: we are very confident that the true effect lies close to that of the estimate of the effect. Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect. Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect. ALT: alanine aminotransferase; AST: aspartate aminotransferase; CI: confidence interval; CRP: C-reactive protein; IL-6: interleukin-6; IQR: interquartile range; LDH: lactate dehydrogenase; WBC: white blood cell. Included studies defined a positive test result as an increase or a decrease compared to normal range values, or both. a The specificity marking the first quartile (Q1) of all specificities of the studies included, the median specificity, and the third quartile (Q3) specificity were used to estimate the corresponding sensitivity estimates from the HSROC model. b A sensitivity and specificity both of 70% would lead to a diagnostic odds ratio of 5.0. c Starting at high certainty of the evidence, the evidence was downgraded by one level when at least half of the studies had high risk of bias on one or more domains; downgraded for indirectness when at least half of the studies in the meta-analyses had high concerns regarding applicability on at least one domain; downgraded for imprecision when fewer people with the target condition were included then would have been needed to achieve the sensitivity-estimates listed with a width of the confidence interval of at most 10% points; and downgraded for inconsistency when study estimates di ered more than 20% points from each other. Publication bias was not considered to be a problem. Comparisons of routine laboratory tests for COVID-19 with sensitivity and specificity higher than 50% a The median pre-test probability in the meta-analyses varied between 27% and 84%, meaning that the included studies are not representative for situations where the prevalence is 5% or lower. The median prevalence over all the single-gate studies was 36%. b The direct comparison between lymphocyte count increase and C-reactive protein (CRP) increase (9 studies) showed that CRP was considerably more accurate than lymphocyte count increase: relative diagnostic odds ratio (DOR) was 2.02 (95% confidence interval 1.47 to 2.78). As the confidence intervals of all the DORs in the indirect comparisons included a non-informative value (i.e. DOR = 1), a relative DOR of 2 does not mean the alternative is much more informative. Cochrane Database of Systematic Reviews On 30 December 2019, a cluster of patients with pneumonia of unknown origin in Wuhan, China, was publicly reported via ProMED (promedmail.org/promed-posts). In January 2020, it became clear that this was caused by a new coronavirus and that it was spreading to other countries as well. In March 2020, the World Health Organization (WHO) declared the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and resulting COVID-19 a worldwide pandemic. This pandemic, in combination with the novelty of the virus, presents important diagnostic challenges. These challenges range from understanding the value of signs and symptoms in predicting possible infection, assessing whether existing biochemical and imaging tests can identify infection and patients who need critical care, and evaluating whether new diagnostic tests can provide accurate rapid and point-of care testing, either to identify current infection, rule out infection, identify people in need of care escalation, or to test for past infection and immunity. This review follows a generic protocol that covers the full series of Cochrane diagnostic test accuracy (DTA) reviews for the diagnosis of COVID-19 (Deeks 2020b). The Background and Methods sections of this review therefore use some text that was originally published in the protocol, and text that overlaps some of our other reviews (Deeks 2020a; Dinnes 2020; Struyf 2020). The present review concentrates on the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID-19 pneumonia or SARS-CoV-2 infection, and to facilitate further testing. In clinical care, routine laboratory markers such as white blood cell count, measures of anticoagulation, C-reactive protein (CRP) and procalcitonin, are used to assess the health status of a patient. These laboratory markers are also used in patients with COVID-19 infection and may be useful for triage of people with potential COVID-19 infection for treatment or more intensive treatment, especially in situations where time and resources are limited. COVID-19 is the disease caused by infection with SARS-CoV-2. The key target condition for this review was current COVID-19. SARS-CoV-2 infection can be asymptomatic (no symptoms); mild or moderate (symptoms such as fever, cough, aches, lethargy but without di iculty breathing at rest); severe (symptoms include breathlessness and increased respiratory rate indicative of pneumonia); or critical (requiring respiratory support due to severe acute respiratory syndrome (SARS) or acute respiratory distress syndrome (ARDS)). People with COVID-19 pneumonia (severe or critical disease) require distinctive patient management, and it is important to be able to identify these patients. In this review, we focus on COVID-19, without making the distinction between mild to moderate and severe disease. We collated evidence on all routine biomarker tests reported in the identified studies. These can be classified into: • full blood count, haemoglobin and red blood cells; • coagulation markers; • liver markers, cardiac markers and kidney function markers; • general inflammatory markers; and • metabolic markers. Decisions about patient and isolation pathways for COVID-19 vary according to health services and settings, available resources, and stages of the epidemic. They will change over time if and when e ective treatments and vaccines are identified. The decision points between these pathways vary, but all include points at which knowledge of the accuracy of diagnostic information is needed to be able to inform rational decisions. Standard workup for individuals suspected of COVID-19 infection consists of assessing signs and symptoms and a polymerase chain reaction (PCR) test. It is common practice that, when patients enter (either outpatient or admission) the hospital, they will generally have routine laboratory tests done. Routinely available tests for infection and inflammation may be considered in the investigation of people with possible COVID-19 infection. For example, many healthcare facilities have access to standard laboratory tests for infection, such as CRP, procalcitonin, measures of anticoagulation, and white blood cell count with leukocyte di erentiation. Routine laboratory markers may be used as a triage test, either on their own, or in combination with signs and symptoms. In low-resource settings, they may sometimes even be the only tests available. In order to function as a triage test or stand-alone test, a high sensitivity is needed, to prevent infected patients from being sent home or into a general ward with uninfected patients. For a triage test, specificity may be less important, as positive tests will be further investigated. Also, routine laboratory tests may be used to tip the decision to treat the patient as having COVID-19 or not in case of mixed results from other tests or where a definite diagnosis cannot be made. In that case, knowledge of the sensitivity and specificity in a particular (pre-tested) patient population may be useful. Routine laboratory tests may also be used in the further diagnostic workup, to predict mild versus severe outcomes, or to monitor treatment response. These aims of testing will not be the focus of this systematic review. The test that is believed to be most accurate in detecting SARS-CoV-2 is reverse transcriptase polymerase chain reaction (RT-PCR). In many settings, this test will be available, but the results take time before they become available. Although rapid antigen and molecular-based tests are also available, the value of these rapid tests is still not clear. Antibody tests provide insights into the antibody response, but may also take a few days before the response is detectable and therefore the results are available. Alternatives to routine laboratory tests may depend on the setting and situation where the tests are done. For example, in primary care, alternatives may consist of signs and symptoms and rapid and point-of-care tests. Similarly, point-of-care ultrasound may be used, if resources allow. The benefit of routine laboratory tests (and of signs and symptoms) may be as an indication of the severity of a disease: a value further from the reference values may indicate more severe infections. Trusted evidence. Informed decisions. Better health. In emergency departments, chest X-ray, ultrasound, and computed tomography (CT) are widely used diagnostic imaging tests to identify COVID-19 pneumonia. Which imaging test is available may depend on the type of hospital and available resources: a tertiary care hospital in a high-income country may have a mobile CT scan available, while in smaller hospitals only X-ray and ultrasound are accessible. These imaging tests have the advantage that the condition of the lungs can be assessed visually. These other tests are all addressed in the other Cochrane DTA reviews in this suite of reviews (Deeks 2020a; Dinnes 2020; McInnes 2020; Struyf 2020). It is essential to understand the accuracy of tests and diagnostic features to identify how they can be used optimally in di erent settings to develop e ective diagnostic and management pathways. New evidence about routine laboratory testing is becoming available quickly. Therefore, we have produced a Cochrane 'living systematic review' (a systematic review that is continually updated, incorporating relevant new evidence as it becomes available) that will summarize new and existing evidence on the clinical accuracy of routine laboratory markers. Estimates of accuracy from this review will help inform diagnostic, screening, and patient management decisions. To assess the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID-19. Where data are available, we investigated the accuracy (either by stratified analysis or meta-regression) according to a specific measurement or test, days of symptoms, severity of symptoms, reference standard, sample type, study design, and setting. We kept the eligibility criteria broad to include all patient groups and all variations of a test (that is, if patient population was unclear, we included the study). We included studies of all designs that produce estimates of test accuracy or provide data from which estimates can be computed: cross-sectional studies, case-control designs and consecutive series of patients assessing the diagnostic accuracy of routine laboratory testing as a triage test to determine if a person has COVID-19. We intended to include studies recruiting only COVID-19 cases, to estimate sensitivity, or those restricted to people without COVID-19, to estimate specificity (Deeks 2020a). We decided to deviate from this rule as the added value of such studies for our review is questionable. We included both single-gate designs, where a single group of participants, o en suspected of having the target condition, is recruited, and multi-gate designs, where people with and without the target condition are recruited separately. We Intended to include studies that based their results on individual patients as well as studies that based their results on samples. We carefully considered the limitations of di erent study designs, using quality assessment and analysis. We included studies recruiting people presenting with suspected SARS-CoV-2 infection, studies that recruited people to screen for disease, and studies based on serum banks created from known cases of COVID-19 and controls. Studies had to include a minimum of 10 samples or 10 participants. We collected evidence on all routine biomarker tests reported in the identified studies. We interpreted the term 'routine' broadly, considering that some markers will be more routine in some settings or countries than in others. Test positivity could have been defined as an increase in values compared to the normal ranges, or as a decrease compared to normal values. To be eligible, studies needed to identify at least one of: Reverse transcriptase polymerase chain reaction (RT-PCR) is considered the best available test, although due to rapidly evolving knowledge about the target conditions, multiple reference standards on their own as well as in combination have emerged. Therefore, we included the following reference standards: • RT-PCR alone; • RT-PCR, clinical expertise, and imaging (for example, CT thorax); • repeated RT-PCR several days apart or from di erent samples; • plaque reduction neutralization test (PRNT) or enzyme-linked immunosorbent assay (ELISA); • information available at a subsequent time point; • WHO (Appendix 1), and other case definitions; • any other reference standard used by study authors. We conducted a single literature search to cover our suite of Cochrane COVID-19 diagnostic test accuracy (DTA) reviews (Deeks 2020b; McInnes 2020). We conducted electronic searches using two primary sources. Both of these searches aimed to identify all published articles and preprints related to COVID-19, and were not restricted to those evaluating tests. Thus, there are no test terms, diagnosis terms, or methodological terms in the searches. Searches were limited to 2019 and 2020, and for this version of the review have been conducted to 4 May 2020. Trusted evidence. Informed decisions. Better health. We used the Cochrane COVID-19 Study Register (covid-19.cochrane.org), for searches conducted to 28 March 2020. At that time, the register was populated by searches of PubMed, as well as trials registers at ClinicalTrials.gov and the WHO International Clinical Trials Registry Platform (ICTRP). Search strategies were designed for maximum sensitivity, to retrieve all human studies on COVID-19 and with no language limits (Appendix 2). From 28 March 2020, we used the COVID-19 Living Evidence database from the Institute of Social and Preventive Medicine (ISPM) at the University of Bern (www.ispm.unibe.ch), as the primary source of records for the Cochrane COVID-19 DTA reviews. This search includes PubMed, Embase, and preprints indexed in bioRxiv and medRxiv databases. The strategies as described on the ISPM website are described here (ispmbern.github.io/covid-19/; Appendix 3). The decision to focus primarily on the 'Bern' feed was due to the exceptionally large numbers of COVID-19 studies available only as preprints. The Cochrane COVID-19 Study Register has undergone a number of iterations since the end of March and we anticipate moving back to the Register as the primary source of records for subsequent review updates. We identified Embase records obtained through Martha Knuth for the Centers for Disease Control and Prevention (CDC), Stephen B Thacker CDC Library, COVID-19 Research Articles Downloadable Database (cdc.gov/library/researchguides/2019novelcoronavirus/ researcharticles.html), and de-duplicated them against the Cochrane COVID-19 Study Register up to 1 April 2020. We also checked our search results against two additional repositories of COVID-19 publications including: • the Evidence for Policy and Practice Information and Coordinating Centre (EPPI-Centre) 'COVID-19: Living map of the evidence' (eppi.ioe.ac.uk/COVID19_MAP/covid_map_v4.html); • the Norwegian Institute of Public Health 'NIPH systematic and living map on COVID-19 evidence' (www.nornesk.no/ forskningskart/NIPH_diagnosisMap.html). Both of these repositories allow their contents to be filtered according to studies potentially relating to diagnosis, and both have agreed to provide us with updates of new diagnosis studies added. For this iteration of the review, we examined all diagnosis studies from either source up to 4 May 2020. We did not apply any language restrictions. First, all retrieved articles were screened by an overall team of screeners who divided the articles over the di erent rapid DTA reviews. Then, the set of studies possibly involving routine laboratory markers was imported into Covidence. Two review authors screened each title and abstract independent of each other for possible inclusion. In the next step, two review authors independently screened the full text of each possibly relevant article. For articles only available in languages other than English, we used Google Translate and review authors who could read and understand that language. We solved disagreements by discussion. If discussion could not solve the dispute, we consulted a third review author. Two review authors carried out data extraction for each study. We assigned multiple studies with first authors with the same last name to one extractor, so that they could detect preprints from already peer-reviewed, published articles. We contacted study authors when we needed to check details and obtain missing information. Data were extracted on the country and region, the setting, the time period of the study, funding, and information needed for the Characteristics of included studies tables. Studies may have defined a positive test result as a decrease compared to normal vaues, as an increase compared to normal values, and as both increase and decrease. Where possible, we adapted the twoby-two tables in such a way that all studies included in the analyses reported on the same test positivity definition. However, if studies reported both in-and decrease as a positive test result, we included both. We resolved disagreements by discussion between the two review authors, and two other review authors checked the results when these were entered into Review Manager 5.4 (Review Manager 2020). Two review authors independently assessed risk of bias and applicability concerns using the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) tool (Table 1) . We resolved disagreements by discussion between three review authors. QUADAS-2 facilitates assessment across four domains: patient selection, index test, reference standard and flow and timing (Whiting 2011). Each domain is assessed in terms of risk of bias and the first three domains are also assessed in terms of concerns regarding applicability. Signalling questions are included to help judge bias. Table 1 shows the definitions used for assessing the methodological quality. Most routine laboratory tests provide test results as continuous measurements. That means that an explicit threshold is needed to provide positive and negative results for estimation of sensitivity and specificity. Some tests indicate disease if the value is decreased relative to the normal ranges, for other tests disease is indicated when the value is increased, and for some tests, both increase and decrease may indicate the presence of disease. For each test in each study, we reported the threshold used in our analyses, and whether an increase or a decrease in value was regarded as a positive test result. From each study, we included one threshold for each test. If multiple thresholds were reported, we chose the threshold that was most o en used in the other studies. We presented the resulting sensitivity and specificity in forest plots. We reported median and interquartile range (IQR) of pre-test probability of the target condition in 2x2 tables from single-gate studies. We considered a meta-analysis appropriate when four or more studies reported on a particular test. As studies reported mostly di erent thresholds for the same test, we used the Hierarchical Summary Receiver Operator Curve (HSROC) model for metaanalyses to estimate summary curves, as recommended by the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Macaskill 2010). Since summary sensitivities and specificities are only clinically interpretable when the studies included in a meta-analysis use a common cut-o , we estimated sensitivity at points on the SROC curves corresponding to the median specificity observed in the studies included in the metaanalysis. The 'Summary of findings' table also reported the estimates for the first and third quartile specificity. Meta-analyses were undertaken in SAS 9.4, using PROC NLMIXED (SAS 2015). In resource-limited situations, or in case SARS-CoV-2-specific tests are not available, routine laboratory tests may be the only tests available. In order to identify the most discriminative test in such a situation, we compared the diagnostic accuracy of biomarkers that had at least a sensitivity of 50% at a minimum specificity of 50% (either median or IQR). We performed these analyses on all studies that evaluated one of these tests (indirect comparison). We performed additional analyses restricted to studies that made head-to-head comparisons (i.e. assessed two of the biomarkers in the same participants) when at least four studies were included that enabled these direct comparisons. We made test comparisons by adding a covariate for test type to the HSROC model to assess the e ect of test type on the accuracy, cut-o or shape parameters of the model. In addition, whenever the estimated SROC curves had the same shape, we calculated the relative diagnostic odds ratio (RDOR) as a summary of the relative accuracy of two biomarkers at hand. To assess the statistical significance of di erences in test accuracy, we used likelihood ratio tests for comparisons of models with and without covariate terms. If too few primary studies (n < 10) were available for the head-to-head comparison, we assumed the shape parameter of the model to be equal for the biomarkers under evaluation. We investigated sources of heterogeneity if adequate data were available, as listed in the Secondary objectives, either using stratification (where we believed it was inappropriate to combine studies) or through meta-regression models. We developed a list of key findings in 'Summary of findings' tables and determined the certainty in the summary estimates for each test and findings, using the GRADE approach (Schünemann 2020a; Schünemann 2020b. Starting at high certainty, we downgraded meta-analyses by one level when at least half of the studies had high risk of bias on one or more domains; we downgraded for indirectness when at least half of the studies in the meta-analyses had high concerns regarding applicability on at least one domain; we downgraded for imprecision when fewer people with the target condition were included than would have been needed to achieve the sensitivity estimates listed, with a width of the confidence interval of at most 10 percentage points; and we downgraded for inconsistency when study estimates di ered more than 20 percentage points from each other. We did not consider publication bias to be a problem. We will undertake the searches of published literature, preprints, and new test approvals weekly, and, dependent on the number of new and important studies found, we will consider updating each review with each search if resources allow. The overall search for all reviews in this suite was done on 4 May 2020 and resulted in 10,965 records. The first selection resulted in 651 records that were potentially eligible for this review of routine laboratory tests. A er title and abstract screening, we excluded 239 records leaving 412 to be assessed on full text ( Figure 1 ). Of these, we removed 17 duplicates and preprints, 31 studies that were not in the scope of the review, 66 studies that did not contain original data and 7 studies that were retracted or otherwise no longer available. Of the remaining 291 studies, 246 studies only considered proven cases of COVID-19. These reported percentages of proven patients that had an increased or decreased biomarker level. We decided not to extract these data, as only the sensitivity of these markers would be estimable. Furthermore, the aim of these excluded studies was not to assess the accuracy of routine markers for COVID-19, but just to describe the findings or to assess the accuracy of markers to distinguish between mild and severe disease. Trusted evidence. Informed decisions. Better health. The Characteristics of excluded studies table lists the 24 studies that included both patients with and without the target condition, but provided insu icient data to construct 2x2 tables to estimate sensitivity and specificity. The remaining 21 studies are included in this review. Of the 21 included studies, 14 were single-gate studies (a study including patients with suspected COVID-19), six were multiplegate studies (including proven COVID-19 patients and separately one or more groups of non-COVID-19 patients). In the remaining study the design was unclear (Characteristics of included studies). The included studies comprised in total 14,126 COVID-19 patients and 56,585 people without COVID-19. They included a total of 67 laboratory tests ( Eight studies were prepublications and 13 were published in peer reviewed journals. Of the 21 studies, four studies had low or unclear risk of bias on all domains; all other studies had high risk of bias for at least one domain (Figure 2) . Six studies had low concerns regarding applicability for all domains. Eleven studies were judged to have a high risk of bias with respect to the patient selection domain, mainly because of including separate groups of cases and noncases. Six studies did not describe the order of inclusion of their participants and two did not include a random or consecutive sample. Five studies were case-control designs and in two studies the design was unclear. We judged risk of bias for patient selection unclear in four studies. We judged three studies as having a high risk of bias regarding the index test. In these studies the index test was either interpreted with knowledge of the reference standard or there was no predefined cut-o value. Fourteen studies used RT-PCR as a reference standard for SARS-CoV-2 as a target condition, and three used RT-PCR as a reference standard with COVID-19 as a target condition. Only four studies reported multiple tests (e.g. RT-PCR and CT scans) or criteria (e.g. the criteria of the National Health Commission China) as a reference standard for COVID-19 as a target condition. Flow and timing was unclear in the majority of studies (n = 12), because the time between the reference standard and index test was unclear. Trusted evidence. Informed decisions. Better health. None of the studies had low concerns regarding applicability for all domains. As the index test consisted of routine laboratory measurements, these were considered to be low concerns regarding applicability for most studies. In some cases, studies used di erent cut-o values, leading to high concerns regarding applicability. As the focus of our review was COVID-19, we assessed the 14 studies that only used RT-PCR as a reference standard as high concerns regarding applicability of the reference standard. Below we describe the findings for tests assessed in four or more studies: white blood cell count increase and decrease, neutrophil count increase and decrease, monocyte count increase, lymphocyte count decrease, platelets decrease, alanine aminotransferase increase, aspartate aminotransferase increase, albumin decrease, total bilirubin, CRP increase, procalcitonin increase, IL-6 increase, creatine kinase increase, serum creatinine and lactate dehydrogenase increase. See Table 2 for an overview of tests and cut-o values per study. Summary of findings 1 shows the summary of findings for the individual tests, including sensitivity, specificity and diagnostic odds ratios (DORs). All HSROC curves were close to the non-informative diagonal, with DORs varying between 0.23 (95% confidence interval (CI) 0.07 to 0.78) and 4.53 (95% CI 1.89 to 10.88). As an indication, a test with a sensitivity of 70% and a specificity of 70% has a DOR of 5.0. Fi een studies (1262 cases/5318 non-cases) reported on white blood cell count increase (Figure 3) . The cut-o values for an increase in white blood cell count varied from 9.5 x 10 9 cells/L to 11.2 x 10 9 cells/L, with the exception of one study that used a cuto value of 6.4 x 10 9 cells/L. The median prevalence of COVID-19 in the 12 single-gate studies that reported on white blood cell count increase was 36% (IQR 25% to 50%). Cochrane Database of Systematic Reviews Sensitivity in the 15 included studies ranged from 0% to 73%. Fourteen studies had a sensitivity within the range between 0% and 13% and one study reported a sensitivity of 73%. This outlier also was the only study that used the lower cut-o of 6.4 x 10 9 cells/L. Specificity ranged from 54% to 96%. The median specificity was 85%, with the interquartile range from 78% (Q1) to 92% (Q3). The summary estimate of sensitivity following from the HSROC model and corresponding with a specificity of 78%, was 12% (95% CI 4% to 31%). The summary estimate of sensitivity corresponding with the median specificity of 85%, was 6% (95% CI 2% to 17%) and the summary estimate of sensitivity corresponding with a specificity of 92%, was 2% (95% CI 0% to 8%). Eleven studies (1211 cases/3900 non-cases) reported on white blood cell count decrease (Figure 3) . The cut-o values for a decrease in white blood cell count varied from 3.5 x 10 9 cells/L to 4.0 x 10 9 cells/L. The median prevalence of COVID-19 in the nine single- Cochrane Database of Systematic Reviews Sensitivity ranged from 0% to 68%; in 10 studies the sensitivity ranged between 0% and 18%, one study reported a sensitivity of 68% (this outlier is probably due to the low cut-o value of 4.6 x 10 9 cells/L). Specificity ranged from 42% to 94%, with a median of 80% (IQR 66% to 86%). Meta-analysis yielded a sensitivity of 13% (95% CI 4% to 38%), 4% (95% CI 1% to 17%) and 2% (95% CI 0% to 12%) at fixed specificity of 66% (Q1), 80% (median) and 86% (Q3), respectively. Four studies (220 cases/514 non-cases) reported on the accuracy of decrease in neutrophil count (Figure 4) . The cut-o values for a decrease in neutrophil count varied from 1.8*10 9 cells/L to 2*10 9 cells/L. The median prevalence of COVID-19 in the three singlegate studies was 27% (IQR 34% to 24%). The sensitivity of the four studies ranged from 10% to 14% and specificity ranged from 89% to 95%. Meta-analysis yielded a sensitivity of 12% (95% CI 1% to 54%), 10% (95% CI 1% to 56%) and 8% (95% CI 1% to 54%) at a fixed specificity of 92% (Q1), 93% (median) and 94% (Q3), respectively. Four studies (176 cases/107 non-cases) reported on the accuracy of increase in neutrophil percentage (Figure 4) . The cut-o values for an increase in neutrophil count varied from 65.78% to 75.0%. The median prevalence of COVID-19 in the three single-gate studies was 67% (IQR 39% to 74%). The sensitivity of the four studies ranged from 14% to 68% and specificity ranged from 36% to 65%. Metaanalysis yielded a sensitivity of 62% (95% CI 1% to 100%), 59% (95% CI 1% to 100%) and 44% (95% CI 1% to 99%) at fixed specificity of 37% (Q1), 38% (median) and 45% (Q3), respectively. Four studies (126 cases/332 non-cases) reported on monocyte increase ( Figure 5 ). The cut-o values for an increase in monocyte count varied from 0.00 cells/L to 0.8 cells/L. The median prevalence of COVID-19 in the two single-gate studies was 73%. Sensitivity ranged from 10% to 14%; Specificity ranged from 56% to 89%. Metaanalysis yielded a sensitivity of 14% (95% CI 6% to 30%), 13% (95% CI 6% to 26%) and 12% (95% CI 7% to 20%) at fixed specificity of 67% (Q1), 73% (median) and 80% (Q3), respectively. Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews Four studies (190 cases/177 non-cases) reported on decrease in lymphocyte percentage ( Figure 6 ). The cut-o values for a decrease in lymphocyte percentage ranged from 20% to 23.65%. The median prevalence of COVID-19 in the 11 single-gate studies was 37% (27% to 65%), with sensitivity ranging from 0% to 79% and specificity from 27% to 65%. Meta-analysis yielded a sensitivity of 70% (95% CI 0% to 100%), 35% (95% CI 0% to 99%) and 14% (95% CI 0% to 99%) at fixed specificity of 34% (Q1), 50% (median) and 63% (Q3), respectively. Four studies (939 cases/3232 non-cases) reported on decrease in platelets (Figure 7) . The cut-o values for a decrease in platelets ranged from 0.00 to 300.0 per microlitre. The median prevalence of COVID-19 in the three single-gate studies was 76% (38% to 87%), with sensitivity ranging from 13% to 30% and specificity from 71% to 100%. Meta-analysis yielded a sensitivity of 23% (95% CI 13% to 38%), 19% (95% CI 10% to 32%) and 16% (95% CI 7% to 31%) at fixed specificity of 83% (Q1), 88% (median) and 92% (Q3), respectively. Trusted evidence. Informed decisions. Better health. Cochrane Database of Systematic Reviews Figure 8 . Summary ROC plot of tests: alanine aminotransferase (ALT) increase, aspartate aminotransferase( AST) increase. Seven studies (1260 cases/3631 non-cases) reported on AST increase (Figure 8 ). The cut-o values of AST increase varied from 35 U/L to 40 U/L. The median prevalence of COVID-19 in the six single-gate studies was 53% (IQR 29% to 68%). Sensitivity ranged from 15% to 38%, and specificity from 78% to 100%. Meta-analysis yielded a sensitivity of 32% (95% CI 17% to 52%), 29% (95% CI 17% to 45%) and 17% (95% CI 8% to 33%) at fixed specificity of 79% (Q1), 81% (median) and 88% (Q3), respectively. Four studies (799 cases/3273 non-cases) reported on albumin decrease (Figure 9 ). The cut-o values of albumin decrease varied from 0 to 3.5 g/L. The median prevalence of COVID-19 in the three single-gate studies was 75% (IQR 51% to 87%). Sensitivity ranged from 4% to 55%, and specificity from 16% to 87%. Meta-analysis yielded a sensitivity of 36% (95% CI 7% to 82%), 21% (95% CI 3% to 67%) and 13% (95% CI 1% to 64%) at fixed specificity of 46% (Q1), 66% (median) and 79% (Q3), respectively. Cochrane Database of Systematic Reviews Four studies (333 cases/438 non-cases) reported total bilirubin increase ( Figure 9 ). The cut-o varied from 0 to 21 µmol/L. The median prevalence of COVID-19 in the four single-gate studies was 51% (IQR 25% to 61%). Sensitivity ranged from 3% to 9% and specificity ranged from 77% to 97%. Meta-analysis yielded a sensitivity of 23% (95% CI 14% to 35%), 12% (95% CI 3% to 34%) and 4% (95% CI 0% to 41%) at fixed specificity of 85% (Q1), 92% (median) and 97% (Q3), respectively. Fourteen studies (997 cases/1284 non-cases) reported on CRP increase ( Figure 10 ). The cut-o values for an increase in CRP increase varied from 8 mg/L to 34.8 mg/L. The median prevalence of COVID-19 in the 11 single-gate studies was 51% (IQR 28% to 60%). Sensitivity ranged from 0% to 95%, with one outlier of 0% (based on two COVID-19 cases), and the other 13 studies ranging from 31% to 95%. Specificity ranged from 20% to 81%. Meta-analysis yielded Six studies (607 cases/738 non-cases) reported on procalcitonin increase ( Figure 10 ). The cut-o values for an increase in procalcitonin varied from 0.1 ng/mL to 0.5 ng/mL. The median prevalence of COVID-19 in the five studies was 38% (IQR 31% to 70%). Sensitivity ranged from 0% to 48%. Specificity ranged from 26% to 95%. Meta-analysis yielded a sensitivity of 14% (95% CI 3% to 48%), 3% (95% CI 1% to 19%) and 1% (95% CI 0% to 10%) at fixed specificity of 66% (Q1), 86% (median) and 95% (Q3), respectively. Four studies (86 cases/130 non-cases) reported on IL-6 increase ( Figure 11 ). The cut-o values for an increase in IL-6 varied from 0 to 7 pg/mL. The median prevalence of COVID-19 in the four Cochrane Database of Systematic Reviews studies was 84% (IQR 65% to 94%). Sensitivity ranged from 22% to 86%. Specificity ranged from 27% to 92%. Meta-analysis yielded a sensitivity of 83% (95% CI 47% to 96%), 73% (95% CI 36% to 93%) and 59% (95% CI 25% to 86%) fixed specificity of 42% (Q1), 58% (median) and 74% (Q3), respectively. Creatine kinase is a muscle damage marker, which increases upon muscle damage. It is sometimes used as an indicator for cardiac infarction. Five studies (575 cases/498 non-cases) reported on creatine kinase increase ( Figure 12 ). The cut-o values for an increase in creatine kinase were between 174 µmol/L and 310 µmol/ L. The median prevalence of COVID-19 in the five single-gate studies was 55% (IQR 37% to 70%). Meta-analysis yielded a sensitivity of 15% (95% CI 10% to 22%), 11% (95% CI 6% to 19%) and 7% (95% CI 2% to 20%) at fixed specificity of 88% (Q1), 94% (median) and 98% (Q3), respectively. Cochrane Database of Systematic Reviews Serum creatinine is an indicator of kidney damage. Four studies (1005 cases/3311 non-cases), all single-gate design, reported on serum creatinine increase ( Figure 12 ). The cut-o values for an increase in serum creatinine kinase were between 73 µmol/L and 133 µmol/L. The prevalence in the four studies was 16%, 66%, 38% and 75%. Meta-analysis yielded a sensitivity of 15% (95% CI 2% to 63%), 7% (95% CI 1% to 37%) and 3% (95% CI 0% to 36%) at fixed specificity of 76% (Q1), 91% (median) and 97% (Q3), respectively. LDH is a general marker for tissue damage. Five studies (382 cases/431 non-cases) reported on LDH increase ( Figure 12) Cochrane Database of Systematic Reviews 40% to 71%). Sensitivity ranged from 14% to 32% and specificity ranged from 61% to 100%. Meta-analysis yielded a sensitivity of 26% (95% CI 15% to 42%), 25% (95% CI 15% to 38%) and 22% (95% CI 11% to 40%) at fixed specificity of 69% (Q1), 72% (median) and 77% (Q3), respectively. For three tests, we found a pair of sensitivity and specificity where both sensitivity and specificity exceeded 50%. These were IL-6 increase, CRP increase and lymphocyte count decrease. Using all available studies in an indirect comparison (i.e. unrestricted to head-to-head studies), we compared the test performance of IL-6 increase (4 studies), CRP increase (14 studies) and lymphocyte count decrease (13 studies) in one meta-regression analysis. The shape of the SROC curves significantly di ered (P < 0.001). Figure 13 shows the summary ROC curves for the three tests in one Figure ( Summary of findings 2). The median specificity in the 19 studies evaluating one or more of the three tests, was 52% (IQR 34% to 67%). Within the specificity interquartile range, sensitivity varied between 6% (95% CI 0% to 49%) and 100% (22% to 100%) for lymphocyte count decrease, between 51% (95% CI 34% to 68%) and 73% (95% CI 64% to 80%) for CRP increase, and between 67% (95% CI 51% to 79%) and 73% (95% CI 45% to 79%) for IL-6 increase. Nine studies directly compared CRP increase with lymphocyte count decrease for the detection of COVID-19. Especially for lymphocyte count decrease, this direct comparison (Figure 14) , shows a di erent picture from the indirect comparisons (Figure 13 ), or the separate analyses ( Figure 6 ). Despite di erences in cut-o s, the results from most studies were consistent with CRP increase showing higher sensitivity than the lymphocyte count decrease. Cochrane Database of Systematic Reviews overall accuracy was higher for CRP increase than for lymphocyte count decrease. However, both tests are close to the diagonal line corresponding with an uninformative test. We included 21 studies in this review and analyzed the results for 67 di erent routine laboratory tests, focusing on diagnosing COVID-19. For 16 tests, we have summarized the results in a meta-analysis. As the majority of the included studies only reported RT-PCR as a reference standard, the meta-analyses may be more applicable to detecting SARS-CoV-2 infection than COVID-19 diseased. Only three tests performed at sensitivity-specificity combinations where both sensitivity and specificity were above 50%. There was low to very low certainty in the summary estimates of the tests. The low accuracy of these tests does not render them useless. They are all indicators of the general health status of a patient. They may indicate infection, inflammation, or tissue damage and thus support diagnoses made based on other diseases. However, evidence to date suggests that in sick hospitalized patients, they cannot discriminate between COVID-19 and other diseases as the cause of infection, inflammation or tissue damage and should preferably not be used as stand-alone tests for COVID-19. As a triage test would require a high sensitivity (< 80%), these tests have limited use as triage tests. How these tests would perform in those with milder symptoms cannot be inferred from our data. In some situations, where resources are very limited, these tests are the only ones at hand when making a diagnosis. In these situations, it may be worthwhile to consider the three tests with a slightly better performance than the others: lymphocyte count decrease, IL-6 increase and CRP increase. These tests are also available as point-of-care tests, although that is not how they were used in the included studies, so any inference should be made with caution. Of those three, IL-6 has the highest summary sensitivity at the highest median specificity. Both the median specificity and the boundary of the third quartile were above 50% (58% and 74% respectively). If we chose to use the test at a higher specificity of 74%, then the sensitivity would only be 59% (95% CI 25% to 86%). When testing 1000 people using this cut-o value, at 5% pre-test probability, then 29 or 30 out of 50 cases would have a true positive result and be contained or put in quarantine, and 20 or 21 out of 50 cases would be sent home, possibly infectious. It would also mean that of the 950 non-cases, 247 would be considered to be positive, while they are not. Using the test at a lower cut-o value to increase sensitivity, would decrease specificity even further. The median pre-test probability of all included studies was 36% and most patients were hospitalized. In such a scenario, when testing 1000 people with IL-6 at a specificity of 74% and a sensitivity of 59%, then 212 out of 360 cases would have a true positive result and be contained or put in quarantine, and 148 out of 360 cases would be sent home or to a non-COVID-19 ward, possibly infectious. It would also mean that of the 640 non-cases, 166 would be considered to be positive, while they are not. Nine studies directly compared leukocyte count increase and CRP increase. From the meta-analysis including these two tests, we found that CRP is more accurate than leukocyte count increase, but as explained above, the point estimates do require caution when using the tests as sole markers. Furthermore, we did not assess the quality of the comparisons made in the included studies. We assessed the diagnostic accuracy of a broad spectrum of routine laboratory tests for COVID-19. Included studies demonstrated considerable heterogeneity in the accuracy of many biomarkers, and used cut-o values and reference standards that were, in many cases, suboptimally described. The current review included a range of di erent cut-o values for most index tests, which we took into account using HSROC analyses and pooling studies with similar cuto values for a given laboratory marker. A limitation is suboptimal reporting that hampered assessment of the QUADAS-2 flow and timing domain in many studies. In many instances the timing of index test and reference standard was unclear, which could have led to unreliable results concerning the diagnostic abilities of the tests. While most studies used RT-PCR as reference standard, some used a combination of RT-PCR and signs and symptoms or other tests. This potentially introduced heterogeneity because of di erences in patients marked as cases and controls according to the di erences in reference standards. Some tests of interest, such as d-dimer or cardiac markers were evaluated in too few studies to meta-analyse their results. We retrieved information on multiple index tests. The availability of laboratory tests is dependent on the type of hospital, department and available resources of the place in which the test is to be performed. In order to make the findings suitable for di erent settings we have included a broad range of biomarkers, and settings. We did not find studies that included participants in a primary care or general population setting. In clinical practice, not a single test, but the results of a combination of tests might be important for diagnosing COVID-19. These tests can be used for the first triage of patients in case of limited access to diagnostic tests, a er which at a later stage further testing can be done. For triage tests, a high sensitivity is important to safely rule out the disease, however all tests had a low sensitivity. Also, the cut-o values used may di er by clinic and location, this could lead to di erent treatment decisions if a single patient were tested in di erent settings. In this review we included all di erent cut-o points available in current literature. Lastly, the reference standard in most studies was RT-PCR only, which means that there are concerns regarding applicability of the results of this review to COVID-19 as a target condition. However, the reporting of the studies was unclear and sometimes confusing. It may therefore be possible that in the study practice also other criteria were used to assess the diagnosis, but that this was not or insu iciently reported. None of these markers as stand-alone tests are useful for accurately ruling in or ruling out COVID-19. As a triage test would require a high sensitivity (< 80%), these tests have limited value as triage tests. Although there is low or very low certainty about the summary estimates in this review, we do not expect that studies with a low risk of bias will show a better performance than the tests included. Future studies focusing on the usefulness of routine laboratory tests for COVID-19 may consider a more representative sample of the population, focus on markers with prespecified, clinically sound cut-o s and focus on single, but also on the combination of regular blood markers. Furthermore, considering the test results as continuous values may be more informative, as larger deviations from the reference values will have greater impact on the health status of the tested people, and might enable more personalized treatment. (Table 2) Blood routine examination results were before hospitalization, first enzyme level test results after hospitalization of these 2 groups; person doing the testing not stated. Hospital lab technicians processed samples. Thresholds for positivity or negativity were not reported but we assumed that the same thresholds were used as in Ai 2020b, which was a study on the same 102 participants with COVID-19. Target condition and reference standard(s) Reference standard: RT-PCR was used to confirm cases. For some cases, RT-PCR was repeated 5 times before a positive test was confirmed. Sample not reported. Hence target condition was SARS-CoV-2 infection. Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? High risk Patient Sampling Patients suspected of having SARS-CoV-2 pneumonia and hospitalized at Chongqing Three Gorges Central Hospital from 26 to 31 January 2020 were included in our study. Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Were all patients included in the analysis? Yes Exposure history: history of sojourn or residence: 57.1% for cases and 21.1% for controls. History of contact with confirmed patient: cases: 28.6% and controls 5.3%. History of contact with person who had fever or respiratory symptoms: cases 14.3% and controls 57.9% Time since onset of symptoms: not reported. Days from illness onset to first admission: median 5 days for cases and 1 day for controls Index tests Routine laboratory tests (Table 2) Lymphocyte count (LYMPH#), CRP and IL-6 were evaluated on admission. Lymphopenia (< 1.0 × 10 9 /L) was 1 of the 3 diagnostic criteria for S-COVID-19-P according to the 6th-Guidelines-CNHHC. Elevated CRP (> 0.8 mg/L) and elevated IL-6 (> 5.9 pg/mL) were both important infection-related biomarkers Cochrane Database of Systematic Reviews Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Could the selection of patients have introduced bias? Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? Yes Feng 2020 (Continued) Routine laboratory testing to determine if a patient has COVID-19 ( Were all patients included in the analysis? Yes Feng 2020 (Continued) (Table 2) Blood samples were collected on the same day of the rRT-PCR test. CRP, AST, ALT, GGT, ALP and LDH were measured on a Roche Cobas 8000 device (Roche Diagnostic, Basel, Switzerland) using either a spectrophotometric assay (AST, ALT and LDH), a colorimetric assay (ALP and GGT) or an immunoturbidimetric assay (CRP). WBC, platelets and the leukocyte formula were measured on Sysmex XE 2100 (Sysmex, Japan). Reference standard: rRT-PCR was performed on a Roche Cobas Z480 thermocycler (Roche Diagnostic, Basel, Switzerland) using the Roche-provided Tib-Molbiol's 2019-nCoV Real-Time Reverse Transcription PCR Kit. RNA purification was performed using the Roche Magna pure system. Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Yes Did all patients receive the same reference standard? • WBC count increased (11.2 x 10 9 /L) • WBC count decreased (3.6 x 10 9 /L) • Lymphocyte count decreased (1.0 x 10 9 /L) • CRP increased (10 mg/L) For all tests • Sample: blood product, whole blood (not reported, but otherwise WBC impossible) • Test interpreter: not reported • Timing of testing: not reported Target condition and reference standard(s) RT-PCR (conducted multiple times in each participant; at least upon admission and 24h after admission, and for some participants even every few days). Target condition was SARS-CoV-2 infection. Was a consecutive or random sample of patients enrolled? Yes Was a case-control design avoided? Yes Did the study avoid inappropriate exclusions? Yes Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Library Trusted evidence. Informed decisions. Better health. Patient Sampling Children with confirmed 2019-nCoV pneumonia (cases) admitted between 24 January and 22 February 2020 and children with RSV pneumonia (controls) admitted between 10 December 2019 and 22 February 2020 in Wuhan Children's hospital and patients who underwent the detection of peripheral blood lymphocyte subsets were included in the study. Previously healthy children were included in the study, and children receiving chemotherapy, treatment of glucocorticoids or immunosuppressant before the diagnosis of the pneumonia were not included in the study as their immune response to viral infections might be different. Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? No If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Unclear Were all patients included in the analysis? Yes Unclear risk Patient Sampling Pregnant women who were admitted into the Hubei Provincial Maternal and Child Health Center, during 24 January-29 February 2020. The study also included suspected patients with typical chest CT imaging but negative in RT-PCR tests. Eleven pregnant women who were tested positive for SARS-CoV-2 were classified as labo- Cochrane Database of Systematic Reviews ratory-confirmed case group, and eighteen with typical chest CT imaging but tested negative in RT-PCR tests as suspected case group. The control group of pregnant women without pneumonia during hospital stay were randomly selected from the medical records by an investigator (MP), who was not involved in statistical analysis. Only those aged 25-35 years were selected to match the age range of cases. 121 women admitted during 24 January-11 February 2019 (control 2019 group) Patient characteristics and setting Pregnant women (and therefore high concern regarding applicability) Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? No If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Low concern DOMAIN 4: Flow and Timing Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Low concern DOMAIN 3: Reference Standard Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? For all index tests, see Table 2 Target condition and reference standard(s) RT-PCR. Laboratory testing of 2019-nCoV in throat swabs was performed by both Beijing Centers for Disease Control and Prevention Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Yes Were all patients included in the analysis? Yes Time since onset of symptoms was 7.0 (3.5-9.0) days (confirmed group) and 6.0 (4.0-9.0) days (unconfirmed group). Compared with participants in unconfirmed group, participants in confirmed group had significantly higher proportion of Wuhan residence history, having visited Wuhan, clustering diseases and dry cough Index tests WBC count, PCT, ALT, LDH, creatinine kinase, troponin I. Table 2 Target condition and reference standard(s) RT-PCR. sample: nasopharyngeal swabs or sputum specimens; the confirmed group was defined as a positive result of at least 1 RT-PCR test for SARS-CoV-2. Unclear if study was a 2-gate design or a single-gate design, but the way the methods and results are described, we assumed a single-gate design. Patient characteristics and setting 19 COVID-19 patients and 15 non-COVID-19 patients from the Second Affiliated Hospital of Anhui Medical University and Suzhou Municipal Hospital in Anhui province, China were included in this study. The mean age was 48 (IQR 27~56) and 35 (IQR 27~46) in COVID-19 and non-COVID-19 patients, respectively. 8 (42.11%) were female in COVID-19 patients, and 9 (60%) in non-COV-ID-19 patients. The median duration from exposure to onset is 8 (IQR 6~11) and 5 (IQR 4~11) days in COVID-19 and non-COVID-19 patients, respectively. All participants had a history of exposure to confirmed case of 2019-nCoV or travel to Hubei before illness Index tests Index tests done: WBC and lymphocyte count, neutrophil count, AST; ALT; LDH; GGT; α-hydroxybutyric dehydrogenase; CK; CRP and IL-6. Tests were done on admission (4-5 days from onset), person doing the testing is not stated. As WBC was assessed, sample must have been whole blood Target condition and reference standard(s) COVID-19 cases were confirmed to be infected with or without 2019-nCoV by real-time RT-PCR. COVID-19 was defined to be 2019-nCoV negative by PCR detection. For non-COVID-19 confirmation, we collected a throat swab or sputum sampling every other day. The patient was confirmed as non-COVID-19 if 3 consecutive real-time PCR tests were negative during first 7 days of admission Are there concerns that the included patients and setting do not match the review question? Low concern DOMAIN 2: Index Test (All tests) Were the index test results interpreted without knowledge of the results of the reference standard? Unclear If a threshold was used, was it pre-specified? Yes Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard? Unclear Did all patients receive the same reference standard? Were all patients included in the analysis? Unclear Country: China Symptoms and severity: there were 6 (19%) smokers among diagnosed participants and 13 (15%) among negative cases. 7 (22%) diagnosed and 15 (18%) negative cases had hypertension. There were no other commonly found comorbidities in either group. Demographics: median age 40 (IQR 27-53); 46% male Exposure history: there was no specific exposure history common to all participants with suspected disease: 8 (25%) diagnosed participants had visited Wuhan in the previous 2 weeks and 12 (38%) had been exposed to participants with infection in the previous 2 weeks. In negative cases, these numbers were 7 (20%) and 8 (24%), respectively. None of the participants had a history of exposure to the seafood market in Wuhan. Time since onset of symptoms: median 5 days (IQR 2-7 days) Clinical and laboratory data on admission were obtained from detailed medical records, collected in a standardized case report form by 2 experienced emergency doctors. Laboratory tests included a complete blood count, serum biochemistry, IL-6 test, CK test, LDH test, and tests for the identification of other respiratory pathogens Timing of tests not reported; blinding not reported Target condition and reference standard(s) A nucleic acid amplification test was performed on swab specimens from participants with suspected disease at admission. Participants with a positive diagnosis were admitted to the hospital, while participants with a negative initial result were kept in quarantine and underwent a second nucleic acid test after 24 h; of these, participants with a second negative result on the nucleic acid test were considered to not have an infection and were discharged from the hospital once they tested negative for SARS-CoV-2 antigens on 2 consecutive tests. WBC increase 9.5 9.5 10 10 11.2 9.5 10 9.5 10 10 NR 10 6.44 Child with cough or di iculty in breathing, plus at least one of the following: central cyanosis or SpO 2 < 90%; severe respiratory distress (for example, grunting, very severe chest indrawing); signs of pneumonia with a general danger sign: inability to breastfeed or drink Other signs of pneumonia may be present: chest indrawing, fast breathing (in breaths/minute): aged < 2 months: ≥ 60; aged 2 to 11 months: ≥ 50 X-ray, computer tomography scan, or lung ultrasound): bilateral opacities, not fully explained by volume overload, lobar or lung collapse Origin of pulmonary infiltrates: respiratory failure not fully explained by cardiac failure or fluid overload. Need objective assessment (for example, echocardiography) to exclude hydrostatic cause of infiltrates/oedema if no risk factor present 200 mmHg < ratio of arterial oxygen partial pressure/fractional inspired oxygen (PaO 2 /FiO 2 ) ≤ 300 mmHg • moderate ARDS: 100 mmHg < PaO 2 /FiO 2 ≤ 200 mmHg (with PEEP ≥ 5 cmH 2 O, or non-ventilated) PaO 2 /FiO 2 ≤ 100 mmHg (with PEEP ≥ 5 cmH 2 O, or non-ventilated) • when PaO 2 is not available, SpO 2 /FiO 2 ≤ 315 mmHg suggests ARDS Use PaO 2 -based metric when available. If PaO 2 not available, wean FiO 2 to maintain SpO 2 ≤ 97% to calculate OSI or SpO 2 /FiO 2 ratio: • bilevel (non-invasive ventilation or CPAP) ≥ 5 cmH 2 O via full-face mask: PaO 2 /FiO 2 ≤ 300 mmHg or SpO 2 /FiO 2 ≤ 264 Study selection, data-extraction and quality assessment, multiple revisions of the review Jane Cunningham contributed clinical, methodological and/or technical expertise to dra ing the protocol Clare Davenport contributed clinical, methodological and/or technical expertise to dra ing the protocol contributed clinical, methodological and/or technical expertise to dra ing the protocol; contributed to multiple revisions of the review and co-ordinated all contributions to all Cochrane Rapid DTA reviews Jacqueline Dinnes contributed clinical, methodological and/or technical expertise to dra ing the protocol; did the initial screening titles and abstracts for all reviews Sabine Dittrich contributed clinical, methodological and/or technical expertise to dra ing the protocol Devy Emperador contributed clinical, methodological and/or technical expertise to dra ing the protocol Lotty Hoo contributed clinical, methodological and/or technical expertise to dra ing the protocol René Spijker contributed clinical, methodological and/or technical expertise to dra ing the protocol; co-ordinated and conducted the study retrieval en initial selection steps Yemisi Takwoingi contributed clinical, methodological and/or technical expertise to dra ing the protocol; supervised the meta-analyses Ann Van den Bruel contributed clinical, methodological and/or technical expertise to dra ing the protocol Junfeng Wang translated articles from Chinese to English whenever necessary; retrieved articles in Chinese; extracted data from and assessed quality of Chinese language articles Study selection, data-extraction and quality assessment, multiple revisions of the review Verbakel: Study selection, data-extraction and quality assessment, meta-analyses; multiple revisions of the review Leeflang contributed clinical, methodological and/or technical expertise to dra ing the protocol; dra ed the QUADAS-2 criteria; co-ordinated the review process; overall supervision has provided freelance consultancy for approved professional organizations and learned societies (physiotherapists, optometrists, opticians), and has no known conflicts of interest in relation to this review none known Fatuma Guleid: none known Holtman: none known Bada Yang: none known Jane Cunningham: none known Clare Davenport: none known Jacqueline Dinnes: none known Sabine Dittrich: is employed by FIND. FIND has several clinical research projects to evaluate multiple new diagnostic tests against published Target Product Profiles that have been defined through consensus processes. These studies are for diagnostic products developed by Devy Emperador: is employed by FIND. FIND has several clinical research projects to evaluate multiple new diagnostic tests against published Target Product Profiles that have been defined through consensus processes. These studies are for diagnostic products developed by private sector companies who provide access to know-how, equipment/reagents, and contribute through unrestricted donations as per FIND policy and external SAC review Lotty Hoo : none known René Spijker: the Dutch Cochrane Centre (DCC) has received grants for performing commissioned systematic reviews Yemisi Takwoingi: none known Ann Van den Bruel: none known Junfeng Wang: has received consultancy fee from Biomind, an Artificial Intelligence (AI) company providing machine intelligence solutions in medical imaging. The consultancy service was about design of clinical studies none known Jan Verbakel: none known Commonwealth and Development O ice (FCDO) We intended to include studies that recruited only COVID-19 cases, to estimate sensitivity or those restricted to people without COVID-19, to estimate specificity (Deeks 2020a). We decided to deviate from this rule as the added value of such studies for our review is questionable We planned to investigate test accuracy, either by stratified analysis or meta-regression, according to a specific measurement or biomarker, days of symptoms, severity of symptoms, reference standard, sample type, study design, and setting We did not specify some details about the analyses in our protocol. We chose to present sensitivity and median interquartile range values for cut-o s of specificity Diagnostic Tests, Routine [*methods Leukocyte Count; Liver Function Tests Lymphocyte Count; Pandemics; Platelet Count Reverse Transcriptase Polymerase Chain Reaction ROC Curve; SARS-CoV-2 [*isolation & purification]; Sensitivity and Specificity Are there concerns that the included patients and setting do not match the review question? Were the index test results interpreted without knowledge of the results of the reference standard?No If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question?Low concern DOMAIN 3: Reference StandardIs the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question?Cochrane Database of Systematic ReviewsWere the reference standard results interpreted without knowledge of the results of the index tests?Unclear Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard?No Did all patients receive the same reference standard? YesWere all patients included in the analysis? Unclear Rentsch 2020 (Continued) Patient Sampling Inclusion criteria of the patients suspected of moderate type novel coronavirus pneumonia for this study are:• exposure history • presenting with fever or respiratory symptoms, or normal or decreased WBC count at the early stage, or decreased lymphocyte count • radiological features of novel coronavirus pneumonia Exclusion criteria are:• respiratory rate ≥ 30/min • peripheral oxygen saturation ≤ 93% when at rest • shock • need for mechanic ventilation or ICU care; 5. Organ failure.In this study, the participants suspected of moderate type novel coronavirus pneumonia confirmed with positive nucleic acid tests were designated as the study group and the ones with negative findings as the control group. Duration 31 January-11 February 2020 Patient characteristics and setting Setting: triaged for admission to the Southeast Hospital of Xiaogan Central Hospital from the fever clinics of Xiaogan Central Hospital, Xiaogan First People's Hospital and Hubei Aerospace Hospital. From 31 January-11 February 2020Country: China Severity: none of the participants were severely or critically ill Demographics: in cases, 51% was male and in controls 48% was male; mean age was 49.2 years +/-13.7 (95% CI 48-50)Exposure status: more than half were exposed to travellers from Wuhan Time since onset of symptoms: mean 4.6 days from onset of symptoms (+/-2.9); 0.22% died Cochrane Database of Systematic Reviews If a threshold was used, was it pre-specified? Could the conduct or interpretation of the index test have introduced bias? Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Was there an appropriate interval between index test and reference standard?Unclear Did all patients receive the same reference standard? Were all patients included in the analysis? Could the patient flow have introduced bias? Yang 2020b (Continued) Were the index test results interpreted without knowledge of the results of the reference standard?Unclear If a threshold was used, was it pre-specified? Unclear Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Cochrane Database of Systematic ReviewsCould the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Are there concerns that the included patients and setting do not match the review question?Low concern DOMAIN 2: Index Test (All tests)Were the index test results interpreted without knowledge of the results of the reference standard? If a threshold was used, was it pre-specified? Are there concerns that the index test, its conduct, or interpretation differ from the review question? Is the reference standards likely to correctly classify the target condition? Were the reference standard results interpreted without knowledge of the results of the index tests? Could the reference standard, its conduct, or its interpretation have introduced bias? Are there concerns that the target condition as defined by the reference standard does not match the question? Zhu 2020 (Continued) Cochrane Database of Systematic ReviewsWas there an appropriate interval between index test and reference standard?Unclear Did all patients receive the same reference standard? Were all patients included in the analysis? Could the patient flow have introduced bias? Zhu 2020 ( The focus will be on the diagnosis of COVID-19 pneumonia or infection with SARS-CoV-2. For this protocol, the focus will not be on prognosis. Was a consecutive or random sample of patients enrolled?This will be similar for all index tests, target conditions, and populations.YES: if a study explicitly stated that all participants within a certain time frame were included; that this was done consecutively; or that a random selection was done.NO: if it was clear that a different selection procedure was employed; for example, selection based on clinician's preference, or based on institutions.UNCLEAR: if the selection procedure was not clear or not reported. This will be similar for all index tests, target conditions, and populations.YES: if a study explicitly stated that all participants came from the same group of (suspected) patients.NO: if it was clear that a different selection procedure was employed for the participants depending on their COV-ID-19 (pneumonia) status or SARS-CoV-2 infection status.UNCLEAR: if the selection procedure was not clear or not reported. Studies may have excluded patients, or selected patients in such a way that they avoided including those who were difficult to diagnosis or likely to be borderline. Although the inclusion and exclusion criteria will be different for the different index tests, inappropriate exclusions and inclusions will be similar for all index tests: for example, only elderly patients excluded, or children (as sampling may be more difficult). This needs to be addressed on a case-tocase basis.YES: if a high proportion of eligible patients was included without clear selection.NO: if a high proportion of eligible patients was excluded without providing a reason; if, in a retrospective study, participants without index test or reference standard results were excluded; if exclusion was based on severity assessment postfactum or comorbidities (cardiovascular disease, diabetes, immunosuppression).UNCLEAR: if the exclusion criteria were not reported. Some laboratory studies may have intentionally included groups of patients in whom the accuracy was likely to differ, such as those with particularly low or high viral loads, or who had other diseases, such that the sample overrepresented these groups. This needs to be addressed on a case-to-case basis. Artificial spiked samples are a clear example.YES: if samples included were likely to be representative of the spectrum of disease.NO: if the study oversampled patients with particular characteristics likely to affect estimates of accuracy.UNCLEAR: if the exclusion criteria were not reported. HIGH: if one or more signalling questions were answered with NO, as any deviation from the selection process may lead to bias. design, or in an already highly selected group of participants, or the study was able to only estimate sensitivity or specificity.LOW: any situation where signs and symptoms were the first assessment/test to be done on the included participants.UNCLEAR: if a description about the participants was lacking.an already highly selected group of participants.LOW: any situation where generic laboratory tests were among the first tests to be done on the included participants.UNCLEAR: if a description about the participants was lacking. Were the index test results interpreted without knowledge of the results of the reference standard?This will be similar for all index tests, target conditions, and populations.YES: if blinding was explicitly stated or index test was recorded before the results from the reference standard were available.NO: if it was explicitly stated that the index test results were interpreted with knowledge of the results of the reference standard.UNCLEAR: if blinding was unclearly reported.If a threshold was used, was it prespecified?This will be similar for all index tests, target conditions, and populations.YES: if the test was dichotomous by nature, or if the threshold was stated in the methods section, or if authors stated that the threshold as recommended by the manufacturer was used.NO: if a receiver operating characteristic curve was drawn or multiple threshold reported in the results section; and the final result was based on one of these thresholds; if fever was not defined beforehand (in review # 4, Signs and symptoms).UNCLEAR: if threshold selection was not clearly reported. HIGH: if one or more signalling questions were answered with NO, as even in a laboratory situation knowledge of the reference standard may lead to bias. This will probably be answered 'LOW' in all cases, except when tests used a threshold that was much higher or lower than in practice, or undertaken in Is the reference standard likely to correctly classify the target condition?In this review, we focused on the target condition COVID-19 disease. Although we defined acceptable reference standards using a consensus process once the list of reference standards that have been used has been obtained from the eligible studies, Studies of which it is clear that only RT-PCR was used will be considered high risk of bias. HIGH: if only RT-PCR was used (as it measures a different target condition); if alternative diagnosis was highly likely and not excluded (will happen in paediatric cases, where exclusion of other respiratory pathogens is also necessary); if tests used to follow-up viral load in known test positives.LOW: if above situations were not present.UNCLEAR: if intention for testing was not reported in the study. Was there an appropriate YES: this will be similar for all index tests, populations for the current infection target conditions: as the situation of a patient, including clinical presentation and disease progress, evolves rapidly and new/ongoing exposure can re- The following information is taken from the university of Bern website (see: ispmbern.github.io/covid-19/living-review/ collectingdata.html).The register is updated daily and CSV file downloads are made available. From 1 April 2020, we will retrieve the curated bioRxiv/medRxiv dataset (connect.medrxiv.org/relate/content/181). With the kind support of the Public Health & Primary Care Library PHC (www.unibe.ch/university/services/university_library/ faculty_libraries/medicine/public_health_amp_primary_care_library_phc/index_eng.html), and following guidance of the Medical Library Association (www.mlanet.org/p/cm/ld/fid=1713). ))))) Embase: ncov OR (wuhan AND corona) OR COVID bioRxiv/medRxiv: ncov or corona or wuhan or COVID Review first published: Issue 11, 2020 Inge Stegeman: Study selection, data-extraction and quality assessment, first dra of the review and subsequent revisions;Eleanor A Ochodo: Study selection, data-extraction and quality assessment, multiple revisions of the review;