key: cord-0960154-qf1gsie6 authors: Hawthorne, G. H.; Harvey, A. title: Real-world clinical performance of SARS-CoV-2 point-of-care diagnostic tests: a systematic review of available trials as per April, 4, 2021 date: 2021-09-22 journal: nan DOI: 10.1101/2021.09.20.21263509 sha: 83dcf97372fcd230a9265583bc6c66fba4582a7b doc_id: 960154 cord_uid: qf1gsie6 Point-of-care assays offer a decentralised and fast solution to the diagnosis of SARS-CoV-2 and provide benefits for patients, healthcare workers, healthcare facilities and other environments. This technology has to potential to prevent outbreaks, enable faster adoption of life-changing measures and improve hospitalar workflow. While reviews regarding the performance of those assays exist, a review focused on the real-life clinical performance and point-of-care feasibility of those platforms was missing. Therefore, the objective of this study is to help end users (clinicians, healthcare providers and organisations) to understand the real-life performance of point-of-care assays, aiding in their implementation in decentralised, true point-of-care facilities, or inside hospitals. 871 studies were screened in 3 major databases and 51 studies were included, evaluating 20 antigen tests and 10 nucleic-acid amplification platforms. We excluded studies that used processed samples, pre-selected populations, archived samples and laboratory-only evaluations and strongly favored prospective trial designs in our inclusion criteria. We also investigated package inserts, instructions for use, comments on published studies and manufacturers websites in order to assess feasibility of POC placement and additional information of relevance to the end user. Apart from sensitivity and specificity, we present information on time to results, hands-on time, kit storage, machine operating conditions and regulatory status. To the best of our knowledge, this is the first review to systematically evaluate POC test performance in real-life clinical practice. We found the performance of tests in clinical practice to be markedly different from the manufacturers reported performance and laboratory-only evaluations in the majority of studies. Our findings may help in the decision-making process related to SARS-CoV-2 test in real-life clinical settings. In December 2019, SARS-CoV-2 was first reported in Wuhan, China, and a pandemic was declared by the World Health Organization (WHO) in March, 2020. Reverse transcription-quantitative polymerase chain reaction (RT-qPCR) is the gold standard for diagnosis of SARS-CoV-2 infection. However, this technique has disadvantages including the requirement of centralised facilities with specific equipment for testing conduction, the requirement of highly trained personnel and a long turnaround time between sample collection to results, approaching over 48 hours in some scenarios 1 . Driven by the need for diagnostic solutions during the pandemic, multiple new assays and platforms were created, including multiple fast molecular and antigen tests. The FIND SARS-COV-2 DIAGNOSTIC PIPELINE 2 , which collates an overview of commercially available SARS-CoV-2 tests in real time, shows over 1120 results for diagnostic solutions up to June 7 th , 2021. Point-of-care (POC) diagnostic platforms are defined as diagnostic assays that can deliver results near patients, without the need for centralised laboratories or diagnostic facilities 3 . These platforms tend to deliver fast results, often within minutes to a few hours, enabling quicker medical decisions and facilitating timely interventions. POC diagnostic platforms are currently in use in health systems for different ends, from the bedside glucose test 4 to the analysis of blood gases and electrolytes 5 . Besides being able to provide faster results, an important advantage of POC tests is to facilitate diagnosis in locations that previously could not have access to centralised diagnostic techniques. In the context of transmissible infectious diseases, some of these assays enable quick decisions regarding treatment and isolation requirements. Before the pandemic, POC platforms were already in use for the diagnosis of conditions such as influenza-like illnesses in different settings, including accident and emergency departments in hospitals and outpatient clinics 6 . Other assays focus on the diagnosis of sexually transmitted diseases like HIV 7 and Chlamydia trachomatis 8 . As a consequence of the SARS-CoV-2 pandemic and the need for more diagnostic solutions, not only centralised platforms but also POC diagnostic assays have had an unprecedent expansion, especially because time from sample collection to results is key to prevent further infections and speed up the workflow inside hospitals and healthcare organisations overall. However, POC tests can have limitations such as lack of accuracy, in the form of decreased sensitivity or specificity, increased costs, and a lower throughput compared to centralised laboratory facilities and techniques like PCR. Some tests also require a number of manual steps in preparation or computers for their execution, which can make platforms complex for true POC placement. Additionally, tests that are deemed inaccurate can have multiple consequences. False negative results can cause inadequate placement of patient in hospitals (e.g, moving a infectious patient to a 'green ward'), causing new outbreaks in an already diseased population, and also deem a community patient not infectious, thus increasing the chances of propagating infection to contacts. False positive results can inversely place patients in high-risk environments in hospitals (e.g, inside a 'red ward') and cause unnecessary isolation and economic impact for in an outpatient setting. If a test is flagged as inaccurate and it is decided that this test needs confirmation before the results can be trusted, no clinical action can be taken until this confirmation is obtained, which defeats the purpose of a fast test. The regulatory aspect of novel tests during a pandemic scenario has been complex and the dimension of this process during the SARS-CoV-2 pandemic was unprecedented. Due to the urgent need for testing solutions, many abbreviated validation studies were accepted by the scientific community and by regulatory agencies such as the FDA with initial or partial evaluations, and few platforms have had their real-life performance assessed before being released to the market. Despite showing good accuracy in internal laboratory validations, multiple POC platforms for SARS-CoV-2 diagnosis had a lower-thanexpected performance once released for clinical use 9 . This topic was the subject of political and juridical debate 10 and resulted in some previously approved tests being later revoked from the market 11 . There are many examples of disparities between manufacturer claims and laboratory-only evaluations and data from clinical trials. For instance, the platform ID Now (Abbott) claim a sensitivity of 100% and a specificity of 100% on the product's package insert 12 , but clinical evaluations have showed sensitivities below 50% 13 or 75% 14 . Another example is the study conducted by Jokela et al 15 , where the Mobidiag Novodiag had 93.4% (100/107) sensitivity in archived samples but only captured 60% (3/5) positive samples in a clinical setting (the clinical study could not obtain more positive samples due to a decline in prevalence). It is well known that results from laboratory-only evaluations differ from results of clinical studies. Usually, evaluating an intervention in patients in real-life scenarios has an increased level of complexity. In the particular situation of SARS-CoV-2 testing, there are many possible reasons for those discrepancies. The first group of differences stem from factors related to the patients themselves, such as tolerance to swabbing and the presence of inhibitors in the nasopharynx secretions. There has been debate regarding the reliability of viral load and Ct values 16 , suggesting that viral load may greatly vary not only with the stage of the disease but also depending on the quality of swab collection and elements like dilution of swabs in a buffer and the RNA extraction process; therefore, it appears to be important for a platform to have a low limit of detection regardless of the 'average' viral load in a group of patients to miss as little cases as possible. The presence of inhibitors on samples is also relevant, as many platforms describe interactions between food, beverages and medications and their amplification chemistry in their instructions for use or package inserts 12 . Once a sample is already known positive (with a known Ct value or an estimation of viral load) and has been selected in a panel for a comparison against other platform, this risk of poor-swabbing technique and inhibitors become non-existent, and therefore laboratory only evaluations may report higher sensitivity than what would be expected in a clinical setting. Another group of differences is related to the feasibility of the workflow of the proposed testing platform in a true POC scenario, including the technique of swab collection and the expertise needed to conduct testing (such as sample preparation, machine operation, and cleaning). Many tests that claim the possibility of use in a true POC setting would find resistance to adoption due to technical complications, as it is the case of most Loop mediated isothermal amplification (LAMP) assays (e.g, Yamazaki et al 17 ) . There are also factors related to the particularities of the pandemic scenario and the precautions needed to prevent cross-infections, including the desirable use of viral inactivation techniques in samples and the unfeasibility of amplification techniques that may end up releasing amplicons in a proper clinical setting, thus possibly increasing false-positive results. Furthermore, there are differences between laboratory evaluations and clinical settings; pre-selection of samples and repeating experiments is possible in a laboratory setting, but much harder inside a clinical workflow. On top of that, the risk of false-positives in hospitalar environments is elevated, especially in areas of high movement and turnover like emergency departments, outpatient settings and clinical wards. POC diagnosis plays an important role in SARS-CoV-2 management. Faster diagnosis speeds up isolation measures and therefore the prevention of new outbreaks. In the same way, faster confirmation of SARS-CoV-2 absence helps avoiding unnecessary isolation for individuals and their contact groups, providing social and economic benefit. The workflow of patients inside a hospital can be greatly facilitated by using tests that are reliable and provide a fast result, aiding in the placement of patients inside red or green wards and preventing SARS-CoV-2 nosocomial spread while freeing up rooms and improving the capacity of emergency departments. For example, in the study conducted by Collier et al using the SAMBA platform 18 , mean length of stay on COVID-19 "holding" (or "amber") wards was reduced by nearly 30h using the POC test. Additionally, timely interventions like the use of dexamethasone in patients requiring respiratory support 19 or the use of interleukin-6 receptor antagonists in critically ill patients 20 benefit from a fast diagnostic modality, especially considering the waiting-time for a centralised test result can be up to 48h in some centres. In this review, we are going to address the POC platforms for SARS-CoV-2 diagnosis, which can be divided into molecular tests (that use some sort of nucleic acid amplification) and antigen tests (that target SARS-CoV-2 antigens), focusing especially on clinical trials of their performance in real life settings. Our criteria was rigidly tailored to exclude samples that were pre-selected and testing conducted in laboratory conditions with frozen samples. The objective of this study is to help clinicians, healthcare providers and organisations to understand the real-life performance of POC assays, aiding in their implementation in decentralised, true point-of-care facilities or in more complex healthcare environments with safety. Other reviews of point-of-care assays targeting SARS-CoV-2 exist and use different inclusion criteria for studies. Dinnes et al 21 have published a review that includes 64 studies for 16 antigen platforms and 5 molecular assays, with data up to September 2020. This study attempted to divide the studies between symptomatic and asymptomatic individuals, and also included laboratory-only evaluations and retrospective studies, which escapes our goal. Yoon et al 22 conducted a study using the FDA Emergency Use Authorization (FDA-EUA) authorized point-of-care tests up to August, 2020, in which 26 studies were analysed. Hayer et al 23 reviewed antigen tests but only included assays not needing a separate reader. Considering the novelty of the topic, we encourage readers to evaluate these and other studies to obtain a clearer picture of the field. Other POC diagnostic technologies that have been in use in the management of COVID-19 patients, like the use of POC ultrasound for patient follow-up after diagnosis, escape the scope of this work. This systematic review was conducted following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. After debate between the authors, a broad search phrase was defined with the intention to capture a large number of studies, given the novelty of the field. Filtering results by date using the abovementioned algorithm was not necessary, since the name SARS-CoV-2 was coined after the identification of the pathogen in 2019. -Study is unrelated to point-of-care testing; -Platform is not point-of-care (e.g requires the use of centralised equipment); -Test main focus is not acute diagnosis (e.g antibody assays); -Study is not a clinical trial (e.g proof of concepts, validations, studies conducted with frozen samples, test conducted in laboratory conditions); -Samples extracted from patients were processed before testing; -Population was fully pre-selected (e.g, only known positives were enrolled); The definition of a point-of-care diagnostic assay is not straightforward, since there are no rigid criteria for reference. The defining factor is being able to provide a diagnostic solution near the patient, thus removing the need for a centralised processing facility. While some platforms are totally mobile, being able to travel to the patient's location, and thus clearly defined as POC, others require the use of an energy source, centralised computers, or tablets. Other platforms require multiple preparation steps before test conduction (such as the need for RNA extraction, the use of a centrifuge, heating blocks or multiple pipetting steps), even though the final testing step can theoretically be conducted near the patient -this is the case of most LAMP platforms. On top of that, requirements for isolation and measures for preventing infection spread further complicated this definition during the SARS-CoV-2 pandemic. We therefore recognize the definition of a POC platform has level of subjectivity. Aligned with our objectives, in this review, we considered an outpatient community setting as our reference point. Therefore, assays that use reagents that require processing in a central laboratory or facility were excluded from our analysis. The rational for that is that these platforms could not be operational without such facilities in the vicinity and therefore could not be implemented in the community in a truly POC fashion. Based on the information available in publications and manufacturer manuals, platforms that are judged to be technically capable of community implementation will be included and have their potential limitations/considerations (e.g, the requirement for cold chain storage or the need for a desktop computer) described in the comments of our results table. Because of the disruption and urgency caused by the pandemic, many studies have not followed rigid clinical trial designs, and a high level of heterogeneity between the methods and designs was noticed. Issues such as need for self-isolation, patient discomfort upon repeated swabbing, multi-platform evaluations and insufficient number of positive samples during low-incidence scenarios affect the feasibility of the studies and need to be taken into consideration when proposing a review. For instance, the study of Tu et al 24 evaluating the ID NOW platform (Abbott) had an original design to enroll 200 positive patients, but a prevalence drop made the study unaffordable. Given the heterogeneity of the trials, finding a rigid, unifying criteria for inclusion was difficult and would make this review impossible. In this work, as mentioned, our goal is reviewing the efficacy of the platforms in real life scenarios. Therefore, we included studies that tested platforms in real patients in a true point-of-care fashion and excluded laboratory-only evaluations. As a consequence, we have excluded from our analysis all proofof-concept papers and studies that used spiked samples. We have also excluded platforms that used solely pre-tested frozen samples, as we understand that these conditions are vastly different from conditions found in the field; as mentioned before, the selection of frozen samples may remove samples with inhibitors, invalid or borderline results, and with low viral-loads, fauvoring samples with high Ct values. Naturally, many trials used cooled or frozen samples at some point, especially when considering multiplatform evaluations. We tended to consider time from sample collection to testing in our selection criteria; while it was impossible to decide on a clear cut-off time, this was often informative. Samples that were stored for a brief period of time to allow testing with a POC platform were accepted. While this likely does not have the same value as immediate point-of-care testing, it is often a necessary accommodation for validation studies where a comparator assay is used. As an example, Lephart et al collected samples from 88 patients (13 of which were known positive), stored at 4 °C and tested within 24h; this study was included in our table. On the other hand, studies using samples that were part of frozen panels tested in retrospect, often weeks after sample collection, tended to be excluded. As an example of our criteria, we did not include the work by Corman et al 25 who have conducted a comparison of seven SARS-CoV-2 antigen tests available in Europe because processed samples were used and only negative swabs were collected from real patients. Therefore, we strongly favored prospective studies. Given the heterogeneity of designs and the conditions for studies, we debated between authors before inclusion when methods were not clear. For instance, Cerutti et al conducted a trial with 330 patients and only a minority of frozen samples was included (n = 13); this trial was included in our criteria 26 . In the other hand, a prospective study by Courtellemont et al 27 was not included in analysis as known positive patients were pre-selected to enroll; therefore, operators knew the status of the patients beforehand. A similar situation happened in the study conducted by Ghofrani et al 28 , which conducted an comparison with known positive patients and selected eligible samples; the methodology of this study was not clear and the sensitivity, reported as 96.7%, is much higher than usually reported in literature for antigen assays 29 . The vast majority of studies used nasopharyngeal samples, although a few studies used nasal samples only. Studies evaluating point-of-care assays using saliva samples do exist 30, 31 and usually to show a decreased performance; for instance, Basso et al 32 found a sensitivity of only 13% testing saliva on antigen tests; similarly, performance of saliva samples was inferior in the study by Agulló et al 33 evaluating the Panbio assay. We therefore reported the results for either nasopharyngeal or nasal swabs and (as per manufacturer's advice) when they were part of an assessment with multiple samples. When collection methods were compared (e.g, sensitivity of self-swab against healthcare collection 34, 35 ), we reported the results obtained by professional swabbing. We also excluded studies where POC assays were compared to other POC assays, as we understand no proper gold-standard was employed in those situations. An example is the study conducted by Basu et al 36 , where Abbott ID Now COVID-19 was compared against the Cepheid Xpert Xpress SARS-CoV-2 without a PCR standard. It is important to mention that the reliability of the gold-standard was questioned in some studies 37 ; ideally, a reference standard would be built using more than one assay 13 , cross-checked clinical information, radiological evidence and other laboratory information (e.g antibody production, viral markers or inflammatory makers), but this is understandably complex and unfeasible in many scenarios. We also carefully considered the population type in the studies. Naturally, testing known-positive patients presents a bias. Due to challenges imposed by factors such as lockdowns, the urgency needed for results, different prevalence levels, differences in viral load between different days of disease and the size of the trials, some studies tested known positive populations in order to have statistically significant data for sensitivity. If a study was done solely in known positive patients, we tended to exclude it from our table. On the other hand, studies that complemented a prospective evaluation by testing positive populations in a randomized way were accepted, provided the proportion was reasonable. As an example, Basso et al tested antigen assays in 139 selected inpatients (this population had a 60% positivity rate) and 96 outpatients prospectively (3% positivity rate); we included this study in our table. Some studies tested exclusively in a paediatric population, and were also included 38 . . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; Taking these factors into consideration, trials were judged on the overall level of heterogeneity in their methodology. In the study conducted by Osterman et al 39 evaluating 2 antigen platforms, there was a significant variability between 2 sites as samples were collected in different time-frames (site 1 from March to October 2020 and site 2 between November-December 2020) and some samples in site 1 had different storage methods, with some being frozen for days before testing. After discussion, we decided to include this study in our table. Other studies like Marti et al 40 were excluded due to a high level of heterogeneity in their methods, using both a POC and a centralised PCR as their standard and using different populations, including a population of known positive individuals. We attempted to include detailed explanations of the reasons for inclusion or exclusion of individual trials in the next section of this review. We investigated package inserts, instructions for use, comments on published studies and manufacturers websites in order to assess feasibility of POC placement and additional information that may be relevant to the end user. Apart from sensitivity and specificity, we included time to results, hands-on time, kit storage, machine operating conditions and regulatory status in the table. We also made comments on testing requirements and additional details that were deemed relevant. We opted to use publicly available information, such as the instructions for use in the FDA website 12 , when possible; when that information was unavailable, we attempted to obtain the package insert by contacting the manufacturers or looking at public information published by hospitals, government entities and other third-party institutions using the platform. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; Multiple studies were excluded in our final screening as tests were conducted using spiked or synthetic samples. The study by Zheng et al 41 evaluating an unnamed lateral flow dipstick assay is one example. Another example is the work of Tanida et al 42 , evaluating the ARIES SARS-CoV-2 Assay; this study was excluded as the comparison was made against another POC assay (Xpert Xpress SARS-CoV-2) and used patient samples with defined copy numbers and synthetically spiked samples. As the case of most LAMP assays, the saliva LAMP assay proposed by Yamazaki et al 17 was not deemed to be feasible in a true point-of-care scenario as it requires RNA extraction, a heat block and apparently multiple pipetting steps, and therefore a reasonable level of expertise. The assay was also tested with preselected samples. The study by Yoshikawa et al 43 is another example of an excluded study using LAMP technology. The study conducted by Peto et al 44 regarding the LamPORE platform was not included in the analysis as the platform was not deemed to be POC. RNA needs to be extracted and primers have to be added and incubated with a thermocycler, followed by multiple manual steps. The trial also was conducted with preselected, frozen samples. For similar reasons, the study by Singha et al 45 using a glucose meter to detect SARS-CoV-2 was not included, as the test requires a centrifuge, a magnet, and incubation in water baths. This evaluation was also conducted with known positive samples only. Agulló et al evaluated nasopharyngeal, nasal only, and saliva samples against nasopharyngeal samples in the Cobas z 480 Analyzer (Roche) . Because of this division, sample size ended up being small and heterogeneous. The concordance for positive results was 57.3% for nasopharyngeal samples and as low as 23.1% in saliva; we included the results of the nasopharyngeal testing in our table. Studies conducted with frozen samples are available for this assay 46 , but as mentioned above, were not included in our review. Basso et al 32 tested antigen assays in both saliva and NPS in a mixed population (139 inpatients, 96 outpatients), providing individual figures of sensitivity and specificity for the NPS samples. Since the comparator gold-standard was also tested in saliva and NPS, the ultimate gold standard was unclear. In the case of antigen tests, the detailed number of individuals to give the figures for sensitivity and specificity were not provided in the study or in the supplementary material to the best of our knowledge. We decided to include this study in our table with a commentary pointing towards the fact that the number of individuals used to make the figures was an estimation made for practical purposes and was not provided in the original paper; therefore, it may reflect slightly different patient numbers. Hoehl et al 56 study using the RIDA® QUICK SARS-CoV-2 (R-Biopharm) was not included as a minimal number of samples were tested with a confirmation method, and thus false negative results could not be determined. The study by Mlcochova et al 57 was not included in our table because (1) the methodology used frozen pre-selected samples, (2) antibodies were evaluated together with NAAT tests, making the selected timeframe for analysis questionable (NAAT was used to test samples up to 28 days after symptoms), (3) the criteria used for the reference standard was not entirely clear and (4) the number of samples was small (n=45). For similar reasons, the study by Veyrenche 58 et al was not included in our list. The study by Micocci et al 59 regarding the POCKIT™ Central Nucleic Acid Analyzer was not included as the low number of positives prevented the study from being a proper diagnostic accuracy study; the objective was to evaluate the feasibility of POC testing in care homes in England. After discussion between authors, the study by Olearo et al 60 that evaluated 4 antigen tests was not included in the table for having openly deviated from the manufacturers recommended sample matrix/handling instructions and presenting a sensitivity between 49.4-54.9%%, which is on the lower side of what is expected for antigen tests. The methodology in the study by Hogan et al 61 evaluating the Mesa Accula assay (now the Thermo Scientific™ Accula™ SARS-CoV-2 Test after acquisition by Thermo Fisher) was not totally clear, since there is no mention of frozen samples or time to test after sample collection. However, it appears that samples were pre-selected (N=100) and tested in a laboratory after being tested by a centralised PCR assay. We therefore believe that this study, which showed a sensitivity of 68% and a specificity of 100%, is unlikely to accurately reflect results in the field. A different methodology was used by Rastawicki et al 62 evaluating the PCL COVID-19 Ag rapid fluorescent immunoassay (FIA); 4 swabs were collected in the course of 2 days and antibodies were also evaluated. We considered the comparison between RT-PCR and antigen test straightforward enough for the trial to be included in our table, despite the low number of patients enrolled. The study by Regev-Yochay 63 could not be included as multiple antigen platforms were classified as a whole, and individual data for individual assays was not available. The study by Smithgall et al 64 regarding the Cepheid Xpert Xpress and Abbott ID Now was not included as only remnant patient samples were tested; therefore, the platforms were not evaluated in a proper clinical environment. Loeffelholz et al 65 study evaluating the Xpert Xpress SARS-CoV-2 was excluded as all but one site tested the platform with remnant frozen samples. The study by Wolters et al 66 on the same platform was also excluded as it only evaluated diluted and processed sample panels in a laboratory setting. Jokela et al 15 has made two different evaluations of the Novodiag by Mobidiag where, by our understanding, an initial phase was a laboratory evaluation and the second phase was a prospective clinical trial. However, there was a major drop in prevalence in the second phase of the study, which only enabled collection of 5 positive samples in a population of 362 individuals. We included the second phase of this trial in our table despite the low number of positive samples. Moeren et al 67 study evaluating the BD Veritor antigen test had a mixed design, where 352 symptomatic adults were evaluated prospectively and known-positive individuals (n = 123) were added to the pool, visiting them at home within 72h of their RT-PCR positive result. Because the assay was tested in a true point-of-care fashion and this was necessary to obtain statistical significance, we decided to include this . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. The copyright holder for this this version posted September 22, 2021. ; https://doi.org/10.1101/2021.09.20.21263509 doi: medRxiv preprint study in our table. We also reported the specificity considering the 2 false-positive results obtained by the analyser, not considering the eye readings. POC tests have to be compared against a selected gold-standard in order to have their performance evaluated. This standard is PCR in the vast majority of the studies (as mentioned above, some studies used another POC platform as a comparison and were excluded from our analysis). It is important to understand that the performance numbers reflect values against a reference standard, which is not always necessarily better or more accurate than the object of study. Most studies used a third platform as a tiebreaker in this context and the results of this full analysis were considered whenever the authors identified this was the case. As mentioned previously, if a clear reference standard was used using antibodies and/or clinical and radiological evidence, this was also taken into consideration. Positive predictive value (PPV) and negative predictive value (NPV) are useful to understand how much a result can be trusted given the prevalence of a disease in a certain time. Assays are always trialed in a setting with an estimated prevalence in a given time, and for diagnostic assays, the own study results with the reference method provide an estimation of the prevalence in the tested population. PPV and NPV change accordingly depending on the prevalence of the setting. For instance, the study of an antigen test by Peña et al 68 reported a sensibility of 69.86%, a specificity of 99.61%, but a PPV of 94.44% and a NPV of 97.22% given the prevalence in that setting was 8.64%. In this work, we avoided using PPV and NPV projections whenever possible and aimed to report the provided 'sensitivity' and 'specificity' values, even though we recognize these values are intertwined. The decision to not use PPV and NPV projections was made because (1) in rapidly contagious diseases like SARS-CoV-2, an accurate real-time monitoring of prevalence parameters is difficult, in contrast to what is found for diseases with a clearly defined and predictable epidemiology; therefore the prevalence of SARS-CoV-2 infection in a given setting may change rapidly considering outbreaks, lockdowns, and new variants, (2) an accurate real time monitoring of SARS-CoV-2 regional prevalence is challenging even for developed countries, resulting in prevalence values that are often retrospective (3) an assay performance can be distorted using different prevalence levels and (4) an analysis including multiple PPV and NPV projections would make this study more speculative and less practical. We strongly suggest that the referenced trials are read in full for further information and clarification of performance as the term 'sensitivity' and 'specificity', as reported in this review, are always relative to prevalence in the particular setting of the study. We present the included studies in tables below. NAAT tests are grouped separately from antigen tests and have a column with testing requirements. While most manufacturers used viral RNA copies/ml in a dilution to assess platforms limit of detection, some manufacturers used plaque forming units (PFU) instead of viral RNA copies/ml. We presented the limit of detection in copies/ml if both information were available, but followed manufacturer's instructions for use. The limit of detection of the different assays was converted into a copies/ml format when possible (for instance, if this value was given in copies/uL). We also did not include a claimed limit of detection for antigen assays. Cue COVID-19 (Cue) . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint 32 99.1% (3420/3450) 97 99.5% (184/185) 88 99.61% (3146/3158) 90 99.8% (519/520) 33 99.8% (1220/1222) 93 99.9% (1000/1001) 100 99.9% (3738/3741) 95 is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; https://doi.org/10.1101/2021.09.20.21263509 doi: medRxiv preprint *numbers were calculated based on information on paper but not explicitly provided by authors. We identified 20 antigen platforms and 10 NAAT platforms with clinical trials that fit our defined criteria. A total of 30 platforms were covered by 51 studies, with some studies covering more than one platform. To the best of our knowledge, this is the first review to systematically evaluate POC test performance in real-life clinical practice. Considering the high heterogeneity of methods and outcomes between studies, and also the unbalanced number of studies per platform, we opted not to conduct a metaanalysis in this study. We have decided against providing an 'average' performance for platforms as this would likely be misleading and would potentially downplay the method discrepancies in the trials. NAAT platforms, on average, take longer to provide results and require more equipment for test conduction compared to antigen tests. However, their results have shown to be more reliable in clinical practice. Applying selection criteria specifically targeted at prospective studies, we noticed important differences between performance reported by manufacturers and laboratory evaluations and performance in real-life conditions. While this is true for both NAAT platforms and antigen assays, the discrepancies were more extreme in the antigen group. Healthcare facilities, individuals and test providers must be aware of the real-life performance of the platforms before deciding on their implementation. We hope this systematic review can help making informed decisions regarding SARS-CoV-2 testing. The accuracy of diagnostic tests is affected by numerous factors, including days since symptom onset, individual viral load, quality of sample collection, site of sample collection (nasopharyngeal, nasal only, pharyngeal only, saliva only) and test modality (nucleic acid amplification x antigen). As previously mentioned, all studies included were compared to PCR assays, considered the gold standard for SARS-CoV-2 diagnosis. Our results suggest a strong tendency of antigen tests to be less accurate than NAAT tests in real-life clinical trials. This finding is aligned with findings of other reviews 21 . . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; We also noticed differences in prevalence within the same test. For instance, the sensitivity of the antigen assay STANDARD Q COVID-19 Ag Test (SD Biosensor) varied between 45%-92%, and the sensitivity of the antigen test PANBIO (Abbott) varied between 38-90%. The explanations for the phenomenon of high variability in results, particularly in sensitivity, are likely multifactorial, including differences in SARS-CoV-2 prevalence between studies and differences in sample size and methodology. One such factor may be testing transporting and storage, as most antigen tests need to be stored between 5-30C o . Haage et al 129 assessed eleven antigen tests and identified reductions of up to 10 fold in sensitivity for 46% of the assays after 10 minutes outside the ideal temperature range; this number grew to 73% if the exposure lasted three weeks. Pollock et al 87 had similar findings evaluating the BinaxNOW test. This finding is significant and may partially explain false negative results, considering that many regions have oscillations in temperature outside the target range and some factors like stock storage and transportation are beyond the end-user control. Point-mutations generating changes in the SARS-CoV-2 nucleocapsid protein structures can also play a role; Bourassa et al uncovered a 1000-fold loss in sensitivity for the Sofia Antigen test (Quidel) which was associated with the D399N mutation 130 . It is also important to point out that the high variability range in results may reflect a publishing bias, as these two platforms were the only ones with a relevant number of published studies. Additionally, we found some evidence that the sensitivity of antigen tests increases if they are used within the first days of symptoms, but this is still significantly inferior to the average performance of NAAT tests. For instance, Bulilete at el reported that the sensitivity of the Panbio assay improved from 71.4% to 77.2% if the test was conducted in the first 5 days of symptoms 93 . However, a division between asymptomatic and symptomatic individuals has a questionable value because a significant portion of the asymptomatic individuals are in fact pre-symptomatic and will develop symptoms in the future, but may already be in the shredding phase; this becomes even more important if the individual has had a high-risk contact. On top of that, the definition of being symptomatic is subjective and depends of factors such threshold of perception and the memory of the tested individual, which is not always reliable in settings such as care-homes and acute hospitalar settings and for population such as children and cognitively impaired individuals. Additionally, timely interventions such as the use of dexamethasone in patients requiring respiratory support depends on confirmation of SARS-CoV-2 presence 19 ; this is usually a late clinical presentation and it is reasonable to expect a portion of patients to present late to services. Some studies also showcased the implications of using tests with suboptimal specificity in settings of low prevalence. Hoehl et al 56 used an antigen test for the self-testing of teachers at home, with the goal to prevent clusters of infections; out of a population of 602 individuals, 5 were confirmed positive but 16 false positive results were recorded. The same concern was voiced by Kriemler et al 131 when using antigen tests to assess the point-prevalence of acute SARS-CoV-2 infections in school children. In a study by Colavita et al, of 73,634 individuals in international airports, 1176 were reported antigen positive but only 34.3% of the were actually positive after NAAT test confirmation. Regarding kit storage, most platforms will require the use of refrigerated facilities given the average upper storage limit was 30 o C. Some of them deserve mention for requiring strict temperature control, particularly the Cobas Liat Time to results was highly variable between NAAT platforms, ranging from ~13 minutes (ID Now) to ~90 minutes (SAMBA-II). The time to results of antigen platforms was usually below 30 minutes. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; Hands-on time, used in this context as the time needed to prepare test conduction (prepare samples, load machine, configure test conduction) was usually around 1 to 2 minutes, and rarely over 5 minutes across all platforms. Some platforms had little information available about them, despite our best efforts to obtain package inserts or instructions for use. This was especially true for antigen tests. These platforms include the NADAL® COVID-19 (Nal vonminden), PCL COVID-19 Ag rapid (PCL), QuickNavi™-COVID19 (Denka), 2019-CoV Ag Fluorescence Rapid Test Kit (Bioeasy), SGTI-flex COVID-19 Ag (Sugentech) and mariPOC SARS-CoV-2 (ArcDia). We have therefore included information to the best of our knowledge and signalized the information we could not obtain by writing "not available" in the table. We encourage readers to read the original studies used as the basis to our table. We also encourage the reading of other reviews of point-of-care assays targeting SARS-CoV-2 for a clearer picture of the field. Assays other than NAAT and antigen tests have also been used. We found a few studies using the FebriDx device (Lumos diagnostics), which captures Myxovirus resistance protein A (MxA -a marker of interferon-induced antiviral host response) and C reactive protein (a well-known and widely used inflammatory marker in medical practice). In one study, the platform had a sensitivity of 93% and a specificity of 86% 132 (with an estimated prevalence of 48% in the studied population). There are other studies available regarding this assay 133,134,135 but a comprehensive analysis of this platform escapes the purpose of our review. As the markers are commonly elevated for a range of pathogens, the test has a low specificity and has limited use in settings with low prevalence. Few platforms had a satisfactory number of clinical studies, and in many situations the number of individuals enrolled was suboptimal. Further research and reviews of this topic are encouraged. One of the main limitations of this review is the selection criteria. Considering the high heterogeneity of methods and outcomes between studies, finding a clear-cut unified exclusion criteria was not possible. We debated between authors when in doubt, but a level of subjectivity was inevitable. For the same reasons of heterogeneity, we opted not to conduct a meta-analysis in this study. Authors have decided against providing an 'average' performance for platforms as this would likely be misleading and would potentially downplay the method discrepancies in the trials. This review was registered in the International prospective register of systematic reviews (PROSPERO) with registration number CRD42021260694. A protocol for this study can be assessed online. GHH and AH are employed by Diagnostics of the Real World, who has a molecular assay that was mentioned in this systematic review. . CC-BY-NC 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) preprint The copyright holder for this this version posted September 22, 2021. ; SARS-CoV-2 diagnostic pipeline Point-of-Care Testing for Infectious Diseases: Past, Present, and Future Accuracy of point-of-care glucose measurements Point-of-Care Versus Central Laboratory Measurements of Hemoglobin, Hematocrit, Glucose, Bicarbonate and Electrolytes: A Prospective Observational Study in Critically Ill Patients Clinical evaluation of ID NOW influenza A & B 2, a rapid influenza virus detection kit using isothermal nucleic acid amplification technology -A comparison with currently available tests SAMBA HIV Semiquantitative Test, a New Point-of-Care Viral-Load-Monitoring Assay for Resource-Limited Settings A 30-Min Nucleic Acid Amplification Point-of-Care Test for Genital Chlamydia trachomatis Infection in Women: A Prospective COVID-19) Update: FDA Informs Public About Possible Accuracy Concerns with Abbott ID NOW Point-of-Care Test Trump administration bars FDA from regulating some laboratory tests, including for coronavirus Removal Lists of Tests that Should No Longer Be Used and/or Distributed for COVID-19: FAQs on Testing for SARS-CoV-2 Vitro Diagnostics EUAs -Molecular Diagnostic Tests for SARS-CoV-2. FDA Comparative study of four SARS-CoV-2 Nucleic Acid Amplification Test (NAAT) platforms demonstrates that ID NOW performance is impaired substantially by patient and specimen type Comparison of Abbott ID Now and Abbott m2000 Methods for the Detection of SARS-CoV-2 from Nasopharyngeal and Nasal Swabs from Symptomatic Patients SARS-CoV-2 sample-to-answer nucleic acid testing in a tertiary care emergency department: evaluation and utility Ct values from SARS-CoV-2 diagnostic PCR assays should not be used as direct estimates of viral load Development of a point-of-care test to detect SARS-CoV-2 from saliva which combines a simple RNA extraction method with colorimetric reverse transcription loop-mediated isothermal amplification detection Point of Care Nucleic Acid Testing for SARS-CoV-2 in Hospitalized Patients: A Clinical Validation Trial and Implementation Study Dexamethasone in Hospitalized Patients with Covid-19 Interleukin-6 Receptor Antagonists in Critically Ill Patients with Covid-19 Point-of-care testing for the detection of SARS-CoV-2: a systematic review and meta-analysis Real-world clinical performance of commercial SARS-CoV-2 rapid antigen tests in suspected COVID-19: A systematic meta-analysis of available data as per Comparison of seven commercial SARS-CoV-2 rapid point-of-care antigen tests: a single-centre laboratory evaluation study Urgent need of rapid tests for SARS CoV-2 antigen detection: Evaluation of the SD-Biosensor antigen test for SARS-CoV-2 High performance of a novel antigen detection test on nasopharyngeal specimens for SARS-CoV-2 infection diagnosis: a prospective study High performance of a novel antigen detection test on nasopharyngeal specimens for SARS-CoV-2 infection diagnosis: a prospective study Evaluating the use of posterior oropharyngeal saliva in a point-of-care assay for the detection of SARS-CoV-2 Saliva for use with a point of care assay for the rapid diagnosis of COVID-19 Salivary SARS-CoV-2 antigen rapid detection: A prospective cohort study Evaluation of the rapid antigen test Panbio COVID-19 in saliva and nasal swabs in a population-based point-of-care study SARS-CoV-2 patient self-testing with an antigen-detecting rapid test: a headto-head comparison with professional testing Head-to-head comparison of SARS-CoV-2 antigen-detecting rapid test with self-collected nasal swab versus professional-collected nasopharyngeal swab Performance of Abbott ID Now COVID-19 Rapid Nucleic Acid Amplification Test Using Nasopharyngeal Swabs Transported in Viral Transport Media and Dry Nasal Swabs in a New York City Academic Institution Contribution of VitaPCR SARS-CoV-2 to the emergency diagnosis of COVID Think of the Children: Evaluation of SARS-CoV-2 Rapid Antigen Test in Pediatric Population Evaluation of two rapid antigen tests to detect SARS-CoV-2 in a hospital setting Differences in detected viral loads guide use of SARS-CoV-2 antigendetection assays towards symptomatic college students and children Reverse Transcription Recombinase-Aided Amplification Assay With Lateral Flow Dipstick Assay for Rapid Detection of Evaluation of the automated cartridge-based ARIES SARS-CoV-2 Assay (RUO) against automated Cepheid Xpert Xpress SARS-CoV-2 PCR as gold standard Development and evaluation of a rapid and simple diagnostic assay for COVID-19 based on loop-mediated isothermal amplification Diagnosis of SARS-CoV-2 Infection with LamPORE, a High-Throughput Platform Combining Loop-Mediated Isothermal Amplification and Nanopore Sequencing Hitting the diagnostic sweet spot: Point-of-care SARS-CoV-2 salivary antigen testing with an off-the-shelf glucometer Analytical and clinical performance of the panbio COVID-19 antigen-detecting rapid diagnostic test An optimized stepwise algorithm combining rapid antigen and RT-qPCR for screening of COVID-19 patients Evaluation of three rapid lateral flow antigen detection tests for the diagnosis of SARS-CoV-2 infection Evaluation of a SARS-CoV-2 rapid antigen test: Potential to help reduce community spread? Analytical performances of the point-of-care SIENNA TM COVID-19 Antigen Rapid Test for the detection of SARS-CoV-2 nucleocapsid protein in nasopharyngeal swabs: A prospective evaluation during the COVID-19 second wave in France Evaluation of the COVID19 ID NOW EUA assay Performance Evaluation of the SAMBA II SARS-CoV-2 Test for Point-of-Care Detection of SARS-CoV-2 Comparison of a Point-of-Care Assay and a High-Complexity Assay for Detection of SARS-CoV-2 RNA Clinical Evaluation of BD Veritor SARS-CoV-2 Point-of-Care Test Performance Compared to PCR-Based Testing and versus the Sofia 2 SARS Antigen Point-of-Care Test Evaluation of the diagnostic accuracy of a new point-of-care rapid test for SARS-CoV-2 virus detection At-home self-testing of teachers with a SARS-CoV-2 rapid antigen test to reduce potential transmissions in schools: Results of the SAFE School Hesse Study Combined Point-of-Care Nucleic Acid and Antibody Testing for SARS-CoV-2 following Emergence of D614G Spike Variant Diagnosis value of SARS-CoV-2 antigen/antibody combined testing using rapid diagnostic tests at hospital admission Is Point-of-Care testing feasible and safe in care homes in England? An exploratory usability and accuracy evaluation of a point-of-care polymerase chain reaction test for SARS-COV-2. medRxiv 2020.11 Handling and accuracy of four rapid antigen tests for the diagnosis of SARS-CoV-2 compared to RT-qPCR Comparison of the Accula SARS-CoV-2 Test with a Laboratory-Developed Assay for Detection of SARS-CoV-2 RNA in Clinical Nasopharyngeal Specimens Real World Performance of SARS-CoV-2 Antigen Rapid Diagnostic Tests in Various Clinical Settings Comparison of Cepheid Xpert Xpress and Abbott ID Now to Roche cobas for the Rapid Detection of SARS-CoV-2 Multicenter Evaluation of the Cepheid Xpert Xpress SARS-CoV-2 Test Multi-center evaluation of cepheid xpert® xpress SARS-CoV-2 point-of-care test during the SARS-CoV-2 pandemic PERFORMANCE EVALUATION OF A SARS-COV-2 RAPID ANTIGENTEST: TEST PERFORMANCE IN THE COMMUNITY IN THE NETHERLANDS Performance of SARS-CoV-2 rapid antigen test compared with real-time RT-PCR in asymptomatic individuals Clinical Performance of the Point-of-Care cobas Liat for Detection of SARS-CoV-2 in 20 Minutes: a Multicenter Study Assessing a novel, lab-free, point-of-care test for SARS-CoV-2 (CovidNudge): a diagnostic accuracy study Evaluation of the Cue Health point-of-care COVID-19 (SARS-CoV-2 nucleic acid amplification) test at a community drive through collection center Clinical Evaluation and Utilization of Multiple Molecular In Vitro Diagnostic Assays for the Detection of SARS-CoV-2 Instructions for Use for the Novodiag® System Mobidiag -Molecular diagnostics of coronavirus infection Potential for False Results with Roche Molecular Systems, Inc. cobas SARS-CoV-2 & Influenza Test for use on cobas Liat System-Letter to Clinical Laboratory Staff, Point-of-Care Facility Staff, and Health Care Providers Coronavirus COVID-19 serology and viral detection tests: technical validation reports Evaluation of the QIAstat-Dx Respiratory SARS-CoV-2 Panel, the First Rapid Multiplex PCR Commercial Assay for SARS-CoV-2 Detection Detecting SARS-CoV-2 at point of care: preliminary data comparing loopmediated isothermal amplification (LAMP) to polymerase chain reaction (PCR) Performances of the VitaPCR TM SARS-CoV Diagnostics for the Real World. SAMBA II SARS-CoV-2 Test Instructions for use Xpert® Xpress SARS-CoV-2 has received FDA Emergency Use Authorization Antigen rapid tests, nasopharyngeal PCR and saliva PCR to detect SARS-CoV-2: a prospective comparative clinical trial Performance and Implementation Evaluation of the Abbott BinaxNOW Rapid Antigen Test in a High-Throughput Drive-Through Community Testing Site in Massachusetts Performance characteristics of five antigen-detecting rapid diagnostic test (Ag-RDT) for SARS-CoV-2 asymptomatic infection: a head-to-head benchmark comparison Evaluation of a rapid antigen test (Panbio TM COVID-19 Ag rapid test device) for SARS-CoV-2 detection in asymptomatic close contacts of COVID-19 patients Clinical performance evaluation of SARS-CoV-2 rapid antigen testing in point of care usage in comparison to RT-qPCR Nasopharyngeal Panbio COVID-19 Antigen Performed at Point-of-Care Has a High Sensitivity in Symptomatic and Asymptomatic Patients With Higher Risk for Transmission and Older Age The sensitivity of SARS-CoV-2 antigen tests in the view of large-scale testing Evaluation of the Panbio TM rapid antigen test for SARS-CoV-2 in primary health care centers and test sites Panbio antigen rapid test is reliable to diagnose SARS-CoV-2 infection in the first 7 days after the onset of symptoms Diagnostic performance of a SARS-CoV-2 rapid antigen test in a large, Norwegian cohort Field evaluation of a rapid antigen test (Panbio TM COVID-19 Ag Rapid Test Device) for COVID-19 diagnosis in primary healthcare centres Comparison of SARS-COV-2 nasal antigen test to nasopharyngeal RT-PCR in mildly symptomatic patients Evaluation of the Panbio TM COVID-19 Ag Rapid Test at an Emergency Room in a Hospital in São Paulo Diagnostic accuracy of two commercial SARS-CoV-2 Antigen-detecting rapid tests at the point of care in community-based testing centers Evaluation of the accuracy and ease-of-use of Abbott PanBio -A WHO emergency use listed, rapid, antigen-detecting point-of-care diagnostic test for SARS-CoV-2. medRxiv 2020.11 Clinical performance of the Abbott Panbio with nasopharyngeal, throat, and saliva swabs among symptomatic individuals with COVID-19 Multicenter evaluation of the Panbio TM COVID-19 rapid antigen-detection test for the diagnosis of SARS-CoV-2 infection Clinical Validation of Automated and Rapid mariPOC SARS-CoV-2 Antigen Test TM COVID-19 Ag Rapid Test Device COVID-19 tests -mariPOC Evaluation of the accuracy, ease of use and limit of detection of novel, rapid, antigen-detecting point-of-care diagnostics for SARS On-site rapid molecular testing, mobile sampling teams and eHealth to support primary care physicians during the COVID-19 pandemic Evaluation of a Rapid Diagnostic Assay for Detection of SARS-CoV-2 Antigen in Nasopharyngeal Swabs The evaluation of a newly developed antigen test (QuickNavi TM -COVID19 Ag) for SARS-CoV-2: A prospective observational study in Japan Diagnostic accuracy and utility of SARS-CoV-2 antigen lateral flow assays in medical admissions with possible COVID-19 SARS-CoV-2 Antigen Rapid Qualitative Test Instructions for Use Evaluation of accuracy, exclusivity, limit-of-detection and ease-of-use of LumiraDx TM -Antigen-detecting point-of-care device for SARS-CoV-2. medRxiv A Rapid, High-Sensitivity SARS-CoV-2 Nucleocapsid Immunoassay to Aid Diagnosis of Acute COVID-19 at the Point of Care: A Clinical Performance Study SARS-CoV-2 Antigen Rapid Test Kit (Colloidal Gold Immunochromatography) MEDsan SARS-CoV-2 Antigen Rapid Test PCL Covid19 Ag Gold Saliva Medical Technology Recalls SARS-CoV-2 Antigen Rapid Test Kit and Leccurate SARS-CoV-2 Antibody Rapid Test Kit (Colloidal Gold Immunochromatography) due to Risk of False Results Clinical Evaluation of Roche SD Biosensor Rapid Antigen Test for SARS-CoV-2 in Municipal Health Service Testing Site Point-of-care evaluation of a rapid antigen test (CLINITESTⓇ Rapid Antigen Test) for diagnosis of SARS-CoV-2 infection in symptomatic and asymptomatic individuals The Value of Rapid Antigen Tests to Identify Carriers of Viable SARS-CoV-2. medRxiv CoV 2 Rapid Antigen Test x 25 (For COVID-19) Inc. STANDARD Q COVID-19 Ag Test Sugentech Impaired performance of SARS-CoV-2 antigen-detecting rapid tests at elevated temperatures A SARS-CoV-2 Nucleocapsid Variant that Affects Antigen Test Performance Surveillance of Acute SARS-CoV-2 Infections in School Children and Point-Prevalence During a Time of High Community Transmission in Switzerland Diagnostic accuracy of a host response point-of-care test in patients with suspected Use of the FebriDx point-of-care assay as part of a triage algorithm for medical admissions with possible Horses for courses? Assessing the potential value of a surrogate, point-of-care test for SARS-CoV-2 epidemic control Utility of the FebriDx point-of-care test for rapid triage and identification of possible coronavirus disease 2019 (COVID-19)