key: cord-0847588-r9juogyo authors: Aggarwal, Nishant; Garg, Mohil; Dwarakanathan, Vignesh; Gautam, Nitesh; Kumar, Swasthi S; Jadon, Ranveer Singh; Gupta, Mohak; Ray, Animesh title: Diagnostic accuracy of non-contact infrared thermometers and thermal scanners: A systematic review and meta-analysis date: 2020-10-10 journal: J Travel Med DOI: 10.1093/jtm/taaa193 sha: 262f8cb4c6371d81ae03e065cbab422801f83a4b doc_id: 847588 cord_uid: r9juogyo Infrared thermal screening, via the use of handheld non-contact infrared thermometers (NCITs) and thermal scanners, has been widely implemented all over the world. We performed a systematic review and meta-analysis to investigate its diagnostic accuracy for the detection of fever. We searched PubMed, Embase, the Cochrane Library, medRxiv, bioRxiv, ClinicalTrials.gov, COVID-19 Open Research Dataset, COVID-19 research database, Epistemonikos, EPPI-Centre, World Health Organization International Clinical Trials Registry Platform, Scopus and Web of Science databases for studies where a non-contact infrared device was used to detect fever against a reference standard of conventional thermometers. Forest plots and Hierarchical Summary Receiver Operating Characteristics curves were used to describe the pooled summary estimates of sensitivity, specificity and diagnostic odds ratio. From a total of 1063 results, 30 studies were included in the qualitative synthesis, of which 19 were included in the meta-analysis. The pooled sensitivity and specificity were 0.808 (95%CI 0.656-0.903) and 0.920 (95%CI 0.769-0.975), respectively, for the NCITs (using forehead as the site of measurement), and 0.818 (95%CI 0.758-0.866) and 0.923 (95%CI 0.823-0.969), respectively, for thermal scanners. The sensitivity of NCITs increased on use of rectal temperature as the reference. The sensitivity of thermal scanners decreased in a disease outbreak/pandemic setting. Changes approaching statistical significance were also observed on the exclusion of neonates from the analysis. Thermal screening had a low positive predictive value, especially at the initial stage of an outbreak, while the negative predictive value (NPV) continued to be high even at later stages. Thermal screening has reasonable diagnostic accuracy in the detection of fever, although it may vary with changes in subject characteristics, setting, index test, and the reference standard used. Thermal screening has a good NPV even during a pandemic. The policymakers must take into consideration the factors surrounding the screening strategy while forming ad-hoc guidelines. although it may vary with changes in subject characteristics, setting, index test, and the reference standard used. Thermal screening has a good NPV even during a pandemic. The policymakers must take into consideration the factors surrounding the screening strategy while forming ad-hoc guidelines. Keywords: COVID-19, Fever, Infection control, Infrared rays, Mass screening, Pandemics, The emergence of the SARS virus in 2003 pushed several nations to adopt border control measures. Thermal screening -via the use of thermal scanners (infrared thermal imaging systems) as well as handheld non-contact infrared thermometers (NCITs) -is deemed as the safest tool for screening of temperature during infectious disease outbreaks such as SARS 1 , H1N1 2,3 and presently, COVID-19 4, 5 . It works on the principle that the human body emits infrared radiation which, like other electromagnetic radiations, can be focused onto a detector that converts heat into electrical signals and displays the temperature of the area as a graphic profile (thermal scanners) or a numerical reading (NCITs) 6 . In the wake of COVID-19, thermal screening has been widely implemented all over the world. These sites include entry and/or exit screening at domestic and international airports 7 , defense establishments 8 , offices/workplaces, grocery stores, shopping malls, and hotels 9 . Screening for fever with non-contact infrared devices is operationally more favorable, especially in the setting of contagious diseases, over conventional methods of measuring temperature in which the instrument comes in contact with the human body. Potential advantages of using handheld NCITs include reduced discomfort to the subject as well as faster readings 4, 10 . Infrared tympanic thermometers, a popular method of contact thermometry, require ear pinna to be pulled manually which may increase the risk of cross-infection, and the use of disposable plastic covers which may increase the financial burden during a disease outbreak 11 . Thermal scanners do not require close proximity to the subject (in contrast to NCITs and contact thermometers) and hence, the operator may be in a remote area to minimize the risk of transmission 5 . The efficacy of thermal screening during a pandemic would depend on several factors including, but not limited to: (a) the diagnostic accuracy of the devices for the detection of fever, and (b) the prevalence of fever in the disease infected individuals. We aimed to conduct a systematic review and meta-analysis to estimate the diagnostic accuracy of NCITs and thermal scanners for the detection of fever. This systematic review was based on the methodological approaches recommended by the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy 12 . This review complies with the Preferred Reporting Items for Systematic Reviews and Meta-analyses of Diagnostic Test Accuracy Studies, the PRISMA-DTA statement 13 , and PRISMA-DTA checklist (eMethods 1 in the Supplement). We searched the relevant databases for eligible articles without time restriction until May 29, 2020. Our search strategy is provided in eMethods 2 in the supplement. The detailed inclusion and exclusion criteria used in the study are mentioned in eMethods 3 in the Supplement. Two reviewers (NA and MGa) independently screened the articles on the basis of title and abstract to assess for potential inclusion in our study. Following this, full-text versions of articles were accessed and further screened for inclusion. If a clear consensus for a particular study was not reached, the differences were resolved by a collective discussion that included a third reviewer (AR). From the included studies, data extraction was carried out by two independent reviewers (NA and MGa). Extracted fields included study characteristics (first author name, year of publication, study setting), subject characteristics, index test characteristics (manufacturer, anatomical site), reference test characteristics (method, temperature threshold), and the indices of diagnostic test accuracy. The quality of included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool (eMethods 4 in the Supplement). This was done independently by two reviewers (NA and MGa). All disagreements were resolved by consensus in consultation with a third reviewer (AR). Data for 2x2 A sensitivity analysis was also conducted to investigate the possible influence of neonates (excluding the studies which involved neonates or did not mention age distribution of the sample), the threshold of fever (analysis of studies with fever threshold of <38°C vs ≥38°C by the reference device), type of reference standard (comparison of studies according to different methods for core temperature measurement), disease outbreak (limiting the analysis to studies conducted during a disease outbreak or pandemic), and study setting (comparison of studies conducted in an 'inpatient' vs 'outpatient or airport' setting). To calculate the statistical significance of the difference between two pooled sensitivities or specificities, we calculated the combined standard error of pooled estimate, followed by the Z statistic 17 . Using this value, the p-value for the difference was calculated. A p-value of <0.05 was considered to denote statistical significance. We also recorded the positive and negative predictive values (PPV and NPV) in individual studies which were however found to be variable due, in part, to the varying prevalence of fever in different studies. Therefore, an analysis was performed to determine the PPV (the probability of test positives being true positives) and NPV (the probability of test negatives being true negatives) values from the pooled sensitivity and specificity data obtained from our quantitative synthesis. These values were calculated across a wide range of expected fever prevalence during a pandemic (from 0.00001% to 10%) and plotted in a graph using the GraphPad Prism 8 software. Using our search criteria, we identified a total of 1063 studies, of which 700 were found to be from PubMed, 321 from Embase, 29 from the Cochrane library, 1 from medRxiv, and 12 from screening the reference lists of included articles and relevant review articles. Our literature search flow diagram is summarized in the PRISMA format ( Figure 1 ). A total of 30 studies were included in the qualitative synthesis, of which 19 were included in the quantitative synthesis. when the patients admitted to the hospital wards or the emergency department (ED), were included; while the setting was considered 'outpatient/airport' (n=12) [1] [2] [3] 18, 20, 26, [29] [30] [31] 41 when the subjects presented to outpatient centers, clinics, emergency triage (but not admitted to the ED) or were healthy volunteers from a clinic or airport setting. The study by Hamilton et al., where >70% of subjects were clinic attendees and healthy volunteers, was also considered as an outpatient/airport setting 18 . Two of the studies did not mention sufficient information for study setting and were considered under the 'unclassified' setting (n=2) 6, 37 In the studies reporting on thermography, the prevalence of fever ranged from 0.5% to 51.9%. The sensitivity and specificity of the included NCITs ranged from 0.182 to 0.970 and 0.599 to 1, respectively. In the case of thermal scanners, the sensitivity and specificity ranged from 0.148 to 0.929 and 0.310 to 0.997, respectively. Of the 30 studies from qualitative synthesis, 11 studies were not included in the metaanalysis, due to various reasons ( Figure 1 ): 2x2 data was unavailable 28, 33, 34 or inconsistent 11, 19, [23] [24] [25] 27 or the study characteristics for risk of bias were unavailable 22, 42 . Results of quality assessment of the included studies (n=19) are summarized as eTable 2 in the Supplement. Overall, 12 of the 19 included studies had a high risk of bias in at least one of the four domains of the QUADAS-2 tool, while 3 studies 18,20,30 had a high risk of bias in two domains. Overall, 10 NCIT devices were included in our analysis, which involved a total of 5562 readings. The pooled sensitivity and specificity for NCITs, regardless of the site of temperature overall accuracy of 0.92 (95%CI 0.90-0.94). No publication bias was seen on Deeks' funnel plot asymmetry test (p=0.67, Figure 2C ). Amongst thermal scanners, 11 devices were included, which involved a total of 8312 readings. The pooled sensitivity and specificity of the devices was obtained to be 0. Figure 3C ). As disease spreads in a community, the proportion of infected individuals, and with it, the prevalence of symptoms (fever in the present study) is expected to rise. PPV and NPV for the detection of fever will depend on the prevalence of fever in the community. In our analysis, we observed that PPV rises with an increase in the prevalence of fever for both NCITs and thermal scanners as shown in Figure 4 . At an arbitrary prevalence of 1%, the PPV for detection of fever was 9.2% for NCITs and 9.7% for thermal scanners. This means that out of every 10 patients detected febrile by thermal screening, ~one actually turned out to be febrile. Interestingly, in contrast to PPV, there was only a comparatively smaller fall in the values of NPV-2.3% (from ~100% to 97.7%) for NCITs and 2.1% (from ~100% to 97.9%)-even as the prevalence of fever increased 10 5 fold ( Figure 4 ). This would mean that, even at a fever prevalence of 10% during a pandemic, a patient who is detected to be afebrile by thermal screening has over a 97% probability of being truly afebrile by the reference method. Wide heterogeneity was observed as demonstrated by visual inspection of the 95% prediction region of the HSROC curves ( Figure 2B and 3B). The Spearman correlation coefficient was -0.56 (p=0.09) for NCITs and 0.25 (p=0.45) for thermal scanners, indicating the absence of a threshold effect. Further subgroup analysis was conducted to look for the likely sources of heterogeneity. The results of the sensitivity analysis are depicted in Figures ≥38°C. There were no changes in sensitivity or specificity with the exclusion of studies with a high risk of bias in ≥2 domains. The specificity of NCITs was not found to change in an outpatient/airport setting as compared to an inpatient setting (0.81 vs 0.95, p=0.10). There were only two studies where NCITs were used during a pandemic 21, 29 , due to which a subgroup analysis could not be performed. On the exclusion of studies with neonates (and where the age distribution was not (a) tympanic temperature only, (b) tympanic or oral, and (c) tympanic or axillary temperature. No differences were observed in the pooled summary estimates between these three groups. There were no differences in the pooled sensitivity or specificity on comparison of devices with a fever threshold of <38°C vs ≥38°C or on the exclusion of studies with a higher risk of bias in ≥2 domains. The sensitivity of thermal scanners was found to fall with their use in a pandemic setting (0.74 vs 0.82; p=0.04). On limiting the analysis to studies from an outpatient or airport setting (i.e. exclusion of studies from the inpatient setting 32 and where the study setting was not reported), there were no changes observed in the pooled summary measures. The results of this review suggest that non-contact infrared thermometers (NCITs) and thermal scanners generally have reasonable sensitivity and specificity for the diagnosis of fever. An increase in the specificity of NCITs was noted when rectal temperature was used as the reference test. The sensitivity of thermal scanners decreased with the use of the devices during a disease outbreak/ pandemic setting. On the exclusion of neonates from the analysis, differences approaching statistical significance were observed in the sensitivity of NCITs and the specificity of both NCITs and thermal scanners. In the case of both thermal screening devices, there were no changes in the pooled sensitivity or specificity with the exclusion of studies at a high risk of bias or with the comparison of studies with different thresholds for fever. Thermal screening was found to have a low PPV, especially in the initial phase of a disease outbreak in a given community. In contrast, the NPV was seen to be reasonably high even in case of a relatively large proportion of the population being febrile. Wide heterogeneity was observed in the studies included in our review, in terms of the participant characteristics, the study design and setting, the index tests, and the reference standards used. The demographic details regarding the study participants were not available in some of our included studies. There was non-uniformity in the reference standard used for the confirmation of fever. In addition, differences in the type of index test used (NCITs/thermal scanners), the manufacturer specifications, the environmental conditions for optimal operation and the experience of the operator can lead to inaccuracies in the measurement of temperature and a further increase in heterogeneity. Several factors can influence the detection of fever by infrared thermal devices 6, 43 . Environmental factors such as absolute temperature, variation in the temperature, relative The target body site for the measurement may be subject to differential vascularity leading to variation in heat distribution. Forehead is a more feasible site for scanning but is thought to be more prone to physiological and environmental variations. On the other hand, sites such as external auricular area and inner eye canthi 6, 45 are less subject to variations but are not as accessible and the removal of eyewear, scarves, etc. may lengthen the preparation time for the subject 43 . Wrist temperature may be useful since rolling up the sleeves may not lengthen the preparation time significantly 29 . In our study, we found no significant changes in the pooled sensitivity or specificity when the analysis was restricted to the forehead as the site. Disease outbreaks, such as the COVID-19, necessitate the use of a screening device wherein the sensitivity of the device plays a vital role, as false negatives should be minimized at all costs. In a pandemic setting, the sensitivity of thermography decreased significantly in our analysis. This may be linked to the use of thermal scanners for mass screening 1,3 , contrary to the recommendations by the FDA, which state that only one person's temperature should be measured at a time 5 . Any face obstructions such as masks, glasses, headbands or scarves must also be removed prior to screening with a thermal scanner; this may be challenging to enforce in a pandemic situation. Incidentally, the FDA recommends confirmation of a positive result on thermal scanner with a secondary method of evaluation, such as an NCIT or a contact thermometer 46 . On the exclusion of neonates from the analysis, differences approaching statistical significance were observed in the sensitivity of NCITs and the specificity of both NCITs and thermal scanners. Several factors, unique to neonates, may hamper the detection of fever by infrared devices as well as reference tests. Neonates are more prone to temperature instability from ambient temperature changes due to a higher evaporative heat loss, higher metabolic rates and inability to make behavioral adaptations 47 . Discomfort to the baby during handling may affect the rectal, oral and axillary measurements, as well as make it challenging to achieve an optimal viewing angle for the use of infrared devices. Additionally, infants have brown fat located in their axillary pockets, which takes part in non-shivering thermogenesis, and hence, may affect the axillary temperature measurements 47, 48 . In our analysis, we found that thermal screening had a high NPV for fever but there was considerable variation in PPV with change in fever prevalence. On assuming a fever prevalence of 1%, the NPV obtained in our study (99.8%, both for NCITs and thermal scanners) agrees well with the results obtained by Bitar et al. (>99%) 43 . But, it is generally in the early stages of a pandemic (prevalence of fever<1%) that thermal screening is used as a means of delaying the introduction of infection in the given community 43 . At these initial stages, we found thermal screening to have a poor PPV, meaning that most of the subjects deemed to be febrile on screening would turn out to be afebrile (false positives), which may also evoke undue anxiety and anguish amongst these individuals 43 61 . Therefore, temperature screening alone does not appear to be an effective way to detect cases and to help curb the international spread of COVID-19. Despite the psychological reassurance provided by thermal screening, public health officials and policymakers must take into consideration the quality of scientific evidence that drives such measures and the guidelines must reflect a wholesome approach to the prevention of community transmission. A recent study suggested that the best strategy to reopen travel restrictions is the administration of COVID-19 test to all incoming travelers followed by isolation of test positives 62 . While it is important to rule out more common infections like COVID-19, other imported infections must also be taken into consideration in the workup of febrile travelers 63 . This study had a few limitations. First, there was high heterogeneity across the studies, which persisted even on subgroup analysis. Second, in our overall analysis including all NCITs and thermal scanners, only the single best sets of values (with the highest Youden's index) for each of the 21 devices were considered. Hence, our estimates of pooled sensitivity and specificity may reflect the best-case parameters of diagnostic accuracy for the included devices, which may be higher than in the case where the other sets of 2x2 data values are considered. Third, there were several included studies where the index test temperature threshold for fever was not pre-specified but obtained retrospectively from the study data, making them less reliable. Handheld non-contact infrared thermometers (NCITs) and thermal scanners have a reasonable sensitivity and specificity in detecting fever. However, variation in the diagnostic performance was observed in different study settings: notably, an increase in specificity of NCITs with the use of rectal temperature as reference, and differences in sensitivity of NCITs and specificity of both NCITs and thermal scanners with the exclusion of neonate subjects. Despite an observed fall in the sensitivity of thermal scanners in a pandemic setting, our study shows that the NPV continues to be high even when the disease affects a large proportion of the community. Thermal screening may be considered as a method of detection of fever in symptomatic individuals, but only as a part of a larger approach to pandemic response. The demographic, epidemiological, environmental, and psychosocial factors that surround the screening strategy must be taken into consideration, both by present public health policymakers as well as future researchers. U N C O R R E C T E D M No financial support was obtained for this study. The authors have declared no conflict of interest. Infrared thermography to mass-screen suspected SARS patients with fever Evaluation of an infrared thermal detection system for fever recognition during the H1N1 influenza pandemic Fever screening during the influenza (H1N1-2009) pandemic at Narita International Airport generalhospital-devices-and-supplies/thermal-imaging-systems-infrared-thermographic-systemsthermal-imaging-cameras Analysis of IR thermal imager for mass blind fever screening LAWA Official Site | News Release Department Uses Thermal Imaging to Detect COVID-19. US DEPARTMENT OF DEFENSE Non-contact infrared thermometers for measuring temperature in children: primary care diagnostic technology update Non-contact infrared versus axillary and tympanic thermometers in children attending primary care: a mixed-methods study of accuracy and acceptability Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version 1.0: The Cochrane Collaboration Preferred Reporting Items for a Systematic Review and Meta-analysis of Diagnostic Test Accuracy Studies: The PRISMA-DTA Statement Index for rating diagnostic tests Conducting systematic reviews of diagnostic studies: didactic guidelines The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed Clinical performance of infrared consumer-grade thermometers Comparison of 3 infrared thermal detection systems and self-report for mass fever screening Mass screening for fever in children: a comparison of 3 infrared thermal detection systems Non-contact infrared thermometry temperature measurement for screening fever in children Development and deployment of infrared fever screening systems Comparison of Infrared Thermal Detection Systems for mass fever screening in a tropical healthcare setting Clinical accuracy of tympanic thermometer and noncontact infrared skin thermometer in pediatric practice: an alternative for axillary digital thermometer Clinical accuracy of non-contact infrared thermometer from umbilical region in children: A new side Use of noncontact infrared thermography to measure temperature in children in a triage room Mass screening of suspected febrile patients with remote-sensing infrared thermography: alarm temperature and optimal distance Limitations of forehead infrared body temperature detection for fever screening for severe acute respiratory syndrome Validity of Wrist and Forehead Temperature in Temperature Screening in the General Population During the Outbreak of 2019 Novel Coronavirus: a prospective real-world study Screening for fever by remote-sensing infrared thermographic camera Utility of infrared thermography for screening febrile subjects Fever screening of seasonal influenza patients using a costeffective thermopile array with small pixels for close-range thermometry The validity of mass body temperature screening with ear thermometers in a warm thermal environment Investigation of febrile passengers detected by infrared thermal scanning at an international airport Tympanic, infrared skin, and temporal artery scan thermometers compared with rectal measurement in children: a real-life assessment Clinical accuracy of a non-contact infrared skin thermometer in paediatric practice Modern approach to infectious disease management using infrared thermal camera scanning for fever in healthcare settings Cutaneous Infrared Thermometry for Detecting Febrile Patients Performance of non-contact infrared thermometer for detecting febrile children in hospital and ambulatory settings Accuracy of tympanic and infrared skin thermometers in children Thermal image scanning for influenza border screening: results of an airport screening study Field test studies of our infrared-based human temperature screening system embedded with a parallel measurement approach International travels and fever screening during epidemics: a literature review on the effectiveness and potential use of non-contact infrared thermometers Is thermal scanner losing its bite in mass screening of fever due to SARS? New standards for fever screening with thermal imaging systems Policy for Telethermographic Systems During the Coronavirus Disease 2019 (COVID-19) Public Health Emergency. US Food and Drug Administration 2020 Can there be a standard for temperature measurement in the pediatric intensive care unit? Comparison of temporal artery, mid-forehead skin and axillary temperature recordings in preterm infants <1500 g of birthweight Prevalence of Asymptomatic SARS-CoV-2 Infection: A Narrative Review Temporal dynamics in viral shedding and transmissibility of COVID-19 Transmission onset distribution of COVID-19 Infectivity, susceptibility, and risk factors associated with SARS-CoV-2 transmission under intensive contact tracing in Hunan, China Coronavirus Disease 2019 Case Surveillance -United States Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area Clinical Characteristics of Coronavirus Disease 2019 in China Comorbidities, clinical signs and symptoms, laboratory findings, imaging features, treatment strategies, and outcomes in adult and pediatric patients with COVID-19: A systematic review and meta-analysis The prevalence of symptoms in 24 COVID-19): A systematic review and metaanalysis of 148 studies from 9 countries Effectiveness of airport screening at detecting travellers infected with novel coronavirus (2019-nCoV) Modelling strategies for controlling SARS outbreaks World Health Organization Working Group on International and Community Transmission of SARS. Public health interventions and SARS spread Effectiveness of interventions targeting air travellers for delaying local outbreaks of SARS-CoV-2 Strategies at points of entry to reduce importation risk of COVID-19 cases and re-open travel Travel-related fever in the time of COVID-19 travel restrictions