key: cord-1029704-2ey3ohhq authors: Rao, Kunal; Coviello, Giuseppe; Feng, Min; Debnath, Biplob; Hsiung, Wang-Pin; Sankaradas, Murugan; Yang, Yi; Po, Oliver; Drolia, Utsav; Chakradhar, Srimat title: F3S: Free Flow Fever Screening date: 2021-09-03 journal: 7th IEEE International Conference on Smart Computing, SMARTCOMP 2021 DOI: 10.1109/smartcomp52413.2021.00060 sha: bbb5fb353c3841a33b4da0ede09da30a34c1fae7 doc_id: 1029704 cord_uid: 2ey3ohhq Identification of people with elevated body temperature can reduce or dramatically slow down the spread of infectious diseases like COVID-19. We present a novel fever-screening system, F3S, that uses edge machine learning techniques to accurately measure core body temperatures of multiple individuals in a free-flow setting. F3S performs real-time sensor fusion of visual camera with thermal camera data streams to detect elevated body temperature, and it has several unique features: (a) visual and thermal streams represent very different modalities, and we dynamically associate semantically-equivalent regions across visual and thermal frames by using a new, dynamic alignment technique that analyzes content and context in real-time, (b) we track people through occlusions, identify the eye (inner canthus), forehead, face and head regions where possible, and provide an accurate temperature reading by using a prioritized refinement algorithm, and (c) we robustly detect elevated body temperature even in the presence of personal protective equipment like masks, or sunglasses or hats, all of which can be affected by hot weather and lead to spurious temperature readings. F3S has been deployed at over a dozen large commercial establishments, providing contact-less, free-flow, real-time fever screening for thousands of employees and customers in indoors and outdoor settings. One of the common symptoms for a person infected with COVID-19 is fever [1] . Reliable and accurate detection of fever helps in isolating potentially infected people. In this paper, we present F 3 S which screens for people with fever as they move in a free flow setting where individuals need not pause or stop at a kiosk for fever screening. Fig. 1 on the left shows a setup where people need to pause or stop, get their temperature measured and then proceed, while on the right, it shows the free flow movement of people, where temperature is measured as people walk through. By allowing free flow movement, F 3 S increases the throughput (number of people screened per minute) and avoids over-crowding (which can increase virus transmission). To measure temperature of individuals, F 3 S uses thermal and visual cameras, which are placed far away from people, so that temperature measurements can be done from a distance in a contact-less manner. This does not require any human intervention, which is a key concern for COVID-19, since it can spread through close contact of individuals [1] . To do simultaneous fever screening of multiple persons in a freeflow setting, and in real-time, F 3 S uses edge machine learning techniques to perform sensor fusion of thermal and visual data streams within a few hundred milliseconds. Such low-latency response for multiple persons is not possible when visual and thermal data streams are sent to the cloud for analytics processing. F 3 S uses resources near the visual and thermal sensors for edge machine learning. We make the following key contributions: 1) We present F 3 S, a novel free flow fever screening solution which runs at the edge and through use of deep learning techniques, it enables real-time, simultaneous high-throughput measurement of core body temperature of multiple individuals from a distance, all without any human intervention. 2) We present novel sensor fusion techniques to fuse thermal and visual frames to enable accurate temperature measurement of multiple individuals simultaneously. 3) We present a neural network based distance compensation model, which enables correct temperature measurement at various distances from the thermal camera. Such compensation is necessary because the temperature reported for a person depends on the distance of the person from the thermal camera. 4) We present novel techniques to track temperature of individuals across frames, and at different regions (eye, forehead, face and head), depending on visibility and pose of the person; we prioritize, collate and filter alerts for same individual to avoid false positives. 5) We present novel techniques to measure temperature of individuals even when their face is partially covered (for example, if they are wearing masks, sunglasses or hats) 6) Finally, we present a methodology to determine ground truth using thermal and visual sensor data, and to verify accuracy and correctness of temperature measurement. Fever screening measures the core body temperature of individuals as they walk into the facility and triggers an alert when the temperature is above a certain pre-configured threshold (Centers for Disease Control and Prevention considers a person to have fever when measured temperature is at least 100. The temperature across a human body is not consistenthead is the hottest, feet are the coldest, and there are variations in between. There are two regions of the human body that can provide most reliable core temperature measurements: 1) inner canthus of the eye -corner of the eye where the upper and lower eyelids meet, and 2) ear canals. For non-invasive febrile temperature screening using thermographic devices, ISO recommends obtaining temperature reading between the eyes of a person [2] . This is the main goal associated with measuring core body temperatures remotely, i.e. non-invasively and without human proximity, using thermographic sensors -detecting this region reliably, and measuring accurate core body temperature. There are two main challenges that F 3 S addresses in achieving this goal. 1) Free Flow: How to reliably measure core body temperature while people are moving, possibly occluding each other, and are not asked to stop and look at the sensor? 2) Eye-region Occlusion: How to adapt to eye-region obstructions, such as glasses, caps, masks; measure core body temperature and still allow uninterrupted flow of entry? III. RELATED WORK Infrared thermography has been deployed at the quarantine stations for detecting elevated body temperature in the last two decades [3] - [5] . There are many research works that combine RGB and thermal images to extract multiple vital signs (i.e. body temperature, heart rate, respiration rate, blood volume pulse, etc.) to identify people with febrile conditions [3] , [6] - [8] . However, these works have been designed for the pause-and-go like scenario, where an individual pauses for a few seconds, stands still and looks straight at the camera. Even commercial solutions like [9] - [14] work well only in pause-and-go scenarios and not in a free-flow setting. They assume that the face and eye-region is clearly visible with no obstruction, which may not be the case in a free-flow scenario, where people are screened as they walk through and face and/or eye-region may or may not be always visible due to occlusion. Haghmohammadi et al. propose a method for measuring body temperature for an indoor moving crowd [15] . However, it also does not work when the face is not clearly visible. In addition, it has not been tested thoroughly for crowd scenario. Somboonkaew et al. propose a mobile-platform for massive human temperature screening in large public areas based on the infrared forehead temperature using an IR camera and a mobile phone [16] . They align the RGB image to the thermal image to find the target area of the temperature measurement. For the image alignment, they use image cropping and image scaling based on field of view and image resolution of the RGB and thermal cameras. This alignment procedure, however, is very simple and does not work well when faces are occluded. In contrast, F 3 S uses sophisticated method to dynamically align visual and thermal image pairs (explained in Section V-B2), which enables accurate temperature measurement in a free-flow setting, even when there is occlusion. is a fever screening system designed to operate in a free-flow manner i.e. individuals are not required to pause/stop in order for the temperature to be measured. Instead, their temperatures is measured automatically from a distance as they cross the field of view of the camera. A key aspect of our system is that it does not require any human intervention, which is a key concern for COVID-19, since it can spread through close contact of individuals [1] . Our solution works in real-time and is able to handle a high volume and rate of flow. Fig. 2 shows the setup for deployment of F 3 S. Arrow shows the direction of movement of people from the entrance into the building. Cameras are located further away pointing towards the entrance so that a large enough field-of-view is captured. As people walk into the building, their temperatures are measured and displayed on the screen for the operator to monitor. If the temperature of a person is above the configured threshold, then an alert is triggered and the operator can Figure 2 : F 3 S Deployment request the individual to step aside and proceed for secondary screening. F 3 S acts as an initial screening solution and final screening is done using a medical-grade thermometer. Physical distance between operator and target individuals is constantly maintained, as the operator does not need to come in close contact with the people. In fact, the operator need not be present at the location physically as well, everything can be monitored from a central location in the control room and if an alert is triggered, the individual can be notified over an audio speaker to step aside, thereby avoiding the need for any human to physically intervene. The primary goal of any fever screening system is to measure a person's temperature in their eye region, particularly inner canthus area, to produce the most reliable measurement [2] . Using a thermal sensor alone to detect this region and measure the temperature is possible, but the accuracy of detection is poor [17] . In order to overcome this limitation, we combine visual sensor data with thermal sensor data to get accurate readings of the temperature of every person. Visual sensors produce higher dimensional data than thermal sensors (higher resolution and channels in image data). Therefore, visual camera streams allow for more accurate detection of persons, their faces and eye-regions. However, visual cameras lack information about temperature, which is available in the thermal stream data. We use deep learning models to perform sensor fusion of visual and thermal streams to associate temperatures in thermal data with corresponding objects in the visual stream. Fig. 3 shows a high-level design and workflow of F 3 S. There are two streams of data coming into our system i.e. thermal data stream and visual data stream. Both these streams relate to the same scene, one containing the visual RGB frame and the other containing the thermal frame of the scene. Goal. The system's goal is to detect the core body temperature of people as they move in a free-flow manner, from their eye region when possible, or other appropriate body regions. In order to achieve this goal, we split the design of F 3 S into three components, each of them handling specific challenges. 1) Person Tracking: The system needs to detect and track people, while also identifying various regions for temperature measurement as they walk through. 2) Frame Fusion: The system needs to correlate the identified regions in the visual frame with regions in Figure 3 : F 3 S design and workflow the thermal frame, from where temperatures can be extracted. 3) Fever Screening: The system needs to decide if a person actually has fever or not, based on the multiple detections it possibly makes across frames. Next, we discuss each of these components in detail. This component of F 3 S detects people and tracks them as they move in a free-flow setting. We use a proprietary neural network based model [18] , which detects the face, head and body of a person. Depending on whether the person is wearing glasses, mask, hat, or a combination of these, and the pose and angle -none, one or more of the above detections are possible for a person in each frame. After detection, tracking of the person is a key component in our system to achieve high accuracy when people are not pausing or stopping. Having a unique tracking ID for the detected regions of a person allows our system to correlate and refine temperature measurements across frames for them as they walk through. For tracking purpose, person tracking component maintains a cache, called as person cache, to store details of most recently detected individuals. Specifically, for these individuals, the latest bounding box location of body, head and face, along with their snapshots is maintained. A person remains in the cache until there are body, face or head detections available for the person. If there are no detections available for some configurable period of time e.g. 10 seconds, then the person is removed from the cache (since the person may have left and therefore there are no detections). In order to assign unique ID for an individual, each of the available detections i.e. body, face or head is checked one by one. Our tracking algorithm is described in Algorithm 1. In the first loop, detected body is compared with all previously stored bodies in the cache to see if there is any match. For body match, we consider the bounding box location and image similarity. If the overlap in bounding box is high i.e. above a pre-configured threshold and the image similarity is also high (above a pre-configured threshold), then it is considered as a match. If a body match is found, then the ID of the body in cache is assigned to the detected body and details of the body i.e. bounding box and snapshot is updated in the cache. If there is no match found, then a new ID is assigned to the body and the detected body is added to the cache. The second loop uses proprietary neural network based face recognition model [18] to compare and match the face with all faces in the cache. If the match score is above a pre-configured threshold, then it is considered as a face match and the ID of the face in the cache is assigned to the detected face, and face details in cache is updated. Since accuracy of face recognition model is higher, we use this to overwrite the ID for body and assign the same ID as face, to the body. If there is no face match found, then ID of the body, if available, is assigned to the face, otherwise a new ID is assigned to the face and the detected face is added to the cache. The third loop assigns an ID to the head detection. We do not use image similarity and bounding box location match for head because the head detection has a low detection rate compared to body and less accuracy for similarity match compared to face match. We assign the same ID as the face or body, if available, otherwise we assign a new ID to the head. If both face and body ID are available, then we give priority to face ID. Thus by going through this procedure, where we check the available body, face or head detections one by one, we are able to track and assign unique ID to individuals, which enables F 3 S to correlate and refine temperature measurement for an individual as the person walks through in a free-flow manner. As mentioned above, F 3 S uses both, visual (RGB) and thermal sensors to achieve high accuracy and high throughput fever screening. The system detects persons on the visual frames and then finds the temperature of the person from the thermal frame. To do this, a mapping from the visual frame to the thermal frame is required, i.e. for a given point if the sensors have the exact same viewpoint. However, even though these sensors are assembled in a single unit, they are placed sideby-side, with discernible distance between them. Such a setup introduces a difference in the sensor viewpoints, which implies that (x v , y v )! = (x t , y t ). Instead, the points are related through a function -(x v , y v ) = f align (x t , y t ). The system needs to estimate f align so that a point from the visual frame can be mapped to the correct point in the thermal frame to read the associated temperature. 1) Manual Offset Approach: A straightforward way to get approximate alignment is to perform static scaling and translation. Let I V be visual sensor data, I T be thermal sensor data and I AT be thermal sensor data aligned to visual image. Then alignment function f align can be defined as follows (where t x is horizontal offset, t y is vertical offset, S x is horizontal scale factor and S y is vertical scale factor). With above transformation function, it is possible to get temperature measurement in known area of visual ROI using data from I AT = f align (I T ). However, this approach produces correct alignment only for a shallow depth plane, and approximate alignment for neighbouring pixels in that area of that plane. It does not align the entire frame. Pixels representing depth planes further or much closer from the correctly aligned image plane have larger misalignment. Alignment errors can be about 100+ pixels (in both x and y direction) when the person is closer or further to camera. This leads to inaccurate temperatures being assigned to visual regions. 2) Dynamic Frame Alignment: Relative distortion between sensor images vary depending upon the depth plane. As the individual walks towards the camera, multiple temperature readings have to be taken to take into account head and face pose, occlusion with other individuals and objects. Since typical width of free flow traffic is about 6 to 8 feet wide and with horizontal field of view angle spans about 90 degrees or more (due to measurement zone constraints), it leads to nonuniform misalignment in horizontal plane as the individuals are found off camera's optical axis. Alignment issues is shown in Fig. 4 and the alignment error at various distances along horizontal and vertical plane is shown in Fig. 5 . Corresponding measurement inaccuracy and required correction is depicted in Fig. 6 . F 3 S dynamically aligns visual and thermal image pairs for all people in field of view using the flow shown in Fig. 7 . Since thermal image has limited features to extract and limited correlation to visual image, features in boundary of person's head are matched in visual and thermal spectrum. To do this, first background is estimated in the images and subtracted and foreground mask is obtained. ORB features [19] of foreground mask is obtained and matched using a brute force Hamming matcher to identify matching feature points. Using the matching feature points with high confidence score across both pairs, Homography matrix is obtained for each pair. Object pairs are rectified with respect to each other using Homography matrix in Spatial Fusion module. A by-product of this alignment is distance measurement. The thermal and visual cameras are placed in a stereo arrangement with a small baseline. This allows the application of epipolar geometry and stereo vision principles to calculate depth-from-disparity [20] . Typically, the critical issue in assessing depth from cross-spectral stereo is finding corresponding points across the two spectra. We already solve this problem during the alignment process and using that, we obtain the depth and use it in subsequent module for temperature correction. Once visual and thermal frame fusion is completed, Feverscreening module then processes the fused frame to detect people with fever. Four key challenges that this module solves are: 1) Detecting and measuring temperature within the sensor's recommended measurement zone 2) Prioritizing temperature measurement region depending on available detections for an individual within a frame 3) Temperature correction based on distance from camera 4) Prioritized refinement of measured temperatures across frames, and alerting for those with fever 1) Detection and Measurement in Capture Zone.: Measured temperature from the thermal sensor varies depending on the distance from the camera. This is because of the atmospheric composition between the person and the camera. Gases and particles present in the atmosphere absorb some of the emitted infrared radiation from the person. Farther the distance, more is the absorption and therefore less is the measured temperature. This variation of temperature with distance is shown in Fig. 8 for an individual along the optical axis and 2 feet off from the optical axis. Due to these atmospheric factors, thermal imaging sensors have a recommended measurement zone in terms of distance from the sensor. To allow measurement within this recommended zone, a capture zone is set up i.e. a zone in which temperature measurements for individuals starts as they enter and stops when they leave. Along with capture zone, a Region of Interest (ROI) within the camera view is also configured. This can be used to tightly control the region within which temperature is measured for individuals, thus keeping it clutter-free for the operator. Note that within the ROI, people may be too far or too near i.e. may be within or outside the capture zone. Algorithm 2: Fever Screening algorithm Algorithm 2 shows the procedure followed in determining the temperature of individuals as they walk through in the capture zone. The first step is to discard out-of-order frames. Next, all detected persons who lie outside the configured ROI are removed. After removal of any person outside the ROI, the remaining individuals are processed one by one as they enter and leave the capture zone. Fever-screening module maintains a cache of recently seen individuals. Tracking ID of the individual is used as an identifier to determine if the person is new or previously seen. For a previously seen person, the new temperature reading is measured for the current frame and updated in cache corresponding to the tracking ID of the individual. For a new person, check is performed to see if the person has entered the capture zone. If it is determined that the person has entered the capture zone, then temperature for the person is measured and a new person with new tracking ID is added in cache. Note that the temperature for a person is added or updated in cache only if it is within the acceptable human temperature range, so as to avoid any spurious temperature readings. Also, note that we start temperature measurement for an individual when he/she enters the capture zone and continue to monitor and measure until the individual leaves the capture zone. After all individuals are processed, the frame is annotated with the temperatures for individuals and rendered, so that the operator can see the live view of the feed with temperatures of individuals annotated on the live stream. After frame rendering is done, all individuals in cache are processed one by one to (a) determine temperature for the individual across multiple readings and send alert if temperature is above a certain configured threshold and (b) remove any expired entries from cache i.e. remove any individuals in cache after a configurable period of time, after they have left. 2) Prioritized temperature measurement.: Algorithm 3 shows the procedure followed to prioritize and measure the temperature for a person within a frame. Highest priority is given to temperature measurement at eye and forehead region, Algorithm 3: Prioritized temperature measurement followed by face and finally to head region. Within each of these regions, a configurable area is chosen and temperatures of pixels within that area is obtained from thermal sensor data. Among all the temperature values in the area, maximum among those is chosen as the raw measured temperature. We decided to choose the maximum value so as to avoid a false negative i.e. missing an individual who might have fever. 3) Temperature correction: As mentioned in section V-C1, the measured temperature of an individual varies depending on the distance from the camera and this variation is nonlinear. To correct for this variation in measured temperature with distance, F 3 S employs a neural network based distance compensation model, which can be used when a black body is present as part of the deployment. Black body provides a known reference temperature to ensure accurate temperature reading. This problem is framed as a regression problem and we use Multi-layer Perceptron (MLP) to solve it. The model is a feed-forward neural network, to which the input parameters are the distance and the corresponding measured temperature at that distance. The output from the model is the predicted true temperature of the individual. The model architecture is shown in Fig. 9 , with input layer, hidden layer and an output layer. At the time of model training, as people walk through, the Figure 9 : Distance compensation model measured temperature at the black body is considered as the ground truth i.e. the true temperature of the person. Using this as the ground truth, the loss function is set as the mean squared error given by equation 1, where Y i is the true temperature and Y i is the predicted temperature. We use Adam [21] optimizer with following hyperparameter settings: α of 0.001, β 1 of 0.9, β 2 of 0.999 and of 1e −07 for training. Rectified Linear Unit (ReLU) is used as activation function and the model is trained until 100 epochs with batch size of 1. The split between training and test data is 70:30. After the model is trained, the evaluation resulted in a mean squared error of 0.018 on test data, indicating a very high accuracy for the trained distance compensation model. 4) Prioritized refinement and alerting: As an individual walks through, multiple temperature readings of the person are recorded across frames. Based on these readings algorithm 4 shows the procedure followed to prioritize and refine temperature measurement and send alert. To prioritize temperature measurement, highest priority is given to eyes and forehead, followed by face and head. Maximum temperature for the available highest priority region is recorded as the temperature for the person (to avoid missing a person with fever). To refine the measured temperature, as person is seen across frames, if temperature readings of higher priority region is measured, then the previous recorded temperature from lower priority region, is refined and updated with the new one. With respect to alerting, F 3 S uses tracking ID to avoid sending repeated alerts for same individual. An alert is sent the first time measured temperature for a person is greater than the configured threshold, and after that an alert is sent only if higher priority region temperature is measured or if the delta increase in the temperature is above a configured threshold. As the environmental conditions change, the thermal readings from the camera start to drift from actual temperature, thereby producing incorrect temperature readings, which ultimately results in incorrect temperature measurement for individuals. Camera vendors provide several parameters to calibrate and correct for this drift and maintain original temperatures. This correction however must be done manually, which is not practical to do in a real deployment where environmental conditions may change frequently. In order to do this automatically, F 3 S employs a separate module to automatically detect the change and quickly adjust the camera parameters to maintain accurate temperature readings produced by the camera. In order to detect the change automatically, a black body is used as a reference object and set to a known reference temperature in the field of view of the camera. The temperature readings coming from the region of the black body is continuously monitored and any change in the temperature Data: Cache of recently seen individuals Result: Prioritized, refined temperature and alert foreach cache entry do if minimum number of readings present then max head temperature ←− getM axHeadT emperatureReading(); max f ace temperature ←− getM axF aceT emperatureReading(); max eye f orehead temperature ←− getM axEyeF oreheadT emperatureReading(); if higher priority region temperature is available then updatePersonTemperatureWithHigherPriorityReading(); else updatePersonTemperatureReading(); end end delta temperature increase ←− getDeltaT emperatureIncrease(); if person temperature greater than configured threshold AND delta temperature increase greater than configured delta change OR higher priority region temperature is available then sendAlert(); end end Algorithm 4: Prioritized refinement and alerting reading beyond an acceptable threshold is detected in realtime and immediate action is taken to adjust the camera parameters until the temperature of the black body is back to the reference temperature. Fig. 10 shows a feedback control system used by F 3 S for auto-calibration. The drift a.k.a. error in the temperature of the black body is measured in real-time and if the error is beyond an acceptable threshold, then the dynamic proportional controller corrects various parameters of the camera to iteratively reduce the error and bring back the measured temperature of the black body to the reference temperature. Using this auto-calibration technique, even if the environmental conditions change, the camera always produces correct thermal readings, which ultimately results in correct temperature measurement for individuals. Fig. 11 shows the settling time for the proportional controller starting from an error of 5 degree Celsius i.e. the measured temperature of reference object is off by 5 degrees. A configurable time, called settling period is setup between each signal from the proportional controller, for the changed camera parameters to take effect. This is by default set to 5 seconds. For low errors, the convergence is achieved within a few signals, VII. EXPERIMENTAL SETUP AND METHODOLOGY In our system, we use an edge device with a quad-core processor, i7-7700, and 16GB memory. We use Ubuntu 16.04 as the OS and docker for software deployment. The hardware is able to process 8 frames per second (real-time performance) and the video frame resolution is 1280 by 960 while the resolution of thermal data is 336 by 252. We use a Mobotix M16 TR [22] for thermal and visual imaging in our experiments. Note that techniques used in F 3 S are general and work with other cameras as well e.g. integrated thermal and visual cameras [23] or separate thermal [24] and visual [25] cameras placed next to each other. We collect datasets containing both video and thermal data to measure the accuracy of our system. We split the task of assessing the accuracy of our system into two phases. First, we determine the accuracy of the thermal sensor by comparing the values it reports with manual measurements using a traditional hand-held contact thermometer. Then, we assess the accuracy of F 3 S, assuming that the thermal sensor is 100% accurate. This two-step procedure has a big advantage: if we change the thermal sensor, then we only have to measure the accuracy of the new thermal sensor, and we can infer the accuracy of the system without further experiments. For our experimental purpose, we created three datasets, called "LAB", "PRODUCTION 1" and "PRODUCTION 2", as shown in Table I . "LAB" is the one created in controlled lab environment while the other two are from production environment. In each dataset, we considered different operating conditions like standing at different distances and locations, walking at different speeds, wearing mask/glasses and having different temperatures, and we collected multiple video clips for these various conditions. The total number of people, those wearing masks, those with fever and total length of video clips for each dataset is shown in Table I . Next, we discuss how we make ground-truth for these datasets. While for the "LAB" dataset, we verify the thermal DATASET sensor's output with the data from a real thermometer and perform necessary tuning on the thermal sensor, in other datasets, we consider that the data from the thermal sensor is accurate. In other words, we use the raw data from thermal sensor as the ground-truth. To be more specific, for a person standing before the thermal sensor, we manually annotate the person's forehead and eye region and use the temperature of this area from the thermal sensor as the temperature of the person. Maximum temperature across frames is considered as ground-truth temperature for the person. As explained in Section V-B, due to displacement between thermal and visual sensor, the corresponding frames are not aligned, and hence can cause errors in mapping temperatures to visual objects. Here, we show how dynamic frame alignment achieves superior alignment as compared to the manual offset approach. As seen in Fig. 12 , x co-ordinate alignment error varies from 120 pixels at 3 feet to 50 pixels at 12 feet (see "X Error Before"). This is about 70 pixels of change as a person transits through the measurement area. Unfortunately, this is about the width of the bounding box for a face, and this causes erroneous readings of temperature for visual objects like face. The manual x-offset and yoffset are set to 75 pixels and -25 pixels respectively in both Figs. 12 and 13 based on manual overlay of the two images at a particular distance. These offsets are set once, independent of the position of the person in the field of view of the camera. This setting, however, does not provide good alignment at all distances from the camera as the person walks through (see "X Error Manual" and "Y Error Manual" in Figs. 12 and 13). For example, at 3 feet from the camera, and at the optical axis (Fig. 12) , the x-coordinates of the visual and thermal region for a person are off by 120 pixels ("X Error Before"). Therefore, a manual x-coordinate correction of 75 pixels is not adequate to align the two frames. Similarly, a manual ycoordinate correction of -25 pixels is also not adequate for a person at 3 feet from the camera, where the error is -40 pixels ("Y Error Before"). 2) Dynamic correction: By using dynamic frame alignment technique, both visual and thermal frames are tightly aligned at different distances, at and off optical axis (see "X Error Dynamic" and "Y Error Dynamic" in Figs. 12 and 13 ). For example, as shown in Fig. 12 , after dynamic alignment, the x-coordinate alignment error (shown as "X Error dynamic") for a person at 3 feet from the camera, and at the optical axis, is less than 5 pixels, which means that the visual and thermal frames are well aligned. The close alignment (low x and y alignment error) is observed for all usable distance ranges (3 feet to 12 feet) from the camera, and at the optical axis. A similar near-perfect alignment is observed in Fig. 13 , where the dynamic alignment approach ensures that the x and y coordinate alignment error for all usable distances from the camera, and up to 2.5 feet off the optical axis, is less than 5 pixels, indicating that the visual and thermal frames are well aligned. Fig. 14 shows our experimental results where we measure the temperature using detections at various regions i.e. eyes and forehead, face and head for a person wearing glasses, mask and neither. The rectangle shows the detections and dot inside the rectangle is the location of temperature measurement. We observe that when the person is not wearing glasses, across all detections, the final location of temperature measurement is the same. While, when the person is wearing glasses, the eye region is occluded and temperature is measured at different locations for different detections, but the variation in Figure 14 : Temperatures at various regions DATASET TP FP TN FN LAB 2 0 15 0 PRODUCTION 1 4 1 7 0 PRODUCTION 2 1 2 73 0 OVERALL 7 3 95 0 Table II : F 3 S Accuracy temperature is quite low. This shows that we do not need to ask people to pause and give a good shot, instead, we can use any of the available detections as they walk through. It typically takes ∼ 250 to 300 milliseconds for detection and temperature measurement for an individual. In Table II , we show the true positives (TP), false postives (FP), true negative (TN), and false negative (FN) from F 3 S for each dataset. For a person with fever, if F 3 S also detects fever for the person, it is considered as a TP, and if F 3 S considers the person as normal, it is a FN case. For a person without fever, if F 3 S detects fever for the person, it is a FP case, otherwise it is considered as a TN. For all three datasets with 105 people, F 3 S achieves 100% sensitivity and 96.9% specificity. The standard deviation of temperature values is 1.41 • F. Among these datasets, we don't have any FN. In other words, we detects all people with fever, which is critical to guarantee the safety. However, our system has 3 false positives for PRODUCTION 1 and PRODUCTION 2 datasets due to occlusions and bad lighting conditions. We have presented a rapid, contact-less and hygienic fever screening system that runs at the edge and uses deep learning techniques for accurate temperature measurement. Our easyto-use solution improves customer experience and works well even when individuals are using personal protective equipment like masks, spectacles and hats. Although not reported here, we have augmented the proposed system with face recognition. Locations with well-defined entry/exit points like building access, airport boarding gates, theme parks, stadiums and hospitals, all benefit from the high-speed and precision of the the proposed thermal screening and face recognition for access control and screening of employees and guests. Coronavirus (COVID-19) Medical electrical equipment -deployment, implementation and operational guidelines for identifying febrile humans using a screening thermograph Contactless vital signs measurement system using rgb-thermal image sensors and its clinical screening test on patients with seasonal influenza Applications of Infrared Thermography for Noncontact and Noninvasive Mass Screening of Febrile International Travelers at Airport Quarantine Stations Comparison of infrared thermal detection systems for mass fever screening in a tropical healthcare setting Stable contactless sensing of vital signs using rgb-thermal image fusion system with facial tracking for infection screening Remote sensing of multiple vital signs using a cmos camera-equipped infrared thermography system and its clinical application in rapidly screening patients with suspected infectious diseases Multiple vital-sign-based infection screening outperforms thermography independent of the classification algorithm Elevated body temperature screening solutions Temperature Screening Kiosks with Visitor and Employee Sign-in Make Workplace Safer With Our Access And Temperature Screening Kiosks VitalSign Body Temperature Check Kiosk Remote measurement of body temperature for an indoor moving crowd Mobile-platform for automatic fever screening system based on infrared forehead temperature Object detection in infrared images using deep convolutional neural networks Face Recognition Orb: An efficient alternative to sift or surf Computer vision: algorithms and applications Adam: A method for stochastic optimization M16 Thermal TR FM 640 Plus P Series IR Camera Intel Realsense cameras