key: cord-0681096-i7yh60ld
authors: Sathyamoorthy, Adarsh Jagan; Patel, Utsav; Savle, Yash Ajay; Paul, Moumita; Manocha, Dinesh
title: COVID-Robot: Monitoring Social Distancing Constraints in Crowded Scenarios
date: 2020-08-14
journal: nan
DOI: nan
sha: 4c7b84814de2a003d2da444739ff45064167c0f8
doc_id: 681096
cord_uid: i7yh60ld

Maintaining social distancing norms between humans has become an indispensable precaution to slow down the transmission of COVID-19. We present a novel method to automatically detect pairs of humans in a crowded scenario who are not adhering to the social distance constraint, i.e. about 6 feet of space between them. Our approach makes no assumption about the crowd density or pedestrian walking directions. We use a mobile robot with commodity sensors, namely an RGB-D camera and a 2-D lidar to perform collision-free navigation in a crowd and estimate the distance between all detected individuals in the camera's field of view. In addition, we also equip the robot with a thermal camera that wirelessly transmits thermal images to a security/healthcare personnel who monitors if any individual exhibits a higher than normal temperature. In indoor scenarios, our mobile robot can also be combined with static mounted CCTV cameras to further improve the performance in terms of number of social distancing breaches detected, accurately pursuing walking pedestrians etc. We highlight the performance benefits of our approach in different static and dynamic indoor scenarios.

T HE COVID-19 pandemic has caused significant disruption to daily life around the world. As of August 10, 2020, there have been 19.8 million confirmed cases worldwide with more than 730 thousand fatalities. Furthermore, this pandemic has caused significant economic and social impacts.

At the moment, one of the best ways to prevent contracting COVID-19 is to avoid being exposed to the coronavirus. Organizations such as the Centers for Disease Control and Prevention (CDC) have recommended many guidelines including maintaining social distancing, wearing masks or other facial coverings, and frequent hand washing to reduce the chances of contracting or spreading the virus. Broadly, social distancing refers to the measures taken to reduce the frequency of people coming into contact with others and to maintain at least 6 feet of distance between individuals who are not from the same household. Several groups have simulated the spread of the virus and shown that social distancing can significantly reduce the total number of infected cases [1] , [2] , [3] , [4] , [5] .

A key issue is developing guidelines and methods to enforce these social distance constraints in public or private gatherings at indoor or outdoor locations. This gives rise to many challenges including, framing reasonable rules that people can follow when they use public places such as supermarkets, pharmacies, railway and bus stations, spaces for recreation and essential work, and how people can be encouraged to follow * Authors contributed equally. Figure 1 : Our robot detecting non-compliance to social distancing norms, classifying non-compliant pedestrians into groups and autonomously navigating to the group with currently the most people in it (a group with 3 people in this scenario). The robot encourages the non-compliant pedestrians to move apart and maintain at least 6 feet of social distance by displaying a message on the mounted screen. Our COVID-robot also captures thermal images of the scene and transmits them to appropriate security/healthcare personnel. the new rules. In addition, it is also crucial to detect when such rules are breached so that appropriate counter-measures can be employed. Detecting social distancing breaches could also help in contact tracing [6] .

Many technologies have been proposed for detecting excessive crowding or conducting contact tracing, and most of them use some form of communication. Examples of this communication include WiFi, Bluetooth, tracking based on cellular connectivity, RFID, Ultra Wide Band (UWB) etc. Most of these technologies work well only in indoor scenes, though cellular have been used outdoors for tracking pedestrians. In addition, many of these technologies such as RFID, UWB, etc. require additional infrastructure or devices to track people indoors. In other cases, technologies such as WiFi and Bluetooth are useful in tracking only those people connected to the technologies using wearable devices or smartphones. This limits their usage for tracking crowds and social distancing norms in general environments or public places, and may hinder the use of any kind of counter-measures. Main Results: We present a vision-guided mobile robot (COVID-robot) to monitor scenarios with low or high-density crowds and prolonged contact between individuals. We use a state-of-the-art algorithm for autonomous collision-free navigation of the robot in arbitrary scenarios that uses a hybrid combination of a Deep Reinforcement Learning (DRL) method and traditional model-based method. We use pedestrian detection and tracking algorithms to detect groups of people in the camera's Field Of View (FOV) that are closer than 6 feet from each other. Once social distance breaches are detected, the robot prioritizes groups based on their size, navigates to the largest group and encourages following of social distancing norms by displaying an alert message on a mounted screen. For mobile pedestrians who are noncompliant, the robot tracks and pursues them with warnings.

Our COVID-robot uses inexpensive visual sensors such as an RGB-D camera and a 2-D lidar to navigate and to classify pedestrians that violate social distance constraints as noncompliant pedestrians. In indoor scenarios, our COVID-robot uses the CCTV camera setup (if available) to further improve the detection accuracy and check a larger group of pedestrians for social distance constraint violations. We also use a thermal camera, mounted on the robot to wirelessly transmit thermal images. This could help detect persons who may have a high temperature without revealing their identities and protecting their private health information. Our main contributions in this work are: 1. A mobile robot system that detects breaches in social distancing norms, autonomously navigates towards groups of non-compliant people, and encourages them to maintain at least 6 feet of distance. We demonstrate that our mobile robot monitoring system is effective in terms of detecting social distancing breaches in static indoor scenes and can enforce social distancing in all of the detected breaches. Furthermore, our method does not require the humans to wear any tracking or wearable devices. 2. We also integrate a CCTV setup in indoor scenes (if available) with the COVID-robot to further increase the area being monitored and improve the accuracy of tracking and pursuing dynamic non-compliant pedestrians. This hybrid combination of static mounted cameras and a mobile robot can further improve the number of breaches detected and enforcements by up to 100%. 3. We present a novel real-time method to estimate distances between people in images captured using an RGB-D camera on the robot and CCTV camera using a homography transformation. The distance estimate has an average error of 0.3 feet in indoor environments. 4. We also present a novel algorithm for classifying noncompliant people into different groups and selecting a goal that makes the robot move to the vicinity of the largest group and enforce social distancing norms. 5. We integrate a thermal camera with the robot and wirelessly transmit the thermal images to appropriate security/healthcare personnel. The robot does not record temperatures or perform any form of person recognition to protect people's privacy.

We have evaluated our method quantitatively in terms of accuracy of localizing a pedestrian, the number of social distancing breaches detected in static and mobile pedestrians, and our CCTV-robot hybrid system. We also measure the time duration for which the robot can track a dynamic pedestrian. Qualitatively, we highlight the trajectories of the robot pursuing dynamic pedestrians when using only its RGB-D sensor as compared to when both the CCTV and RGB-D cameras are used.

The rest of the paper is organized as follows. In Section 2, we present a brief review of related works on the importance of and emerging technologies for social distancing and robot navigation. In Section 3, we provide a background on robot navigation, collision avoidance, and pedestrian tracking methods used in our system. We describe new algorithms used in our robot system related to grouping and goal-selection, CCTV setup, thermal camera integration, etc in Section 4. In Section 5, we evaluate our COVID-robot in different scenarios and demonstrate the effectiveness of our hybrid system (robot + CCTV) and compare it with cases where only the robot or a standard static CCTV camera system is used.

In this section, we review the relevant works that discuss the effectiveness of social distancing and the different technologies used to detect breaches of social distancing norms. We also give a brief overview of prior work on collision avoidance and pedestrian tracking.

Works that have simulated the spread of a virus [1] , [2] , [3] , [4] , [5] demonstrate different levels of effectiveness of different kinds of social distancing measures. Effectiveness of a social distancing measure is evaluated based on two factors: (1) . the basic reproduction number R o , and (2) the attack rate. R o is the average number of people to whom an infected person could spread the virus during the course of an outbreak. The attack rate is the ratio between the total number of infected cases over the entire course of the outbreak [6] .

For instance, in [1] , in a workplace setting, the attack rate can be reduced by up to 82% if three consecutive days are removed from the workdays for R o = 1.4 [1] . Similarly, in an R o = 1.4 setting, maintaining 6 feet or more between persons in the workplace could reduce the attack rate by up to 39.22% [2] or reduce the rate by 11% to 20% depending on the frequency of contact with other employees [3] . Other works that have studied the effects of self-isolation [4] , [7] , show that it could reduce the peak attack rate by up to 89% when R o < 1.9.

Recently, many techniques have been proposed to monitor whether people are maintaining the 6-feet social distance. For instance, workers in Amazon warehouses are monitored for social distancing breaches using CCTV cameras 1 . Other methods include using wearable alert devices 2 . Such devices work using Bluetooth or UWB technologies. Companies such as Apple and Google are developing contact tracing applications that can alert users if they come in contact with a person who could be infected 3 .

A comprehensive survey of all the technologies that can be used to track people to detect if social distancing norms are followed properly is given in [6] . This includes a discussion of pros and cons of technologies such as WiFi, Zigbee, RFID, Cellular, Bluetooth, Computer Vision, AI, etc. However, almost all of these technologies require new static, indoor infrastructure such as WiFi routers, Bluetooth modules, central RFID hubs, etc. Technologies such as RFID and Zigbee also require pedestrians to use wearable tags to localize them.

Most of these technologies are also mostly limited to indoor scenes, with the exception of cellular-based tracking and do not help in reacting to cases where people do not follow social distancing guidelines. In [8] , a quadruped robot with multiple on-board cameras and a 3-D lidar is used to enforce social distancing in outdoor crowds using voice instructions. Our work is complimentary to these methods and also helps react to social distancing violations. Although we evaluate our system indoors, it was trivially be extended to outdoor scenes in the future.

The problem of collision-free navigation has been extensively studied in robotics and related areas. Recently, some promising methods for navigation with noisy sensor data have been based on Deep Reinforcement Learning (DRL) methods [9] , [10] . These methods work well in the presence of sensor uncertainty and produce better empirical results when compared to traditional methods such as Velocity Obstaclebased methods [11] , [12] . These methods include training a decentralized collision avoidance policy by using only raw data from a 2-D lidar, the robot's odometry, and the relative goal location [13] . The policy is extended by combining it with control strategies [14] . Other works have developed learning-based policies that implicitly fuse data from multiple perception sensors to handle occluded spaces [15] and to better handle the Freezing Robot Problem (FRP) [16] . Other hybrid learning and model-based methods include [17] , which predicts the pedestrian movement through optical flow estimation. [18] constructs a potential freezing zone that is used by the robot to prevent freezing and improve the pedestrianfriendliness of the robot's navigation. Our navigation approach is also based on DRL and can be combined with any of these methods.

In this section, we provide a brief overview of the collision avoidance scheme used in our system, our pedestrian detection and tracking method, and our criteria for social distancing.

We use an end-to-end Deep Reinforcement Learning-based (DRL) policy [13] to generate collision-free velocities for the robot. We chose a DRL-based method because it performs well in the presence of sensor uncertainty, and have better empirical results than traditional collision avoidance methods. The collision avoidance policy is trained in a 2.5-D simulator with a reward function that (i) minimizes the robot's time to reach its goal, (ii) reduces oscillatory motions in the robot, (iii) heads towards the robot's goal, and most importantly, (iv) avoids collisions. At each time instance, the trained DRL policy π θ takes 2-D lidar data observations (o t lidar ), the relative goal location (o t goal ), and the robot's current velocity (o t vel ) as inputs to generate collision-free velocities v DRL . Formally,

This velocity is then post-processed using Frozone [18] to eliminate velocities that lead to the Freezing Robot Problem (FRP).

Frozone [18] is a state-of-the-art collision avoidance method for navigation in moderate to dense crowds (≥ 1 person/m 2 ) that uses an RGB-D camera to track and predict the future positions and orientations of pedestrians relative to the robot. Its primary focus is to simultaneously minimize the occurrence of FRP [19] and any obtrusion caused by the robot's navigation to nearby pedestrians. FRP is defined as any scenario where the robot's planner is unable to compute velocities that move the robot towards its goal. When navigating among humans, the robot must ensure that it does not freeze, as it severely affects its navigation and causes inconvenience to the humans around it.

Frozone's two core ideas are as follows. The robot first classifies pedestrians into potentially freezing (more probable of causing freezing) and non-freezing pedestrians based on their walking speeds and directions by predicting their future positions over a time horizon. The robot then constructs and avoids a spatial region called the Potential Freezing Zone (PFZ). The PFZ corresponds to the set of locations where the robot has the maximum probability of freezing and being obstructive to the pedestrians around it. Formally, the PFZ is constructed as follows:

wherep ped i is the predicted future position of the i th pedestrian, and K is the total number of potentially freezing pedestrians.p ped i is calculated as,p ped

The symbols p ped i and v ped i denote the i th pedestrian's current position and velocity vectors relative to the robot, and ∆t is the time horizon over which prediction is done. If the distance between the robot and the closest potentially freezing pedestrian is less than a threshold distance, the robot deviates its current velocity direction (computed by the DRL method) away from the PFZ.

A lot of work has been done on object detection and tracking in recent years, especially on methods based on deep learning. For detecting and tracking pedestrians, we use the work done in [20] based on Yolov3 [21] , a scheme that achieves a good Figure 2 : a. The criteria used to detect whether two pedestrians violate the social distance constraint. This figure shows two pedestrians represented as circles in two different scenarios. The increasing size of the circles denotes the passage of time. The green circles represent time instances where the pedestrians maintained > 6 feet distance, and the red circles represent instances where they were closer than 6 feet. Top: Two pedestrians passing each other. This scenario is not reported as a breach since the duration of the breach is short. Bottom: Two pedestrians meeting and walking together. This scenario is reported as a breach of social distancing norms. b. A top-down view of how non-compliant pedestrians (denoted as red circles) are classified into groups. The numbers beside the circles represent the IDs of the pedestrians outputted by Yolov3. The compliant pedestrians (green circles) are not classified into groups as the robot does not have to encourage them to maintain the appropriate social distance. In the scenario shown, the robot would first attend to Group 1. balance between speed and tracking accuracy. The input to the tracking scheme is an RGB image and the output is a set of bounding box coordinates for all the pedestrians detected in the image.

The bounding boxes are denoted as

where H is the set of all pedestrian detections, top left, m B , and n B denote the top left corner coordinates, width, and height of the k th bounding box B k , respectively. Apart from these values, Yolov3 also outputs a unique ID for every person in the RGB image, which remains constant as long as the person remains in the camera's FOV. Since Yolov3 requires RGB images, the images from both the RGB-D and the CCTV cameras can be used for detecting pedestrians.

We mainly focus on detecting scenarios where individuals do not maintain a distance of at least 6 feet from others for a given period of time (we choose a 5-second threshold). We choose to detect this scenario because it is a fundamental social distancing norm during all stages of a pandemic, even as people begin to use public spaces and restrictions are lifted.

An important challenge in detecting when individuals are not maintaining appropriate distances amongst themselves is avoiding false negatives. For example, two or more people passing each other should not be considered a breach, even if the distance between them was less than 6 feet for a few moments (see Figure 2a ). Another challenge is detecting pedestrians and estimating the distances between them in the presence of occlusions. This can be addressed in indoor scenarios by using available static mounted CCTV cameras. 

In this section, we first discuss how our method effectively detects a breach in social distancing norms. We refer to people who violate social distancing norms as non-compliant pedestrians. We then describe how we classify non-compliant pedestrians into groups and compute the goal for the robot's navigation based on the size of each group. Our overall system architecture is shown in figure 3.

As mentioned in Section III-D, if certain individuals do not maintain a distance of at least 6 feet from each other, the system must report a breach. The robot's on-board RGB-D camera and the CCTV camera setup (whenever available) continuously monitor the states of individuals within their sensing range. At any instant, breaches could be detected by the robot's RGB-D camera and/or the CCTV camera.

1) Social Distance Estimation Using RGB-D Camera: We first describe how we localize a person detected in the RGB image (Section III-C) with respect to the robot by using its corresponding depth image from the RGB-D camera. The depth and RGB images from the RGB-D camera have the same widths and heights and are aligned by default to be looking at the same subjects (see figure 4) . We denote the depth image at any time instant t as I t , and the value contained in a pixel at coordinates (i, j) is the proximity of an object at that part of the image.

Formally,

Here, f is an offset distance from the RGB-D camera from where depth can be accurately measured, and R is the maximum range in which depth can be measured. Symbols w, h, i, and j represent the image's width, height, and the indices along the width and height, respectively. Using this data, we localize a detected pedestrian P as follows. First, the detection bounding boxes from the RGB image are superimposed over the depth image. Next, the minimum 10% of the pixel values inside the bounding box B P are averaged to obtain the mean distance (d avg ) of pedestrian P from the camera. Denoting the centroid of the bounding box B P as [x B P cen , y B P cen ], the angular displacement ψ P of the pedestrian relative to the robot can be computed as:

where F OV cam is the field of view angle of the camera. This calculates the angle in a coordinate system attached to the robot such that its X-axis is along the robot's forward direction and Y-axis is towards the robot's left. ψ P can range between

]. The pedestrian's position with respect to the robot is then calculated as [p P x , p P y ] = d avg * [cos ψ P , sin ψ P ].

To estimate the distances between a pair of pedestrians, say P a and P b , we use the Euclidean distance function given by,

If dist(P a , P b ) < 6 feet for a period of time T (we choose 5 seconds), then the robot reports a breach for that pair of individuals. This process is repeated in a pairwise manner for all the detected individuals or pedestrian, and a list of pairs of non-compliant pedestrian IDs is obtained from the sensor data.

2) Social Distance Estimation Using a CCTV Camera: While the robot's RGB-D camera has the advantage of being mobile and being able to detect breaches anywhere, it is limited by a small FOV and sensing range. If a breach of social distancing occurs outside this sensing range, it will not be reported. To mitigate this limitation, we utilize an existing CCTV camera setup in indoor settings to widen the scope for detecting breaches. Pedestrian detection and tracking are done as described in Section III-C. We estimate distances between individuals as follows.

Homography: All CCTV cameras are mounted such that they provide an angled view of the ground plane. However, to accurately calculate the distance between any two pedestrians on the ground, a top view of the ground plane is preferable. To obtain the top view, we transform the CCTV camera's angled view of the ground plane by applying a homography transformation to four points on the ground plane in the angled view. The four points are selected such that they form the corners of the maximum area of a rectangle that can fit within the FOV of the CCTV camera (see Figure 5a and . This distance is then scaled by an appropriate factor S to obtain the real-world distance between the pedestrians. The scaling factor is found by measuring the number of pixels in the image that constitute 1 meter in the real-world.

If the real-world distance between a pair of pedestrians is less than 6 feet for a period of time T, a breach is reported for that pair. A list of all the pairs of non-compliant pedestrian IDs is then obtained.

Once a breach is detected through the robot's RGB-D camera and/or the CCTV camera, the robot must navigate towards the location of the breach and encourages the noncompliant pedestrians to move away from each other through an alert message. If the non-compliant pedestrians are walking, the robot pursues them until they observe social distancing. Prior to this, the robot must compute the location of the breach relative to itself. We detail this process in the following sections.

1) Classifying People into Groups: In social scenarios, people naturally tend to walk or stand in groups. We define a group as a set of people who are closer than 6 feet from each other (see Figure 2b) . Therefore, if the robot attends to a group, it can convey the alert message to observe social distancing to all the individuals in that group. In addition, when there are multiple groups of people breaching the social distancing norms, the robot can prioritize attending to each group based on the number of people in it. We classify noncompliant people into groups based on Algorithm 1. 2) Locked Pedestrian: Consider a dynamic group of noncompliant pedestrians. The robot's RGB-D camera or the CCTV camera must be able to track at least one member of that group to efficiently guide the robot towards that group. Our method chooses a person who has the least probability of moving out of the FOV of either the robot's RGB-D camera or the CCTV camera (depending on which camera detected the group), and locks on to him/her. This person is called the locked pedestrian. The identity of the locked pedestrian is updated as people's positions change. To find the locked pedestrian, we consider the centroid of the bounding box of each person in the largest group. The person whose centroid has the least lateral distance from the center of the image is chosen as the locked pedestrian. That is, the condition for locking a pedestrian is,

where I G is the set of IDs for the detected pedestrians in the current largest group and P lp denotes the locked pedestrian.

Once a pedestrian is locked, the robot localizes him/her relative to itself using the d avg and equation 3 in Section IV-A1). That is,

Where, o t goal is the location of the goal relative to the robot, d lp avg is the average distance and ψ P lp is the angular displacement of the locked pedestrian from the robot respectively. The DRL method and Frozone use o t goal to navigate the robot towards the locked pedestrian in a pedestrian friendly way without freezing. 4) Computing Goal Position Using a CCTV Camera: If the CCTV camera detects a breach and a locked pedestrian, the goal computation for the robot requires homogeneous transformations between three coordinate frames: 1. the topview image obtained after homography, 2. the ground plane, ).

We consider corner point 1 of the homography rectangle in the real world to be the origin of the coordinate system fixed to the ground plane (o gnd ) with its X and Y axes aligned with the X and Y axes of the top view image (see Figures 5a and c. Therefore, the angle θ lp−corn,top also corresponds to the angle between the two points on the ground plane θ lp−corn,gnd . The Euclidean distance (r lp−corn,top ) between the points on the top view image is calculated using Equation 4. The real world distance between the points, denoted as r lp−corn,gnd is then obtained by scaling r lp−corn,top using the factor S.

The location of the locked pedestrian in the ground coordinate frame is calculated as [x Where x robot,map and y robotmap are the X and Y coordinates of the robot in the map coordinate frame. Using this o t goal , the trained DRL policy computes the collision-free velocity towards the locked pedestrian. 5) Multiple Groups and Lawnmower Inspection: So far, we have discussed how a breach of social distancing norms can be detected using either the robot's RGB-D camera or an existing, independent CCTV camera setup. In the case where both cameras detect several groups of non-compliant pedestrians, the robot attends to the group with the most number of individuals. The robot attends to a group until everyone in the group observes the appropriate distancing measures. Once the robot is done attending to a group, the next largest group is selected and attended to. If the same group is detected in both cameras, the goal data computed using the CCTV camera will be used to guide the robot.

To improve the effectiveness of the integrated robot and CCTV system in detecting new non-compliant groups of pedestrians, the robot inspects the blind spots of the CCTV camera continuously by following the well-known lawnmower strategy. This expands the total area that the system is monitoring at any time instant. In addition, the lawnmower strategy guarantees that 100% of an environment can be covered by navigating to a few fixed waypoints, although it does not guarantee an increase in the number of breaches detected.

Once the robot reaches the vicinity of the locked pedestrian, the robot first displays the reason why they were approached on its mounted screen; the estimated distance between the people in the group. The robot then displays a message encouraging the people to stay apart from each other.

While this is a simplistic approach, this setup can be easily improved with a number of extensions in the future. For instance, the robot can also talk to the people in a group by either playing a recorded message, or a message from the security authorities. It can also be extended to include virtual AI applications that can assist people by understanding the context of the scenario.

As mentioned previously, the robot is also equipped with a thermal camera that generates images based on the differences in temperatures of different regions that it observes (see Figure  6 ). Our pedestrian detection scheme detects and tracks people on these images and the results are then sent to appropriate security or healthcare personnel who detects if an individual's temperature signature is higher than normal. Measures can then be initiated to trace the person for future contact tracing. We intentionally choose to have a human in the loop instead of performing any form of facial recognition to protect people's privacy.

Such a system would be useful in places where people's temperatures are already measured by security/healthcare personnel such as airports, hospitals etc. Monitoring people's temperatures remotely reduces exposure for security/healthcare personnel, thus reducing their risk of contracting the coronavirus.

In this section, we elaborate on how our system was implemented on a robot, explain the metrics we use to evaluate our system and analyze the effectiveness and the limitations of our method.

We implement our method on a Turtlebot 2 robot customized with additional aluminium rods to attach a 15-inch screen to display messages to the non-compliant pedestrians. We specifically chose the Turtlebot 2 due to its ease of customization and its light-weight and tall structure. The pedestrian detection and tracking algorithm is executed on a laptop with an Intel i9 8th generation CPU and an Nvidia RTX2080 GPU mounted on the robot. We use an Intel Realsense (with 70 o FOV) RGB-D camera to sense pedestrians and a Hokuyo 2-D lidar (240 o F OV ) to sense other environmental obstacles. hTo emulate a CCTV camera setup, we used a simple RGB webcam with a 1080p resolution mounted at an elevation. To process the images from the CCTV camera, we use a laptop with an Intel i7 7th generation CPU and an Nvidia GTX1060 GPU. We use a FLIR C3 thermal camera to generate the temperature signatures of the robot's surroundings. The ROS package for adaptive Monte-Carlo localization is used for locating the robot relative to the map coordinate frame.

We use the following metrics to evaluate our method.

• Accuracy of pedestrian localization: We compare the ground truth location of a pedestrian with the location estimated using our method as detailed in Sections IV-A1 and IV-A2. Higher localization accuracy translates to more accurate distance estimation and goal selection for the robot's navigation. • Number of breaches detected: This is the total number of locations in an environment at which a social distancing breach can be detected, given a total number of locations uniformly sampled from the environment. We measure this metric both in the presence and absence of occlusions in the environment. This metric provides a sense of the area in the environment that can be monitored by our system at any time instant. Higher values are better. • Number of enforcements: The number of times the robot attended to a breach once it was detected. We again measure this in the presence and absence of occlusions in the environment. Ideally should be equal to the number of breaches detected. • Tracking Duration for a mobile pedestrian: We measure the time for which the robot is able to track a walking pedestrian. Since the robot's RGB-D camera has a limited FOV, the robot must rotate itself to track a pedestrian for a longer time. This metric is a measure of the robot's effectiveness in pursuing a mobile locked pedestrian (in a group of people who are walking together).

1) Accuracy of Pedestrian Localization: We perform two sets of comparisons of the ground truth locations versus the estimated pedestrian location using 1. the robot's RGB-D camera, and 2. the CCTV setup. The plots are shown in Figure  7 , with the ground truth locations plotted as green circles and the estimated locations plotted as blue circles. The plot in Figure 7a shows the pedestrian being localized with respect to a coordinate axis fixed to the robot, with its positive Xaxis pointing in the robot's forward direction and the positive Y-axis pointing towards the robot's left. Figure 7b shows a pedestrian being localized in the ground coordinate frame.

We observe in Figure 7a that when a pedestrian is closer to the robot and closer to the X-axis of the robot, the localization estimates closely match the ground truth. If a pedestrian is farther away from the robot or near the exterior limits of the RGB-D camera's FOV, the errors between the estimates and the ground truth values increase. This is mainly because the robot localizes a pedestrian based on the centroid of the bounding box of the pedestrian, which is located on the person's torso, whereas the ground truth is measured as a point on the ground.

In addition, the orientation of the pedestrian relative to the RGB-D camera also affects the centroid of the bounding box and the localization estimate. However, since the maximum error between the ground truth and estimated values is within 0.3 meters, its effect on the social distance calculation and goal selection for the robot is within an acceptable limit. The accuracy can also be improved with higher FOV depth cameras in the future.

From Figure 7b we see a trend similar to the plot in 7a. The farther away a person is from the origin (o gnd ), the greater the error between the ground truth and the pedestrian's estimated location. This is due to the approximations in the homography in obtaining the top view from the angled CCTV view, which carries forward to computing [x Pa f eet,top , y Pa f eet,top ] (Section IV-A2). However, the maximum error between the estimates and ground truths is again within 0.25 meters. Also, since a pedestrian's location is estimated by the point corresponding to his/her feet, errors due to the pedestrian's orientation are less frequent. The average error in the distance estimation between pedestrians is ∼ 0.3 feet.

2) Breach Detection and Enforcement: In this experiment we compare the performance differences in detecting a social distancing breach and enforcing social distancing guidelines for three configurations: 1. CCTV only, 2. Robot only, and 3. Robot-CCTV hybrid system.

The detection and enforcement capabilities of these systems in dynamic scenes vary extensively depending on the initial orientation of the robot and the walking speed and walking directions of pedestrians. Therefore, we standardize the experiment by comparing the best performances of the three configurations in terms of their ability to detect crowding and Figure 7 : Plots of ground truth (blue dots) versus pedestrian localization (red dots) when using the robot's Realsense camera and the static CCTV camera with more FOV. a. The estimates from the Realsense camera tend to have slightly higher errors because we localize pedestrians using averaged proximity values within their detection bounding boxes, which is affected by the size of the bounding boxes. b. Localization using the data from the CCTV camera is more accurate as it tracks a person's feet. This method is not affected by a person's orientation. We observe that in both cases, the localization errors are within the acceptable range of 0.3 meters. Table I : Comparison of three configurations in terms of detecting breaches in social distancing norms when two pedestrians are static in any one of 40 points in a laboratory setting. We observe that CCTV + robot configuration has the most number of breaches detected even when the robot is static and outside the CCTV's sensing range. When the robot is mobile, following lawnmower waypoints outside of the CCTV's FOV, it can detect a breach in any of the 20 locations that could not be detected by the CCTV camera.

social distancing breaches in static scenes and the number of times the robot attended to those breaches in a laboratory setting. We demonstrate the robot's ability to track mobile pedestrians in the next section. For this experiment, we uniformly sample 40 points in our lab, with 20 points within the FOV of the CCTV camera, and 20 points outside it. Each one of those points could be a location for a social distancing breach at any time instant. We evaluate how many of these points are visible to both cameras and the effect of the robot's mobility. The robot is placed in a fixed location outside the sensing region of the CCTV camera for the static case, and in the mobile case, the robot moves along a lawnmower trajectory outside the CCTV's FOV. The social distancing breaches can also be partially occluded. When a breach is 50% occluded, we mean a scenario where a person blocks another person such that the half of the human body divided by the sagittal plane is visible to the camera.

The results are shown in Table I . As can be seen, the CCTVonly configuration is capable of detecting the standard 20 breaches within its sensing region. It can also handle occlusions between pedestrians better and detect breaches due to the CCTV camera's global view of the environment. It should be noted that this system is an improvement over current CCTV systems where a human manually detects excessive crowding and initiates countermeasures. However, there is no scope for enforcing social distancing at the location of the breaches.

The robot-only configuration detects fewer breaches (10 breaches) than the CCTV setup within the RGB-D camera's sensing region when the robot is static (due to its low FOV). Objects occluding the social distancing breaches also adversely affect the number of detections made by the robot. However, when the robot is moving along a lawnmower trajectory outside the CCTV's FOV, the robot detects the social distancing breaches that could be at any of the 20 locations regardless of whether they are occluded or not.

The robot-CCTV hybrid configuration provides the best performance of the three configurations in terms of detecting novel breaches at the most locations when the robot is static. This is because, when the robot is outside the sensing region of the CCTV camera, the hybrid configuration monitors the largest area in the environment. This configuration also provides better tracking capabilities when a pedestrian is walking (see Section V-C4). We also note that, in static scenarios, the robot attends to 100% of the breaches that are detected.

3) RGB-D Pedestrian Tracking Duration: In this experiment, we measure the duration for which the robot-only configuration can track walking pedestrians using only its onboard RGB-D sensor. Since it is limited by its FOV, continuously tracking a pedestrian who is walking out of the RGB-D camera's FOV is challenging. To counteract this limitation and track a pedestrian for a longer time, the robot has to rotate/move towards the pedestrian along the pedestrian's walking direction. We vary the walking speed of a pedestrian moving in a direction that is perpendicular to the orientation of the robot. We also vary the maximum angular velocity of the robot to measure the differences in tracking performance (Table II) .

We observe that the greater the maximum angular velocity of the robot, the better it can track a fast-moving pedestrian. However, since the robot is navigating among humans, we limit the maximum linear and angular velocities to 0.75 m/sec figure 5c . The pink and blue colors denote the static obstacles in the environment. a. The robot only uses its RGB-D sensors to track the pedestrian. The robot pursues the pedestrians successfully when they move in a smooth trajectory. b. The robot's RGB-D camera is unable to track the pedestrians when they make a sudden sharp turn. c. When the CCTV camera is used to track the pedestrians, the robot follows their trajectories more closely. d. Pedestrians making sharp and sudden turns can also be tracked. The black line denotes the point where the pedestrians leave the CCTV camera's FOV, from where the RGB-D camera tracks the pedestrians. Sharp turns in d again become a challenge. Figure 9 : Two mobile non-compliant pedestrians detected by the CCTV camera, pursued by our COVID-Robot in a laboratory setting.

The locked pedestrian is marked with a green dot at his feet. Note that the locked pedestrian is changed based on the positions of the two pedestrians in the CCTV footage. The robot pursues them until they maintain the appropriate distance. trian for different pedestrian walking speeds and maximum angular velocities of the robot. The pedestrian walks 5 meters in a direction perpendicular to the robot's orientation and it has to rotate and track the walking pedestrian. The ideal time for which a pedestrian should be tracked is given in the bracket beside the actual time. The robot can effectively track a pedestrian walking at up to 0.75 m/sec when its angular velocity is 1 rad/sec. and 0.75 rad/sec, respectively, to minimize the disturbance it causes them. We observe that capping the angular velocity makes it challenging for the robot to track pedestrians walking at > 0.75 m/sec. Even when the robot is used at its maximum 1 rad/sec angular velocity, pedestrians walking at 1 m/sec are difficult to track. This can only be alleviated in the future when depth cameras improve their range and FOV. 4) CCTV-Guided Walking Locked Pedestrian Pursuit: We qualitatively demonstrate how a robot pursues two walking non-compliant pedestrians by plotting their trajectories in the cases where the RGB-D (see figure.8a and b) or the CCTV camera (see figure. 8c and d) detects him/her. Figure 8 a shows that when the pedestrians walk in a smooth trajectory without sharp turns, the robot is able to successfully track them throughout their walk.

In figure 8b , we observe that when the pedestrians make a sharp turn and manage to go outside the limited FOV of the RGB-D camera, the robot is unable to pursue him/her. The pedestrians were walking at speeds ∼0.75 m/sec. This issue is alleviated when the CCTV camera tracks both the pedestrians instead of the RGB-D camera. Figure 8c and d show that the robot is able to track the pedestrians more closely and accurately with the goal data computed using the CCTV's localization. In addition, sudden and sharp turns by the pedestrians are handled with ease, and pedestrians moving at speeds ∼ 0.75 m/sec can be tracked and pursued, which was not possible with the robotonly configuration. When the pedestrians move out of the CCTV camera's FOV (black line in figures 8c and d), the data from the robot's RGB-D camera helps pursue the two pedestrians immediately. However, the pedestrians' sharp turns again becomes a challenge to track.

The robot pursuing two non-compliant pedestrians in our lab setting is shown in figure 9 .

We present a novel method to detect breaches in social distancing norms in indoor scenes using visual sensors such as RGB-D and CCTV cameras. We use a mobile robot to attend to the individuals who are non-compliant with the social distancing norm and to encourage them to move apart by displaying a message on a screen mounted on the robot. We demonstrate our method's effectiveness in localizing pedestrians, detecting breaches, and pursuing walking pedestrians. We conclude that the CCTV+robot hybrid configuration outperforms configurations in which only one of the two components is used for tracking and pursuing non-compliant pedestrians.

Our method has a few limitations. For instance, our method does not distinguish between strangers and people from the same household. Therefore, all individuals in an indoor environment are encouraged to maintain a 6-foot distance from each other. Our current approach for issuing a warning to violating pedestrians using a monitor has limitations, and we need to develop better human-robot approaches. As more such monitoring robots are used to check for social distances or collecting related data, this could also affect the behavior of pedestrians in different settings.

We need to perform more studies on the social impact of such robots. Due to COVID restrictions, we have only been able to evaluate the performance of COVID-robot in our low to medium density laboratory settings. Eventually, we want to evaluate the robot's performance in crowded public settings and outdoor scenarios. We also need to design better techniques to improve the enforcement of social distancing by using better human-robot interaction methods.

Agent-based simulation for weekend-extension strategies to mitigate influenza outbreaks

Policies to reduce influenza in the workplace: impact assessments using an agent-based model

Relevance of workplace social mixing during influenza pandemics: an experimental modelling study of workplace cultures

A small community model for the transmission of infectious diseases: comparison of school closure as an intervention in individual-based models of an influenza pandemic

Estimating the impact of school closure on influenza transmission from Sentinel data

Enabling and Emerging Technologies for Social Distancing: A Comprehensive Survey

Strategies for mitigating an influenza pandemic

Autonomous Social Distancing in Urban Environments using a Quadruped Robot

Decentralized noncommunicating multiagent collision avoidance with deep reinforcement learning

Motion planning among dynamic, decision-making agents with deep reinforcement learning

Reciprocal velocity obstacles for real-time multi-agent navigation

Reciprocal nbody collision avoidance

Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning

Fully Distributed Multi-Robot Collision Avoidance via Deep Reinforcement Learning for Safe and Efficient Navigation in Complex Scenarios

Realtime Collision Avoidance for Mobile Robots in Dense Crowds using Implicit Multi-sensor Fusion and Deep Reinforcement Learning

DenseCAvoid: Real-time Navigation in Dense Crowds using Anticipatory Behaviors

OF-VO: Reliable Navigation among Pedestrians Using Commodity Sensors

Frozone: Freezing-free, pedestrian-friendly navigation in human crowds

Unfreezing the robot: Navigation in dense, interacting crowds

Simple online and realtime tracking with a deep association metric

Yolov3: An incremental improvement

ACKNOWLEDGMENT This work is supported in part by ARO grant W911NF1910315 and NSF grant 2031901.