key: cord-0764922-o16ud1mm authors: Hu, Da; Zhong, Hai; Li, Shuai; Tan, Jindong; He, Qiang title: Segmenting areas of potential contamination for adaptive robotic disinfection in built environments date: 2020-08-26 journal: Build Environ DOI: 10.1016/j.buildenv.2020.107226 sha: ff60b5fc659bec1d004f6ac4eefbb256e0127c1c doc_id: 764922 cord_uid: o16ud1mm Mass-gathering built environments such as hospitals, schools, and airports can become hot spots for pathogen transmission and exposure. Disinfection is critical for reducing infection risks and preventing outbreaks of infectious diseases. However, cleaning and disinfection are labor-intensive, time-consuming, and health-undermining, particularly during the pandemic of the coronavirus disease in 2019. To address the challenge, a novel framework is proposed in this study to enable robotic disinfection in built environments to reduce pathogen transmission and exposure. First, a simultaneous localization and mapping technique is exploited for robot navigation in built environments. Second, a deep-learning method is developed to segment and map areas of potential contamination in three dimensions based on the object affordance concept. Third, with short-wavelength ultraviolet light, the trajectories of robotic disinfection are generated to adapt to the geometries of areas of potential contamination to ensure complete and safe disinfection. Both simulations and physical experiments were conducted to validate the proposed methods, which demonstrated the feasibility of intelligent robotic disinfection and highlighted the applicability in mass-gathering built environments. workers after a second confirmed COVID-19 case in New York [6] . Deep cleanings are also 52 conducted for school buildings during the closures [7] . Now disinfection is a routine and 53 necessary for all mass-gathering facilities, including schools, airports, transit systems, and 54 hospitals. However, manual process is labor-intensive, time-consuming, and health-undermining, 55 limiting the effectiveness and efficiency of disinfection. First, pathogens can survive on a variety 56 of surfaces for a long period of time. For example, norovirus and influenza A virus were found 57 on objects with frequent human contacts in elementary school classrooms [8] . The coronavirus 58 that causes severe acute respiratory syndrome (SARS) can persist on nonporous surfaces such 59 as plastics for up to 9 days [9] . Second, pathogens spread very quickly within built environments. 60 It was found that contamination of a single doorknob or tabletop can further contaminate 61 commonly touched objects and infect 40-60% of people in the facilities [10] . Hence, the cleaning 62 and disinfection workers are burdened by heavy workloads and subject to high infection risks. 63 Third, workers could be harmed by the chemicals and devices used for disinfection. For 64 instance, nurses who regularly clean surfaces with disinfectants were found to be at a higher 65 risk of chronic obstructive pulmonary disease [11] . Exposure to disinfectants was also found to 66 cause asthma [12] . Therefore, there is a critical need for an automated process for indoor 67 disinfection to replace human workers from such labor-intensive and high-risk work. 68 69 To address this critical need, the objective of this study is to create and test a novel framework 70 and new algorithms for a robotic manipulator to conduct automatic disinfection in indoor 71 environments to reduce pathogen transmission and exposure, and thus potentially prevent 72 outbreaks of infectious diseases. The contribution of this study is twofold. First, a deep-learning 73 method is developed to detect and segment the areas of potential contamination. Using the 74 visual simultaneous localization and mapping (SLAM) technique, the segmented areas of 75 potential contamination are mapped in a three-dimensional (3D) space to guide the robotic 76 disinfection process. Second, a new method is proposed to control a robot to move to the areas 77 needing disinfection, and generate trajectories based on the geometries of areas of potential 78 contamination and surrounding contexts. The adaptive motion will ensure disinfection quality 79 and safety. The rest of the paper is organized as follows. Related studies are reviewed in 80 Section 2 to reveal the knowledge gaps and technical barriers to be addressed in this study. 81 Then, the framework and methods are elaborated in Section 3, followed by the experimentation 82 and evaluation in Section 4. Section 5 concludes this study by discussing the applicability of 83 robotic disinfection in built environments, and limitations, and future research directions. Table 1 84 presents a list of abbreviations used in this paper. framework was proposed to allow construction robots to perceive and model the geometry of its 157 workpieces using sensors and building information model data. In our application, after 158 identifying the areas of potential contamination, the robot needs to move to an appropriate 159 position and adapt its trajectory with respect to the geometry of the areas of potential 160 contamination for complete disinfection and avoiding collision with adjacent objects. The areas of potential contamination need to be automatically detected and mapped in 3D 205 space to guide robotic disinfection. Particularly the object surfaces with frequent human 206 contacts are the areas of potential contaminations requiring disinfection. Therefore, those areas 207 need to be automatically detected and segmented from the RGB images, and thereafter 208 projected to a 3D semantic map for robot navigations and actions. map 116 object labels to the corresponding five object affordance labels. Table 2 presents 225 several examples. Each object or its part is associate with a five-dimensional vector, 226 representing the five object affordance labels. The value 1 indicates that a specific object 227 affordance is associated with an object or its part, and value 0 indicates that a specific object 228 affordance is not associate with an object or its part. For example, "floor" is associated with 229 "walk" affordance, "*/door/knob" is associated with "pull" affordance. If the correspondence 230 between an object and the five affordance labels cannot be established, then the association 231 will not be performed to ensure the reliability of predicting affordance from the trained network. 232 Fig. 4 (a) presents an example of the label transformation. Using Table 2 , annotated data from 233 ADE20K can be transferred to affordance ground truth data. For instance, seat base is 234 transferred to sit affordance. Fig. 4 of convolution, batch normalization, ReLU, and max-pooling layers. An initial 7*7 convolution 254 with a stride of 2 is first applied, followed by the batch normalization and ReLU activation layer. 255 Thereafter, a max-pooling operation is conducted with a kernel size of 3 and a stride of 2. The 256 two steps can reduce the spatial size, and thus reduce the computation cost and the number of 257 parameters in the deep layers. In the bottleneck, the network has four connected blocks. After segmenting the object affordance from the 2D RGB images as the areas of potential 273 contamination, it is necessary to project the 2D labels to a 3D grid map for guiding robot 274 navigation and disinfection. As depth images are registered to the reference frame of RGB 275 images, the first step is to use the classical pinhole camera model [61] to obtain the point cloud 276 of the environment. Given a pixel (x, y) and its depth d, its world coordinate (X, Y, Z) is 277 computed by Eq. (1), where f x , f y are the camera focal length in pixel units, (c x , c y ) represents 278 the principal point that is usually at the image center. Fig. 6 presents an example of the obtained 279 point cloud. Each point stores information of world coordinates, label information, and its highest 280 probability predicted by the network. 281 Second, octomap library [62] is applied to generate a 3D occupancy grid map, using the 287 obtained point cloud as input. A voxel filter is used to reduce the size of the point cloud to 288 accelerate the mapping process. In each voxel space, only one point is stored as one point is 289 adequate to update an octree node. The voxel filter resolution is set to the same resolution as 290 that of the occupancy map. The resolution of the occupancy map is set as 4 cm, which can 291 provide adequate details in indoor environments while maintaining processing efficiency. Fig. 7 292 presents an example of a 3D point cloud filtering. The image size is 960×540 and 518,400 293 points are generated for each frame. After using the voxel filter, the number of points reduces to 294 23,009 for the frame shown in Fig. 7 . The number of filtered points could vary from frames due 295 to noises in sensory data. 296 297 298 Fig. 7 . Using voxel filter for 3D mapping. 299 300 Since the camera is constantly moving, semantic information may continuously update. For 301 instance, a small object may not be accurately segmented when the camera's view angle is not 302 at a favorable position. Hence, semantic information at the pixel level from different frames are 303 fused to deal with this situation, see Fig. 8 . If two affordances are the same, the affordance will 304 be kept, and the probability becomes the average of the two affordances. Otherwise, the 305 affordance with higher confidence is kept and the probability is decreased to 0.9 of its original 306 probability. This process can allow the occupancy map to update the semantic information with 307 a new prediction of higher confidence. After the above steps, the areas of potential 308 contamination are predicted and projected to the 3D occupancy map, which can further guide 309 the robotic disinfection. After mapping the areas of potential contamination, the next step is to generate robot motions to 316 scan the areas with UV light for disinfection. The robot has a 3 degree of freedom base and 6 317 degree of freedom manipulator. First, the robot needs to move to the objects needing 318 disinfection. A hierarchical planning approach is adopted, which consists of global and local 319 path planning. Global path planning provides an optimal path from the start to the goal, and 320 local path planning outputs a series of velocity commands for the robot. The A* algorithm [63] is 321 used to find a globally optimal path for the robot. The heuristic function h(n) is used to guide the 322 trajectory search toward a goal position. The A* algorithm can find the shortest path very 323 efficiently. In this study, the Manhattan distance is used as the heuristic function that is defined 324 in Eq. (2). This equation is used to calculate Manhattan distance from any node (n (x n , y n )), to 325 the goal (g (x g , y g )) in the graph. samples intersect with obstacles will be recognized and eliminated. An optimal pair of (v, w) for 334 the robot is determined by maximizing the objective function defined in Eq. (4), which is 335 dependent on (1) proximity to the global path, (2) proximity to the goal, and (3) proximity to 336 obstacles. 337 where f a (v, w) represents the distance between global path and the endpoint of the trajectory, 339 To make the disinfection process more efficient, the robotic arm is preprogrammed to adapt to 374 objects with various geometries. As shown in Fig. 10 The DSC is similar to the IoU, which is another measure of overlap between prediction and 424 ground truth. This measure ranges from 0 to 1, where The AP metric summarizes a precision-recall curve as the weighted mean of precisions 428 achieved at each threshold. AP is not dependent on a single threshold value since it averages 429 over multiple levels. The AP is defined in Eq. (7), where P n and R n are the precision and recall 430 at the nth threshold, and P n is defined as precision at cut-off n in the list. testing sets. The training set achieved the highest mAP, mIoU, and mDSC since the model is 447 optimized using this set. The testing set #1 achieved the second-highest scores, and the 448 difference of all the three metrics between the training set and testing set #1 is less than 0.1. 449 However, testing set #2 achieved the smallest scores among the four datasets. This is because 450 the training set contains both real and simulated images, while testing set #2 only contains real 451 images. Synthetic images cannot reproduce richness and noise in the real ones, which may 452 lead to the network trained on synthetic images performs undesirable on real images. Therefore, 453 the network trained on both simulated and real images have a better performance on a testing 454 set combines both samples. 455 456 Fig. 11 The performance of the network on the training set, validation set, and two testing sets. affordance walk achieves the highest IoU and AP scores, which is attributed to a relatively large 465 sample size compared to other affordances such as grasp and pull. In addition, walking surface 466 often covers large areas in the scene. Pull has the lowest prediction accuracy among the five 467 affordances. The pull affordance represents objects that can be pulled such as doorknob and 468 cabinet handle. These objects are relatively small and have a small sample size in the dataset. 469 The walk, grasp, place, and sit affordances achieved DSC and AP scores higher than 0.5, 470 indicating the usability of the proposed method in built environments. 471 472 map using the network. The frame size provided by the Kinect is 960*540 pixels. Fig. 12 shows 487 the predicted affordances in images captured in the building. Walk, Grasp, Pull, Place, and Sit 488 affordances are color-coded, and the color intensity represents their corresponding probabilities. 489 490 491 Fig. 12 . Results of affordance segmentation 492 493 Fig. 13 presents the results of 3D semantic occupancy mapping. Images were obtained to 494 perform RTAB-Map SLAM to obtain camera poses. Thereafter, semantic reconstruction was 495 conducted using recorded video and camera trajectory. At a resolution of 4cm, the indoor scene 496 can be properly reconstructed. The results indicate that the proposed method can successfully 497 segment affordances. The walk, place, sit, and grasp affordances are reasonably segmented. In 498 Fig. 13 (a) , small tablet arm of sofa on the left side is correctly segmented as place affordance. 499 However, small objects like doorknob are not correctly recognized in the semantic map. In 500 addition, a part of the table surface is wrongly identified as walk affordance. This is possibly due 501 to the small size of the training data. The occupancy map can be continuously updated during 502 the robot disinfection action to address the incorrect segmentation. 503 Fig. 13 . Results of 3D semantic reconstruction 505 506 The processing time for each step was assessed in this study. Table 4 presents the average 508 time spent on each processing stage. The occupancy map resolution is set as 4 cm. As shown 509 in the table, the processing frequency of the entire system is about 3.2 Hz and 4.0 Hz for image 510 size 960×540 and 512×424, respectively. The octomap update is the most time-consuming step 511 in the system, since it requires raycasting to update occupancy map. The raycasting is used to 512 clear all the voxels along the line between the origin and end point. The SLAM method achieves 513 a high frame rate to track the camera in real time. Semantic segmentation and semantic point 514 cloud generation are also run at a very high frame rate. Our system runs at 3.2 Hz for a high-515 resolution image streaming, which can be adapted to most indoor online applications. 516 517 Table 4 Average processing time for each step (Process with * and process with ** 518 executed at the same time) 519 Step Image resolution (512×424) Fig. 15 . Implementation of robot navigation. Red arrow is the pose of a goal point. 549 550 After navigating to the areas of potential contamination, trajectory will be generated to perform 551 disinfection. Fig. 16 In addition, a physical experiment was conducted using an AUBO-i5 robotic arm with a UV light 564 attached as its end effector. The UV light will automatically turn on when it is close to the object 565 surface requiring disinfection and shut off when moving away. As shown in Fig. 17 and collision-free path to clean the area. The developed robots present a promising and safe 588 solution to reduce the transmission and spread of microbial pathogens such as influenza and 589 coronavirus. 590 591 Using the proposed method has at least two benefits in cleaning and disinfection practice. First, 592 the robot platform can reduce the infection risk of cleaning workers by keeping them away from 593 contaminated areas. Second, affordance information can guide the robot to focus on hot spots 594 and thoroughly disinfect potentially contaminated areas. Thus, the developed methods will help 595 reduce the seasonal epidemics, as well as pandemics of new virulent pathogens. The 596 developed method achieved high accuracy in segmenting floors and high-touch surfaces as 597 areas of potential contamination. Empirical evidence [75] suggests that floors can harbor a 598 variety of pathogens including the SARS-CoV-2 for a long period. Human movements can lead 599 to resuspensions of pathogens deposited on the floor, further contaminating other surfaces. 600 Hence, to avoid reciprocal contamination and enhance disinfection efficiency, both floors and 601 high-touch areas need to be disinfected. The performance in 3D segmentation and mapping 602 achieved by this method demonstrated its applicability. With the developed perception capability, 603 functionalities such as vacuum, spray, and mopping can be incorporated into the robot system 604 for floor cleaning. In addition, comparing with overhead UV lights, the robotic disinfection 605 proposed in this study can reach the places where conventional overhead lights cannot reach. 606 Conventional germicidal UV lights can lead to skin cancer and cataracts, which pose a health 607 threat to humans. The precision UV light scanning achieved by the adaptive robot motion can 608 ensure the continuous and complete disinfection of high-touch areas, which is safer, and more 609 efficient and effective than the overhead UV lights. 610 611 There are some limitations that need to be addressed in future studies. First, the network 612 reported a low accuracy in segmenting the areas of potential contamination on small objects 613 such as doorknobs and cabinet handles in unfavorable circumstances. The low accuracy stems 614 from the scarcity of the available data. Future studies are needed to augment the dataset and 615 develop more robust deep learning algorithm for 3D segmentation. Second, the developed robot 616 system only employs UV lights to disinfect high-touch areas. However, other operation modes 617 such as vacuum, spray, and swipe are needed to clean and disinfect a variety of surfaces, 618 including floors. Advanced control techniques should be developed and parameters such as 619 scanning time, spray dose, and swipe force should be calibrated for these modes to achieve 620 optimal disinfection performance. Third, this study considers a single robot for disinfection at a 621 room scale. A fleet of robots might be needed to disinfect a large facility such as a hospital or an 622 airport. The planning and scheduling of multiple robots for coordinated disinfection will be an 623 interesting and useful future study. Fourth, human presence and social context have not been 624 considered in this research. The robot should never point the UV light to humans and should not 625 spray disinfectant in vicinity of humans. Moreover, the robot should not interrupt ongoing human 626 activities for disinfection. Future studies are needed to enable the robots to learn the rules and 627 understand the contexts, which are important for the actual deployment in human-centric built 628 environments. 629 630 Modelling microbial infection 636 to address global health challenges An interactive web-based dashboard to track COVID-19 in 639 real time Economic burden of 642 seasonal influenza in the United States -2020 U.S. Flu Season: Preliminary Burden Estimates Prevention of device-related healthcare-associated infections Coronavirus forces New York City subways, trains to clean up their act, 650 NBC News Every public and private school in Illinois is closed because of the coronavirus Here's what you need to know Occurrence of bacteria and viruses on elementary 657 classroom surfaces and the potential role of classroom hygiene in the spread of infectious 658 diseases Persistence of coronaviruses on 660 inanimate surfaces and its inactivation with biocidal agents How quickly viruses can contaminate buildings --from just a single 662 doorknob Association of Occupational Exposure to 666 Disinfectants With Incidence of Chronic Obstructive Pulmonary Disease Among US 667 Female Nurses Association of household cleaning 670 agents and disinfectants with asthma in young German adults Application of Service Robots for Disinfection in Medical Institutions Implementation of Xenon Ultraviolet-C Disinfection Robot to Reduce Hospital 676 Acquired Infections in Hematopoietic Stem Cell Transplant Population s subway is sending robots to disinfect trains of coronavirus hTetro: A tetris inspired shape 682 shifting floor cleaning robot A tiling-theoretic 685 approach to efficient area coverage in a tetris-inspired floor cleaning robot Floor cleaning robot with 688 reconfigurable mechanism 690 Expressing attention requirement of a floor cleaning robot through interactive lights Improved techniques for grid mapping with rao-693 blackwellized particle filters A SLAM algorithm in less than 200 lines C-language 695 program Hector Open Source Modules for Autonomous Mapping and Navigation with Rescue 698 Comparing ICP variants on real-world 700 data sets Real-time loop closure in 2D LIDAR SLAM 704 maplab: An open framework for research in visual-inertial mapping and localization Autonomous aerial 707 navigation using monocular visual-inertial fusion RTAB-Map as an open-source lidar and visual simultaneous 709 localization and mapping library for large-scale and long-term online operation Orb-slam2: An open-source slam system for monocular, stereo, 712 and rgb-d cameras Dense visual SLAM for RGB-D cameras Object detection with 716 discriminatively trained part-based models Recurrent convolutional neural network for object recognition Imagenet large scale visual recognition challenge Ssd: Single 724 shot multibox detector You only look once: Unified, real-time 726 object detection Learning object-to-class kernels for scene classification Scene classification with 730 semantic fisher vectors Indoor scene understanding with 733 geometric and semantic contexts Indoor scene understanding with rgb-d 735 images: Bottom-up segmentation, object detection and semantic segmentation Functional object class detection based 738 on learned affordance cues Reasoning about object affordances in a knowledge base 741 representation What can i do around here? deep 743 functional scene understanding for cognitive robots Weakly supervised affordance detection Visual affordance and function understanding: A 748 survey, ArXiv Prepr. ArXiv1807.06775 A Variable-Structure Robot Hand That Uses the 750 Environment to Achieve General Purpose Grasps Jamming-Free Immobilizing Grasps Using Dual-Friction 753 Dynamic regrasping by in-hand orienting of grasped objects using 756 non-dexterous robotic grippers OCOG: A common grasp computation algorithm for 759 a set of planar objects Robot grasp planning based on demonstrated grasp strategies Randomized physics-based motion planning for grasping in 764 cluttered and uncertain environments Autonomous motion planning and 766 task execution in geometrically adaptive robotized construction work High performance loop closure detection using bag of 769 word pairs ORB: An efficient alternative to SIFT or 772 SURF Learning to Segment Affordances Scene parsing through 776 ADE20K dataset Learning to Label Affordances from Simulated and Real Data Context-based affordance segmentation from 2D 780 images for robot actions Convolutional networks for biomedical image 783 segmentation Deep residual learning for image recognition Learning to refine object segments Laplacian pyramid reconstruction and refinement for semantic 791 segmentation OctoMap: An efficient 794 probabilistic 3D mapping framework based on octrees A Formal Basis for the Heuristic Determination of 797 Minimum Cost Paths The dynamic window approach to collision avoidance An open-source library for improved solving of generic 802 inverse kinematics Jointly optimize data augmentation 805 and network training: Adversarial data augmentation in human pose estimation Learning deconvolution network for semantic segmentation Fully convolutional instance-aware semantic 810 segmentation Semantic segmentation of small objects and 812 modeling of uncertainty in urban remote sensing images using deep convolutional neural 813 networks Automatic differentiation in pytorch Lecture 6.5-rmsprop: Divide the gradient by a running average of 817 its recent magnitude Imagenet: A large-scale 819 hierarchical image database Understanding deep learning 822 requires rethinking generalization A multi-scale cnn for affordance segmentation in rgb images Aerosol and Surface Distribution of Severe Acute Respiratory Syndrome Coronavirus 2 in 828 Hospital Wards The authors gratefully acknowledge NSF's support. 633