key: cord-0452959-xu6gelt3 authors: Wang, Fei; Wu, Xilei; Wang, Xin; Chi, Jianlei; Shi, Jingang; Huang, Dong title: You Can Wash Better: Daily Handwashing Assessment with Smartwatches date: 2021-12-09 journal: nan DOI: nan sha: 3dbdbd1b92a055b18b7887a1f4f3c9f88ab212d5 doc_id: 452959 cord_uid: xu6gelt3 We propose UWash, an intelligent solution upon smartwatches, to assess handwashing for the purpose of raising users' awareness and cultivating habits in high-quality handwashing. UWash can identify the onset/offset of handwashing, measure the duration of each gesture, and score each gesture as well as the entire procedure in accordance with the WHO guidelines. Technically, we address the task of handwashing assessment as the semantic segmentation problem in computer vision, and propose a lightweight UNet-like network, only 496KBits, to achieve it effectively. Experiments over 51 subjects show that UWash achieves the accuracy of 92.27% on sample-wise handwashing gesture recognition, $<$0.5 textit{seconds} error in onset/offset detection, and $<$5 out of 100 textit{points} error in scoring in the user-dependent setting, while remains promising in the cross-user evaluation and in the cross-user-cross-location evaluation. In the current COVID-19 pandemic, washing hands professionally is critical, more than ever, for people to prevent the transmission of virus in daily life. We conducted an online questionnaire on handwashing knowledge and practices over 505 subjects across 26 provinces in China. The survey shows 96.04% of subjects have heard of standard handwashing guidelines but only 34.65% of subjects follow the guidelines, according with the situation reported in other two surveys conducted in Germany [Mieth et al., 2021] and in Nigeria [Wada and Oloruntoba, 2021] , which reveals that many ordinary people do not have adequate practice to wash hands professionally, regularly, and sufficiently. For the daily-life usage, automatic handwashing assessment systems should be able to work along with individual users, for our handwashing may happen at home, restaurants, workplaces, etc. Unfortunately, most of current work can only monitor users' handwashing at sinks where deployed -sensors such as pressure sensors [Kinsella et al., 2007] , alcohol sensors [Edmond et al., 2010] , ultrasonic hotspots [Rabeek et al., 2014] , RFID tags [Pineles et al., 2014] , and Bluetooth [Mondol and Stankovic, 2015; Cao et al., 2021] are embedded into the dispensers to simply count the happens of handwashing or estimate handwashing procedures; cameras [Llorca et al., 2011] , depth cameras [Zhong et al., 2016] , and mmWave devices [Khamis et al., 2020] are designed to be placed on wall. Besides, from the functionality aspect, (1) the systems should not require additional user operations. The official app of Apple Watch and iWash [Samyoun et al., 2021] require users to awaken the monitoring systems manually before washing hands, harming the using will. (2) people's handwashing procedures vary in their gesture sequences and the gesture duration, so the systems have to handle these diversities. (3) the systems should be able to assess individual handwashing gestures and the entire procedure to help users to check and improve their handwashing techniques. No existing work, to our best survey, meets all the above requirements simultaneously, thus we propose UWash. As shown in Fig. 1 , UWash leverages accelerometers and gyroscopes of smartwatches to assess users' handwashing techniques in 4 aspects, i.e., detecting the onset/offset of handwashing, estimating the duration of each handwashing gesture, scoring the estimated gestures in accordance with WHO guidelines, and finally scoring the entire procedures. Technically, we novelly ascribe the task of handwashing assessment as the semantic segmentation task in computer vision [Long et al., 2015] , and develop a simple yet effective U-Net [Ronneberger et al., 2015] variant network to achieve [Mondol and Stankovic, 2015] no no no yes no no no Depth camera [Zhong et al., 2016] no no no yes no yes no WristWash [Li et al., 2018] yes yes no yes yes yes no RFWash [Khamis et al., 2020] yes no no yes no yes no HAWAD [Mondol and Stankovic, 2020] yes yes no no yes no no Armband [Wang et al., 2020] yes yes no no yes yes no iWash [Samyoun et al., 2021] no no no no no no no AWash [Cao et al., 2021] no no no yes yes yes no UWash (Ours) yes yes yes yes yes yes yes it. Recall that semantic segmentation in computer vision is aligning every pixel in an image to the category of things or stuffs, given the time-serial sensory recordings of smartwatches, UWash aligns every sample in a sliding window to the background state or a specific handwashing gesture. As shown in Fig. 1 , with the alignment, UWash achieve all functions mentioned above without bells and whistles. Our contributions are three folds. (1) We summarize 4 requirements that an automatic handwashing assessment system should meet for people's daily use, and propose UWash that meets all. (2) We consider the handwashing assessment task as a semantic segmentation problem. Under the setting, we achieve the task with a simple yet effective U-Net variant. (3) UWash is the first work that can score the handwashing procedure following the WHO guidelines to help users to improve their handwashing techniques. Hand hygiene has already been crucial to preventing healthcare-associated infections in hospitals. Human mandatory audits are applied to improve the healthcare workers' compliance with the WHO guidelines. However, this approach is labor-intensive, time-consuming, and costly. Thus automatic handwashing monitoring systems are proposed to facilitate healthcare workers' adherence. Among them, sensors such as alcohol sensors [Edmond et al., 2010] , pressure sensors [Kinsella et al., 2007] [Khamis et al., 2020] are placed on wall to estimate fine-grained handwashing procedures. Since sensors or devices in these work are required to be deployed close to the handwashing sinks in hospitals, they are not suitable for people's daily use. Wearable devices such as wristbands [Li et al., 2018; Mondol and Stankovic, 2020] , armbands [Wang et al., 2020] , and smartwatches [Mondol and Stankovic, 2015; Samyoun et al., 2021; Cao et al., 2021] are also proposed to monitor handwashing in recent years. However, wearing wristbands and armbands in daily life is an extra interaction for users, limiting a widespread use. Considering this, smartwatches are ideal platforms for monitoring the handwashing procedures. Unfortunately, current smartwatch based work cannot detect the onset/offset of handwashing, requiring to work along with Bluetooth sensors on dispensers [Mondol and Stankovic, 2015; Cao et al., 2021] or to be awakened manually [Samyoun et al., 2021] , where the former limits the use places and the latter harms the use frequency. Besides requirements in user experiences, we believe if the handwashing monitoring system can score every individual handwashing gesture and the entire procedure following WHO guidelines, it will help people to improve their handwashing techniques following the reported scores. Thus we propose UWash to achieve this. UWash conducts handwashing gesture semantic segmentation on time-serial sensory recordings of smartwatches sliding window by sliding window. We use L to represent the length of the sliding windows. We use N to represent the size of the training dataset. We use A and G to represent data of the accelerometers and gyroscopes, respectively. We use Y for the sample-wise gesture annotations. With these symbols, the training dataset is ., L; j ∈ 1, 2, ..., N }. Note that accelerometers and gyroscopes are in 3 dimensions, which are not explicitly shown for brevity. We further simplify the notation of the training dataset as D = {A, G; Y }. Our goal is to propose a machine learning model M that takes A and G as inputs, and outputs samplewise gesture recognition results, Y * , expressed below, where · is for the operator to compute distances between the model's outputs (Y * ) and annotations (Y Figure 3 : Architecture of UWash. The dual-branch U-Nets take data from two modality sensors, i.e., accelerometers and gyroscopes, as inputs, respectively. Feature maps from two branches are further concatenated in high-level layers for sample-wise gesture recognition. is a pixel-wise classification architecture widely used in the visual semantic segmentation task. We replace 2D convolutions in U-Net with 1D convolutions to conduct sample-wise gesture alignment on the 1D time-serial sensory recordings of smartwatches. The network architecture is shown in Fig. 3 , reasons of several proposed modifications explained next. (1) Two Input Branches. Accelerometers and gyroscopes of smartwatches measure linear accelerations and angular accelerations respectively, describing two physical quantities in different scales. To properly leverage data in dual-modalities, we have to do data normalization before merging them for the later task. To conduct automatic modality normalization, we feed raw accelerometer data and raw gyroscope data into the two branches of U-Nets, respectively. We also apply Batch Normalization [Ioffe and Szegedy, 2015] and Leaky Rectified Linear Unit [Maas et al., 2013] to facilitate the normalization between two modalities. (2) Squeeze-and-Excitation Module [Hu et al., 2018] . Though features learned from the dual-branch U-Nets are considered to be normalized, we still believe their contributions to the final gesture recognition are always not equal. For example, as shown in Fig. 1 , accelerometers are more sensitive than gyroscopes for G1, while gyroscopes are more sensitive for G6. To re-weight the contribution on handwashing gesture recognition of the dual-modal sensors, we apply Squeeze-and-Excitation Modules on the learned features. After that, we concatenate the re-weighted features along the channel dimension for the gesture recognition. (3) Pyramid Pooling Module (PPM) [Zhao et al., 2017] . PPM is proved to be efficient to harvest features across different field of views. Therefore, we apply it at the middle of U-Nets. Before fed into PPM, feature maps are with size of 8 × 64, where 8 and 64 represent temporal dimension and channel dimension, respectively. We use three average pooling operations with window/stride sizes of 8, 4, and 2 on the feature maps, generating three outputs with the size of 1 × 64, 2 × 64, and 4 × 64, respectively. We then use convolutions with the kernel size of 1 × 1 to reduce the channel to be of 16, for the channel balance between the 3 pooling outputs and the input feature maps. Further, we upsample the pooling outputs and concatenate them with the input feature maps. (4) Input length of 64. We find the sliding window size of 64 (1.28 seconds) is sufficient to discriminate gestures, for the sampling rate of the smartwatch is 50Hz. Besides, since larger inputs lead to larger models, we set the input length to be 64 instead of 128 or larger for a smaller model. Actually, UWash models are quite lightweight, only 496KBits without any model compression. We use Pytorch 1.9.0 to implement the network. The initial learning rate is 0.001. The batch size is 16k. We use Cross-Entropy losses and Adam [Kingma and Ba, 2014] to optimize the network. We train the network for 500 epochs. In testing, given a test time-series, we conduct gesture recognition with the sliding window size of 64 and the stride of 64. We show an initial recognition output in Fig. 4 , which demonstrates some false classification. Thus, we further apply two post-smoothing methods on the initial recognition outputs. (1) Multiple Test Voting (MTV). We conduct gesture recognition over the test time-series via the sliding window size of 64 with the stride of 1, resulting in multiple outputs on each sample. For each sample, we take the mode of all its outputs as its final recognition result. (2) The Mode Filter (TMF). For each sample, we use the output mode of its nearest 128 samples as its final recognition result, naming a Mode Filter with window size of 128 and stride of 1. A post-smoothed example is visualized in Fig. 4 , which shows that MTV and TMF can effectively reduce errors and improve the sample-wise handwashing gesture semantic segmentation, numerical results reported in Section 4. Scoring the quality of handwashing procedures is complex, which requires expertise on the type of handwashing gestures, the completion of gestures, the duration of gestures, etc. In this paper, we deformalize it as considering the duration of each gesture with respect to the WHO guidelines. To obtain the guideline-recommended duration, we collect 60+ online videos that describe WHO guidelines and carefully select 12 of them as reference, ignoring those with slow play, fast play, over-detailed explanation, etc. Table 2 shows the recommended duration of each handwashing gesture in these videos. In addition, we remove the maximum and minimum of each gesture and compute the average as the professional handwashing duration, denoted as D p i , i ∈ [1, 2, ..., 9]. We have two empirical assumptions. (1) Since each gesture emphasizes cleaning one part of hands, we assume each gesture is equally important in handwashing. (2)The quality of cleaning under each gesture increases linearly with its duration, and the perfect quality is reached and saturated when the duration is equal to or greater than the professional duration. Given these assumptions, we score handwashing via where 100 9 is the peak score of each gesture, to match the first assumption; D e i represents the estimated duration of the i-th gesture; min(1, 4.9 G2 1.5 3.5 3 4.5 3 3.5 4.5 3.5 3.5 6.5 4.5 3 3.65 G3 1.5 3.5 3 4.5 3 3.5 4.5 3.5 3.5 6.5 4.5 3 3.65 G4 4 4 3 5 6 5 6 6 11 10 5 2 5.4 G5 2 4 3 5 5.5 3.5 5 3.5 3.5 8.5 4 3 4 G6 2 3 2 5.5 4.5 2.5 4 3.5 3.5 4 5 2.5 3.45 G7 2 3 2 5.5 4.5 2.5 4 3.5 3.5 4 5 2.5 3.45 G8 3 3 2.5 4.5 5.5 3.5 6.5 3 4 6 5 3.5 4.1 G9 3 3 2.5 4.5 5.5 3.5 6.5 3 4 6 5 3.5 4.1 We use the Samsung Gear Sport smartwatches to records data of motion sensors and corresponding timestamps [Fomichev et al., 2019] . To increase the diversity of external conditions such as the type of hydrants, sinks, dispenses, etc., with IRB approval, we collect data at 5 buildings on a campus, i.e., a teaching hall, a laboratory hall, a cafeteria, a dormitory, and a library. At each building, we randomly recruit 10 passersby (11 at the laboratory hall) as participants and train them to wash their hands with WHO guidelines. To act like the daily handwashing procedures, participants were asked, with smartwatches normally worn, to conduct activities including walking to the sink, washing hands, walking out of the restroom, while other activities such as wetting hands with water, applying soap, drying hands with a towel are not mandatory, depending on their behaviors. We denote gestures in WHO guidelines as category 1 to category 9, and all other activities as category 0. Every participant repeats the procedure 5 times, which is a tolerable number and would not cause any hand discomfort. Along with the sensor data, we also use codes from [Wang et al., 2019] to record videos on the participants' hands and corresponding timestamps. Five labeling workers watch the synchronized video streams to label gestures on motion sensory data collected at each building respectively. Further, we segment motion sensory records with the sliding window size of 64 and the stride of 1, leading to a simple data augmentation of 63 times. In all, the data acquisition process involves 51 participants and 5 locations, resulting in a dataset with 804991 instances. We first evaluate UWash on all participants under the userdependent setting. For each participant, we use instances corresponding to the first 4 handwashing procedures as the training set, and the last ones as the test set, leading to the training and test set with instances of 643971 and 161020, respectively. In this paper, the training set and the test set have no overlap in all evaluations. (1) Overall Results. We report overall results including accuracy, precision, recall, and F1 score in Table 3 . The accuracy is computed via Equation 3. where N p represents the length of the test time-series of the p-th participant; S p,i and S * p,i represent the ground-truth and the UWash output on the i-th sample of the p-th participant, Figure 5 : Results on participants. 33 of 51 participants are more than 95%; 6 of 51 are less than 85% With simple yet efficient post smoothing methods, i.e., multiple test voting and the mode filter (see Section 3.2), UWash can eventually achieve an accuracy of 92.27%. Because this is a 10-class classification task (9 handwashing gestures + 1 background), we have 10 precisions, 10 recalls, and 10 F1 scores. We report the means as mPrecision, mRecall, and mF1 in Table 3 . Consistent with accuracy, these three metrics show that UWash achieves a good performance, and can be further enhanced with MVT and TMF. (2) Performance on Gestures. We show the confusion matrix of UWash+MTV+TMF on 10 gestures in Fig. 6 . Though the data of these 10 gestures are not quite balanced (29.2% of background), UWash works well for all gestures, especially for the 2nd, 3rd, 6th, and 7th gestures. Besides, we find two types of false recognition are dominant. The first happens between the background and gestures of 1 and 9. We think this is because the background contains diverse activities such as wetting hands with water, applying soap, and drying hands with a towel, which may largely increase the difficulty to be classified correctly with its postactivity (G1) or pre-activity (G9). The other false happens between two successive gestures. As the onset/offset of every gesture is annotated manually, the discordance across different labeling workers, participants, and locations may cause false between successive gestures. (3) Performance on Participants. For the pth participant, we use Equation 4 to compute the accuracy. where symbols share the same meanings with Equation 3. Fig. 5 shows that 33 of 51 participants achieve the accuracy over 95%; only 6 of them have the accuracy of less than 85%; (4) Onset/Offset Detection. We use Equation 5 to compute the onset detection error. where t * s and t s represent the detection and the ground-truth of the onset of a test handwashing procedure, respectively; |·| is to compute the absolute value. Similarly, we use |t * e − t e | to compute the error of the offset detection. We compute the onset/offset detection errors over the test set and report the Mean and standard deviation (SD) in Table 4 . The table shows that Means and SDs are all within 1 second, indicating UWash can detect handwashing events correctly and stably. (5) Scoring. We use methods described in Section 3.3 to compute handwashing scores on the test set. Further, we compute the Mean and SD of scoring errors against groundtruth. As shown in Table 4 , the mean and the SD are less than 5 points, indicating UWash can score users' handwashing procedures well. In Fig. 7 , we visualize how UWash works on an example of the 26th participant. UWash outputs sample-wise gesture classification, with which we can detect onset/offset of handwashing, estimate the duration of gestures, score gestures and the whole handwashing procedure. Thus UWash can help users to check and promote their handwashing practice with the estimated scores in daily life. We evaluate the cross-domain performance of UWash, which is a critical criteria for a handwashing scoring system for people's daily use. (1) Cross-Participant. We train UWash with data of 50 out of 51 participants and test the trained model with data of the remaining participant. We conduct this leaveone-participant-out process over 51 participants respectively to evaluate the cross-participant performance. As shown in Fig. 8 , accuracy among some left-out participants is more discrete than those in the user-dependent setting, shown in Fig. 5 . For example, the 13th and the 38th participants have relatively special personal handwashing styles, resulting in the accuracy of <60%. This indicates that they may not follow the WHO guidelines. In this case, UWash could remind users to improve their hand hygiene techniques. As expected, the mean accuracy in the cross-participant setting decreases to 83.34%, since the personalized gestures of these individuals are not included in the training set. Table 5 shows that UWash detects onset/offset of handwashing well even in the cross-participant setting, errors of <0.5 seconds, which means that UWash can effectively distinguish the handwashing gestures from its pre/postactivities. However, the handwashing scoring performance drops significantly, repeatedly indicating the performance of gesture classification on unseen users highly depends on how well they wash hands following WHO guidelines. (2) Cross-Participant-Cross-Location. We use data of 4 out of 5 locations to train UWash and test the trained model on the remaining one location. We conduct this leave-one-location-out process over 5 locations where data collected respectively. Since recruited participants have no overlap between different locations, the leave-one-location-out process also leads to the evaluation in the cross-participant-cross-location setting. The experimental results are shown results in Fig. 9 and Table 5. In this setting, the performance have similar characteristics to those in the cross-participant setting, e.g., the accuracy is more discrete; the 13th and the 38th participants have the lowest accuracy; the mean accuracy drops to 81.45%; onset/offset detection are achieved well. Since the participants are trained to wash hands following the WHO guidelines, UWash can only recognize and score gestures recommended in the WHO guidelines. In real life, handwashing gestures are diverse from person to person. Scores estimated by our current version of UWash may not always accord with the real quality of handwashing procedures. However, this is precisely our purpose to propose UWash for promoting people's adherence to the WHO guidelines. Awash: handwashing assistance for the elderly with dementia via wearables Successful use of alcohol sensor technology to monitor and report hand hygiene compliance Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift Rfwash: a weakly supervised tracking of hand hygiene technique Electronic surveillance of wall-mounted soap and alcohol gel dispensers in an intensive care unit Do they really wash their hands? prevalence estimates for personal hygiene behaviour during the covid-19 pandemic based on indirect questions Accurate measurement of handwash quality using sensor armbands: Instrument validation study We introduced UWash, a smartwatch-based handwashing assessment system, to raise people's awareness of handwashing in daily use and adherence to the WHO handwashing guidelines. UWash can detect onset/offset handwashing, estimate the duration of every handwashing gesture, and score gestures as well as the entire procedure following WHO guidelines. Experiments show that UWash is promising.