key: cord-0221863-ll6hj6q9
authors: Chen, Xiang 'Anthony'
title: FaceOff: Detecting Face Touching with a Wrist-Worn Accelerometer
date: 2020-08-04
journal: nan
DOI: nan
sha: d6155ef15ba9f418742071d9c8aeee6d1958ac5a
doc_id: 221863
cord_uid: ll6hj6q9

According to the CDC, one key step of preventing oneself from contracting coronavirus (COVID-19) is to avoid touching eyes, nose, and mouth with unwashed hands. However, touching one's face is a frequent and spontaneous behavior---one study observed subjects touching their faces on average 23 times per hour. Creative solutions have emerged amongst some recent commercial and hobbyists' projects, yet most either are closed-source or lack validation in performance. We develop FaceOff---a sensing technique using a commodity wrist-worn accelerometer to detect face-touching behavior based on the specific motion pattern of raising one's hand towards the face. We report a survey (N=20) that elicits different ways people touch their faces, an algorithm that temporally ensembles data-driven models to recognize when a face touching behavior occurs and results from a preliminary user testing (N=3 for a total of about 90 minutes).

Figure 1: We report evidence that demonstrates the potentials and limitations of using a commodity wrist-worn accelerometer to detect face-touching behavior based on the specific motion pattern of raising one's hand towards the face, detecting 82 out of 89 face touches with a false positive rate of 0.59% in a preliminary study.

According to the Center of Disease Control and Prevention (CDC), one key step of preventing oneself from contracting coronavirus is to avoid touching eyes, nose, and mouth with unwashed hands 1 . Pathogens picked up by our hands can enter the throat and lungs through mucous membranes on the face.

However, touching one's face is a frequent and spontaneous behavior-Kwok et al. observed subjects touching their faces on average 23 times per hour [11] , where 44% of the contacts were made with a mucous membrane. To reduce face touching, creative solutions have emerged amongst some commercial and hobbyists' projects 2 3 4 since the outbreak of COVID-19. However, most are closed-source and/or lack validation in performance.

We develop FaceOff-a sensing technique using a commodity wrist-worn accelerometer to detect face-touching behavior based on the specific motion pattern of raising one's hand towards the Figure 2 : Our preliminary survey (N=20) shows that people touch a wide range of facial parts. Our goal is to detect the touching of both mucosal and nonmucosal areas to strongly promote the awareness of avoiding touching one's face at all.

face ( Figure 1 ). We consider the touching of both mucosal and nonmucosal facial areas. Although it is touching the mucosal area that might cause infection, detecting touching both types of areas can more strongly raise people's awareness of avoiding touching their face at all. In a survey, we asked 20 participants 5 to describe where they would naturally touch their face; the result shows a wide range of facial areas ( Figure 2 ). To detect face touching, we chose accelerometer-a common and low-cost sensor available in most wrist-worn devices (e.g., smart watches, fitness trackers). The use of accelerations has shown promises in detecting body-tapping behavior without the need to instrument the body [1] , although detecting face touching has not been explored in prior work. We hypothesize that accelerometer can detect face touching by recognizing the hand's motion patternthe unique course of accelerations and orientations as the hand moves towards the face. However, one limitation is that accelerometer cannot detect the actual contact with the face. For example, adjusting one's eyeglasses would appear highly similar to touching the eyes.

Based on the reported face touching, we develop a data collection protocol. Due to the COVID-19 pandemic, we had limited access to participants for data collection. Thus we gathered training data only from the first author. We then develop an algorithm that temporally ensembles data-driven Random Forest classifiers to binarily detect whether a person touches their face. We analyze and validate this approach in a preliminary study on three other participants, each of which wore the device for 30 minutes with intermittent prompts to touch their face 6 while conducting their own daily activities. As a result, 82 out of 89 (92%) face touching actions were detected with a false positive rate of 0.59%.

Contributions of this work are as follows.

• To the best of our knowledge, the first reported evidence of the potentials and limitations of detecting face touching using a commodity wrist-worn accelerometer; • A systematic protocol of data collection, training and testing, which can be adopted by future research to explore other sensing solutions for detecting face touching to combat the COVID-19 pandemic. 5 Demographic information reported in Appendix A 6 Each participant thoroughly cleaned their hands and the physical objects they touched prior to the study

There is a large body of work on wearable health sensing, for which we refer the readers to Pantelopoulos et al.'s survey [14] . Our review is focused on three prior research topics most related to our work. Body-centric interaction leverages (intentionally) tapping on different body parts, which can be used to trigger specific digital information or functions [2, 6] . Similar examples include tapping on the body during running or cycling to control one's smart devices [5, 17] , or moving a smartphone to the mouth for activating speech input [19] . Chen et al. demonstrate using an inertial measurement unit (IMU) and a phone's front camera to detect spatial interaction around a user's body [3] , and later a single IMU alone to detect tapping the phone on a number of on-body locations [1] . However, detecting face touching has never been investigated.

Detecting eating/drinking, also involving a hand's motion relative to one's face, has been explored in prior work using a wrist-worn inertial sensor (e.g., [4, 15, 16] ). However, detecting face touching presents new challenges: (i) unlike eating, the person might no longer be constrained in a seated position; and (ii) the motion pattern of one's hand is no longer limited to delivering food to the mouth but could vary across touching a range of facial parts.

Sensing personal hygienic behaviors is related to the purpose of detecting face touching and in the literature is dominated by two specific activities: hand washing and tooth brushing. For hand washing, various locations of wearable sensors have been explored: Li et al. use IMU on a wrist-worn device to detect whether a user's hand washing follows WHO's 13-step recommendation [13] . Zhang et al. developed a ring device with fluid sensors for real-time monitoring of hand-hygiene compliance [20] . Kutafina et al. enabled hand hygiene training in medical education using a forearm Electromyography (EMG) sensor [10] . For tooth brushing, Hong et al. used an accelerometer + RFID sensor combination mounted on the back of a user's hand to recognize activities including tooth brushing [7] . Huang and Lin used a magnet attached to a manual toothbrush and an off-the-shelf smartwatch to detect fine-grained brushing behaviors on individual teeth [8] . Wijayasingha and Lo proposed a wearable sensing framework using IMU for monitoring both hand washing and tooth brushing [18] . However, no prior work has addressed sensing the hygienic behavior of face touching.

We collected accelerometer data from the first author (male, 32, right-handed) wearing an Apple Watch 2 (100 Hz sampling rate) on the left wrist.

The independent variable is Behavior (Touch vs. No touch).

For Touch, we consider where and how. For where to touch, we include various parts of the face based on CDC's guidelines (eyes, nose, and mouth) as well as other frequently touched areas indicated in the aforementioned survey: hair, forehead, temple, ear, cheek, and chin. Since our face is symmetric, we also consider left vs. right side for certain facial parts (e.g., left vs. right ear). In terms of how to touch, we cover both transient touch (e.g., a quick scratch on the nose) and lingering touch (e.g., holding and rubbing the chin). We also consider hand placement (before face-touching), following and extending the experiment design in [1] . Specifically, we consider high, middle and low hand placement: for high placement the user sits and places the forearms on a desk in front of them; for middle placement the user sits and places the hands on the laps; for low placement the user stands and lowers their arms naturally around the waist. Different initial hand placements result in different trajectories the hand travels towards the face.

For No Touch, we adopt the set of general daily activities used in [9, 12] : flipping magazines, jumping-jacks, typing on a keyboard, speaking while gesturing and washing hands and washing hands.

The data collection was split into three sessions corresponding to the three hand placement conditions for Touch. In between sessions the user took off and then put the watch back on, which accounted for the possible variance of how the watch was worn (e.g., position, orientation, tightness).

At the beginning of a session, the user thoroughly cleaned the watch and washed and dried the hands. We then started with collecting data for Touch where the user was prompted with instructions on the watch to touch a facial part either transiently or lingeringly. The user would tap anywhere on the watch screen, which started the data collection of a facing touching trial. The user then proceeded to touch the designated facial part. The data collection ended automatically after our empirically pre-defined 1.5s window (which covers the time taken to raise one's hand and engage it in the touching of a facial part). Each facial part was touched eight times. For symmetric parts (e.g., eyes, ears) these trials were evenly split between left/right sides. The order of facial parts was randomized to avoid temporal dependence of behavior. Further, across all the trials, how to touch was evenly and randomly split between transient vs. lingering. For example, the user would be prompted to "touch left cheek lingeringly".

Next, we collected data for No Touch where the user simply performed each aforementioned activity for 30s while the system randomly collected 10 samples. In total, we collected 1 user × 3 sessions per user × per session: (8 trials per facial part × 9 facial parts + 10 trials per general activity × 5 general activities) × = 366 data points

To visualize the data, we first downsample each trial to 15 bins, each containing a time interval of 0.1s of accelerometer data. Figure 1 shows an aggregated view of the aforementioned collected data: each column in each chart corresponds to one bin that contains a distribution of accelerometer readings within the bin's time interval. We can see how face touching and general activities exhibit distinct motion patterns across all the axes (note that the axis alignment is configured for wearing the watch on the left wrist). For example, despite touching different facial areas, the X axis almost always points up, as shown in the latter half of the time window. For the Y axis, the peak at the beginning of the time window is likely caused by the motion of rotating one's forearm around the elbow and towards the face, where Y would first point up and then 'flattens' when the hand is raised to the face level.

Rather than individual bins, our featurization focuses on characterizing the entire 1.5s window of data as a distribution. Specifically, our features consist of the statistical summary (sum, mean, median, standard deviation, coefficient of variation, zero crossing, mean/median absolute deviation) and shape-related measurements (skewness and kurtosis). In total, we use these 10 features per axis × 3 axes = 30 features.

Since the goal of detecting face touching is prevention, it is beneficial to detect a facing touching action before its completion. In other words, the question is whether we can use a smaller time window than 1.5s. To investigate this possibility, we train a series of Random Forest 7 models based on the collected data from the one user. Each model uses only a partial window of a trial's data for inference, e.g., at t = 1.0s, a model f 1.0s only uses data up to that point, i.e., between 0 and 1.0s. We compute the F1 scores from a 10-fold cross validation. Figure 3 : F1 scores of using a partial time window of data (e.g., for t = 1.0s, a model only uses data from 0 to 1.0s) to detect face touching.

As shown in Figure 3 , as we 'wait for' more data, the F1 score expectedly increases and beyond 1.0s the F1 score seems to flatten. Thus we select 1.0s as the starting point from which a model would detect face touching. Specifically, we train different models to processes different incoming data atT = {1.0s, 1.1s, 1.2s, 1.3s, 1.4s, 1.5s}. At t ∈ T , the current data is x t and the corresponding model is f t : x → {−1, 1} (1 for face touching and -1 for no face touching). A final result is obtained via voting where each model's vote is its result weighted by the F1 score: sgn( t ∈T w t f t (x t )), where sgn is the sign function, w t = exp(λ(F1 t − min T F1)) and λ is constant to scale the difference amongst the F1 scores (we use λ = 10).

We tested this approach of temporally ensembling data-driven models in a preliminary setting, as described below.

We recruited another three 8 participants (male, 18; male, 22; female 28; all right-handed). Two participants reported touching their face at least once an hour wheres the other was unaware of the frequency. Each participant identified five most common ways they touched their face: all three reported ear and nose; eye, cheek, chin and forehead were reported twice; only one participant reported hair. False positives rate 0.66% 0.57% 0.55% 0.59% False positives rate 0.62% 0.56% 0.50% 0.56% Table 2 : Preliminary testing results of using only the entire 1.5s of data at the end of the time window.

Participants were asked to wear the watch on the left wrist (i.e., the non-dominant hand) for a period of about 30 minutes while performing daily routine activities of their choice. P1 sat on a reclined chair and browse her phone, P2 stood and walked around the house while listening to lectures on the phone via headphones, and P3 sat down and worked on programming tasks on a laptop computer. Admittedly, this protocol only captured a brief snapshot of how well the system might work in the context of a specific user-chosen daily activity; we consider a fully in-the-wild study with more participants as future work (after the pandemic). Before the period started, each participant thoroughly washed/dried their hands and cleaned the watch and objects they expected to touch in the next 30 minutes. During the period, each participant was randomly prompted 30 times (on average once per minute) to use their left hand to touch their face on the five facial parts they identified earlier. Each prompt consisted of two steps. Firstly the watch interrupted the participant's activity with vibration that prompted them to raise the watch and read the instruction of touching a specific facial part. If a participant were to touch their face immediately, the motion would be unnatural as it would artificially start from a wrist-raising posture. Instead, we asked the participant to press the 'confirm' button, and resumed their activity; then in four seconds, the watch vibrated again, following which the participant would now touch their face at the specified part.

We used the same 1.5s as the sliding window length at a rate of four FPS (i.e., a 83.33% overlap between subsequent frames). Face touching was labeled on the three seconds of data after the second vibration that prompted the participant to actually touch their face. The rest of the data was labeled as no face touching.

We analyzed the logged data offline using models developed earlier from a different user than the three participants. The last three minutes of P1's data was lost due to technical issues, which lost one face touching trial and some no touching data.

As shown in Table 1 , our ensemble approach detected 82 out of 89 face touching actions. We found that sometimes we did not have to wait until the end to finalize the vote result-a majority vote could be formed at t < 1.5s. Specifically, amongst the 82 face touches we found, 18 were detected at t = 1.3s, 28 were at t = 1.4s and the rest at 1.5s. In comparison, if we were to use only the entire 1.5s of data at the end of the time window, the recall rate would be slightly lower (80/89), as shown in Table 2 .

We also compute the false positive rate, which is the percentage of no-face-touching instances labeled as face-touching. As shown in Table 1 & 2, the two approaches achieved similar false positive rates (with only a 0.03% difference).

Overall, results show that it is feasible to detect face touching using a wrist-worn accelerometer: our ensemble approach achieved slightly higher recall rate without compromising the prevention of false positives. Meanwhile, for over half the time (18+28), an earlier majority vote was able to preemptively determine a face touching action before collecting the entire window of accelerometer data.

The inability to detect actual hand-face contact. Based on our observations, a number of false positives are caused by behavior that resembles touching one's face, e.g., scratching the back of one's neck, raising eyeglasses, picking up a phone call, drinking water. However, it is a reasonable design choice to alert a user of such false positives, as their hand is still brought to close proximity to the face even without touching. It would be appropriately aggressive to promote an awareness of trying to keep one's hands off the face as much as possible regardless of whether actual contact is made.

The current set of no-touch activities needs to be expanded: as future work explores other types of sensors, e.g., proximity detection between the hand and the face, it is also possible to use camera instrumented in the environment, using rich visual information to distinguish face touching from certain near-miss activities (e.g., eating, drinking). Using cameras, it is also possible to annotate naturally-occurring face-touching behaviors, which addresses the currently-controlled experiment task setting. However, privacy issues and line-of-sight constraints are two long-standing challenges for camera-based solutions.

One-handed detection only is another apparent limitation. While it is uncommon to ask users to wear two watches, future work can explore an alternate form factor, e.g., wrist-band accelerometer (e.g., Fitbit-like devices), which would be more socially and economically appropriate to wear on both hands.

Feedback mechanisms to effect behavior change would be an important next-step now that we have demonstrated a proofof-concept mechanism for detecting face touching. Future work should investigate both visual and vibrotactile feedbacks that alert a person both at the onset of face touching for prevention and after one touches the face for feedback that encourages behavior change. Longitudinal study should be conducted to track people's behavior change given the detection and alerts of face touching.

Larger-scale data collection and user testing should be conducted (when available) to account for the possibly different ways people touch their face, to improve the data-driven models and to obtain statistically-significant performance evaluation results.

Bootstrapping User-Defined Body Tapping Recognition with Offline-Learned Probabilistic Representation

Extending a mobile device's interaction space through body-centric interaction

Around-body interaction: sensing and interaction techniques for proprioception-enhanced input with mobile devices

Detecting periods of eating during free-living by tracking wrist motion

Run&Tap: Investigation of On-Body Tapping for Runners

Skinput: appropriating the body as an input surface

Activity recognition using wearable sensors for elder care

Toothbrushing monitoring using wrist watch

Minuet: Multimodal Interaction with an Internet of Things

Wearable Sensors in Medical Education: Supporting Hand Hygiene Training with a Forearm EMG

Face touching: A frequent habit that has implications for hand hygiene

Viband: High-fidelity bioacoustic sensing using commodity smartwatch accelerometers

Proceedings of the 2018 ACM International Symposium on Wearable Computers -ISWC '18

A survey on wearable sensor-based systems for health monitoring and prognosis

Exploring symmetric and asymmetric bimanual eating detection with inertial sensors on the wrist

A practical approach for recognizing eating moments with wrist-mounted inertial sensing

Movespace: on-body athletic interaction for running and cycling

A wearable sensing framework for improving personal and oral hygiene for people with developmental disabilities

ProxiTalk: Activate Speech Input by Bringing Smartphone to the Mouth

Smart ring: a wearable device for hand hygiene compliance monitoring at the point-of-need

We disseminated a face touching survey via an business communication platform amongst members of our research group and received 20 responses. The participants were between 19 to 28 ages old. There were five females, 14 males and one non-binary. Eleven participants reported touching their face at least once per hour, six at least once per minute, one at least once per day and two unaware how often they touched their face. Participants were also asked to describe five or more different ways that they would normally touch their face in their everyday lives, the result of which we show in Figure 2 . As in any other survey, asking participants to estimate frequency of an activity subjects to inaccuracy of the memory. However, our goal is not to compute an exact frequency but to elicit a list of commonly-occurring face touching behaviors.

We implemented Random Forest Models using the scikit-learn Python library 9 . To further improve the model, we used the training data set to perform a hyperparameter tuning by first performing a randomized search to narrow down to ranges of parameters, within which we then employed a grid search to pinpoint specific optimal parameters. We repeated this process five times to obtain five sets of parameters for models corresponding to T = {1.0s, 1.1s, 1.2s, 1.3s, 1.4s, 1.5s}, as shown below. We use scikit-learn's default values for the rest of the parameters.