key: cord-0131558-14v20oer authors: Zhan, Yuting; Haddadi, Hamed title: MoSen: Activity Modelling in Multiple-Occupancy Smart Homes date: 2021-01-01 journal: nan DOI: nan sha: 6d388124a0872fc6f406813a6dde879afdefe209 doc_id: 131558 cord_uid: 14v20oer Smart home solutions increasingly rely on a variety of sensors for behavioral analytics and activity recognition to provide context-aware applications and personalized care. Optimizing the sensor network is one of the most important approaches to ensure classification accuracy and the system's efficiency. However, the trade-off between the cost and performance is often a challenge in real deployments, particularly for multiple-occupancy smart homes or care homes. In this paper, using real indoor activity and mobility traces, floor plans, and synthetic multi-occupancy behavior models, we evaluate several multi-occupancy household scenarios with 2-5 residents. We explore and quantify the trade-offs between the cost of sensor deployments and expected labeling accuracy in different scenarios. Our evaluation across different scenarios show that the performance of the desired context-aware task is affected by different localization resolutions, the number of residents, the number of sensors, and varying sensor deployments. To aid in accelerating the adoption of practical sensor-based activity recognition technology, we design MoSen, a framework to simulate the interaction dynamics between sensor-based environments and multiple residents. By evaluating the factors that affect the performance of the desired sensor network, we provide a sensor selection strategy and design metrics for sensor layout in real environments. Using our selection strategy in a 5-person scenario case study, we demonstrate that MoSen can significantly improve overall system performance without increasing the deployment costs. Human activity recognition (HAR) is the central task to many intelligent systems such as smart homes [12] , long-term healthcare [72] , personal robotics [67] , assisted living [64] , and human-computer interaction [32] . Current works illustrate that human activity can be recognized using two main approaches, namely video-based [35] and sensor-based [22] . Video-based activity recognition utilizes cameras to capture or record individuals' motions [35] , while sensorbased systems leverage the wearable or ambient sensors to understand the movements of the subjects and the interactions between people and the environment [74] . While video-based approaches are often privacy-invasive, the sensor-based systems, which we focus on in this paper, are often more privacy-friendly and take advantage of their pervasiveness [78] . Currently, more and more sensors are getting embedded into our ambient environment, wearable electrical products, and intelligent appliances to aid with sensor-based activity recognition systems. The multi-modal sensor data enables the system to receive rich context information and to have the capability to process personalized behavioral analytics and provide context-aware applications [12] . While multitudes of sensors extend the variety of information that could be received, the heterogeneity of the devices [71] and the increasing number of the residents [11, 28] complicate the data collection system in real settings. Even for single-occupancy scenarios, where only a single individual in the single space, the diversity of sensor settings or floorplans could affect the overall performance of sensor networks. Importantly, sensor networks designed for singleoccupancy houses are never deployed in identical settings, and sensor selection in each system is diverse, varying from commercial products to self-built devices [6, 18, 23, 77] . The price, stability, precision and coverage range of different sensors affect the implementation and performance of sensor-based systems [18] . It is difficult to find a uniform sensor integration system flexible to distinct homes, especially when the homes might have more than one resident, referring to the multi-occupancy scenarios in this paper. Prior research has already specified the significance of multi-occupancy scenarios, but the complexity in the ongoing sensor networks and unknown uncertainties impede the real implementation of the sensor network and further analysis [2, 11, 13, 53, 78] . Hence, when designing a specific sensor network to the target home, especially in multi-occupancy scenarios, people need to take the real floorplan, the number of residents, sensor density, and device's resolution into considerations. Data associating problem (i.e., identification annotation) is one of the central problems for the sensor-based activity recognition in multi-occupancy smart homes [11, 21, 43, 65] . It refers to labeling the identifications of the time-series sensor events by mapping each sensor event with the resident causing its generation. A suitable model of data associating is the prerequisite for leveraging singleoccupancy techniques into multi-occupancy households. Current solutions on the identification annotation are mainly relying on self-reporting [74] and camera-recording [2] , where the former one is biased and the latter one would invigilate the privacy. The capability for automatically labeling the identification, hence, is significant for a practical smart home system. Researchers tend to use wearable sensors to reduce the complexity of the problem because wearable sensors can be utilized as the identification tag of different residents [53] . Another promising way is to leverage the real-time locating system (RTLS) to locate different residents when they are interacting with the environment. RTLS is a rising technology for detecting both the location and identification of the target, where the target could refer to an item, a person, or a vehicle [14, 84] . By integrating the residents' location and sensor layout, we annotate sensor events with identifications using the RTLS-based approach. In this paper, taking the automatic annotation problem as the desired context-aware task to solve, we explore the interaction dynamics between the sensor-based environment and multiple residents by proposing the MoSen emulation environment. MoSen is designed to evaluate the way in which different localization resolutions, number of residents, number of sensors, and varying sensor deployments affect the performance of the pre-designed sensor network before real deployment. By investigating the dynamics of annotation accuracy, given any floorplan, MoSen is able to provide sensor selection strategy and metric-based design suggestion for the pre-designed sensor layout. MoSen can help practitioners with cost-effective and accurate sensor integration for any given home deployment. By using real behavior models and synthetic data, we emulate multi-occupancy scenarios in households with 2 to 5 residents. Given the scarcity of multi-occupancy dataset and difficulty in realistic data collection with existing technologies, especially in the circumstance of the COVID-19 pandemic with social distancing, we provide a novel framework to generate synthetic multi-occupancy behavior models by modeling real single-occupancy datasets that collected in real homes. The quality of the multi-occupancy behavior model is validated by comparing the performance between synthetic and real double-occupancy datasets. The main objective of this paper is to offer an effective evaluation structure and feasible sensor selection strategy to any smart home. Through comparing real and synthetic datasets, we discuss potential challenges when sensor-based activity recognition is adopted in different multi-occupancy scenarios. Our main contributions of the paper are as follows: • We propose MoSen to investigate the interaction dynamics between sensor-based environment and multiple residents; • We provide an algorithm to generate synthetic multi-occupancy behavior models and compare the performance with the real dataset; • We explore how the labeling accuracy is affected by different localization resolution, the residents' quantity, sensor density, and varying deployment in multi-occupancy scenarios; • We design a sensor selection strategy to balance the trade-off between deployment costs and expected labeling accuracy in different homes, which accelerates the practical adoption of sensor-based activity recognition in reality. The rest of this paper is organized as follows. In Section 2 we present the related work, Section 3 represents an overview of MoSen system, and Section 4 describes the design methodologies applied in the proposed system. In Section 5, we evaluate the effect of localization devices resolutions, residents' quantity and sensor density, respectively. With the analytical result, we provide a case study in Section 6, then present discussions in Section 7, and the final conclusion in Section 8. Human indoor activities are complex, diverse and stochastic, making them challenging to define and quantify. A variety of advanced ubiquitous sensing technologies (e.g. wireless sensing [38] , wearable sensors [12] , or ambient sensors [22] ) have been adopted to collect human indoor activity data [1, 3, 6, 23] . Human activity recognition is a central task for accelerating automation integration in smart environments [67] . Prior works have illustrated modeling human activity patterns is valuable for providing personalized services [73] or context-aware interactions with the resident [17, 44] . Currently, human activity can be recognized using two main approaches, namely video-based [35] and sensor-based [22] . In this paper, we focus on sensor-based activity recognition. Sensor-based activity recognition utilizes sensor readings to understand human activities. The metadata emanated from multifarious sensors embedded in the living environment. These data will be trained and learned by a series of machine-learning or deep-learning algorithms [78] . In this paper, we leverage human activity datasets from real homes that deployed with different ambient sensors (i.e. motion sensor, temperature sensor, light sensor) [6, 23] . 2.1.2 Multi-person Activity Datasets. The majority of researches in human activity recognition have investigated the single-occupancy scenario [65, 77] , where only one resident lives in a single space. However, the real environment is usually inhabited by more than one resident and even with pets, which is referred to as multioccupancy scenario [6] in this paper. Multi-person activity recognition has less investigation, as many practical challenges are yet to be overcome in the single-occupancy scenario [11] . Recent pilot deployments demonstrate the applicability and adaptability of multi-occupancy scenarios by using different machine learning algorithms [2, 11, 53] . There are two publicly and widely-used multiperson datasets in current literature, the CASAS Datasets [23] , and the ARAS Datasets [6] . We compare our synthetic multi-person behavior models with these two real datasets to validate the quality of the synthetic model. Real-time locating system (RTLS) is a rising technology for detecting both the location and identification of the target, where the target could refer to an item, a person, or a vehicle [14, 84] . Different positioning technologies have been investigated in the last several decades, while these technologies perform a similar task with varying accuracy. We conclude 12 main indoor positioning technologies in Table 1 and Table 2 , comparing the positioning accuracy, coverage range, cost, infrastructure complexity, network, localization method, and frequently-used convention measurement of different technologies. Applications for RTLS, also called location-based services (LBSs), have already been broadly adopted in a variety of indoor locationaware scenario [46, 76] , from mapping and navigation services [10, 55] to human-robotics interaction [14] . In the transition from singleoccupancy scenario to the multi-occupancy environment, it becomes significantly important to track each resident [24] . In Table 1 and Table 2 , we also conclude how 12 main indoor positioning technologies perform in multi-occupancy environment, as listing their basic experimental setting and locating accuracy. With tracking residents respectively in an efficient and accurate approach, the sensor events can be separated into different steams, as each resident would have an independent data-driven profile that serves for further personalized interaction provided by the smart environment. Recent works have shown the capability by tracking resident's trajectory with their identification by non-camera-based systems [5, 15, 60, 63, 68, 81] . These classes of technologies can be categorized into device-based system and devicefree system. Device-based system use smartphones, smartwatches, or other wireless tags embedded into the human body. These extra devices will be leveraged to identify different individuals [1, 13, 40, 42, 81] . Device-free system depends on wireless signals by analyzing the signal patterns from the breathing or heartbeat to perform identification [2, 3, 57] . Recognition. The real-world experiment conducted by Nguyen et al. [61] emphasized the applicability in modeling complex activities from human indoor trajectories. Their work also demonstrates the feasibility in recognizing activities from new trajectories [61] by applying the hierarchical hidden Markov model (HHMM). Wilson and Atkeson [79] have demonstrated that localization accuracy and activity recognition can be beneficial to each other, especially in multi-occupancy environments. Lu and Fu [48] provide more fine-grained outcomes in a single-occupancy scenario to illustrate the possibility of locationaware activity recognition. These works on location-based activity recognition validate the feasibility of leveraging residents' locations to annotate the time-series sensor events. Recent advances in machine learning and deep learning accelerate the sensor-based activity recognition but most of them require annotated datasets [78] . The quality of label extends a significant impact on the performance of machine learning models. However, collecting sensor data with ground-truth labels (i.e., identification, activity) is still challenging, especially in longitudinal monitoring scenarios. Currently, ground-truth labels are obtained either from a resident's diary [34] or video-based recording techniques [2] . In order to simplify the annotation process, some researchers also designed a simple graphical user interface (GUI) to help residents finish the diary report [6] . However, this can be tedious, timeconsuming, and inaccurate. Unlike the diary-based technique, the video-based recording is unobtrusive and precise, but incorporates major privacy concerns and needs extra manual inputs. The capability of automatic annotation, hence, is the central primitive when building each resident's individual activity profile. Hamm et al. [30] have presented a flexible framework on combining heterogeneous sensory modalities with classifiers for sequence labeling automatically. In this paper, we are interested in the identification labeling for time-series sensor events and leave activity labeling to a future research. Previous works (as listed in Section 2.2.1) have already Hence, in our proposed MoSen system, we leverage these identified trajectories to annotate sensor events automatically in real-time, by integrating residents' respective locations and the sensor layout. For instance, in a 4-person scenario, where there are 4 residents in the home, the location information (refers to four respective location points) and the sensor layout are known, the proximity between each location point and the triggered sensor will be compared and the nearest location points from that sensor would be selected. Such a solution mainly depends on the accuracy of the localization techniques (as details are shown in Table 1 and Table 2 ) and ambient-sensor density. We discuss the effects of different localization resolutions in different sensor layout in the Section 5. In order to protect users' privacy and increase data sharing, synthetic data generation has been developed as an alternative tool among data scientists [27, 36] . The generated data preserves the required statistical features as the real data in a non-adversarial setting, and is hardly distinguishable from the real data when the generation structure is mandated by an adversarial network [9] . Effectively generating synthetic data can augment the labeled data and compensate for the data scarcity, when the availability of labeled data is constrained [50] . Especially for sensor-based activity data, where data collection for even single-occupancy scenario is hard, there are more challenges posed in multiple occupancy scenarios. The aforementioned gap demonstrates the importance of synthetic sensor-data generation in our multi-person setting. Recent works on Generative Adversarial Networks (GAN) have demonstrated their capability in generating different types of data, from images generation [16, 29] , text generation [82, 87] , music composition [41] , and time-series sensory data generation [7] . The research published by [25] employed hidden Markov models (HMMs) to generate realistic synthetic smart home sensor data. The authors used data similarity measures to validate the realism of generated data, which are not random but preserves the underlying patterns Simulating Key Problem: Automatically Identity Annotation Synchronize Trajectory Generation Adding Noises Figure 2 : An overview of the MoSen system or structures of the real data. In this paper, we compare the performance of our synthetic multi-person behavior models with the real datasets to validate the effectiveness of our synthetic models. The design of MoSen system is motivated by a need to accelerate the practical implementation of sensor-based activity recognition technologies in multi-occupancy settings. MoSen is adaptable to any customized indoor environments. Smart home designers or practitioners can leverage the analytical result of their pre-design sensor network in the specific context-aware task to better balance the trade-off between the deployment cost and system performance. Figure 2 shows an overview of MoSen system in solving the identification annotation problem, which is one of the central problems for the sensor-based activity recognition in multi-occupancy smart homes [11] . It refers to labeling the identifications of the time-series sensor events by mapping each sensor event with the resident causing its generation. In this paper, we solve the identification annotation problem by leveraging the Graph and Rule-Based Algorithm (GR/ED) proposed in [24] , which was designed to track individuals in an ambient sensor setting. The core idea is that individuals trip sensors when they move from one location to another. Sensor events will then be separated into different streams by leveraging human trajectory or location information with the nearest neighbor standard filter (NNSF) [? ], a classical data association method. To achieve the required labeling accuracy of identifications for time-series sensor events in the multi-occupancy environment, MoSen can additionally provide a sensor selection strategy that fits for the user's requirements while optimizing for the number of sensors and their placement (hence the installation cost) to achieve the highest labeling accuracy. MoSen platform can emulate this annotation process with different sensor settings for any pre-designed smart home. The platform is assumed that only the sensor events are provided by the sensors, and without the need to considering how heterogeneous or multi-modal sensing environments are meshed and combined. In a practical setting, the sensor event is recorded when a sensor is triggered. In MoSen, we emulate triggering sensors by building realistic single-person activity patterns. In this way, we can add different representative activity patterns into MoSen to simulate multi-resident scenarios, noted as the multi-occupancy behavior model in Figure 2 . Due to the stochastic nature of our choices and the heterogeneity of chosen single-person datasets, residents that contribute to each activity pattern might have different backgrounds, different habits, and different daily routines. We then leverage Dijkstra' algorithms [26] to emulate and generate each resident's daily trajectory. These trajectories are utilized as ground-truth trajectories of residents. Normally distributed noise, depending on the different resolutions of positioning technologies, is added to the ground-truth value to generate a new trajectory that emulates how localization devices work. This noise-added trajectory is referred to as the detected trajectory. Labeling accuracy, in this paper, is defined as how detected trajectory from different sensor networks affect the identification annotation process. With MoSen, by combing the real residents' activity pattern and floorplan, every multi-occupancy house can be emulated and evaluated before deploying a real sensor system, which we believe can accelerate the practical utility of sensor-based activity recognition system in smart homes. In the sensor-based activity recognition, multi-modal sensor readings are collected and then represented as time-series data to describe human indoor activities. The dataset contains a series of sensor events ordered by time. Each sensor event is recorded when the respective sensor is triggered or activated, at the time when this sensor is touched or walked around by residents. In this paper, we choose two widely-used published datasets in our analysis, CASAS datasets [23] and ARAS datasets [6] . The CASAS group collected human activity datasets from the WSU smart apartment test bed [23] . Activity labels are annotated in CASAS datasets with respective start time and end time, via a handwritten diary. The majority of datasets represent indoor activities as a series of sensor events, which contains the event timestamp, the sensor name, the sensor state and the activity label. Each sensor event should at least contains the following details: [Timestamp, Sensor ID, Sensor Status, Activity Label] In this paper, five single-occupancy testbeds chosen from CASAS datasets are leveraged to generate the synthetic multi-person behavior model. These five testbeds are annotated as hh120, hh122, hh123, hh125, hh126 [22] . Details and properties of them are shown in Table 3 . And the relations between these testbeds and synthetic multi-occupancy datasets are demonstrated in the Table 4 . Activity labels contained in single-occupancy testbeds include: bed-toilet transition, cook, eat, enter home, leave home, personal hygiene, phone, relax, sleep, work. Different from the CASAS group collecting the activity in the lab setting, the ARAS group collected two pairs of residents' daily activities in their real houses by recording the ground truth labels with a designed Graphical User Interface (GUI) [6] . For each house, it consists of 30 days sensor reading as a form of 22x86400 matrix for each day, where the first 20 columns (S1 -S20) refer to the binary sensor reading and the column 21 (P1) and 22 (P2) are the activity labels for resident A and B. The activity labels, ranging from 1 to 27, represent 27 different activities. They are in order as follows: going out, preparing breakfast, having breakfast, preparing lunch, having lunch, preparing dinner, having dinner, washing dishes, having snack, sleeping, watching TV, studying, having shower, toileting, napping, using internet, reading book, laundry, shaving, brushing teeth, talking on the phone, listening to music, cleaning, having conversation, having guest, changing clothes, others. The data example below presents how ARAS dataset represent sensor status at every second and the activity labels of two residents. Timestamp S1 S2 S3 S4 S5 S6 S7 S8 S9 S10 S11 S12 S13 S14 S15 S16 S17 S18 S19 S20 P1 P2 Two aforementioned datasets [6, 22] illustrate two main variations of data representations in the sensor-based activity recognition literature. In this paper, further analysis of both datasets requires us to integrate them in a uniform way. By this motivation, we define the format of our synthetic dataset to include the critical information we need, as the sample data are shown as followed: Timestamp refers to critical timestamps in multi-person scenario, where critical indicates that there are at least one sensor triggered in that second. Every critical sensor activation also noted as one sensor event in this paper. P1 to P5 represent five residents, respectively. S denotes the sensor ID triggered at t by the resident Pn. We leverage sensor activation that annotated with effective activity labels from CASAS and ARAS dataset to model resident activity patterns, which are action-based behavioral models. Data pre-processing is responsible for unifying the format of the dataset from different sources. We pre-process the public singleoccupancy datasets, by developing an algorithm to capture the critical timestamps for sensor status transition or activity transition, then format the data as discussed aforementioned in Section 4.1.3. Timestamps. In CASAS datasets [22] , sensor events are recorded in order with timestamps that sensors are triggered. However, for the ARAS dataset [6] , data is recorded in every second, that is, the dataset has 86399 lines which refers a day has 86399 seconds. Capturing critical timestamps for both datasets is the first step to uniform the data. In identifying the start or end points for sensor events, we leverage the last sensor fired representation [77] , which means the last triggered sensor continues to retain its value as 1 and changes to 0 when the next sensor is triggered. In our data format, the transition between S and S + 1 represents this information, that is, the sensor triggered by resident Pn is changing. And the corresponding switching timestamp refers to critical timestamp. To obtain critical timestamps, we select switching time points when transitions occur. However, this processing step includes both real activity switching time and noise values. The sensor data is sampled every second. The high-density data includes several kinds of unexpected noise values, e.g. break in continuous activity, loss of data, and so on. These noisy values should be smoothed before the final mapping stage. By understanding and learning the relation between activities and sensors, we try to figure out how the system performance will be affected by the sensor layout. We leverage five different single-occupancy published datasets [22] to build synthetic multi-person datasets in this paper. The occupants from each dataset will act as residents in our emulated environment with their realistic activity data and learned patterns in different multi-person scenarios. Table 5 illustrates the properties of several multi-occupancy scenarios and infrastructures' detail in respective emulated environments. The synthetic multi-person datasets we used for further modeling contains information as described in the section 4.1.3, consisting of important timestamps and sensor ID triggered by residents. Figure 3 shows the sensor activation list in our four-person scenario. Four colored lines represent four residents in the space, respectively. In order to evaluate the quality of the synthetic multiple-occupancy datasets and validate whether the proposed synthetic methodology can represent characteristics of the real-environment datasets, we compare the similarity between the ARAS dataset [6] , a real two-person dataset, and the synthetic two-person dataset. The ARAS dataset in ARAS(real) floorplan is defined as the baseline, referred to Configuration I in Figure 4 . Then we compare the synthetic dataset in the ARAS floorplan (Configuration II ), the ARAS dataset in the emulated floorplan (Configuration III ), and the synthetic dataset in the emulated floorplan (Configuration IV ) to the Configuration I. To quantify the variation of localization resolution in different configurations, both labeling accuracy (LA) and corresponding decreasing rate (DR) of the same annotation task are analyzed, shown in Figure 4 . We first investigate the similarity when using the ARAS (real) and synthetic datasets in the ARAS (real) floorplan, as shown in Figure 4a and 4b. Even the synthetic dataset is stochastic and has no relation with the real dataset, it performs as well as the real dataset under an identical layout for the annotation task. We then compare the performance when the ARAS (real) and synthetic datasets with the same emulated floorplan, as the experiment results shown in Figure 4c and 4d, which also exert similar performance in the same task. This similarity demonstrates that the synthetic datasets we utilized for further analysis are capable to provide convincing explanations for the annotation task in this paper. Based on the sensor activation versus time, we model the real trajectory by choosing optimal paths between adjacent sensor activities. The detected trajectory is synthesized under the real trajectory and the different localization resolution. Figure 3) , where the target resident's locations can be refereed by the sensor layout, a series of location nodes would be considered to generate the best route when a resident move from one sensor to another. The best route generated is utilized as real trajectories for residents in this paper as people always choose the shortest route when they move from one node to another node. The nodes include sensor locations and junction locations to avoid obstacles, as shown in Figure 5 . Sensors are attached on furniture or appliances that residents are most frequently interact with. Some important nodes are added in the Hallway and Living Room as a transition point from one sensor to another sensor, refer to as junction locations. Bridges represent how people will move from one sensor to the nearby sensors. If the number of nodes is not big enough, the path will be intercepted by the walls. More nodes can result in a smoother moving path but will increase computational cost. Given nodes layout and possible path connections (edges), the bridge map can be graphed (in Figure 5 ). The bridge map includes all possible optimal ways in the room to move from one activity (sensor) to other ones. Finding the shortest way between two sensors can be treated as a classic optimization problem. To find an optimal path in 2D environments, there are several widely evolved navigation algorithms, e.g. Dijkstra's algorithms [26] , A* [31] , and Depth First Search (DFS) [75] . Dijkstra's algorithm is a technique to find the optimal and shortest paths between two different nodes (i.e., starting point and destination point). To accelerate the calculation, a modified Dijkstra's algorithm is adopted to synthesize the resident's real trajectory in this study. Setting the nodes = {1, · · · , }, the possible connections = {1, · · · , } , the selection of optimal path can be done in ( + log ) step for each selection. During the selection, each edge should have one or multiple weights. The weights are used to evaluate the capacity and priority of edges. In this work, the only considered parameter is the distance, without additional priority factors. Each path can be endowed priorities depending on the specific case to improve further. Figure 6 shows the emulated real trajectories based on four residents' sensor activation lists, respectively. The RTLS devices can provide the detected trajectory within the respective resolution [5, 15, 60, 63, 81] . The localization range and resolution of the facilities depend on the different indoor positioning technologies [68] . After the real trajectory is generated, the possible resolution is incorporated in the trajectory with a given device (sensor) performance. Assuming the localization resolution is and the possibility of each point in the error range is identical, the error can be added as Equation 1. where is the possible angle for sensor from real location ( , ), ( , ) is the detected location and ∈ [0, 1] is error factor. In this study, the distribution of added error is assumed uniform in all locations of the shooting range. Error factor can be any value from zero to one with identical possibility. If the resolution of a particular device (sensor) has a specific pattern (e.g. semi-normal distribution), the error factor ( ) should be controlled to have a changing possibility along with the distance from the sensor. If the sensor is attached on a wall, resulting in a sector detected range, the detecting location is controlled in a corresponding direction. Figure 7 shows a representative trajectory (L = 0.5 meters), that is how MoSen emulates localization devices collect resident's data in a day. It gives a straightforward insight into how residents have Typical studies in multi-occupancy scenarios may consider two residents in the same space, limited by costly devices, annotation problem, localization accuracy, or others. With our MoSen platform, we investigate different multi-person scenarios by emulating independent residents in any target sensor network. We then provide the sensor selection strategy for sensor network by balancing the trade-off between deployment cost and system performance. The results for our experiments would inform real sensor deployment of the multi-resident smart home. As the scarcity of multi-occupancy activity dataset in the sensor-based setting, there are few research studies considering more than two residents in a single space. Leveraging two real two-person datasets, noted as ARAS [6] and CASAS [22] , we validate the efficiency of our twoperson data generator in Section 4.3.2. We expand the generator and emulate multi-occupant scenarios in households with 2 to 5 residents. MoSen platform is adaptable to different or customized indoor environments, and the critical inputs to the platform are the sensor locations in the target floorplan. In this evaluation, one design strategy chosen for the increasing multi-person scenarios is to maintain similar layout complexity and sensor density, only the number of bedrooms changes with the number of residents. Figure 1 shows the representative floorplan in the four-person scenario and Table 4 demonstrates the space size of each scenario. Generally, every multi-person environment in this evaluation consists of: Living room, kitchen, laundry room, bathroom, hallway, bedrooms (corresponding quantity is depending on the number of residents) and others. Synthetic multi-person datasets and multi-occupancy environments are integrated to emulate residents' indoor trajectories. The emulation idea is based on when people move between two triggered sensors, they probably would choose the shortest way from the current sensor location to the next. This shortest way is referred to as the best route in this paper, as discussed in the Section 4.4.1. We leverage these best routes as residents' real trajectories, while the detected trajectories are emulating how different positioning resolutions of sensor network track human activities. However, different positioning technologies with respective resolutions exert varying accuracy when these technologies locate residents in a real environment, which also has diverse deployment costs [51] . For instance, when detecting human locations and trajectories, there are deviations from the ground-truth values, caused by different resolutions. For example, Estimote [59] announced their location beacons could achieve 1.5 meters accuracy, which means that the detected location and their real location might be away from each other at most 1.5 meters. In this paper, with varying resolutions of different technologies (ranging from 0 meter to 10 meters), we add respective deviations to the best route, refer the new trajectories as the detected trajectories that are obtained by the distinct localization devices. Providing personalized services to different residents is one of the most important applications in multi-person smart homes. By profiling the resident's activity pattern, recording what sensors they have interacted with in their daily routines are the preliminary work in the multi-person activity recognition. Automatic identification labeling is the central problem when we try to model multi-occupancy activity recognition based on ambient-sensor networks. One feasible solution is to use residents' trajectories to label the triggered sensors by matching the location of the sensor and the resident. The Graph and Rule Based Algorithm (GR/ED) [24] and the nearest neighbor standard filter (NNSF) [? ] are leveraged in this paper to solve the annotation problem, as details in Section 3. We compare the detected locations of all residents and the triggered sensor at every critical timestamp. The triggered sensor, hence, would be assigned to the resident who has the shortest straight-line distance. As illustrated in Section 4.4, in the MoSen emulation, the best route represents the ground-truth locations of every resident, while the detected trajectories are utilized in the realistic annotation process. In this evaluation, we choose the automatic identification annotation problem as the central task to illustrate how MoSen platform evaluate the impact of the number of residents, the number of sensors, positioning technologies, and different sensor layouts. MoSen can additionally provide a sensor selection strategy that fits for the user's requirement while optimizing for the number of sensors and their placement (hence the installation cost) to achieve the highest labeling accuracy. While the real datasets leveraged to model the synthetic multioccupancy datasets have data for more than one month (Table 3) , only one-day data from the synthetic datasets are utilized in further analysis. We leave an extensive evaluation for weekly data or monthly data for future work. Table 6 , we first compare the performance of automatic identification labeling in several multi-occupancy scenarios. We use similar floorplans and sensor layouts to emulate multiple residents living in a single space. It generally provides information about the selection of localization devices when an application has a different demand on labeling accuracy. The labeling accuracy can be improved with a smaller resolution, but the cost of devices will also increase. Insights from this table could provide valuable information to designers or practitioners when designing the actual sensor deployments. For example, in a 2-person scenario, at least a 4-meter localization device is needed when a user requires the labeling accuracy higher than 90%, but in a 5-person scenario, a 2-meter resolution is required to achieve the same-level accuracy. Resolution. Different techniques have varying performance in localizing individuals, and localizing multiple residents simultaneously is challenging. In this experiment, we evaluate how the varying localization resolutions affect the final labeling accuracy in multi-occupancy scenarios. In each scenario, we implement the same activity sequence in the proposed floorplan but increase the localization resolution. We increase the deviations of the residents' trajectories and add more noise to the best route. Figure 10 : Distributions of the sensor connection length in four multi-occupancy scenarios. The x-axis represents the distance between nodes (sensors), and the y-axis represents the frequency of the corresponding distance between nodes. As shown in Figure 8 , different declining sigmoid curves indicate how the labeling accuracy decrease with the increasing resolution. In order to understand the decreasing trend of labeling accuracy in different occupancy scenarios, we increase the number of residents in the same setting and investigate the change in labeling accuracy. Figure 9a shows the overview of the labeling performance when increasing the number of residents; Figure 9b and Figure 9c demonstrate the decline rate of the labeling accuracy for the four multi-occupancy scenarios, respectively. Labeling accuracy decreases when the number of residents increases. It is worth to note that the highest point of decline rate in these scenarios are all between 3 and 3.5 meters. The similarity between these four scenarios is because we utilize a similar sensor density for them to keep the same complexity of the sensor deployments. In other words, the highest point of decline rate depends on the sensor deployments. In our experiments, the four multioccupancy scenarios are using four different floorplans as we described in Table 5 , the number of bedrooms is changing with the residents' quantity. However, we keep the same sensor density of these floorplans, as also shown in Table 5 , the sensor density of four scenarios are ranging from 0.43 to 0.46 sensors per m 2 . We also utilize mean distance to compare the sensor density of four floorplans. The Delaunay triangle method [8] is adopted to connect each sensor node with neighboring sensor nodes, while the length of theses connections is leveraged to calculate mean distance. The distributions of the connections' length are shown in Figure 10 . After removing outlier-connection near the border, we calculate the average length for these connections. This average length refers to mean distance. In this setting, same decreasing trends in all scenarios (shown in 9b and 9c) indicate the effect of the same sensor density (0.43-0.46 sensor/m 2 ) of our floorplans. For instance, each sensor occupied 2.3m 2 in the 2-person scenario on average, and the mean distance of nodes is 1.6865 meters, where the labeling accuracy decrease most dramatically changed during 3.0-3.1 meters. Since other scenarios also have similar sensor density (shown as mean distance in Figure 10 ), the intervals of transition point for these scenarios are also similar. This impact is significant when a smart home designer considers the potential sensor layout to better fit the user's requirements. In the sensor-based activity recognition, different sensing systems are proposed to monitor indoor activities by leveraging various ambient sensors, which also have diverse performance. Previous works [39, 52] have emphasized the significance of sensor selection on multi-device environments (MDE). Dynamically selecting the best sensor for a specific activity recognition system is a typical approach developed for context-aware MDE [39, 45, 85] . These works are mostly about body sensor network (BSN), and they proposed dynamic designing strategies in the BSN environment. In this paper, realizing recommendations on sensor selection for different smart homes is the key objective of our analysis, to achieve a better trade-off between user's requirement, labeling accuracy and system cost. Four requirements (namely cost, acceptance, accuracy 6 7 8 9 13 14 15 16 20 21 22 23 24 28 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 48 49 50 51 52 55 56 58 59 60 61 65 66 Accuracy Sensor ID 5m Average value (a) Example of the way each sensor affects the final annotation accuracy; in a 5-person scenario with 5-meter resolution. The red line represents the overall accuracy, comparing to different sensors' performance (grey bars), individually. and privacy [20] ) are considered in this work for a practical activity recognition system. 1. Cost. Low-cost and low-battery consumption of sensors contributes to better competence. Acceptance. Wearable or ambient sensors that are non-obtrusive will be better. Accuracy. Accuracy is one of the most important factors to be considered. Privacy. The non-visual system is preferred. The sensor-based activity recognition system is chosen by many researchers due to its non-obtrusiveness and privacy-protection [78] . The trade-off between the remaining two factors leaves an interesting but tricky balance to attain. The sensor configuration often depends on the installation cost and sensor prices. We define sensor sensitivity to identify the way a sensor's location and interaction frequency with residents affect the labeling accuracy. This information is valuable for choosing the best and the most cost-effective sensor network for the smart home environment. In this section, we focus on identifying sensor sensitivity and recommendations on the final sensor selection for a specific layout. We use a five-person scenario here as the case study to illustrate the proposed sensor selection strategy. The insights from the case study are extendable to other scenarios and any new floorplan and sensor layout. Identification annotation accuracy. The initial analysis of the specific layout (a five-person scenario in this case study) is emulated in the MoSen system, where annotation accuracy is considered as one of the most important factors in the system. The related analysis has been described in Section 5, and the results show how the localization resolution and sensor density affect the accuracy. Figure 12 shows the requirement for localization resolution with varying labeling accuracy, ranging from 80% to 95%, respectively. For the five-person scenario, if the labeling accuracy is 80%, the localization resolution requirement should be at least 3.72m or more precise. When the accuracy changed to 90%, the resolution should attain at least 1.83m, which has higher accuracy but also more expensive than the former one. Sensor individual effect. Each sensor is evaluated in order to identify the individual effect on the integrated performance, as shown in Figure 11 , and the x-axis represents the sensor ID. With different localization resolutions, sensors might have different performance, and the cumulative effect is shown in this figure. Each rectangle represents the sensor individual effect under different localization resolution. For example, Sensor 6 might have opposite effects when the resolution is varying, but for sensor 14, all effects are positive. This cumulative result leads to a more intuitive concept for sensor sensitivity. Some sensor IDs are absent here as they were not triggered in the experiment. Sensor sensitivity. A sensor's effect is represented as sensor sensitivity, and recommendations will be provided based on sensor sensitivity and sensor cost, as well as the localization resolution and labeling accuracy. Sensor sensitivity integrates the activity patterns of the resident and how frequently residents interact with a specific sensor. As shown in Figure 13 , the different radii of the circles represent different sensor sensitivities, where a larger radius allows a larger detection area (with lower sensitivity). Data scarcity on multi-occupancy scenarios Designing activity recognition systems for multi-occupancy scenarios has been challenging for researchers. First, the complexity of human activity increases dramatically when there is more than (d) Labeling accuracy at 95% Figure 12 : Requirements for localization resolution with the expected labeling accuracy, when the labeling accuracy is 80%, 85%, 90%, 95%, respectively. one person within the same environment. Different from the singleoccupancy scene, the interactions between residents introduces uncertainty when defining indoor activities. Second, annotating the triggered sensor with corresponding identification and activity is challenging in the multi-resident scenario. The lack of groundtruth values is deficient to the multiple-analysis with advanced machine learning or deep learning technologies. Third, data privacy is a big concern when collecting real human data. Even for the sensor-based activity recognition system, which does not invade privacy as severely as the video-based systems do, there still is a need to attain the trade-off between data utility and privacy. Fourth, there are still numerous challenges that need to be overcome in the single-occupancy environment [11] . These gaps, hence, impede the practical data collection on the multi-occupancy scenario. We have built MoSen system to investigate the multi-occupancy scenarios by generating synthetic multi-occupancy behavior model based on real human activity patterns and emulate these models in a virtual environment. We use collected datasets from real installations to represents activity patterns of individuals. However, available datasets often do not have the direct interaction between residents. Due to the data scarcity of the multi-occupancy scene, the synthetic method bridges the gap and the analysis is valuable for us to design a real multi-occupancy scenario in the future. The strategies proposed in MoSen system can extend to any floorplan, and initial analysis of each specific scenario provide designers on how to better design a sensor-based system in balancing the cost and accuracy. We choose the identification labeling as the key question in this paper and leave other parameters for future research. Towards Practical Utility of Sensor-based system Human activities are hard to model in a uniform way, especially when they have different backgrounds, diverse habits, and varied activity performances [86] . The uncertainty from the spatial and temporal difference also increases this difficulty [23] . In the Figure 13 : Sensor sensitivity in a five-person scenario. Larger radius of the circle means less sensitive to the distance and allows to have a bigger detection area, vice versa. multi-occupancy scenario, activity recognition becomes more sophisticated and challenging with the increasing number of residents. There is a also a trade-off between the RTLS localization resolution and sensor costs. Often, researchers in a lab setting prefer the best technology with the highest accuracy, while the accumulated cost are hard to afford in real home designs. Finally the floorplans and furniture are diverse between different homes, which results in highly diverse sensor layouts in a real environment. Our proposed system enables reasonable evaluations and design recommendations for each particular home. Often, researchers in a lab setting prefer the best technology with the highest accuracy, while the accumulated cost is hard to afford in real home designs. The floorplans and furniture are diverse between different designs, which results in highly diverse sensor layouts in the real environment. In this paper, we presented MoSen, a framework towards accelerating the real implementation of sensor-based activity recognition systems by analyzing the trade-off between the overall system performance and cost. We investigated the multioccupancy scenarios by emulating the synthetic multi-occupancy behavior models, which are generated by real single human activity patterns, in a virtual environment. The MoSen platform can extend to any floorplan or sensor configuration. The initial analysis of different specific sensor configurations will provide the designers or practitioners with an effective sensor selection strategy. More quantified results will be shown in future work. In this paper, we evaluate the efficacy of the MoSen platform with an automatic identification annotation task using experiments on synthetic datasets and show how the annotation accuracy is affected by the number of residents, different localization resolutions, and sensor density. Through our trace-driven simulations, the effect of each sensor is also analyzed. Then the sensor selection strategy on the system is provided. Other context-aware tasks will be emulated in our future work. WiDeep: WiFi-based accurate and robust indoor localization system using deep learning Multi-person localization via RF body reflections Smart homes that monitor breathing and heart rate Indoor positioning based on visible light communication: A performance-based survey of real-world prototypes Broadband acoustic local positioning system for mobile devices with multiple access interference cancellation ARAS human activity datasets in multiple homes with multiple residents Sensegen: A deep learning architecture for synthetic sensor data generation Unravelling effects of the pore-size correlation length on the two-phase flow and solute transport properties; GPU-based pore-network modelling Privacy and synthetic datasets RFID-enabled Real-Time Location System (RTLS) to improve hospital's operations management: An up-to-date typology Multioccupant activity recognition in pervasive smart home environments IoT Wearable Sensor and Deep Learning: An Integrated Approach for Personalized Human Activity Recognition in a Smart Home Environment Multiple target tracking with RF sensor networks Real-time locating systems (RTLS) in healthcare: a condensed primer Evolution of indoor positioning technologies: A survey Large scale gan training for high fidelity natural image synthesis GCHAR: An efficient Group-based Context-Aware human activity recognition on smartphone Habitat monitoring: Application driver for wireless communications technology UWB Indoor Positioning Algorithm Based on TDOA Technology Elderly activities recognition and classification for applications in assisted living Interaction models for multiple-resident activity recognition in a smart home Transfer learning for activity recognition: A survey CASAS: A smart home in a box Tracking systems for multiple smart home residents. In Human behavior recognition technologies: Intelligent applications for monitoring and security SynSys: A synthetic data generation system for healthcare applications A note on two problems in connexion with graphs Differential privacy: A survey of results Occupancy modeling and prediction for building energy management GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification Automatic annotation of daily activity from smartphone-based multisensory streams A formal basis for the heuristic determination of minimum cost paths A robust human activity recognition system using smartphone sensors and deep learning Geomagnetism for smartphone-based indoor localization: Challenges, advances, and comparisons A context-aware experience sampling tool Robust human activity recognition from depth video using spatiotemporal multi-fused features PATE-GAN: Generating synthetic data with differential privacy guarantees SmartPDR: Smartphone-based pedestrian dead reckoning for indoor localization Tracking from one side: multi-person passive tracking with WiFi magnitude measurements Pbn: towards practical activity recognition using smartphone-based body sensor networks Beacon-based multi-person activity monitoring system for day care center Survey on deep learning in music using GAN A survey of mobile phone sensing Learning by tracking: Siamese CNN for robust target association CoMon+: A cooperative context monitoring system for multi-device personal sensing environments An active resource orchestration framework for pan-scale, sensor-rich environments Location-based trust for mobile user-generated content: applications, challenges and implementations Plugo: a vlc systematic perspective of large-scale indoor localization Robust location-aware activity recognition using wireless sensor network in an attentive home Basmag: An optimized HMM-based localization system using backward sequences matching algorithm exploiting geomagnetic information Facing the reality of data stream classification: coping with scarcity of labeled data A meta-review of indoor positioning systems A closer look at quality-aware runtime assessment of sensing models in multi-device environments Multi-residential activity labelling in smart homes with wearable tags using BLE technology Ion Emilian Radoi, Victor Asavei, Alexandru Gradinaru, and Alex Butean. 2020. A Comprehensive Survey of Indoor Localization Methods Based on Computer Vision Wi-Fi fingerprinting in the real world-RTLS@ UM at the EvAAL competition Indoor localization with audible sound-Towards practical implementation A survey on health monitoring systems for health smart homes Indoor positioning and navigation with camera phones Systems and methods for object tracking with wireless beacons Highly accurate 3D wireless indoor positioning system using white LED lights Learning and detecting activities from movement trajectories using the hierarchical hidden Markov model LANDMARC: indoor location sensing using active RFID Indoor person identification through footstep induced structural vibration Sensor-based activity recognition in the context of ambient assisted living systems: A review Multi-resident activity recognition using incremental decision trees Indoor localization using controlled ambient sounds Angel Manuel Guerrero Higueras, and Vicente Matellán Olivera. 2020. A context-awareness model for activity recognition in robot-assisted scenarios Comparing ubisense, bespoon, and decawave uwb location systems: Indoor performance analysis LOSNUS: An ultrasonic system enabling high accuracy and secure TDoA locating of numerous devices A mobile robot localization using external surveillance cameras at indoor Smart devices are different: Assessing and mitigatingmobile sensing heterogeneities for activity recognition IoT based mobile healthcare system for human activity recognition Online personalization of crosssubjects based activity recognition models on wearable devices Activity recognition in the home using simple and ubiquitous sensors Depth-first search and linear graph algorithms Loced: Location-aware energy disaggregation framework Accurate activity recognition in a home setting Deep learning for sensor-based activity recognition: A survey Simultaneous tracking and activity recognition (STAR) using many anonymous, binary sensors Arraytrack: A fine-grained indoor location system An RFID indoor positioning algorithm based on Bayesian probability and K-nearest neighbor Attngan: Fine-grained text to image generation with attentional generative adversarial networks A Deep-learning-based Method for PIR-based Multi-person Localization A survey of indoor localization systems and technologies Activity recognition from onbody sensors: accuracy-power trade-off by dynamic sensor selection Activity prediction for improving well-being of both the elderly and caregivers Adversarial feature matching for text generation