1 Citation: Nguyen, K.A. Stewart, R.A. Zhang, H. (2013) An autonomous and intelligent expert system for residential water end-use classification, Expert Systems with Applications, http://dx.doi.org/10.1016/j.eswa.2013.07.049. AN AUTONOMOUS AND INTELLIGENT EXPERT SYSTEM FOR RESIDENTIAL WATER END-USE CLASSIFICATION Abstract Intelligent metering technology combined with advanced numerical techniques enable a paradigm shift in the current level of water consumption information provision that is available to the customer and the water business. The aim of this study was to develop an autonomous and intelligent system for residential water end-use classification that could interface with customers and water business managers via a user-friendly web-based application. Water flow data collected directly from smart water meters includes both single (e.g., a shower event occurring alone) and combined (i.e., an event that comprises several overlapping single events) water end use events. The authors recently developed intelligent algorithms to solve the complex problem of autonomously categorising residential water consumption data into a registry of single and combined events using a hybrid combination of techniques including Hidden Markov Model (HMM), Dynamic Time Warping (DTW) algorithm, time-of-day probability functions, threshold values and various physical features. However, the issue still remained, which is the focus of this current paper, on how to integrate self-learning functionality into the visioned expert system, in order that it can learn from newly collected datasets from different cities, regions and countries, to that collected for the training data. Such versatility and adaptive capacity is essential to make the expert system widely applicable. Through applying alternate forms of HMM and DTW in association with a frequency analysis technique, a suitable self-learning methodology was formulated and tested on three independent households located in Melbourne, Australia with a prediction accuracy of between 80-90% for the major end-use categories. The three principle flow data processing modules (i.e. single and combined event recognition and self-learning function) were integrated into a prototype software application for performing autonomous water end-use analysis and its functionality is presented in the latter sections of this paper. The developed expert system has profound implications for government, water businesses and consumers, seeking to better manage precious urban water resources. 2 Key words: water end-use event, water micro-component, residential water flow trace disaggregation, hidden markov model, dynamic time warping algorithm, gradient vector filtering, adaptive analysis, adaptive function, water demand management 1. Introduction Following a long-standing drought for the second half of the last decade across most of Australia, most capital cities introduced a portfolio of water demand management strategies and constructed capital intensive rain-independent bulk supply sources to ensure the provision of a secure water supply (Willis et al., 2009a). Residential water consumption is often dependent on the water using fixtures or appliances within a dwelling, the household makeup, the regional location and a plethora of socio-demographic influences. A study of end-use water consumption aids water planners and consumers to identify where and when water is used in a household and hence, assists in driving proactive reductions in consumption (Loh and Coghlan, 2003; Stewart et al. 2010; Makki et al., 2011). However, the existing water end-use classification techniques require an extensive use of human resources to collect a combination of water use behaviours and appliance/fixture stock inventory data through a household audit followed by 2-3 hours of analyst time for each home (Stewart et al., 2011; Beal and Stewart, 2011). Presently, water end use or micro-component studies are restricted to the research domain, since it is not economically viable to complete citywide studies due to resource intensity of the flow data classification process. Intelligent and autonomous end use classification firmware is required along with bold large-scale roll-outs of high commercially available high resolution smart water meters in order to bring this level of water consumption information to the masses. Currently, an increasing number of smart water metering technologies have been introduced to the market. Such metering devices embrace two distinct elements: meters that use new technology to capture water use information and communication systems that can capture and transmit real-time water use information (Stewart et al., 2010). These forms of smart metering technology can provide total consumption data to the customer and utility at high levels of resolution; however, they fail to disaggregate this data into its end-use use categories. In the present study, an attempt to automate the domestic water end-use classification process and, thus, to enhance current practices in the urban water industry is required, and a robust hybrid model that employs HMM, DTW and event probability techniques is developed. The proposed system will allow individual consumers to log into their user-defined water 3 consumption web page to view their daily, weekly, and monthly consumption tables as well as charts on their water demand across major end-use categories (e.g., leaks, clothes washer, shower, irrigation). This system can rapidly alert customers of leak events so that they can immediately be addressed rather than waiting for the present slow feedback process from the traditional metering technology (e.g., the quarterly bill). The system will also benefit water businesses by rapidly providing water end-use reports of any desired property or suburb, thereby empowering them to develop more targeted conservation programs in water scarcity periods (e.g. Willis et al., 2011a; 2011b), improved water demand forecasting (e.g. Makki et al., 2011) and optimised pipe network modelling (e.g. Carragher et al., 2012; Beal and Stewart, 2013). Figure 1 summarises below the three key stages in the development of this system: • Stage 1: Develop a non-adaptive intelligent model that autonomously disaggregates collected water flow trace signatures that were collected from the intelligent water meters into a categorised registry of water end-use events (Nguyen et. al., 2013a, 2013b). • Stage 2: Equip the model with adaptive capabilities that enable it to interpret untrained water end-use signature traces, thereby allowing it to adapt to new situation context (e.g. different city to training dataset). • Stage 3: Develop an intelligent and user-friendly expert system and prototype firmware for use by consumers and businesses. [INSERT FIGURE 1] 2. Background 2.1. Existing water metering process and new paradigm Water consumption readings are usually recorded manually on a quarterly or half yearly basis. Under most situations, a whole year’s worth of water consumption data is described by only two to four data points in the water businesses billing system. Conventional water meters count each kilolitre of water as it passes through the meter and do not have the ability to record when (i.e., the time of day) and where the consumption takes place (e.g., washing machine, leaks) (Stewart et al., 2011). These systems produce limited and delayed water consumption information. The current water metering system does not typically provide real-time or continuous/frequent water consumption data, and in cases where it does, it does not provide a sufficient level of data resolution to allow water end-use event categorisation. While real-time 4 or near real-time water consumption data provisioning is now commercially viable with current smart metering technology, there is presently no firmware that can autonomously disaggregate this flow data into the ‘richer’ water end use categories of consumption. Until, such firmware is developed, powerful water end use information will be contained to expensive research studies (e.g. Beal and Stewart, 2011). 2.2. Intelligent system development using various pattern recognition techniques To overcome these limitations, intelligent metering technology is united with advanced pattern recognition techniques to enable a paradigm shift in the current level of water information provision available to the customer and water business. The aim of this project is to develop an autonomous and intelligent system for residential water end-use classification through the employment of various mathematical techniques, namely HMM, DTW and frequency analysis as presented below. A hidden Markov model (HMM) is a statistical Markov model in which the system being modelled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered to be the simplest dynamic Bayesian network, which is one of the most popular techniques in the field of hand writing and speech recognition (Ephraim and Merhav, 2002). Principal theories and typical applications of this technique have been presented in Baum and Petrie (1966), Starner and Pentland (1995), Baum et al (1970), Cho et al (1995), Ghahramani and Jordan(1997), Chien and Wang (1997), Satish and Gururaj (2003) or Tapia (2004). In this study, HMM was utilised as the main classifier for water end use classification decision making. Another important mathematical tool is the Dynamic time warping (DTW) algorithm, which is a popular method for measuring the similarity between two time series of different lengths. In general, this task is performed by finding an optimal alignment between two series with certain restrictions. The sequences are extended or shortened in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension (Myers and Rabiner, 1981). This technique has been widely applied in prototype selection (e.g. Nguyen et al., 2011), pattern recognition (e.g. Myers and Rabiner, 1981; Muller, 2007; Rabiner and Juang, 1993; Sakoe and Chiba, 1978; Manmatha and Srimal, 1999; and Marquez, 2001) or word image searching (Manmatha and Rath, 2002). DTW played an important role in this 5 study because it was utilised to the task of grouping similar unclassified events together to prepare for adaptive analysis. Probability analysis was also applied in this study. For the purpose of this study, frequency histogram data distributions were formulated from the training data to examine the likelihood of event characteristics occurring. A histogram comprises tabular frequencies, shown as adjacent rectangles, which are erected over discrete clusters (bins), with an area equal to the frequency of the observations in the interval. The height of a rectangle is also equal to the frequency density of the interval, i.e., the frequency divided by the width of the interval (Pearson, 1895). For example, suppose that we are given a vector that contains a volume of 10 events in litres, as follows: 𝒗 = [4.63, 4.57, 4.59, 4.63, 4.69, 4.75, 4.71, 4.79, 4.92, 4.76]. Then, the volume distribution using histograms with different numbers of clusters are presented in Figure 2. [INSERT FIGURE 2] In the present study, a distribution of all of the physical characteristics of a group of events (i.e., the volume, duration and flow rate of each water end use category) will be determined using the histogram method with 5 clusters, from which the representative value of each group can be obtained by selecting the entry that has the highest frequency. It should be noted that the selection of 5 clusters is based on the fact all events in each unclassified group after the grouping process will have approximate volume, flow rate and duration (i.e. the values of each feature do not spread over a wide range) regardless the end use category they actually belong to; therefore, the utilisation of 5 clusters is sufficient to determine the representative values. Given a group extracted from the tested home that contains 10 events whose volumes are presented above, the most typical volume representing this group is 4.6 L because it attains the highest frequency of 4 when using a histogram of 5 clusters. 3. Classification model development 3.1. Collected data for the study Data utilised for the development of the model is sourced from 252 residential households fitted with a smart meter and data logger and located in the urban south-east corner of the State of Queensland (SEQ), Australia, in both summer and winter for 2 years, 2010 and 2011. These households are consenting participants in the recently completed South-east Queensland 6 Residential End Use Study (SEQREUS) that was funded by the Queensland State Government (Beal and Stewart, 2011). A sample of properties is taken from the four key cities in this interconnected SEQ region, namely, Sunshine Coast Regional Council, Brisbane City Council, Ipswich City Council and Gold Coast City Council to use as a database for this study. The smart meters provided sufficient resolution (0.014L/pulse every five seconds) of water flow data to the household to complete a water end use or micro-component disaggregation process (i.e. each tap, shower, etc.). Participating households were also requested to participate in an appliance/fixture stock inventory audit and complete a questionnaire survey that was developed to assist in determining the socio-demographic characteristics and socioeconomic status of the households. All such data was required by the team on the SEQREUS in order to manually complete the water end use disaggregation process as well as for statistical analysis related to a number of objectives related to that study (e.g. Carragher et al. 2012; Beal et al., 2011a; Beal and Stewart, 2013). This studies budget was in excess of $US1,000,000 with a reasonable proportion of that budget assigned to water end use analysis process for a sample of 250 households, which is acceptable for a detailed research investigation but the disaggregation process needs to be automated for widespread application. Nonetheless, this extensive dataset of high resolution flow data and associated water end use event registry provided the training set for this study. 3.2. Stage 1: Non-adaptive classification model With the availability of data collected from SEQREUS, the building of an autonomous flow trace analysis system commenced (Figure 3). In Stage 1 of the study, a single event analysis module was developed to categorise all of the unclassified single events that occur in isolation. In this module, HMM, DTW and an event time-of-day probability function were applied to autonomously assign all of the single events to appropriate categories, with an average accuracy of 84.1% (Nguyen et al., 2013a),which is slightly lower than that of combined event due to low recognition accuracy of bathtub and irrigation. Then, a combined event analysis (i.e., a group of concurrent single events) module, which remains one of the most complicated problems in the field of pattern matching, was developed. Several techniques were employed for splitting apart the various events in a combined water use event, including HMM, the Gradient Vector Filtering method and different probability functions that were extracted from the various physical features of the existing database (Nguyen et al., 2013b). The classification 7 outcomes have shown that approximately 88% of the combined events were accurately disaggregated into their end use components and then recognised. [INSERT FIGURE 3] 3.3. Stage 2: Adaptive classification model The classification model that was developed in Stage 1 was initially trialled in a different region (i.e. Melbourne) to that where the model was developed (i.e. different to the SEQ training data) to examine its versatility. Model accuracy dropped due to some fundamental causes, including, the presence of new end-use categories that have not been identified in SEQ (e.g., evaporative air conditioner) as well as some differences in water consumption behaviours for some of the end uses which may be due to a range of macro factors (i.e. different climatic conditions, government policy, etc.). To overcome this challenge, we needed to build some self-learning functionality into the model to make it more adaptive to different regions. Therefore, the objective of the research and focus of this paper was to integrate adaptive features into the model employing appropriate techniques. The establishment of this critical analysis module is articulated in the next section. 4. Adaptive classification model development 4.1. Overview of model architecture This function was developed to analyse all of the events that exhibit patterns which cannot be confidently recognised by the non-adaptive single and combined event modules. Figure 4 provides an overview of the overall analysis procedure for the adaptive model. [INSERT FIGURE 4] At the very first step for adaptive learning, the HMM threshold value, which is explained in the next section and is used to determine whether an event is classifiable by the existing non-adaptive analysis modules, will be applied when the model is operated in a new region. Classifiable events are initially analysed by these modules (Stage 2a), while all of the unclassifiable events are processed by a newly developed adaptive analysis unit (Stage 2b). At the end of this process, all of the unclassifiable events in stage 2b will be incorporated into the existing database to improve the HMM classifier. The advantages of the proposed technique in 8 comparison with other adaptive learning recognisers are the simple algorithms, the fast analysis time and the lower dependency on the existing database, which has been proven in a later section of this paper. A detailed technical development of this analysis module is presented in Figure 5. [INSERT FIGURE 5] The main objective of this analysis module is to address unclassified events that cannot be analysed by the existing non-adaptive model. The first required step is to group all of the events that are likely to belong to the same category together, using the DTW technique with various physical features that are extracted from each subjected event. The outcomes of this analysis step are several groups that contain similar unclassified events and a set of all of the events that cannot be assembled together. Grouped and ungrouped events are then analysed by HMM, DTW, the event time-of-day probability function and another set of physical parameters, which eventually results in all of the single events being classified and sometimes an additional set of unclassified events, which belong to a new end-use category. 4.2. New pattern identification using threshold values The threshold values that are applied in this analysis section were achieved through the training of the existing database that was collected in SEQ by using the HMM method. The determination of the threshold values for each end-use category can be explained as follows. Given that 𝑆𝑖 is a set of all single events that belong to category i, where 𝑖 = [1,2, … ,7] represents the shower, faucet, clotheswasher, dishwasher, toilet, bathtub and irrigation, respectively; 𝑆𝑖 is collected in SEQ and is used as a database to establish an HMM model to represent this category, which is denoted as 𝐻𝑀𝑀𝑖. 𝑇𝑉𝑖 is defined as the threshold value of category 𝑖 if 𝑇𝑉𝑖 is the minimum likelihood score that is achieved when using the 𝐻𝑀𝑀𝑖 to recognise 𝑆𝑖. As a result, when the model is operated in a different area, if one event is assigned to category 𝑖 by the existing single event analysis module but its likelihood is less than 𝑇𝑉𝑖, then it is considered to be an event with an unclassifiable pattern and will be set aside for further analysis. Following the comparison process against threshold values, a set that contains all of the unclassified events that require the application of a new analysis process is obtained. 9 4.3. Event grouping using DTW Given that 𝐀 = (𝐴1 , 𝐴2 , … , 𝐴𝑚 ) is a matrix that contains m unclassified events, and 𝐴𝑘 is the kth event of 𝐀. A DTW distance between 𝐴𝑘 and 𝐴𝑗, namely 𝐷𝑘,𝑗, is determined by using the DTW algorithm. By following the same process, a vector 𝐃 = (𝐷1,𝑗, … , 𝐷𝑖,𝑗, … 𝐷𝑚,𝑗) can be achieved to measure the similarity of all of the events to Event j, which is an arbitrary event that can be selected randomly from the subjected group. Nguyen et al. (2011) have found that, for 𝐴𝑘 and 𝐴𝑙, if � 𝐷𝑘,𝑗−𝐷𝑙,𝑗 𝐷𝑘,𝑗 � < 𝑒, where 𝑒 is defined as the threshold value in terms of similarity and j is an arbitrary event, then it is likely that events 𝐴𝑘 and 𝐴𝑙 have similar patterns. However, because of the fact that two completely different events could also have similar DTW distances to a reference event using the proposed method, additional parameters are required for the technique to be applicable to this specific study. In reality, one end-use event is described by four basic physical features, namely the volume (v), duration (t), maximum flow rate (qmax) and most frequent flow rate (qf) for all of the events in 𝑨. An aggregate DTW distance of Event 𝑘 to Event 𝑗, denoted as 𝐷𝐴𝑘,𝑗 , is determined as follows: 𝐷𝐴𝑘,𝑗 = 𝐷𝑘,𝑗 𝑣𝑘 𝑞𝑚𝑎𝑥𝑘 𝑡𝑘 𝑞𝑓𝑘 (1) Thus, if 𝑅𝐷 = � 𝐷𝐴𝑘,𝑗−𝐷𝐴𝑙,𝑗 𝐷𝐴𝑘,𝑗 � < 𝑒 (2) then events 𝐴𝑘 and 𝐴𝑙 will be assigned to the same group. At the end of this process, the first group (denoted as 𝑔1) that contains 𝑝1 similar events, whose relative differences in terms of DTW distances (i.e. 𝑅𝐷 in Equation 2) are less than the threshold value 𝑒, will be obtained. In the remaining (𝑚 − 𝑝1) events, the same process is then conducted to determine the second group, namely 𝑔2, which contains 𝑝2 similar events. This grouping process is repeated until no more groups can be achieved. The overall process will result in 𝑛 groups of events, denoted by 𝐺 = {𝑔1, 𝑔2 … 𝑔𝑛 }, which contain events that have similar patterns, and a set of all dissimilar events that cannot be gathered together. It can be seen from Equation (2) that the selection of the threshold value 𝑒 will affect the number of groups that are achieved after the grouping process, because all of the events that 10 have a relative difference of less than 𝑒 will be assembled. After considering the analysis time while the final classification accuracy is converged, 𝑒 = 0.1 has been adopted for this study. The next analysis step is to classify these grouped and ungrouped events. 4.4. Analysis of unclassified grouped events The classification of grouped events will be undertaken utilising the HMM technique in combination with other physical characteristics. An aggregate likelihood for each group to be classified into category 𝑖 can be determined by using the following proposed equation: 𝐿𝐿𝑖 = 𝑉𝑆𝑖 𝑇𝑆𝑖 𝑄𝑓𝑆𝑖 𝐻𝑀𝑀𝑆𝑖 (3) Where: • 𝑖 = [1, 2, … , 7] represents the shower, faucet, clotheswasher, dishwasher, toilet, bathtub and irrigation, respectively. • 𝑉𝑆, 𝑇𝑆 and 𝑄𝑓𝑆 are, respectively, the volume score, the duration score and the most frequent flow rate score, which are arbitrary numbers derived from the volume, duration and most frequent flow rate of the subjected group, and • 𝐻𝑀𝑀𝑆 is the representative HMM likelihood of the subjected group. from which an event can be classified to Category k if LLk is the maximum. 4.4.1. Determination of the required parameters The volume score, duration score, most frequent flow rate score and representative HMM score (i.e. 𝑉𝑆, 𝑇𝑆, 𝑄𝑓𝑆 and 𝐻𝑀𝑀𝑆 presented in Equation 3 are the four primary parameters that will be employed to aid the classification process. Given an unclassified group that contains 11 events (shown in Figure 6) and that was obtained after a grouping process while verifying the proposed technique against an independent home in Melbourne, the achievement of these features is described in the following key steps: [INSERT FIGURE 6] Step 1: Determine the volume, duration and most frequent flow rate of the classified events in all of the end-use categories that were achieved from the non-adaptive analysis, from which the 11 distribution range and distribution probability of the above-mentioned features of category 𝑖 can be obtained using the event probability histogram method. It should be noted at this step that a histogram with 10 clusters will be applied to shower, faucet, clotheswasher, bathtub and irrigation due to their widespread values, and 5 clusters for the other end use categories such as dishwasher and toilet because of their concentrated values. This step is undertaken only once for the recognition of all of the unclassified groups. The following example illustrates the process of obtaining these parameters for the dishwasher category based on all of the classified dishwasher events achieved through the existing non-adaptive single event analysis module (Table 1). It should be noted that the distribution probability for each feature is determined by using Equation 4. 𝐷𝑃𝑖 = 𝐹𝑖 𝑁𝑖 (4) Where: • 𝐷𝑃𝑖 is the distribution probability for category 𝑖, • 𝐹𝑖 is the frequency vector of category 𝑖 , and • 𝑁𝑖 is the number of events already classified into category 𝑖. [INSERT TABLE 1] Step 2: Determine the volume (𝑣), the most frequent flow rate (𝑞𝑓) and the duration (𝑡) of each event in the subjected group. From which, the representative volume, duration and most frequent flow rate for this group, denoted as 𝑉𝑟𝑒𝑝 , 𝐹𝑟𝑒𝑝 and 𝑇𝑟𝑒𝑝, can be obtained using the event probability histogram method with 5 clusters. As mentioned prior, the representative values are those that have the highest frequency. Steps 2 to 5 are illustrated in Table 2 over an example that is used to determine the aggregate likelihood of the above mentioned group to be assigned to dishwasher category. Step 3: Compare 𝑉𝑟𝑒𝑝 , 𝐹𝑟𝑒𝑝 and 𝑇𝑟𝑒𝑝 with the distribution of each end-use category to obtain the corresponding distribution probabilities, which are known as 𝑉𝑆, 𝑇𝑆 and 𝑄𝑓𝑆 and are presented in Equation 3. As presented in Table 2, the value of 𝑉𝑟𝑒𝑝 (4.64) falls into the range of 4.25-4.8 for the dishwasher (Table 1); therefore, the value for 𝑉𝑆4 is determined to be 42.8, which is the 12 corresponding distribution probability of this range (9/21). In the same way, the values of 𝑇𝑆4 and 𝑄𝑓𝑆4 are determined to be 26.5 and 66.7, respectively. However, if there is any category 𝑖 that has no classified single event achieved from the non-adaptive analysis process, which is very unlikely to happen, then the determination of the flow rate range and the corresponding distribution probability, as required in the previous step, cannot be undertaken for this category. As a result, the values of 𝑉𝑆, 𝑇𝑆 and 𝑄𝑓𝑆 will be considered to be 1, which means that the HMM likelihood of this group to be categorised as 𝑖 will not be magnified by any factor. In this case, the final recognition accuracy is only slightly affected because the verification process presented in section 5 has shown that the HMM method alone can explain most of the unclassified events correctly. Step 4: Determine the representative HMM likelihood for the subjected group, which is the HMM score that has the highest frequency when evaluated using a histogram method. Step 5: Determine the aggregate likelihood of the unclassified group. In this example, with the achieved values for 𝑉𝑆4, 𝑇𝑆4, 𝑄𝑓𝑆4 and 𝐻𝑀𝑀𝑆4, the overall likelihood of the group subjected to be classified as dishwasher is obtained through the utilisation of Equation 3. [INSERT TABLE 2] Following the same process from step 1 to 5, the likelihoods of this group to be assigned to other end uses such as shower, faucet, clotheswasher, dishwasher, toilet, bathtub and irrigation can be obtained. In this example, the subjected group was eventually assigned to the dishwasher category because the aggregate likelihood of this end use attains the maximum value. 4.4.2. Identification of a new end-use category In the context of this study, if the representative likelihood of the subjected group is 20 times less than the HMM threshold value of the category to which the group is assigned (i.e. 𝐻𝑀𝑀𝑆𝑖 < 0.05 𝑇𝑉𝑖), then this group is considered to be a new end-use category. This ratio of the threshold value was based on the analysis of several houses in Melbourne; however, for 13 identifying an appropriate value for the most accurate identification of a new end-use category, further research needs to be undertaken across a number of new regions to establish more rigorously founded criteria. If the prototype database expansion for a new end use category is proving difficult due to scarce examples of its use in households, then there is an opportunity to also supplement established prototypes with manually inputted ones that have been verified through the use of customer diaries or through fixture level sensors. The newly identified end use categories will then be incorporated into the existing resource so that the future system performance can be improved. 4.5. Analysis of ungrouped events The analysis of ungrouped events is conducted in a similar way to that for grouped events; however, modifications have been made to the formula for the establishment of the aggregate likelihood of one event, which is presented in Equation 5 below: 𝐿𝐿𝑢𝑖 = 𝑉𝑆𝑢𝑖 𝑇𝑆𝑢𝑖 𝑄𝑓𝑆𝑢𝑖 exp(𝑃𝑡) 𝐻𝑀𝑀𝑆𝑢𝑖 (5) Where: • 𝑉𝑆𝑢, 𝑇𝑆𝑢, 𝑄𝑓𝑆𝑢 and 𝐻𝑀𝑀𝑆𝑢 are the volume score, the duration score, the most frequent flow rate score and the representative HMM score of the subjected event. • 𝑃𝑡 is the probability index of event occurrence time of a day that was derived from the already classified events of the subjected household as explained in step 4 below. An in-depth assessment on the training database has found that most of the ungrouped events belong to the shower, faucet, abnormal toilet, bathtub and irrigation end-use categories, because they usually have highly variable patterns that cannot be gathered together. Therefore, the aggregate likelihood of an ungrouped event to be categorised as dishwasher and clotheswasher (i.e. 𝐿𝐿𝑢3 and 𝐿𝐿𝑢4) is zero. The achievement of 𝑉𝑆𝑢, 𝑇𝑆𝑢 and 𝑄𝑓𝑆𝑢 is similar to that for 𝑉𝑆, 𝑇𝑆 and 𝑄𝑓𝑆; however, 𝑉𝑟𝑒𝑝, 𝐹𝑟𝑒𝑝 and 𝑇𝑟𝑒𝑝 in this case are the volume, mode frequent flow and duration of the subjected unclassified event. To obtain 𝐿𝐿𝑢𝑖 for each ungrouped event, the following process is conducted: Step 1: Determine the distribution probability of all of the end-use categories based on all of the classified events, which also include the ones achieved from the analysis of the grouped events section (e.g., if there are 34 shower events obtained using the existing non-adaptive module and 20 events obtained from the grouped event analysis process, then the 14 determination of the range and distribution probability of the volume, duration and most frequent flow rate for this category is based on a total of 54 classified shower events). This process is also required once for the recognition of all of the ungrouped events. Again, histogram with 5 clusters will be applied to dishwasher and toilet, and 10 clusters to the other end use categories. Step 2: Determine the volume, the most frequent flow rate and the duration of the subjected event (i.e. 𝑉𝑟𝑒𝑝, 𝐹𝑟𝑒𝑝 and 𝑇𝑟𝑒𝑝) Step 3: Compare the volume, duration and mode flow rate of the subjected event with the distribution probability of each end-use category, to obtain the corresponding values for 𝑉𝑆𝑢, 𝑇𝑆𝑢 and 𝑄𝑓𝑆𝑢 Step 4: With the achievement of classified single events from all of the previous steps, a time of day probability index (𝑃𝑡) will be determined, which is shown in Figure 7 as an example for the tested home. It should be noted that the time of day probability index is suggested as an additional criterion for the classification process and is less critical than the other ‘classification’ inputs; therefore, it’s weighting contribution towards the overall likelihood score has been purposely kept limited. In this study, an exponential function is selected to limit the range of this magnifier factor to between 1 and 2.71, which corresponds to a time probability of 0% and 100%. For example, if an event occurred between 6-7 am, then its time-of-day (𝑃𝑡) probability to be classified as a shower is 0.0951 as presented in Figure 7 below. Step 5: Determine the HMM likelihood of the subjected event to be assigned to all of the end-use categories (i.e. 𝐻𝑀𝑀𝑆𝑢). Once all of the required parameters have been obtained, the aggregate likelihood of the unclassified data can be determined by using Equation 5. [INSERT FIGURE 7] 4.6. Adaptive learning procedure With the discovery of additional single water end use events from the above described adaptive analysis processes, the classification model can be enhanced through incorporating these new prototype event variants into the existing prototype database. This procedure will ensure that the prototype registry continuously grows and evolves to understand the new variants of water 15 end use flow signatures. The model updating process is described in a series of steps presented in Appendix 1 to determine a new HMM model )(λ . 5. Model calibration and verification The adaptive classification model has been verified against three random homes in a new Melbourne region using the proposed technique presented above. In this section, a comparison between the existing model developed for an SEQ application (i.e., a non-adaptive model) and the new model has also been undertaken by getting the SEQ formulated model to recognise three independent homes from the different city of Melbourne, Australia. It should be noted that homes 2 and 3 in this verification process have the presence of an evaporative air conditioner, which is a new end-use category that was not included in the existing prototype database since the region of SEQ has sufficient air humidity for running air conditioners. Detailed testing on these homes is displayed in Tables 3, both in terms of the number and respective volume of the events (denoted as ‘N’ and ‘V’, respectively). [INSERT TABLE 3] Applying the developed adaptive model to analyse the three Melbourne homes, the results were very promising, with an average recognition accuracy in terms of volume of at least 90% for the faucet, clotheswasher, dishwasher, toilet and irrigation (i.e., 90.1% 91.6%, 90.9%, 92.8% and 100%, respectively) and 81.5% for the shower. In terms of individual event recognition accuracy, it was lower (i.e. 81.5%, 89.7%, 89.1%, 91.4% and 86.2%) for the first five categories listed in Table 3. The classification of the bathtub end use category still remains a challenging problem, for which the accuracy indices are low both in terms of the volume (20.6%) and the specific events (40.5%) recognised. However, when considering the overall recognition accuracy for all end use events occurring within these three homes over the two week period, 85.7% of them were correctly classified using this autonomous recognition process. This is commendable given that these households were in a different region to the SEQ originated training dataset and new end use categories needed to be autonomously created by the adaptive model. Further testing also shows a considerable improvement of the new adaptive model compared to the existing model, which was built using SEQ training data (Figure 8). An increase in 16 accuracy has been experienced in most of the end-use categories, including shower, faucet, clothes washer, dishwasher and toilet. This enhancement can be explained by the fact that the original model for the SEQ region included the fixed boundary conditions for some of the physical characteristics that were derived from the SEQ database (e.g., the minimum shower volume is 7 litres) or applied the SEQ time-of-day probability information obtained from the SEQ training dataset. Therefore, the application of these features in Melbourne city has caused a minor reduction in categorisation accuracy. Moreover, when testing Home 2 and 3 using the SEQ-based model, most of the evaporative air conditioner events were misclassified as faucets or toilet, which resulted in a low accuracy in these end uses (i.e., when threshold values are utilised, the proposed model assigns evaporative air conditioner events into a “new end-use category” that does not affect the recognition accuracy of the other categories). [INSERT FIGURE 8] With a gradual expansion of the database through the adaptive learning process, the newly developed model can perform effectively in any new region without the requirement of a manual calibration if no end-use category exists. The new threshold values that are derived from the expanded database will identify the large majority of events that can be classified using the non-adaptive model, for which the effectiveness has been verified in Nguyen et al., (2013a, 2013b); the remaining variant events are left to the adaptive model. In case there is the presence of new end-use categories, new boundary values that determine whether a event belongs to an existing category or a new category should be re-identified and applied, as mentioned in section 4.4.2. 6. Residential water end use categorisation software application 6.1. Application outputs Through integrating and codifying the analytical processes contained in the single, combined and adaptive learning modules, a software application could be formulated that would be able to autonomously categorise remotely collected residential water consumption data received from smart meters into a repository of end use events. This software application offers different types of results presentations. Figure 9a shows the main interface, which provides important information to the customer, such as a summary of the classified volume of each end-use category during a specific period of time, which is supported by a detailed description of the start time, end time, volume, duration, maximum flow rate and most frequent flow rate of each 17 classified event. These results are directly achieved from the non-adaptive and adaptive analysis process presented in previous sections employing different mathematical techniques, such as HMM, DTW, Gradient Vector Filtering, time-of-day probability function or threshold values, etc. Each classified event is then plotted in a time series scale with different colours corresponding to different end-use categories. The application also allows all analysed results to be exported to an Excel file so that various statistical calculations and studies can be performed on the raw flow rate series of each individual water end use event for a particular household. Apart from the main interface, the analysis outcomes are also presented in terms of a pie chart showing the percentage of each contributing category, to give the user an instant overview of the end-use breakdown, and a bar chart that presents the average household water consumption in terms of litres per household per day (Figure 9b), which is achieved by taking the water consumed in each category divided by the total volume of water consumption in the analysis period. It should be noted that the current prototype software application and the outputs presented here are for the purposes of illustrating its functionality. Ultimately, the researchers seek to make the software embedded into the water businesses water consumption data collection repository, processing collected water use data from its entire customer fleet of meters autonomously and delivering that processed end-use information in a user-friendly form back to the customer via the web to their computer or phone. [INSERT FIGURE 9] 6.2. Optional manual adjustment functionality The present software prototype does not have the level of autonomous end use categorisation accuracy (>95%) considered necessary for commercial application. Therefore the present software prototype allows users to manually modify analysis decisions, such as changing, splitting or merging classified events. Any time that the user clicks on a classified event in the graphical figure, all of the physical characteristics of the event will be presented, with an option for manual modification (Figure 10a). The editing process is clearly demonstrated in Figure 10b, and once it has been finished, all of the edited events can be optionally updated into the existing database to improve classification accuracy in the future. This function was deemed a necessary inclusion in the present prototype software application, but ultimately this function 18 will be made redundant as more training data from a number of different regions enables the software to function with almost faultless accuracy. This is a key area of focus of the authors. 6.3.Daily end use diurnal demand functionality Another useful output of a water end use study is the daily end use diurnal demand graph (Figure 10c). This graph can be automatically created from the repository of classified end use events, and is highly beneficial to both consumers and water businesses seeking to better understand how residential water consumption is being used, at an end use level, across various significant days of the year (i.e. average weekday, average weekend day, peak day, average day peak month). This data is particularly useful for water infrastructure planning (e.g. water pipe network augmentation planning) as it informs network modelling engineers of the peak demand flow rates as well as the key end uses contributing to that peak demand (i.e. evening shower use combined with clothes washer contributes to morning peak). [INSERT FIGURE 10] 6.4. Benefits of the software application Future research aims to further improve the current software through creating a user-friendly presentation of the produced information, which can be interfaced by both the customer and water business professionals through a computer or smart phone accessible web-portal (Figure 11). For the customer, clever reports and diagrams on the following, as a minimum, will be designed: (a) daily water usage broken down on an end-use level for the past week and the average for the past month; (b) water end-use comparisons against a customer set budget, other households and best practice benchmarks; and (c) leak alerts and descriptions on likely leak types, with guidance on corrective actions. For the water business professional, automatically generated reports on the following will be created as a minimum: (d) water end-use averages for single or multiple properties from different suburbs (e.g., compare lower and higher socio-economic suburbs); (e) aggregated daily diurnal demand patterns and contributing end uses for specified days (i.e., peak day); and (f) water demand forecasting reports for selected regions based on just-in-time water end-use data provided. [INSERT FIGURE 11] 19 7. Conclusions, limitations and future directions The development of an autonomous and intelligent system for residential water end-use classification will be of significant benefit to both water consumers and utilities. It allows individual consumer to log into their user-defined water consumption program to view their daily, weekly, and monthly consumption tables, as well as charts on their water demand across major end use categories (e.g. leaks, clothes washer, shower, irrigation). It can also rapidly alert customers of leak events so that they can immediately be addressed rather than waiting for the present slow feedback process from the traditional metering technology (e.g. quarterly bill). This system will also help water businesses by rapidly providing water end-use reports of any desired property or suburb, thereby empowering them to develop more targeted conservation programs in water scarcity periods, improved water demand forecasting and optimised pipe network modelling. All of these opportunities can be realised by the proposed prototype expert system and associated software application integrating the single (Nguyen et al. 2013a), combined (Nguyen et al. 2013) and adaptive (current paper) analysis modules for categorising residential water flow data into end use event categories. The single event disaggregation model was comprehensively described in (Nguyen et al., 2013a), which employed HMM, DTW, event time-of-day probability function and other physical characteristics to assign an unclassified event into an appropriate water end use category. The formulation of a combined event analysis module was the logical second stage of research since a reasonable proportion of residential water consumption occurs simultaneously (Nguyen et al., 2013b). This modules utilises a hybrid combination of HMM, gradient vector filtering method, threshold values and various physical features to disaggregate combined events into several classified single events. The present analytical stage of this overall research project, which is the focus of this paper, had the goal to ensure that the model could adapt and self-learn variant water flow signature characteristics in different cities and regions without re-training or calibrating with that regions dataset. Through the application of HMM, DTW, threshold values and other physical features, the adaptive function has been successfully developed which allows the system to effectively analyse data from any new residential house in different regions. A verification process undertaken to assess the model capability displayed very promising outcomes with most of the 20 achieved recognition accuracies for all end use categories being approximately 90%. After this function was completed, a user-friendly automatic flow trace analysis application has been developed which integrates all available analysis modules into one comprehensive residential water end use event pattern recognition system. The only limitation with the adaptive module was its lower recognition accuracy for bathtub and irrigation events. While the present prototype software application is sufficient for conducting automated residential water end use analysis with an average of 80-90% recognition accuracy, the system would still require human input to achieve very high levels of recognition accuracy. Ultimately, accuracy in the order of 95-100% is required for commercially released software. Therefore, a future research program has been proposed by the researchers, which includes the following key tasks: 1) Apply genetic algorithms to explore the optimum states for the existing HMM classifier which may enhance accuracy and efficiency. 2) In addition to HMM and DTW, an Artificial Neural Network (ANN) with a back-propagation algorithm will be incorporated to analyse physical characteristics of each collected event (i.e. volume, duration and flow rate), which will likely have a significant impact on the recognition process accuracy. 3) Apart from the water end use event time-of-day likelihood functions that have been previously applied, other decision support parameters (i.e. social and demographic information) will be examined to improve accuracy. 4) Further train the analysis system using new water end use databases from other regions (i.e. Melbourne, Adelaide, United Kingdom) in order to improve its accuracy and sufficiently cater for different end use categories (e.g. evaporative air conditioners). References Baum, L. E. Petrie, T. 1966. Statistical inference for probabilistic functions of finite state Markov chains. The Annals of Mathematical Statistics 37 (6): 1554–1563. DOI:10.1214/aoms/1177699147. Baum, L. E. Petrie, T. Soules, G. Weiss, N. (1970). A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics 41: 164. DOI:10.1214/aoms/1177697196. http://en.wikipedia.org/wiki/Digital_object_identifier http://dx.doi.org/10.1214%2Faoms%2F1177697196 21 Beal, C. and Stewart, R.A. (2011). South East Queensland residential end use study: final report. Technical Report No. 47 for Urban Water Security Research Alliance. Griffith University and Smart Water Research Centre, January 2012. Beal, C., Stewart, R.A., Huang, T.T., Rey, E. (2011a). SEQ residential end use study. Journal of the Australian Water Association 38 (1), 80-84. Beal, C.D. and Stewart, R.A. (2013) Identifying Residential Water End Uses Underpinning Peak Day and Peak Hour Demand. ASCE Journal of Water Resources Planning and Management Carragher, B.J., Stewart, R.A., Beal, C.D. (2012) Quantifying the influence of residential water appliance efficiency on average day diurnal demand patterns at an end use level: A precursor to optimised water service infrastructure planning. Resources Conservation and Recycling 62, 81-90. Chien, J.-T., Wang, H.-C. (1997). Telephone speech recognition based on Bayesian adaptation of hidden Markov models. Speech Communication, 22, 369-384. Cho, W., Lee, S.W., and Kim, J.H. (1995). Modelling and recognition of cursive words with HMM. Pattern Recognition, 28(12), 1941-1953. Ephraim, Y., Merhav, N. (2002). Hidden Markov processes. Information Theory, IEEE Transactions on, 48, 1518-1569. Ghahramani, Z., Jordan, M. I. (1997). Factorial Hidden Markov Models. Machine Learning 29 (2/3): 245–273. DOI:10.1023/A:1007425814087. Loh, M. and Coghlan, P. (2003). Domestic water use study in Perth, Western Australia 1998 to 2000. Water Corporation of Western Australia. Manmatha, R. and Srimal, N. (1999) Scale Space Technique for Word Segmentation in Handwritten Manuscripts. In: Proc. 2nd Int’l Conf. on Scale-Space Theories in Computer Vision, Corfu, Greece, September 26-27, 1999, pp. 22-33. Manmatha, R. and Rath, T. M. (2002): Word Image Matching Using Dynamic Time Warping. Multi-Media Indexing and Retrieval Group, Center for Intelligent Information Retrieval, University of Massachusetts. Technical Report Makki, A. Stewart, R.A. Panuwatwanich, K. and Beal, C. (2011) Revealing the determinants of shower water end use consumption: enabling better targeted urban water conservation strategies. Journal of Cleaner Production, DOI: 10.1016/j.jclepro.2011.08.007 22 Marquez, J.P. (2001) Pattern recognition: concepts, methods and applications. Springer, ISBN: 3-540-422978. Muller, M. (2007). Information Retrieval for Music and Motion, Chapter 4 . Springer, ISBN 978-3-540-74047-6. Myers, C. S., Rabiner, L. R. (1981). A comparative study of several dynamic time-warping algorithms for connected word recognition. The Bell System Technical Journal, 60, 1389-1409. Nguyen, K.A., Zhang, H., Stewart, R.A. (2011). Application of Dynamic Time Warping algorithm in prototype selection for the disaggregation of domestic water flow data into end use events. Proceeding of the 34th World Congress of the International Association for Hydro-Environment Engineering and Research, pp2137-2144, Brisbane, Australia, 26 June-1 July, 2011. Nguyen, K.A., Zhang, H., and Stewart, R.A. (2013a). Development of an intelligent model to categorise residential water end use events. Journal of Hydro-Environment Research, 10.1016/j.jher.2013.02.004 Nguyen, K.A., Zhang, H., and Stewart, R.A. (2013b). Intelligent pattern recognition model to automate the categorisation of residential water end-use events. Journal of Environment Modelling and Software, [under review]. Pearson, K. (1895). Contributions to the mathematical theory of revolution II: Skew variation in homogeneous material. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 186: 343–326. Rabiner, L., Juang, B. (1993). Fundamentals of speech recognition. Prentice-Hall, Inc., Chapter 4. Rabiner, L. R. 1990. A tutorial on hidden Markov models and selected applications in speech recognition. Readings in speech recognition. Morgan Kaufmann Publishers Inc. Sakoe, H., Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition, Acoustics, Speech and Signal Processing,19 IEEE Transactions on, vol. 26, no. 1, pp. 43{49, 1978. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=1163055 Satish, L., Gururaj, B. I. (2003). Use of hidden Markov models for partial discharge pattern classification. IEEE Transactions on Dielectrics and Electrical Insulation. Starner, T., Pentland, A. (1995). Real-Time American Sign Language Visual Recognition From Video Using Hidden Markov Models. Master's Thesis, MIT, Program in Media Arts. http://en.wikipedia.org/wiki/Special:BookSources/9783540740476 http://en.wikipedia.org/wiki/Special:BookSources/9783540740476 http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=212242 http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=212242 http://www.cc.gatech.edu/~thad/p/031_10_SL/real-time-asl-recognition-from%20video-using-hmm-ISCV95.pdf http://www.cc.gatech.edu/~thad/p/031_10_SL/real-time-asl-recognition-from%20video-using-hmm-ISCV95.pdf 23 Stewart, R.A., Willis, R.M., Giurco, D., Panuwatwanich, K., and Capati, B. (2010). Web-based knowledge management system: linking smart metering to the future of urban water planning. Australian Planner, 47(2), 66-74. Stewart, R.A., Willis, R.M., Panuwatwanich, K. and Sahin, O. (2011). Showering behavioural response to alarming visual display monitors: longitudinal mixed method study. Behaviour & Information Technology, DOI: 10.1080/0144929X.2011.577195, pp. 1-17. Tapia, E., Intille, S.S., Larson, K. (2004). Activity Recognition in the Home Using Simple and Ubiquitous Sensors. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS,vol. 3001, pp. 158–175. Springer, Heidelberg Willis, R. M., Stewart, R.A., Panuwatwanich, K., Capati, B., Giurco, D. (2009a). Gold Coast domestic water end-use study. Water. Journal of Australian Water Association. 36(6), 9-85. Willis, R.M., Stewart, R.A., Panuwatwanich, K., Williams, P.R., Hollingsworth, A.L., (2011a). Quantifying the influence of environmental and water conservation attitudes on household end-use water consumption. Journal of Environmental Management. doi:10.1016/j.jenvman.2011.03.023 Willis, R.M. Stewart, R.A., Giurco, D., Talebpour, M.R., Mousavinejad, A. (2011b) End use water consumption in households: impact of socio-demographic factors and efficient devices. Journal of Cleaner Production, in-press, doi:10.1016/j.jclepro.2011.08.006 Appendices Appendix 1 Step1: Retrieve the initial state probability πi, state transition probability aij, and observation probability bj(ok) of the current model to use as starting probabilities. It should be noted that the subscript (i) here is used to indicate a state in HMM model training algorithm, not showing the end use category as in previous sections. Step 2: Based on the above values, determine the following parameters: • αt(i) : the probability of flow rate o1 through to ot and being in state i at time t ( iq t = ) given the HMM ( λ ) )|,...()( 21 λα iqoooPi ttt == (6) 24 • βt(i) : the probability of flow rate ot+1 through to oT, given the HMM ( λ ) and given that the model is currently in state i at time t ( iq t = ) ),|()( 21 λβ iqoooPi tTttt == ++  (7) • γt(i) : the probability of being in state i at time t given a water flow sequence ( O ) and HMM ( λ ) ∑ = = N j tt tt t jj ii i 1 )()( )()( )( βα βα γ (8) • ξt(i,j) : the probability of being in state i at time t, and in state j at time t+1, given a water flow sequence ( O ) and the HMM ( λ ). 1 ( , , | ) ( , ) ( | ) t t t P q i q j i j P λ ξ λ += == O O (9) where 1 1 1 1 ( | ) ( ) ( ) ( ), N N t kp p t t k p P k a b o pλ α β+ + = = = ∑∑O and 1 1 1( , , | ) ( ) ( ) ( )t t t ij j t tP q i q j i a b o jλ α β+ + += = =O Step 3: Calculate the following parameters for each water flow sequence ( O ) ∑ = T t t i 1 )(γ : expected number of times in state i for the water flow sequence ( O ) ∑ − = 1 1 )( T t t iγ : expected number of transition from state i for the water flow sequence ( O ) ∑ − = 1 1 ),( T t t jiξ : expected number of transition from state i to state j for the flow rate sequence ( O ) 25 Step 4: With the calculated values in step 3, the probabilities values of πi , aij and bj(ok) can be updated by performing Equations 10 to12: )( 1 ii γπ = (10) (11) (12) It should be noted that the above calculations of πi , aij and bj(ok) will be updated every time a new event is introduced to the existing HMM model for training. At the end of this process, a new HMM model ( λ ) will be achieved to cover both existing and new database. ∑ ∑ = = = = T t t T t t kj j j ob kt 1 such that 1 )( )( )( γ γ vo ∑ ∑ − = − == 1 1 1 1 )( ),( T t t T t t ij i ji a γ ξ 26 Figure captions Figure 1 Overview of proposed autonomous and intelligent water management system Figure 2 Example of frequency histogram with different number of clusters 27 Figure 3 Flowchart of the water end-use classification process Figure 4 Flowchart of adaptive model sequence 28 Figure 5 Adaptive model development Figure 6 Example of an unclassified group of events 29 Figure 7 Example of time of day probability for one particular home Figure 8a Adaptive and non-adaptive model comparison in terms of number of event 30 Figure 8b Adaptive and non-adaptive model comparison in terms of volume Figure 9a Software application main interface 31 Figure 9b Software application water end use pie and bar chart outputs Figure 10a Optional manual override reclassification of system classified event 32 Figure 10b Optional manual splitting of combined event into single event categories Figure 10c Software application output of water end use daily diurnal demand pattern 33 Figure 11 Proposed web interface application to customer and water utility 34 Table 1 Example of probability distribution for the dishwasher end use category Features Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Volume range (L) Frequency (No.) 2.62-3.17 4 3.17-3.71 1 3.71-4.25 6 4.25-4.8 9 4.8-5.3 1 Distribution probability (%) 19.1 4.8 28.5 42.8 4.8 Duration range (s) Frequency (No.) 80-93 2 93-106 4 106-119 1 119-132 6 132-145 9 Distribution probability (%) 9.5 19.1 4.8 28.5 42.8 Mode flow range (L/min) Frequency (No.) 2.3-2.7 14 2.7-3.1 1 3.1-3.5 1 3.5-3.9 1 3.9-4.3 4 Distribution probability (%) 66.7 4.7 4.7 4.7 19.2 35 Table 2 Determination of the aggregate likelihood of the grouped events to be classified to dishwasher category Event 1 2 3 4 5 6 7 8 9 10 11 Representative values 𝑣 (L) 4.63 4.57 4.59 4.63 4.69 4.75 4.71 4.79 4.92 4.76 5.18 4.64 Step 2 𝑡 (s) 125 125 125 125 130 130 130 135 135 135 140 126.5 𝑞𝑓 (L/min) 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 𝑉𝑠,4 (L) 42.8 26.5 66.7 Step 3 𝑇𝑠,4(𝑠) N/A 𝑄𝑓,4 (L/min) Step 4 𝐻𝑀𝑀4 (x10 -8) 7.4 54.5 48.9 16.2 75.8 19.9 6.5 2.1 4.0 3.3 13.2 9.5 Step 5 𝐿𝐿4 (x10 -4) 71.86 N/A 36 Table 3 End use event categorisation accuracy (%) using adaptive and non-adaptive models Adaptive model accuracy (%) Non-adaptive model accuracy (%) End use category Home 1 Home 2 Home 3 Average Home 1 Home2 Home 3 Average V N V N V N V N V N V N V N V N Shower 88.9 78.5 76.4 80.6 79.2 85.4 81.5 81.5 76.9 73.6 81.5 75.8 79.2 82.3 79.2 77.2 Faucet 97.0 93.4 77.3 79.3 95.2 96.3 90.1 89.7 93.6 90.4 68.9 78.3 86.9 84.3 83.1 84.3 Clotheswasher 96.7 90.1 86.3 81.8 91.8 95.5 91.6 89.1 85.2 83.1 82.6 81.8 84.2 85.5 84.0 83.4 Dishwasher 96.7 94.3 85.1 88.4 0 0 90.9 91.4 85.6 83.3 78.5 80.4 0 0 82.1 81.8 Toilet 91.4 87.8 89.8 86.5 97.1 84.4 92.8 86.2 78.6 75.3 70.2 74.6 72.1 75.4 73.6 75.1 Irrigation N/A N/A N/A N/A 100 100 100 100 N/A N/A N/A N/A 100 100 100 100 Bathtub 20.6 40.5 N/A N/A N/A N/A 20.6 40.5 20.6 40.5 N/A N/A N/A N/A 20.6 40.5 Note: Testing end use event categorisation accuracy by V (volume of end use) and N (number of end use events) correctly classified. 3.1. Collected data for the study