1 
 

Citation: Nguyen, K.A. Stewart, R.A. Zhang, H. (2013) An autonomous and intelligent expert 
system for residential water end-use classification, Expert Systems with Applications, 
http://dx.doi.org/10.1016/j.eswa.2013.07.049. 

 
AN AUTONOMOUS AND INTELLIGENT EXPERT SYSTEM FOR RESIDENTIAL 

WATER END-USE CLASSIFICATION 

Abstract 

Intelligent metering technology combined with advanced numerical techniques enable a 

paradigm shift in the current level of water consumption information provision that is available 

to the customer and the water business. The aim of this study was to develop an autonomous 

and intelligent system for residential water end-use classification that could interface with 

customers and water business managers via a user-friendly web-based application. Water flow 

data collected directly from smart water meters includes both single (e.g., a shower event 

occurring alone) and combined (i.e., an event that comprises several overlapping single events) 

water end use events. The authors recently developed intelligent algorithms to solve the 

complex problem of autonomously categorising residential water consumption data into a 

registry of single and combined events using a hybrid combination of techniques including 

Hidden Markov Model (HMM), Dynamic Time Warping (DTW) algorithm, time-of-day 

probability functions, threshold values and various physical features. However, the issue still 

remained, which is the focus of this current paper, on how to integrate self-learning 

functionality into the visioned expert system, in order that it can learn from newly collected 

datasets from different cities, regions and countries, to that collected for the training data. Such 

versatility and adaptive capacity is essential to make the expert system widely applicable. 

Through applying alternate forms of HMM and DTW in association with a frequency analysis 

technique, a suitable self-learning methodology was formulated and tested on three 

independent households located in Melbourne, Australia with a prediction accuracy of between 

80-90% for the major end-use categories. The three principle flow data processing modules 

(i.e. single and combined event recognition and self-learning function) were integrated into a 

prototype software application for performing autonomous water end-use analysis and its 

functionality is presented in the latter sections of this paper. The developed expert system has 

profound implications for government, water businesses and consumers, seeking to better 

manage precious urban water resources. 


2 
 

Key words: water end-use event, water micro-component, residential water flow trace 

disaggregation, hidden markov model, dynamic time warping algorithm, gradient vector 

filtering, adaptive analysis, adaptive function, water demand management 

1. Introduction 

Following a long-standing drought for the second half of the last decade across most of 

Australia, most capital cities introduced a portfolio of water demand management strategies 

and constructed capital intensive rain-independent bulk supply sources to ensure the provision 

of a secure water supply (Willis et al., 2009a). Residential water consumption is often 

dependent on the water using fixtures or appliances within a dwelling, the household makeup, 

the regional location and a plethora of socio-demographic influences. A study of end-use water 

consumption aids water planners and consumers to identify where and when water is used in a 

household and hence, assists in driving proactive reductions in consumption (Loh and 

Coghlan, 2003; Stewart et al. 2010; Makki et al., 2011). However, the existing water end-use 

classification techniques require an extensive use of human resources to collect a combination 

of water use behaviours and appliance/fixture stock inventory data through a household audit 

followed by 2-3 hours of analyst time for each home (Stewart et al., 2011; Beal and Stewart, 

2011). Presently, water end use or micro-component studies are restricted to the research 

domain, since it is not economically viable to complete citywide studies due to resource 

intensity of the flow data classification process. Intelligent and autonomous end use 

classification firmware is required along with bold large-scale roll-outs of high commercially 

available high resolution smart water meters in order to bring this level of water consumption 

information to the masses. Currently, an increasing number of smart water metering 

technologies have been introduced to the market. Such metering devices embrace two distinct 

elements: meters that use new technology to capture water use information and communication 

systems that can capture and transmit real-time water use information (Stewart et al., 2010). 

These forms of smart metering technology can provide total consumption data to the customer 

and utility at high levels of resolution; however, they fail to disaggregate this data into its 

end-use use categories.  

 
In the present study, an attempt to automate the domestic water end-use classification process 

and, thus, to enhance current practices in the urban water industry is required, and a robust 

hybrid model that employs HMM, DTW and event probability techniques is developed. The 

proposed system will allow individual consumers to log into their user-defined water 


3 
 

consumption web page to view their daily, weekly, and monthly consumption tables as well as 

charts on their water demand across major end-use categories (e.g., leaks, clothes washer, 

shower, irrigation). This system can rapidly alert customers of leak events so that they can 

immediately be addressed rather than waiting for the present slow feedback process from the 

traditional metering technology (e.g., the quarterly bill). The system will also benefit water 

businesses by rapidly providing water end-use reports of any desired property or suburb, 

thereby empowering them to develop more targeted conservation programs in water scarcity 

periods (e.g. Willis et al., 2011a; 2011b), improved water demand forecasting (e.g. Makki et 

al., 2011) and optimised pipe network modelling (e.g. Carragher et al., 2012; Beal and Stewart, 

2013). Figure 1 summarises below the three key stages in the development of this system: 

• Stage 1: Develop a non-adaptive intelligent model that autonomously disaggregates 

collected water flow trace signatures that were collected from the intelligent water 

meters into a categorised registry of water end-use events (Nguyen et. al., 2013a, 

2013b). 

• Stage 2: Equip the model with adaptive capabilities that enable it to interpret untrained 

water end-use signature traces, thereby allowing it to adapt to new situation context 

(e.g. different city to training dataset). 

• Stage 3: Develop an intelligent and user-friendly expert system and prototype firmware 

for use by consumers and businesses. 

 
[INSERT FIGURE 1] 

 
2. Background 
2.1. Existing water metering process and new paradigm 

Water consumption readings are usually recorded manually on a quarterly or half yearly basis. 

Under most situations, a whole year’s worth of water consumption data is described by only 

two to four data points in the water businesses billing system. Conventional water meters count 

each kilolitre of water as it passes through the meter and do not have the ability to record when 

(i.e., the time of day) and where the consumption takes place (e.g., washing machine, leaks) 

(Stewart et al., 2011). These systems produce limited and delayed water consumption 

information. The current water metering system does not typically provide real-time or 

continuous/frequent water consumption data, and in cases where it does, it does not provide a 

sufficient level of data resolution to allow water end-use event categorisation. While real-time 


4 
 

or near real-time water consumption data provisioning is now commercially viable with 

current smart metering technology, there is presently no firmware that can autonomously 

disaggregate this flow data into the ‘richer’ water end use categories of consumption. Until, 

such firmware is developed, powerful water end use information will be contained to 

expensive research studies (e.g. Beal and Stewart, 2011).  

 
2.2.  Intelligent system development using various pattern recognition techniques 
To overcome these limitations, intelligent metering technology is united with advanced pattern 

recognition techniques to enable a paradigm shift in the current level of water information 

provision available to the customer and water business. The aim of this project is to develop an 

autonomous and intelligent system for residential water end-use classification through the 

employment of various mathematical techniques, namely HMM, DTW and frequency analysis 

as presented below. 

 
A hidden Markov model (HMM) is a statistical Markov model in which the system being 

modelled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be 

considered to be the simplest dynamic Bayesian network, which is one of the most popular 

techniques in the field of hand writing and speech recognition (Ephraim and Merhav, 2002). 

Principal theories and typical applications of this technique have been presented in Baum and 

Petrie (1966), Starner and Pentland (1995), Baum et al (1970), Cho et al (1995), Ghahramani 

and Jordan(1997), Chien and Wang (1997), Satish and Gururaj (2003) or Tapia (2004). In this 

study, HMM was utilised as the main classifier for water end use classification decision 

making. 

Another important mathematical tool is the Dynamic time warping (DTW) algorithm, which is 

a popular method for measuring the similarity between two time series of different lengths. In 

general, this task is performed by finding an optimal alignment between two series with certain 

restrictions. The sequences are extended or shortened in the time dimension to determine a 

measure of their similarity independent of certain non-linear variations in the time dimension 

(Myers and Rabiner, 1981). This technique has been widely applied in prototype selection (e.g. 

Nguyen et al., 2011), pattern recognition (e.g. Myers and Rabiner, 1981; Muller, 2007; Rabiner 

and Juang, 1993; Sakoe and Chiba, 1978; Manmatha and Srimal, 1999; and Marquez, 2001) or 

word image searching (Manmatha and Rath, 2002). DTW played an important role in this 


5 
 

study because it was utilised to the task of grouping similar unclassified events together to 

prepare for adaptive analysis.  

Probability analysis was also applied in this study. For the purpose of this study, frequency 

histogram data distributions were formulated from the training data to examine the likelihood 

of event characteristics occurring. A histogram comprises tabular frequencies, shown as 

adjacent rectangles, which are erected over discrete clusters (bins), with an area equal to the 

frequency of the observations in the interval. The height of a rectangle is also equal to the 

frequency density of the interval, i.e., the frequency divided by the width of the interval 

(Pearson, 1895). For example, suppose that we are given a vector that contains a volume of 10 

events in litres, as follows:  

𝒗 = [4.63, 4.57, 4.59, 4.63, 4.69, 4.75, 4.71, 4.79, 4.92, 4.76]. Then, the volume distribution 

using histograms with different numbers of clusters are presented in Figure 2. 
 

[INSERT FIGURE 2] 

In the present study, a distribution of all of the physical characteristics of a group of events (i.e., 

the volume, duration and flow rate of each water end use category) will be determined using 

the histogram method with 5 clusters, from which the representative value of each group can be 

obtained by selecting the entry that has the highest frequency. It should be noted that the 

selection of 5 clusters is based on the fact all events in each unclassified group after the 

grouping process will have approximate volume, flow rate and duration (i.e. the values of each 

feature do not spread over a wide range) regardless the end use category they actually belong 

to; therefore, the utilisation of 5 clusters is sufficient to determine the representative values. 

Given a group extracted from the tested home that contains 10 events whose volumes are 

presented above, the most typical volume representing this group is 4.6 L because it attains the 

highest frequency of 4 when using a histogram of 5 clusters. 

3. Classification model development  
3.1.  Collected data for the study 

Data utilised for the development of the model is sourced from 252 residential households 

fitted with a smart meter and data logger and located in the urban south-east corner of the State 

of Queensland (SEQ), Australia, in both summer and winter for 2 years, 2010 and 2011. These 

households are consenting participants in the recently completed South-east Queensland 


6 
 

Residential End Use Study (SEQREUS) that was funded by the Queensland State Government 

(Beal and Stewart, 2011). A sample of properties is taken from the four key cities in this 

interconnected SEQ region, namely, Sunshine Coast Regional Council, Brisbane City Council, 

Ipswich City Council and Gold Coast City Council to use as a database for this study.  The 

smart meters provided sufficient resolution (0.014L/pulse every five seconds) of water flow 

data to the household to complete a water end use or micro-component disaggregation process 

(i.e. each tap, shower, etc.). Participating households were also requested to participate in an 

appliance/fixture stock inventory audit and complete a questionnaire survey that was 

developed to assist in determining the socio-demographic characteristics and socioeconomic 

status of the households. All such data was required by the team on the SEQREUS in order to 

manually complete the water end use disaggregation process as well as for statistical analysis 

related to a number of objectives related to that study (e.g. Carragher et al. 2012; Beal et al., 

2011a; Beal and Stewart, 2013). This studies budget was in excess of $US1,000,000 with a 

reasonable proportion of that budget assigned to water end use analysis process for a sample of 

250 households, which is acceptable for a detailed research investigation but the 

disaggregation process needs to be automated for widespread application. Nonetheless, this 

extensive dataset of high resolution flow data and associated water end use event registry 

provided the training set for this study. 

 
3.2.  Stage 1: Non-adaptive classification model 

With the availability of data collected from SEQREUS, the building of an autonomous flow 

trace analysis system commenced (Figure 3). In Stage 1 of the study, a single event analysis 

module was developed to categorise all of the unclassified single events that occur in isolation. 

In this module, HMM, DTW and an event time-of-day probability function were applied to 

autonomously assign all of the single events to appropriate categories, with an average 

accuracy of 84.1% (Nguyen et al., 2013a)，which is slightly lower than that of combined event 

due to low recognition accuracy of bathtub and irrigation. Then, a combined event analysis 

(i.e., a group of concurrent single events) module, which remains one of the most complicated 

problems in the field of pattern matching, was developed. Several techniques were employed 

for splitting apart the various events in a combined water use event, including HMM, the 

Gradient Vector Filtering method and different probability functions that were extracted from 

the various physical features of the existing database (Nguyen et al., 2013b). The classification 


7 
 

outcomes have shown that approximately 88% of the combined events were accurately 

disaggregated into their end use components and then recognised.  

[INSERT FIGURE 3] 

 
3.3.  Stage 2: Adaptive classification model  

The classification model that was developed in Stage 1 was initially trialled in a different 

region (i.e. Melbourne) to that where the model was developed (i.e. different to the SEQ 

training data) to examine its versatility. Model accuracy dropped due to some fundamental 

causes, including, the presence of new end-use categories that have not been identified in SEQ 

(e.g., evaporative air conditioner) as well as some differences in water consumption behaviours 

for some of the end uses which may be due to a range of macro factors (i.e. different climatic 

conditions, government policy, etc.). To overcome this challenge, we needed to build some 

self-learning functionality into the model to make it more adaptive to different regions. 

Therefore, the objective of the research and focus of this paper was to integrate adaptive 

features into the model employing appropriate techniques. The establishment of this critical 

analysis module is articulated in the next section. 

 
4.  Adaptive classification model development 
4.1. Overview of model architecture 

This function was developed to analyse all of the events that exhibit patterns which cannot be 

confidently recognised by the non-adaptive single and combined event modules. Figure 4 

provides an overview of the overall analysis procedure for the adaptive model.  

[INSERT FIGURE 4] 

 
At the very first step for adaptive learning, the HMM threshold value, which is explained in the 

next section and is used to determine whether an event is classifiable by the existing 

non-adaptive analysis modules, will be applied when the model is operated in a new region. 

Classifiable events are initially analysed by these modules (Stage 2a), while all of the 

unclassifiable events are processed by a newly developed adaptive analysis unit (Stage 2b). At 

the end of this process, all of the unclassifiable events in stage 2b will be incorporated into the 

existing database to improve the HMM classifier. The advantages of the proposed technique in 


8 
 

comparison with other adaptive learning recognisers are the simple algorithms, the fast 

analysis time and the lower dependency on the existing database, which has been proven in a 

later section of this paper. A detailed technical development of this analysis module is 

presented in Figure 5. 

 
[INSERT FIGURE 5] 

 
The main objective of this analysis module is to address unclassified events that cannot be 

analysed by the existing non-adaptive model. The first required step is to group all of the events 

that are likely to belong to the same category together, using the DTW technique with various 

physical features that are extracted from each subjected event. The outcomes of this analysis 

step are several groups that contain similar unclassified events and a set of all of the events that 

cannot be assembled together. Grouped and ungrouped events are then analysed by HMM, 

DTW, the event time-of-day probability function and another set of physical parameters, which 

eventually results in all of the single events being classified and sometimes an additional set of 

unclassified events, which belong to a new end-use category. 

4.2. New pattern identification using threshold values 

The threshold values that are applied in this analysis section were achieved through the training 

of the existing database that was collected in SEQ by using the HMM method. The 

determination of the threshold values for each end-use category can be explained as follows. 

Given that 𝑆𝑖 is a set of all single events that belong to category i, where 𝑖 = [1,2, … ,7] 

represents the shower, faucet, clotheswasher, dishwasher, toilet, bathtub and irrigation, 

respectively; 𝑆𝑖 is collected in SEQ and is used as a database to establish an HMM model to 

represent this category, which is denoted as 𝐻𝑀𝑀𝑖. 𝑇𝑉𝑖 is defined as the threshold value of 

category 𝑖 if 𝑇𝑉𝑖 is the minimum likelihood score that is achieved when using the 𝐻𝑀𝑀𝑖 to 

recognise 𝑆𝑖. As a result, when the model is operated in a different area, if one event is 

assigned to category 𝑖 by the existing single event analysis module but its likelihood is less 

than 𝑇𝑉𝑖, then it is considered to be an event with an unclassifiable pattern and will be set aside 

for further analysis. Following the comparison process against threshold values, a set that 

contains all of the unclassified events that require the application of a new analysis process is 

obtained. 


9 
 

4.3. Event grouping using DTW 

Given that 𝐀 = (𝐴1 , 𝐴2 , … , 𝐴𝑚 ) is a matrix that contains m unclassified events, and 𝐴𝑘 is the 

kth event of 𝐀. A DTW distance between 𝐴𝑘 and 𝐴𝑗, namely  𝐷𝑘,𝑗, is determined by using the 

DTW algorithm. By following the same process, a vector 𝐃 = (𝐷1,𝑗, … , 𝐷𝑖,𝑗, … 𝐷𝑚,𝑗) can be 

achieved to measure the similarity of all of the events to Event j, which is an arbitrary event that 

can be selected randomly from the subjected group. Nguyen et al. (2011) have found that, for  

𝐴𝑘 and 𝐴𝑙, if   �
𝐷𝑘,𝑗−𝐷𝑙,𝑗
𝐷𝑘,𝑗

 � < 𝑒, where 𝑒 is defined as the threshold value in terms of similarity 

and j is an arbitrary event, then it is likely that events 𝐴𝑘 and 𝐴𝑙 have similar patterns. 

However, because of the fact that two completely different events could also have similar 

DTW distances to a reference event using the proposed method, additional parameters are 

required for the technique to be applicable to this specific study. 

In reality, one end-use event is described by four basic physical features, namely the volume 

(v), duration (t), maximum flow rate (qmax) and most frequent flow rate (qf) for all of the 

events in 𝑨. An aggregate DTW distance of Event  𝑘  to Event  𝑗, denoted as 𝐷𝐴𝑘,𝑗 , is 

determined as follows: 

𝐷𝐴𝑘,𝑗 = 𝐷𝑘,𝑗 𝑣𝑘 𝑞𝑚𝑎𝑥𝑘 𝑡𝑘 𝑞𝑓𝑘              (1) 

Thus, if 

   𝑅𝐷 = �
𝐷𝐴𝑘,𝑗−𝐷𝐴𝑙,𝑗

𝐷𝐴𝑘,𝑗
 � < 𝑒                                                           (2) 

then events 𝐴𝑘 and 𝐴𝑙 will be assigned to the same group.  

At the end of this process, the first group (denoted as 𝑔1) that contains 𝑝1 similar events, 

whose relative differences in terms of DTW distances (i.e. 𝑅𝐷 in Equation 2) are less than the 

threshold value 𝑒, will be obtained. In the remaining (𝑚 − 𝑝1) events, the same process is 

then conducted to determine the second group, namely 𝑔2, which contains 𝑝2 similar events. 

This grouping process is repeated until no more groups can be achieved. The overall process 

will result in 𝑛 groups of events, denoted by 𝐺 = {𝑔1, 𝑔2 … 𝑔𝑛 }, which contain events that 

have similar patterns, and a set of all dissimilar events that cannot be gathered together. 

It can be seen from Equation (2) that the selection of the threshold value 𝑒 will affect the 

number of groups that are achieved after the grouping process, because all of the events that 


10 
 

have a relative difference of less than 𝑒 will be assembled. After considering the analysis time 

while the final classification accuracy is converged, 𝑒 = 0.1 has been adopted for this study. 

The next analysis step is to classify these grouped and ungrouped events. 

4.4. Analysis of unclassified grouped events 

The classification of grouped events will be undertaken utilising the HMM technique in 

combination with other physical characteristics. An aggregate likelihood for each group to be 

classified into category 𝑖 can be determined by using the following proposed equation: 

 𝐿𝐿𝑖 = 𝑉𝑆𝑖  𝑇𝑆𝑖  𝑄𝑓𝑆𝑖  𝐻𝑀𝑀𝑆𝑖               (3) 

Where: 

• 𝑖 = [1, 2, … , 7]  represents the shower, faucet, clotheswasher, dishwasher, toilet, 

bathtub and irrigation, respectively. 

• 𝑉𝑆, 𝑇𝑆 and 𝑄𝑓𝑆 are, respectively, the volume score, the duration score and the most 

frequent flow rate score, which are arbitrary numbers derived from the volume, 

duration and most frequent flow rate of the subjected group, and 

• 𝐻𝑀𝑀𝑆 is the representative HMM likelihood of the subjected group. 

from which an event can be classified to Category k if LLk is the maximum.  

4.4.1. Determination of the required parameters  

The volume score, duration score, most frequent flow rate score and representative HMM score 

(i.e. 𝑉𝑆, 𝑇𝑆, 𝑄𝑓𝑆  and 𝐻𝑀𝑀𝑆 presented in Equation 3 are the four primary parameters that 

will be employed to aid the classification process. Given an unclassified group that contains 11 

events (shown in Figure 6) and that was obtained after a grouping process while verifying the 

proposed technique against an independent home in Melbourne, the achievement of these 

features is described in the following key steps: 

 
[INSERT FIGURE 6] 

 
Step 1: Determine the volume, duration and most frequent flow rate of the classified events in 

all of the end-use categories that were achieved from the non-adaptive analysis, from which the 


11 
 

distribution range and distribution probability of the above-mentioned features of category 𝑖 

can be obtained using the event probability histogram method. It should be noted at this step 

that a histogram with 10 clusters will be applied to shower, faucet, clotheswasher, bathtub and 

irrigation due to their widespread values, and 5 clusters for the other end use categories such as 

dishwasher and toilet because of their concentrated values. This step is undertaken only once 

for the recognition of all of the unclassified groups. The following example illustrates the 

process of obtaining these parameters for the dishwasher category based on all of the classified 

dishwasher events achieved through the existing non-adaptive single event analysis module 

(Table 1). It should be noted that the distribution probability for each feature is determined by 

using Equation 4.  

 𝐷𝑃𝑖  =
𝐹𝑖
𝑁𝑖

                  (4) 

Where: 

• 𝐷𝑃𝑖 is the distribution probability for category 𝑖, 

• 𝐹𝑖 is the frequency vector of category 𝑖 , and 

• 𝑁𝑖 is the number of events already classified into category 𝑖.  

 
 [INSERT TABLE 1] 

 
Step 2: Determine the volume (𝑣), the most frequent flow rate (𝑞𝑓) and the duration (𝑡) of each 

event in the subjected group. From which, the representative volume, duration and most 

frequent flow rate for this group, denoted as 𝑉𝑟𝑒𝑝 , 𝐹𝑟𝑒𝑝 and 𝑇𝑟𝑒𝑝, can be obtained using 

the event probability histogram method with 5 clusters. As mentioned prior, the representative 

values are those that have the highest frequency. Steps 2 to 5 are illustrated in Table 2 over an 

example that is used to determine the aggregate likelihood of the above mentioned group to be 

assigned to dishwasher category. 

Step 3: Compare 𝑉𝑟𝑒𝑝 , 𝐹𝑟𝑒𝑝 and 𝑇𝑟𝑒𝑝 with the distribution of each end-use category to 

obtain the corresponding distribution probabilities, which are known as 𝑉𝑆, 𝑇𝑆 and 𝑄𝑓𝑆   

and are presented in Equation 3.   

As presented in Table 2, the value of 𝑉𝑟𝑒𝑝 (4.64) falls into the range of 4.25-4.8 for the 

dishwasher (Table 1); therefore, the value for 𝑉𝑆4 is determined to be 42.8, which is the 


12 
 

corresponding distribution probability of this range (9/21). In the same way, the values of 𝑇𝑆4 

and 𝑄𝑓𝑆4 are determined to be 26.5 and 66.7, respectively.  

However, if there is any category 𝑖 that has no classified single event achieved from the 

non-adaptive analysis process, which is very unlikely to happen, then the determination of the 

flow rate range and the corresponding distribution probability, as required in the previous step, 

cannot be undertaken for this category. As a result, the values of 𝑉𝑆, 𝑇𝑆 and 𝑄𝑓𝑆 will be 

considered to be 1, which means that the HMM likelihood of this group to be categorised as 𝑖 

will not be magnified by any factor. In this case, the final recognition accuracy is only slightly 

affected because the verification process presented in section 5 has shown that the HMM 

method alone can explain most of the unclassified events correctly.  

Step 4: Determine the representative HMM likelihood for the subjected group, which is the 

HMM score that has the highest frequency when evaluated using a histogram method. 

Step 5: Determine the aggregate likelihood of the unclassified group.  

In this example, with the achieved values for 𝑉𝑆4, 𝑇𝑆4, 𝑄𝑓𝑆4 and 𝐻𝑀𝑀𝑆4, the overall 

likelihood of the group subjected to be classified as dishwasher is obtained through the 

utilisation of Equation 3.  

 
[INSERT TABLE 2] 

 
Following the same process from step 1 to 5, the likelihoods of this group to be assigned to 

other end uses such as shower, faucet, clotheswasher, dishwasher, toilet, bathtub and irrigation 

can be obtained. In this example, the subjected group was eventually assigned to the 

dishwasher category because the aggregate likelihood of this end use attains the maximum 

value. 

4.4.2. Identification of a new end-use category 

In the context of this study, if the representative likelihood of the subjected group is 20 times 

less than the HMM threshold value of the category to which the group is assigned (i.e. 

𝐻𝑀𝑀𝑆𝑖 < 0.05 𝑇𝑉𝑖), then this group is considered to be a new end-use category. This ratio of 

the threshold value was based on the analysis of several houses in Melbourne; however, for 


13 
 

identifying an appropriate value for the most accurate identification of a new end-use category, 

further research needs to be undertaken across a number of new regions to establish more 

rigorously founded criteria. If the prototype database expansion for a new end use category is 

proving difficult due to scarce examples of its use in households, then there is an opportunity to 

also supplement established prototypes with manually inputted ones that have been verified 

through the use of customer diaries or through fixture level sensors. The newly identified end 

use categories will then be incorporated into the existing resource so that the future system 

performance can be improved. 

4.5. Analysis of ungrouped events 

The analysis of ungrouped events is conducted in a similar way to that for grouped events; 

however, modifications have been made to the formula for the establishment of the aggregate 

likelihood of one event, which is presented in Equation 5 below: 

𝐿𝐿𝑢𝑖 = 𝑉𝑆𝑢𝑖  𝑇𝑆𝑢𝑖  𝑄𝑓𝑆𝑢𝑖 exp(𝑃𝑡) 𝐻𝑀𝑀𝑆𝑢𝑖        (5) 

Where: 

• 𝑉𝑆𝑢, 𝑇𝑆𝑢, 𝑄𝑓𝑆𝑢 and 𝐻𝑀𝑀𝑆𝑢 are the volume score, the duration score, the most 

frequent flow rate score and the representative HMM score of the subjected event. 

• 𝑃𝑡 is the probability index of event occurrence time of a day that was derived from the 

already classified events of the subjected household as explained in step 4 below. 

An in-depth assessment on the training database has found that most of the ungrouped events 

belong to the shower, faucet, abnormal toilet, bathtub and irrigation end-use categories, 

because they usually have highly variable patterns that cannot be gathered together. Therefore, 

the aggregate likelihood of an ungrouped event to be categorised as dishwasher and 

clotheswasher (i.e. 𝐿𝐿𝑢3 and 𝐿𝐿𝑢4) is zero. The achievement of 𝑉𝑆𝑢, 𝑇𝑆𝑢 and 𝑄𝑓𝑆𝑢  is 

similar to that for 𝑉𝑆, 𝑇𝑆 and 𝑄𝑓𝑆; however, 𝑉𝑟𝑒𝑝, 𝐹𝑟𝑒𝑝 and 𝑇𝑟𝑒𝑝 in this case are the 

volume, mode frequent flow and duration of the subjected unclassified event. To obtain 𝐿𝐿𝑢𝑖 

for each ungrouped event, the following process is conducted: 

Step 1: Determine the distribution probability of all of the end-use categories based on all of 

the classified events, which also include the ones achieved from the analysis of the grouped 

events section (e.g., if there are 34 shower events obtained using the existing non-adaptive 

module and 20 events obtained from the grouped event analysis process, then the 


14 
 

determination of the range and distribution probability of the volume, duration and most 

frequent flow rate for this category is based on a total of 54 classified shower events). This 

process is also required once for the recognition of all of the ungrouped events. Again, 

histogram with 5 clusters will be applied to dishwasher and toilet, and 10 clusters to the other 

end use categories. 

Step 2: Determine the volume, the most frequent flow rate and the duration of the subjected 

event (i.e. 𝑉𝑟𝑒𝑝, 𝐹𝑟𝑒𝑝 and 𝑇𝑟𝑒𝑝) 

Step 3: Compare the volume, duration and mode flow rate of the subjected event with the 

distribution probability of each end-use category, to obtain the corresponding values for 𝑉𝑆𝑢, 

𝑇𝑆𝑢 and 𝑄𝑓𝑆𝑢   

Step 4: With the achievement of classified single events from all of the previous steps, a time of 

day probability index (𝑃𝑡) will be determined, which is shown in Figure 7 as an example for 

the tested home. It should be noted that the time of day probability index is suggested as an 

additional criterion for the classification process and is less critical than the other 

‘classification’ inputs; therefore, it’s weighting contribution towards the overall likelihood 

score has been purposely kept limited. In this study, an exponential function is selected to limit 

the range of this magnifier factor to between 1 and 2.71, which corresponds to a time 

probability of 0% and 100%. For example, if an event occurred between 6-7 am, then its 

time-of-day (𝑃𝑡) probability to be classified as a shower is 0.0951 as presented in Figure 7 

below. 

Step 5: Determine the HMM likelihood of the subjected event to be assigned to all of the 

end-use categories (i.e. 𝐻𝑀𝑀𝑆𝑢). Once all of the required parameters have been obtained, the 

aggregate likelihood of the unclassified data can be determined by using Equation 5.  

[INSERT FIGURE 7] 

 
4.6. Adaptive learning procedure 

With the discovery of additional single water end use events from the above described adaptive 

analysis processes, the classification model can be enhanced through incorporating these new 

prototype event variants into the existing prototype database. This procedure will ensure that 

the prototype registry continuously grows and evolves to understand the new variants of water 


15 
 

end use flow signatures. The model updating process is described in a series of steps presented 

in Appendix 1 to determine a new HMM model )(λ . 

5. Model calibration and verification 

The adaptive classification model has been verified against three random homes in a new 

Melbourne region using the proposed technique presented above. In this section, a comparison 

between the existing model developed for an SEQ application (i.e., a non-adaptive model) and 

the new model has also been undertaken by getting the SEQ formulated model to recognise 

three independent homes from the different city of Melbourne, Australia. It should be noted 

that homes 2 and 3 in this verification process have the presence of an evaporative air 

conditioner, which is a new end-use category that was not included in the existing prototype 

database since the region of SEQ has sufficient air humidity for running air conditioners. 

Detailed testing on these homes is displayed in Tables 3, both in terms of the number and 

respective volume of the events (denoted as ‘N’ and ‘V’, respectively). 

 
[INSERT TABLE 3] 

 
Applying the developed adaptive model to analyse the three Melbourne homes, the results 

were very promising, with an average recognition accuracy in terms of volume of at least 90% 

for the faucet, clotheswasher, dishwasher, toilet and irrigation (i.e., 90.1% 91.6%, 90.9%, 

92.8% and 100%, respectively) and 81.5% for the shower. In terms of individual event 

recognition accuracy, it was lower (i.e. 81.5%, 89.7%, 89.1%, 91.4% and 86.2%) for the first 

five categories listed in Table 3. The classification of the bathtub end use category still remains 

a challenging problem, for which the accuracy indices are low both in terms of the volume 

(20.6%) and the specific events (40.5%) recognised. However, when considering the overall 

recognition accuracy for all end use events occurring within these three homes over the two 

week period, 85.7% of them were correctly classified using this autonomous recognition 

process. This is commendable given that these households were in a different region to the 

SEQ originated training dataset and new end use categories needed to be autonomously created 

by the adaptive model.  

Further testing also shows a considerable improvement of the new adaptive model compared to 

the existing model, which was built using SEQ training data (Figure 8). An increase in 


16 
 

accuracy has been experienced in most of the end-use categories, including shower, faucet, 

clothes washer, dishwasher and toilet. This enhancement can be explained by the fact that the 

original model for the SEQ region included the fixed boundary conditions for some of the 

physical characteristics that were derived from the SEQ database (e.g., the minimum shower 

volume is 7 litres) or applied the SEQ time-of-day probability information obtained from the 

SEQ training dataset. Therefore, the application of these features in Melbourne city has caused 

a minor reduction in categorisation accuracy. Moreover, when testing Home 2 and 3 using the 

SEQ-based model, most of the evaporative air conditioner events were misclassified as faucets 

or toilet, which resulted in a low accuracy in these end uses (i.e., when threshold values are 

utilised, the proposed model assigns evaporative air conditioner events into a “new end-use 

category” that does not affect the recognition accuracy of the other categories). 

[INSERT FIGURE 8] 

 
With a gradual expansion of the database through the adaptive learning process, the newly 

developed model can perform effectively in any new region without the requirement of a 

manual calibration if no end-use category exists. The new threshold values that are derived 

from the expanded database will identify the large majority of events that can be classified 

using the non-adaptive model, for which the effectiveness has been verified in Nguyen et al., 

(2013a, 2013b); the remaining variant events are left to the adaptive model. In case there is the 

presence of new end-use categories, new boundary values that determine whether a event 

belongs to an existing category or a new category should be re-identified and applied, as 

mentioned in section 4.4.2. 

6. Residential water end use categorisation software application 
6.1. Application outputs 

Through integrating and codifying the analytical processes contained in the single, combined 

and adaptive learning modules, a software application could be formulated that would be able 

to autonomously categorise remotely collected residential water consumption data received 

from smart meters into a repository of end use events. This software application offers different 

types of results presentations. Figure 9a shows the main interface, which provides important 

information to the customer, such as a summary of the classified volume of each end-use 

category during a specific period of time, which is supported by a detailed description of the 

start time, end time, volume, duration, maximum flow rate and most frequent flow rate of each 


17 
 

classified event. These results are directly achieved from the non-adaptive and adaptive 

analysis process presented in previous sections employing different mathematical techniques, 

such as HMM, DTW, Gradient Vector Filtering, time-of-day probability function or threshold 

values, etc. Each classified event is then plotted in a time series scale with different colours 

corresponding to different end-use categories. The application also allows all analysed results 

to be exported to an Excel file so that various statistical calculations and studies can be 

performed on the raw flow rate series of each individual water end use event for a particular 

household.  

 
Apart from the main interface, the analysis outcomes are also presented in terms of a pie chart 

showing the percentage of each contributing category, to give the user an instant overview of 

the end-use breakdown, and a bar chart that presents the average household water consumption 

in terms of litres per household per day (Figure 9b), which is achieved by taking the water 

consumed in each category divided by the total volume of water consumption in the analysis 

period. It should be noted that the current prototype software application and the outputs 

presented here are for the purposes of illustrating its functionality. Ultimately, the researchers 

seek to make the software embedded into the water businesses water consumption data 

collection repository, processing collected water use data from its entire customer fleet of 

meters autonomously and delivering that processed end-use information in a user-friendly 

form back to the customer via the web to their computer or phone.    

 
[INSERT FIGURE 9] 

 
6.2. Optional manual adjustment functionality 

The present software prototype does not have the level of autonomous end use categorisation 

accuracy (>95%) considered necessary for commercial application. Therefore the present 

software prototype allows users to manually modify analysis decisions, such as changing, 

splitting or merging classified events. Any time that the user clicks on a classified event in the 

graphical figure, all of the physical characteristics of the event will be presented, with an option 

for manual modification (Figure 10a). The editing process is clearly demonstrated in Figure 

10b, and once it has been finished, all of the edited events can be optionally updated into the 

existing database to improve classification accuracy in the future. This function was deemed a 

necessary inclusion in the present prototype software application, but ultimately this function 


18 
 

will be made redundant as more training data from a number of different regions enables the 

software to function with almost faultless accuracy. This is a key area of focus of the authors. 

6.3.Daily end use diurnal demand functionality 

Another useful output of a water end use study is the daily end use diurnal demand graph 

(Figure 10c). This graph can be automatically created from the repository of classified end use 

events, and is highly beneficial to both consumers and water businesses seeking to better 

understand how residential water consumption is being used, at an end use level, across various 

significant days of the year (i.e. average weekday, average weekend day, peak day, average day 

peak month). This data is particularly useful for water infrastructure planning (e.g. water pipe 

network augmentation planning) as it informs network modelling engineers of the peak 

demand flow rates as well as the key end uses contributing to that peak demand (i.e. evening 

shower use combined with clothes washer contributes to morning peak).  

 
[INSERT FIGURE 10] 

 
6.4. Benefits of the software application 

Future research aims to further improve the current software through creating a user-friendly 

presentation of the produced information, which can be interfaced by both the customer and 

water business professionals through a computer or smart phone accessible web-portal (Figure 

11). For the customer, clever reports and diagrams on the following, as a minimum, will be 

designed: (a) daily water usage broken down on an end-use level for the past week and the 

average for the past month; (b) water end-use comparisons against a customer set budget, other 

households and best practice benchmarks; and (c) leak alerts and descriptions on likely leak 

types, with guidance on corrective actions. For the water business professional, automatically 

generated reports on the following will be created as a minimum: (d) water end-use averages 

for single or multiple properties from different suburbs (e.g., compare lower and higher 

socio-economic suburbs); (e) aggregated daily diurnal demand patterns and contributing end 

uses for specified days (i.e., peak day); and (f) water demand forecasting reports for selected 

regions based on just-in-time water end-use data provided.  

 
[INSERT FIGURE 11] 

 
19 
 

7. Conclusions, limitations and future directions  

The development of an autonomous and intelligent system for residential water end-use 

classification will be of significant benefit to both water consumers and utilities. It allows 

individual consumer to log into their user-defined water consumption program to view their 

daily, weekly, and monthly consumption tables, as well as charts on their water demand across 

major end use categories (e.g. leaks, clothes washer, shower, irrigation). It can also rapidly 

alert customers of leak events so that they can immediately be addressed rather than waiting for 

the present slow feedback process from the traditional metering technology (e.g. quarterly bill). 

This system will also help water businesses by rapidly providing water end-use reports of any 

desired property or suburb, thereby empowering them to develop more targeted conservation 

programs in water scarcity periods, improved water demand forecasting and optimised pipe 

network modelling.  

 
All of these opportunities can be realised by the proposed prototype expert system and 

associated software application integrating the single (Nguyen et al. 2013a), combined 

(Nguyen et al. 2013) and adaptive (current paper) analysis modules for categorising residential 

water flow data into end use event categories. The single event disaggregation model was 

comprehensively described in (Nguyen et al., 2013a), which employed HMM, DTW, event 

time-of-day probability function and other physical characteristics to assign an unclassified 

event into an appropriate water end use category. The formulation of a combined event analysis 

module was the logical second stage of research since a reasonable proportion of residential 

water consumption occurs simultaneously (Nguyen et al., 2013b). This modules utilises a 

hybrid combination of HMM, gradient vector filtering method, threshold values and various 

physical features to disaggregate combined events into several classified single events. The 

present analytical stage of this overall research project, which is the focus of this paper, had the 

goal to ensure that the model could adapt and self-learn variant water flow signature 

characteristics in different cities and regions without re-training or calibrating with that regions 

dataset. Through the application of HMM, DTW, threshold values and other physical features, 

the adaptive function has been successfully developed which allows the system to effectively 

analyse data from any new residential house in different regions. A verification process 

undertaken to assess the model capability displayed very promising outcomes with most of the 


20 
 

achieved recognition accuracies for all end use categories being approximately 90%. After this 

function was completed, a user-friendly automatic flow trace analysis application has been 

developed which integrates all available analysis modules into one comprehensive residential 

water end use event pattern recognition system. The only limitation with the adaptive module 

was its lower recognition accuracy for bathtub and irrigation events.  

 
While the present prototype software application is sufficient for conducting automated 

residential water end use analysis with an average of 80-90% recognition accuracy, the system 

would still require human input to achieve very high levels of recognition accuracy. 

Ultimately, accuracy in the order of 95-100% is required for commercially released software. 

Therefore, a future research program has been proposed by the researchers, which includes the 

following key tasks: 

1) Apply genetic algorithms to explore the optimum states for the existing HMM classifier 

which may enhance accuracy and efficiency. 

2) In addition to HMM and DTW, an Artificial Neural Network (ANN) with a 

back-propagation algorithm will be incorporated to analyse physical characteristics of 

each collected event (i.e. volume, duration and flow rate), which will likely have a 

significant impact on the recognition process accuracy. 

3) Apart from the water end use event time-of-day likelihood functions that have been 

previously applied, other decision support parameters (i.e. social and demographic 

information) will be examined to improve accuracy. 

4) Further train the analysis system using new water end use databases from other regions 

(i.e. Melbourne, Adelaide, United Kingdom) in order to improve its accuracy and 

sufficiently cater for different end use categories (e.g. evaporative air conditioners). 

 
References 

Baum, L. E. Petrie, T. 1966. Statistical inference for probabilistic functions of finite state 
Markov chains. The Annals of Mathematical Statistics 37 (6): 1554–1563. 
DOI:10.1214/aoms/1177699147.  

 
Baum, L. E. Petrie, T. Soules, G. Weiss, N. (1970). A maximization technique occurring in the 

statistical analysis of probabilistic functions of Markov chains. The Annals of 
Mathematical Statistics 41: 164. DOI:10.1214/aoms/1177697196. 

 
http://en.wikipedia.org/wiki/Digital_object_identifier
http://dx.doi.org/10.1214%2Faoms%2F1177697196


21 
 

Beal, C. and Stewart, R.A. (2011). South East Queensland residential end use study: final 
report. Technical Report No. 47 for Urban Water Security Research Alliance. Griffith 
University and Smart Water Research Centre, January 2012. 

 
Beal, C., Stewart, R.A., Huang, T.T., Rey, E. (2011a). SEQ residential end use study. Journal 

of   the Australian Water Association 38 (1), 80-84. 
 
Beal, C.D. and Stewart, R.A. (2013) Identifying Residential Water End Uses Underpinning 

Peak Day and Peak Hour Demand. ASCE Journal of Water Resources Planning and 
Management 

 
Carragher, B.J., Stewart, R.A., Beal, C.D. (2012) Quantifying the influence of residential water 

appliance efficiency on average day diurnal demand patterns at an end use level: A 
precursor to optimised water service infrastructure planning. Resources Conservation 
and Recycling 62, 81-90.  

 
Chien, J.-T., Wang, H.-C. (1997). Telephone speech recognition based on Bayesian adaptation 

of hidden Markov models. Speech Communication, 22, 369-384. 
 
Cho, W., Lee, S.W., and Kim, J.H. (1995). Modelling and recognition of cursive words with 

HMM. Pattern Recognition, 28(12), 1941-1953. 
 
Ephraim, Y., Merhav, N. (2002). Hidden Markov processes. Information Theory, IEEE 

Transactions on, 48, 1518-1569. 
 
Ghahramani,  Z., Jordan, M. I. (1997). Factorial Hidden Markov Models. Machine Learning 

29 (2/3): 245–273. DOI:10.1023/A:1007425814087. 
 
Loh, M. and Coghlan, P. (2003). Domestic water use study in Perth, Western Australia 1998 to 

2000. Water Corporation of Western Australia. 
 
Manmatha, R. and Srimal, N. (1999) Scale Space Technique for Word Segmentation in 

Handwritten Manuscripts. In: Proc. 2nd Int’l Conf. on Scale-Space Theories in 
Computer Vision, Corfu, Greece, September 26-27, 1999, pp. 22-33. 

 
Manmatha, R. and Rath, T. M. (2002): Word Image Matching Using Dynamic Time Warping. 

Multi-Media Indexing and Retrieval Group, Center for Intelligent Information Retrieval, 
University of Massachusetts. Technical Report 

 
Makki, A. Stewart, R.A. Panuwatwanich, K. and Beal, C. (2011) Revealing the determinants of 

shower water end use consumption: enabling better targeted urban water conservation 
strategies. Journal of Cleaner Production, DOI: 10.1016/j.jclepro.2011.08.007 

 
22 
 

Marquez, J.P. (2001) Pattern recognition: concepts, methods and applications. Springer, ISBN: 
3-540-422978. 

 
Muller, M. (2007).  Information Retrieval for Music and Motion, Chapter 4 . Springer, ISBN 

978-3-540-74047-6. 
 
Myers, C. S.,  Rabiner, L. R. (1981). A comparative study of several dynamic time-warping 

algorithms for connected word recognition. The Bell System Technical Journal, 60, 
1389-1409. 

 
Nguyen, K.A., Zhang, H., Stewart, R.A. (2011). Application of Dynamic Time Warping 

algorithm in prototype selection for the disaggregation of domestic water flow data into 
end use events.  Proceeding of the 34th World Congress of the International Association 
for Hydro-Environment Engineering and Research, pp2137-2144, Brisbane, Australia,  
26 June-1 July, 2011.  

 
Nguyen, K.A., Zhang, H., and Stewart, R.A. (2013a). Development of an intelligent model to 

categorise residential water end use events. Journal of Hydro-Environment Research, 
10.1016/j.jher.2013.02.004 

 
Nguyen, K.A., Zhang, H., and Stewart, R.A. (2013b). Intelligent pattern recognition model to 

automate the categorisation of residential water end-use events. Journal of Environment 
Modelling and Software, [under review]. 

 
Pearson, K. (1895). Contributions to the mathematical theory of revolution II: Skew variation 

in homogeneous material. Philosophical Transactions of the Royal Society A: 
Mathematical, Physical and Engineering Sciences 186: 343–326.  

 
Rabiner, L., Juang, B. (1993). Fundamentals of speech recognition. Prentice-Hall, Inc., 

Chapter 4. Rabiner, L. R. 1990. A tutorial on hidden Markov models and selected 
applications in speech recognition. Readings in speech recognition. Morgan Kaufmann 
Publishers Inc. 

 
Sakoe, H., Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word 

recognition, Acoustics, Speech and Signal Processing,19 IEEE Transactions on, vol. 26, 
no. 1, pp. 43{49, 1978. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs 
all.jsp?arnumber=1163055 

 
Satish, L., Gururaj, B. I. (2003). Use of hidden Markov models for partial discharge pattern 

classification. IEEE Transactions on Dielectrics and Electrical Insulation. 
 
Starner, T., Pentland, A. (1995). Real-Time American Sign Language Visual Recognition 

From Video Using Hidden Markov Models. Master's Thesis, MIT, Program in Media 
Arts. 

http://en.wikipedia.org/wiki/Special:BookSources/9783540740476
http://en.wikipedia.org/wiki/Special:BookSources/9783540740476
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=212242
http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=212242
http://www.cc.gatech.edu/~thad/p/031_10_SL/real-time-asl-recognition-from%20video-using-hmm-ISCV95.pdf
http://www.cc.gatech.edu/~thad/p/031_10_SL/real-time-asl-recognition-from%20video-using-hmm-ISCV95.pdf


23 
 

Stewart, R.A., Willis, R.M., Giurco, D., Panuwatwanich, K., and Capati, B. (2010). Web-based 

knowledge management system: linking smart metering to the future of urban water 
planning.  Australian Planner, 47(2), 66-74. 

 
Stewart, R.A., Willis, R.M., Panuwatwanich, K. and Sahin, O. (2011). Showering behavioural 

response to alarming visual display monitors: longitudinal mixed method study. 
Behaviour & Information Technology, DOI: 10.1080/0144929X.2011.577195, pp. 1-17. 

 
Tapia, E., Intille, S.S., Larson, K. (2004). Activity Recognition in the Home Using Simple and 

Ubiquitous Sensors. In: Ferscha, A., Mattern, F. (eds.) PERVASIVE 2004. LNCS,vol. 
3001, pp. 158–175. Springer, Heidelberg  

 
Willis, R. M., Stewart, R.A., Panuwatwanich, K., Capati, B., Giurco, D. (2009a). Gold Coast 

domestic water end-use study. Water. Journal of Australian Water Association. 36(6), 9-85. 

Willis, R.M., Stewart, R.A., Panuwatwanich, K., Williams, P.R., Hollingsworth, A.L., 
(2011a). Quantifying the influence of environmental and water conservation attitudes on 
household end-use water consumption. Journal of Environmental Management. 
doi:10.1016/j.jenvman.2011.03.023 

 
Willis, R.M. Stewart, R.A., Giurco, D., Talebpour, M.R., Mousavinejad, A.  (2011b) End use 

water consumption in households: impact of socio-demographic factors and efficient 
devices. Journal of Cleaner Production, in-press, doi:10.1016/j.jclepro.2011.08.006 

 
Appendices 

Appendix 1 

Step1: Retrieve the initial state probability πi, state transition probability aij, and observation 

probability bj(ok) of the current model to use as starting probabilities. It should be noted that the 

subscript (i) here is used to indicate a state in HMM model training algorithm, not showing the 

end use category as in previous sections.   

Step 2: Based on the above values, determine the following parameters: 

• αt(i) : the probability of flow rate o1 through to ot  and being in state i at time t              

( iq t = )   given the HMM ( λ )  

  )|,...()( 21 λα iqoooPi ttt ==                                                  (6)   


24 
 

• βt(i) : the probability of flow rate ot+1 through to oT, given the HMM ( λ  ) and given 

that the model is currently in state i at time t ( iq t = )   

       ),|()( 21 λβ iqoooPi tTttt == ++                            (7) 
 

• γt(i) : the probability of being in state i at time t given a water flow sequence  ( O ) and 

HMM ( λ )         

                  
∑
=

= N

j
tt

tt
t

jj

ii
i

1
)()(

)()(
)(

βα

βα
γ                                                (8)   

 
• ξt(i,j) : the probability of being in state i at time t, and in state j at time t+1, given a 

water flow sequence ( O ) and the HMM ( λ ). 

           1
( , , | )

( , )
( | )

t t
t

P q i q j
i j

P
λ

ξ
λ

+= ==
O

O
                                  (9)  

 
 where   1 1
1 1

( | ) ( ) ( ) ( ),
N N

t kp p t t
k p

P k a b o pλ α β+ +
= =

= ∑∑O
 

 and      1 1 1( , , | ) ( ) ( ) ( )t t t ij j t tP q i q j i a b o jλ α β+ + += = =O
 

Step 3: Calculate the following parameters for each water flow sequence ( O )  

          
∑
=

T

t
t i

1
)(γ  : expected number of times in state i for the water flow sequence ( O )               

 
∑
−

=

1

1
)(

T

t
t iγ :  expected number of transition from state i for the water flow   

sequence   ( O )  
 

∑
−

=

1

1
),(

T

t
t jiξ : expected number of transition from state i to state j for the flow rate    

sequence ( O )   


25 
 

Step 4: With the calculated values in step 3, the probabilities values of πi , aij and bj(ok)  can be 

updated by performing Equations 10 to12: 

)( 1 ii γπ =                                                       (10)               

 
  (11)      

   
     (12)                                                                                               

 
It should be noted that the above calculations of πi , aij and bj(ok) will be updated every time a 

new event is introduced to the existing HMM model for training. At the end of this process, a 

new HMM model ( λ ) will be achieved to cover both existing and new database.   

∑

∑

=

=
=

= T

t
t

T

t
t

kj

j

j

ob kt

1

 such that
1

)(

)(

)(
γ

γ
vo

∑

∑
−

=

−

== 1

1

1

1

)(

),(

T

t
t

T

t
t

ij

i

ji
a

γ

ξ


26 
 

Figure captions 

 
Figure 1      Overview of proposed autonomous and intelligent water management system 

 
Figure 2     Example of frequency histogram with different number of clusters 


27 
 

Figure 3     Flowchart of the water end-use classification process 

 
Figure 4     Flowchart of adaptive model sequence 


28 
 

Figure 5     Adaptive model development 

 
Figure 6     Example of an unclassified group of events 


29 
 

Figure 7      Example of time of day probability for one particular home 

 
Figure 8a    Adaptive and non-adaptive model comparison in terms of number of event 


30 
 

Figure 8b    Adaptive and non-adaptive model comparison in terms of volume 

 
Figure 9a       Software application main interface 


31 
 

Figure 9b    Software application water end use pie and bar chart outputs 

 
Figure 10a   Optional manual override reclassification of system classified event 


32 
 

Figure 10b   Optional manual splitting of combined event into single event categories 

 
Figure 10c   Software application output of water end use daily diurnal demand pattern 


33 
 

Figure 11     Proposed web interface application to customer and water utility 

 
34 
 

Table 1 Example of probability distribution for the dishwasher end use category  

Features Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 

Volume range (L) 
Frequency (No.) 

2.62-3.17 
4 

3.17-3.71 
1 

3.71-4.25 
6 

4.25-4.8 
9 

4.8-5.3 
1 

Distribution probability (%) 19.1 4.8 28.5 42.8 4.8 
      
Duration range (s) 
Frequency (No.) 

80-93 
2 

93-106 
4 

106-119 
1 

119-132 
6 

132-145 
9 

Distribution probability (%) 9.5 19.1 4.8 28.5 42.8 
      
Mode flow range (L/min) 
Frequency (No.) 

2.3-2.7 
14 

2.7-3.1 
1 

3.1-3.5 
1 

3.5-3.9 
1 

3.9-4.3 
4 

Distribution probability (%) 66.7 4.7 4.7 4.7 19.2 
 

35 
 

Table 2 Determination of the aggregate likelihood of the grouped events to be classified to dishwasher category 

Event 1 2 3 4 5 6 7 8 9 10 11 Representative values 

 𝑣 (L) 4.63 4.57 4.59 4.63 4.69 4.75 4.71 4.79 4.92 4.76 5.18 4.64 

Step 2 𝑡 (s) 125 125 125 125 130 130 130 135 135 135 140 126.5 

 𝑞𝑓 (L/min) 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 2.3 

 𝑉𝑠,4 (L) 42.8 

26.5 

66.7 

 
Step 3 𝑇𝑠,4(𝑠) N/A 

 𝑄𝑓,4 (L/min)  

Step 4 𝐻𝑀𝑀4 (x10
-8) 7.4 54.5 48.9 16.2 75.8 19.9 6.5 2.1 4.0 3.3 13.2 9.5 

Step 5 𝐿𝐿4 (x10
-4) 71.86 N/A 

 
36 
 

Table 3 End use event categorisation accuracy (%) using adaptive and non-adaptive models 

 Adaptive model accuracy (%) Non-adaptive model accuracy (%) 

End use 

category 

Home 1 Home 2 Home 3 Average Home 1 Home2 Home 3 Average 

V N V N V N V N V N V N V N V  N 

Shower 88.9 78.5 76.4 80.6 79.2 85.4 81.5 81.5 76.9 73.6 81.5 75.8 79.2 82.3 79.2 77.2 

Faucet 97.0 93.4 77.3 79.3 95.2 96.3 90.1 89.7 93.6 90.4 68.9 78.3 86.9 84.3 83.1 84.3 

Clotheswasher 96.7 90.1 86.3 81.8 91.8 95.5 91.6 89.1 85.2 83.1 82.6 81.8 84.2 85.5 84.0 83.4 

Dishwasher 96.7 94.3 85.1 88.4 0 0 90.9 91.4 85.6 83.3 78.5 80.4 0 0 82.1 81.8 

Toilet 91.4 87.8 89.8 86.5 97.1 84.4 92.8 86.2 78.6 75.3 70.2 74.6 72.1 75.4 73.6 75.1 

Irrigation N/A N/A N/A N/A 100 100 100 100 N/A N/A N/A N/A 100 100 100 100 

Bathtub 20.6 40.5 N/A N/A N/A N/A 20.6 40.5 20.6 40.5 N/A N/A N/A N/A 20.6 40.5 

Note: Testing end use event categorisation accuracy by V (volume of end use) and N (number of end use events) correctly classified. 

 
	3.1.   Collected data for the study