This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution

and sharing with colleagues.

Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party

websites are prohibited.

In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information

regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:

http://www.elsevier.com/copyright

http://www.elsevier.com/copyright


Author's personal copy

Integrated expert system applied to the analysis of non-technical losses in
power utilities

Carlos León a, Félix Biscarri a, Iñigo Monedero a, Juan I. Guerrero a,⇑, Jesús Biscarri b, Rocío Millán b
a School of Computer Science and Engineering, Electronic Technology Department, Av. Reina Mercedes S/N, 41012 Seville, Spain
b Endesa, Non Technical Losses Department, Borbolla Building, Av. Borbolla S/N, 41092 Seville, Spain

a r t i c l e i n f o

Keywords:
Expert system
Data mining
Text mining
Utilities
Power

a b s t r a c t

The detection of non-technical losses (NTLs), in most papers, commonly deals with the utilization of the
registered consumption for each customer; besides, some researchers used the economic activity, the
active/reactive ratio and the contract power. Currently, utility company databases store enormous
amounts of information on both installations and customers: consumption, technical information on
the measure equipment, documentation, inspections results, commentaries of inspectors, etc. In this
paper, an integrated expert system (IES) for the analysis and classification of all the available useful infor-
mation of the customer is presented. Customer classification identifies the presence of an NTL and the
problem type. This IES include several modules: text mining module for analysis of inspector commen-
taries and extraction of additional information on the customer, data mining module to draw up the rules
that determine the customer estimate consumption, and the Rule Based Expert System module to analyze
each customer using the results of the text and data mining modules. This IES is used with real data
extracted from Endesa company databases. Endesa is the most important power distribution company
in Spain, and one of the most significant companies of Europe. This IES is used in the test phase by human
experts in the Endesa company. In this phase, the IES is used as a Decision Support System (DSS), as it
contains another module which provides a report with additional information about the customer and
a summarized result that the inspectors can use to reach a decision.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

Today, in the utility distribution companies, cost reduction is
one of the main issues of concern. Therefore, identifying and reduc-
ing the non-technical losses (NTLs) are the main objectives. NTL is
defined as a non-billed energy due to the existence of irregularities
or deviations in the customer facilities. These anomalies or frauds
cause an imbalance between the company registered consumption
and the real consumption and precipitating serious economic
losses, due to the lack of billed consumption.

Utility companies like Endesa solve the problem by conducting
massive inspections on a customer’s set which satisfies generic
conditions. The time and economic costs involved in such inspec-
tions are very high. Normally, such large scale inspections achieve
12% success rate at best.

This paper proposes an integrated expert system (IES) which
globally analyzes all the available information on Endesa’s dat-
abases, and differentiates between unnecessary and useful infor-
mation. The objective of this analysis is classification of

customers under one or more different categories, to help Endesa
staff identify and categorize NTLs.

This system uses real samples extracted from Endesa’s dat-
abases. It is currently in the test phase, and used as a Decision Sup-
port System (DSS) although it contains a report-generating module
that summarizes the result analysis and its classification.

The IES is chiefly composed of several modules: text mining
module to collect information on the inspectors’ commentaries,
data mining module to set up rules to check if the customer con-
sumption is normal, and the Rule Based Expert System (RBES),
which uses the generated information by other modules. Besides,
it uses a set of parameters (from the domain experts) to classify
the customers based on the problems they face. These rules are de-
rived from the inspectors and the Endesa staff.

In most of the earlier published papers, the authors use several
features for NTL detection, like consumption data, economic activ-
ity, active/reactive rate, and contracted power. The proposed solu-
tion uses all the available customer information that the company
has like consumption reports, information on the measuring equip-
ment, inspectors’ commentaries, documentation, etc. Thus, while
the classification process is being performed, time and economic
costs are simultaneously reduced. The efficiency accordingly in-
creases due to a cut in false NTLs inspections.

0957-4174/$ - see front matter � 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.02.062

⇑ Corresponding author. Tel.: +34 679231193.
E-mail address: juaguealo@us.es (J.I. Guerrero).

Expert Systems with Applications 38 (2011) 10274–10285

Contents lists available at ScienceDirect

Expert Systems with Applications

j o u r n a l h o m e p a g e : w w w . e l s e v i e r . c o m / l o c a t e / e s w a


Author's personal copy

The proposed IES is a part of the development project in collab-
oration with Endesa one of Spain’s largest companies, including
more than 10 million customers.

In this paper, the proposed IES is explained in great detail, under
several sections:

– The Bibliographical Review. In this section, authors do a complete
review of the papers related with the techniques and technolo-
gies used in the IES.

– The MIDAS Project section explains the project in which this IES is
developed. Also, several concepts and procedures related with
the management and control of the customers’ consumption
are discussed.

– The System Architecture section reveals the architecture of the pro-
posed IES, and the sections following it describe each of it modules:
data mining, text mining and RBES modules.

– The Section 9 describe the current results of IES application. In the
Section 10, current researches are briefly presented.

2. Bibliographic review

Frauds and anomalies represent a serious problem for utility
companies. Advancement in the different techniques and method-
ologies related with the Artificial Intelligence has enabled the
detection, classification, reduction, and prediction of the problems
of frauds and anomalies.

The proposed IES employs several techniques to detect and clas-
sify customer abnormalities related with frauds and other anoma-
lies (NTLs). This system combines RBES, data mining and text
mining. Several areas of research related with this work are pres-
ent, although based on pattern detection: for example, the main
fields of research include those with high economic risk: finances
and economics, telecommunications and others.

2.1. Finances and economics

Sánchez, Vila, Cerda, and Serrano (2009) use the fuzzy associa-
tion rules for fraud detection in credit card transactions; Quah and
Sriganesh (2008) use the auto-organization maps to decipher, filter
and analyze the customers’ behavior in fraud detection; Kirkos,
Spathis, and Manolopoulos (2007) use several data mining tech-
niques (decision trees, neural networks and Bayesian belief net-
works) analyzing the factors related with Fraudulent Financial
Statements (FFS) comparing the efficiency among all the methods.
Besides, Richardson (1997) compares statistical and neural net-
work techniques. Wheeler and Aitken (2000) use case-based rea-
soning to detect frauds in the credit approval process. All these
papers propose different techniques, related in large part with data
mining, demonstrating their efficiency in fraud detection. Also,
Hand and Blunt (2001) have made a significant contribution by
an in depth treatment of all the prior information analysis pro-
cesses for application of data mining techniques. They classify cus-
tomers on the economic sector in which the transfer is made. There
are others subjects within the scope of this research field which
use very similar methods, although based on other features, for in-
stance, anti-money laundering (Gao & Xu, 2009), risk evaluation
(Yu, Yue, Wang, & Lai, 2010), credit scoring (Huang, Chen, & Wang,
2007), and financial distress (Chen & Du, 2009).

2.2. Telecommunications

In this field of research several other methods and tools related
with data mining (Daskalaki, Kopanas, Goudara, & Avouris, 2003;
Wang, Wang, Zhan, Li, & Wang, 2004), Support Vector Machine
(SVM) (Wang et al., 2009), neural networks and fuzzy rules
(Estévez, Held, & Pérez, 2006) and Expert Systems (Hilas, 2009)

are applied. These techniques identify various types of trouble
solving, though they will need to use the domain expert knowledge
at a certain level.

2.3. Others

There are several other research fields in which these tech-
niques are applied for pattern detection: medicine (Cierniakoski,
De, & May, 1991; He, Wang, Graco, & Haukins, 1997; Yang &
Hwang, 2006), Intrusion Detection or IDS (Depren, Topallar,
Anarim, & Ciliz, 2005; Hernández-Pereira, Suárez-Romero,
Fontenla-Romero, & Alonso-Betanzos, 2009; Kim, Im, & Park,
2010), transport (Chen, Qu, & van Zuylen, 2010), etc. All these papers
use different search methods to detect anomalous of fraudulent
patterns, ruling out the others by any of the means, except in the
case of Richardson (1997). He establishes a classification system
on several groups based on the problems involved in each case.

Unlike these papers, the electrical energy customer usually dis-
plays more than one type of pattern that indicates the same NTL.
Therefore, the IES classifies customer employing several categories.

2.4. NTL detection

There are several papers focusing on NTL detection:
Galván, Elices, Muñoz, Czernichow, and Sanz-Bobi (1998) pro-

posed a general methodology based on using Radial Basis Function
Networks (RBFN), with the following steps: (1) variable selection,
(2) data filtering, (3) Model fitting, (4) Model analysis, and (5)
Model evaluation. The third step, the RBFN input, is taken of the
variables monthly periods of each annual consumption pattern
and active/reactive consumption. For instance, the methodology
is applied in two economic sectors: low-voltage lodging sector
and high-voltage farm watering sector.

Cabral, Pinto, Onofre, Gontijo, and Filho (2004) proposed an
application that used rough sets to classify the categorical attribute
values to detect fraud in customer electrical energy use. The contin-
uous attributes were converted to discrete. The system achieved a
fraud rightness rate around 20%. The authors expressed that the
main difficulty to detect electrical energy profiles was the low
‘fraudulent customer’/’normal customer’ ratio, around the 5%, in
Brazilian electrical energy distribution companies. They also admit-
ted that to add to their problems, many fraudulent customer behav-
iors appeared as normal behavior. Cabral, Pinto, Linares, and Pinto
(2006) added Knowledge Discovery Databases (KDD) to improve
the success of fraud pattern detection. In the previous version, they
performed a KDD process by selecting 12 attributes (2 string and 10
numerical).

Unlike these papers, the proposed IES does not include a vari-
able selection process, as this selection is performed in a previous
data mining step, according to the domain expert knowledge. Be-
sides, it uses several modules to adapt unstructured and unusual
information. It enables the creation of a set of rules to improve
and fine tune the NTLs detection results.

Therefore, all the reviewed papers reveal several ideas in
common:

– Using different techniques related with the data mining and the
pattern detection.

– In the papers directly related with the NTL detection in energy
consumption, only a few key indicators are used: the energy
consumption, the economic activity, the contracted power and
the active/reactive ratio. Much of the interesting information,
for instance the reports of the NTL inspectors, is rejected.

The present paper proposed a new integrated expert system,
which includes all the available information in Endesa’s databases,

C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285 10275


Author's personal copy

using real inspection results to test the system. The domain expert
knowledge is obtained from the Endesa staff. To incorporate the
knowledge three techniques are used:

– The static knowledge is included in the RBES.
– The data mining technique involves different statistical tech-

niques to create rules related with the estimated consumption
that a customer without NTL would have.

– The text mining technique comprises knowledge of the custom-
ers’ facilities provided by Endesa’s inspectors. The text mining
objective is unstructured information, using natural language,
and it can be used to generate new information (Yang & Lee,
2004), to extract information (Sung & Chang, 2004), to summa-
rize information (Aliguliyev, 2009), etc.

On the contrary, Schutzer (1990) suggested applying a business
expert system for fraud detection. Liao (2005) submitted refer-
ences for fraud management, and Rahman and Lauby (1993) pro-
posed the use of expert systems in Power System Planning.

These papers advanced the usefulness of an expert system in the
utility distribution field, to implement the domain expert knowl-
edge. In recently published papers, authors have not produced pa-
pers related with electrical NTLs detection containing a RBES.
However, there are papers relating to fraud detection in other re-
search fields. For example, Cierniakoski et al. (1991) for processing
medical insurance claims; Bowen (1994) for police investigators of
economic crimes; and Hilas (2009) for fraud detection in private
telecommunications networks.

3. The MIDAS Project

The aim of the MIDAS Project is NTL detection by utilizing the
data mining techniques and other Artificial Intelligence (AI) tech-
niques over Endesa’s databases. Initially, the project began with
the NTL pattern detection. These advancements are published in
various papers (Biscarri et al., 2008; Biscarri et al., 2009), where
different techniques of data mining and neural networks are used
to detect the consumption pattern by which the NTLs they identify
with are proposed. Supervised and non-supervised techniques too
are applied.

The information flow diagram is shown in Fig. 1 depicting the
research methodology and the IES role in the project. The steps
of this cycle are:

– Sample Selection. A set of customers is selected. Contracted
power, economic sector and geographic location features are
used in this step.

– Sample extraction from corporate databases. The main the diffi-
culty in this step is the large number of customers. Two extrac-
tion methods may be used: designing of an extraction system

for batch processing or designing of an intermediate database.
The first method is used at night or during inactive database
periods. The second method, used in the IES, works with the
off-line information and it is actualized periodically using a
batch process.

– Application of studies based on data mining and AI using customer
consumption information. The output is a list of customers ‘sus-
pected’ of NTL. These studies are described in detail by Biscarri
et al.. (2008), Biscarri et al.. (2009).

– Analysis of customers. In this phase, the remaining available
information on the customers is analyzed, to determine if those
with a ‘suspected’ anomaly are wrongly classified. Normally,
this process demands much time and effort from the inspectors
or domain experts, as it is necessitates a review of all the cus-
tomer information manually, one at a time. This step will be
replaced with the proposed IES that offers an automatic and
accurate analysis.

– Customer review. The final conclusions, obtained through the
application of studies and the customer analysis are verified
with ‘in situ’ inspections.

– Review of the results. The results obtained by the inspectors are
checked. This information improves the future studies and the
IES. Fraud identification per number of ‘in-situ’ inspection per-
centage as reported (15–20%, or higher depending on customer
characteristics).

Actually, after analyzing the ‘in-situ’ inspections conducted by
the Endesa staff, the authors conclude that the utilization of addi-
tional techniques becomes necessary to detect the false-positives,
i.e., the customers who present an NTL pattern but who are not
fraudulent or anomalous. We had earlier experienced that the
number of false-positives was high. This is a serious problem in
all the utility companies; and there is no easy solution as customer
consumption depends on several factors.

Also, the high cost associated with ‘in-situ’ inspections poses a
great limitation. The inspections conducted to identify NTL are
more expensive than a standard revision or equipment mainte-
nance, as more qualified inspectors are needed.

The proposed IES attempts to meet this need by adding new
verifications to the selected customers utilizing any of the data
mining methods, to minimize the false-positive cases.

Project development performed by the SPSS Clementine envi-
ronment is commonly used in the commercial development of
the data mining process. Liao and Wen (2007) and Liao, Hsieh,
and Huang (2008) presented an interesting review of some fea-
tures of this software, besides proposing utilization examples.

4. The energy distribution management and the expert system

Endesa, similar to other power utilities companies, implements
several procedures to manage the energy distribution. For exam-
ple, this company establishes contract procedures which lay the
groundwork to create a new contract for the customer. Likewise,
the company inspectors routinely check customer installation.
These inspections follow established guidelines, depending on
the requests by the company.

Normally, the inspectors keep adding on new information to the
corporative databases or modify the existing ones on customer
installation. This information includes inspector comments on any
thing or any event which the inspector has observed. However, in
the case of NTL search, these comments summarize inspector
observations and the procedural steps taken to rectify it. When the
company inspectors identify any abnormality in the customer
measurement system, they notify the company. All the information
related with identified NTL are reported and stored in Endesa’sFig. 1. The information flow diagram of the MIDAS Project.

10276 C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285


Author's personal copy

databases. Thus, the automatic processing of this information,
involving an adequate expert system, is clearly very useful in deter-
mining NTL patterns, and consequently to reduce false NTL ‘in-situ’
inspections.

Endesa staff measure customer consumption by installed mea-
surement registers. The measurement period could either be once
per month (monthly) or one per 2 months (bimonthly). These data
are carefully stored in corporate databases for billing purposes.
Sometimes, the readings cannot be reported due to a reading error
or causes outside the scope of the reader; for example, the worker
cannot access the customer register. In such a scenario, the Endesa
system automatically calculates an estimate of this measure which
the company terms ‘estimated consumption’, and these are usually
lower than the real consumption. For the NTL detection process,
the estimated consumptions need to be carefully studied, as they
could actually hide an NTL. Authors noted several cases with zero
real consumption, resulting from an abnormality or a fraud, and a
low billed estimated consumption, for several reasons. Therefore,
for the expert system, the number of customers estimated readings
becomes highly significant data.

All measurement actions are carried out at the customer’s instal-
lation and the data are stored in Endesa’s databases. As Endesa has
more than 10 million customers, such large numbers of customers
need large databases and complex hardware architecture.

When future customers request the electric energy service, they
need to provide the company with personal information and fur-
ther details about his economic activity, his electric supply require-
ment and his selected contractual power rate. The contractual rate
establishes the power range and voltage to be installed on the
customer property. Besides, the contractual rate provides the
measuring and control equipment features, by which the price of
the billed energy can be established, depending on the time band
consumption and the power contracted. RBES checks customer
consumption relative to the contractual information. For example,
there are some customers for whom the consumption is assembled
in particular time band discriminations.

5. Expert system architecture

The proposed IES is the result of a research of the all informa-
tion available in company databases. Each type of information calls
for its own technique based on the information type and search
objective. The techniques and technologies listed below are used
in the following manner:

– RBES module. This technology is used to include the domain
expert knowledge. This is static knowledge, and normally,
excludes the learning processes. This technology is added to
the system as the main module. It uses the information gener-
ated by the other modules to include the dynamic knowledge
or mining learning.

– Data mining module. This technique includes several statistical
techniques, like basic statistical indicators and regression tech-
niques. It is used to determine whether the consumption is cor-
rect based on the geographic location, contracted power, billing
frequency, time band discrimination, postal code, and economic
activity. This module includes a learning process (described in
Section 6).

– Text mining module. This technique permits the inclusion of the
inspectors’ knowledge of certain customers. This information is
chiefly used to determine if false-positives exist. It enhances the
efficiency of the classification process. This module too, requires
a learning process (described in Section 7).

The IES includes several modules which interact with the
knowledge base. Fig. 2 shows the basic architecture of the system.

The Auxiliary Tools perform several tasks, to check on NTL’s
‘suspicious’ customers and to summarize the accumulated results.
This module is not used in the classification process, but is very
useful for the company to create summary reports.

The core of the expert system, text mining and data mining
modules are done using SPSS Clementine. The knowledge base
and sample storage are done using MySQL. The connection is main-
tained through ODBC drivers.

The data mining and text mining modules update a certain part
of the knowledge base, because of which, they must be applied in
anticipation of customers’ studies. These modules require only peri-
odic update. The data mining module particularly needs to update
its knowledge base, either monthly or bimonthly. The updating
time process depends on the number of customers. For example,
with all customers in the low voltage sector (this sample is termed
‘Low Voltage Sample’ or ‘LV Sample’) this process could take 3 h
(the features of the computer test are specified in Section 9). The
text mining module needs to annually update its knowledge base
or when there is an important change in the company. Following
the earlier example, this process was executed in 2 h.

6. The data mining module

The data mining module has to conduct either a monthly or bi-
monthly learning process, as described in this section. Then, the
information generated by this module is used in the analysis pro-
cess, by the IES. The data mining model of this module thus helps
to establish relations among customers’ consumption, and it cate-
gorizes them using statistical techniques by estimation of the con-
sumption curve in customers’ sets, fulfilling several conditions.

The process begins with filtering customers who present a spe-
cific abnormality in consumption. This module establishes the nor-
mal consumption range for various customers’ sets. The filters
applied are:

– Elimination of customers possessing a high number of esti-
mated measurements.

– Elimination of customers with three or more consecutive zero
consumption readings.

– Elimination of customers showing highly dispersed consump-
tion. For example, a customer with three or more consumption
peaks greater than twice the standard deviation of
consumption.

Normally, the electric consumption is highly dependent on a
few contract features. The system must select the relevant cus-
tomer consumption information, available in the corporate dat-
abases. The selected parameters are:

– Contracted power. First, the customers are divided into two main
groups, the low- and high-contracted power types, based on

Fig. 2. System architecture.

C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285 10277


Author's personal copy

several practical studies. These studies show that the majority
of customers have contracted electric power less than
300 kW. However, customers with higher contracted power
reveal different consumption behaviour.
Next, each main group is divided into 40 different subgroups
using 2 vingtiles, each containing an identical number of cus-
tomers. The vingtiles are numbered from 1 to 20 for the 0 to
300 kW range, and 21–40 for customers with contracted power
higher than 300 kW.
Usually, customers with similar contracted power display a
similar range of consumption. Domestic customers, the most
numerous of the clusters, contract low power, normally
between 3 and 13 kW. Small businesses, like pubs, restaurants
and shops, contract a power greater than 13 kW. Normally,
the interval ranging between 0 and 3 kW is assigned to stores,
warehouses or auxiliary support. Contracted power higher than
300 kW is associated with industrial or distribution activities.

– Geographic location and postal code. These data indicate the cus-
tomer location enabling the formation of geographic zones
including customers with the same climatic and administrative
conditions.

– Economic sector. This feature is also strongly related to customer
consumption. Similar to what Hand and Blunt (2001) proposed,
a clear relationship does exist between economic activity and
consumption.

– Time band discrimination. This parameter enables the formation
of customer sets based on the time band consumption. The aim
is to determine a pattern of consumption for each time band.

– Billing frequency. This determines measurement as a regular
recurrence. It allows tuning the statistical studies circumvent-
ing the need to make interpolations. However, these interpola-
tions are necessary if the cycles of the readings are not the same
for the customers. Normally, the time spans are either monthly
or bimonthly.

The parameters presented allow the establishment of customer
sets with similar behavior, by applying statistical indicator. Four
sets are established to fix the normal levels of the indicators, which

would show possible customers’ statistical abnormalities. These
sets are formed by an aggregation of the following parameters:

– The geographic location, the contracted power and the billing
frequency (Group A). These parameters establish the usual con-
sumption, in a particular geographic location. Through this
group, the system extracts group consumption patterns of the
customers based on these features in the ‘normal’ consumption
range. Fig. 3 shows the monthly consumption average graph
versus that of the contracted power, for a region in the north
of Spain clearly indicating the relationship between customer
consumption and contracted power.

– The geographic location, contracted power, billing frequency and
postal code (Group B). In the IES-conducted analysis this
set allows the detection of seasonal behaviours, for example,
the customers related the increase in their consumption in
summer to tourism. Fig. 4 shows the common contracted power
in a particular postal code and the range of consumption within
this zone. This postal code corresponded to a particular place in
northern Spain.

– The geographic location, contracted power, billing frequency, and
economic sector (Group C). This group establishes the depen-
dence of the customers’ consumption on its economic activity.
In the analysis performed by the proposed IES, this information
is used to establish the estimated consumption curve for a par-
ticular economic sector. The system assumes that the custom-
ers’ economic activity data from Endesa’s databases, is true,
but it also implements processes to check the veracity. Fig. 5
shows the common contracted power in a particular economic
sector and the range of consumption within this sector. This
sector corresponded to restaurants and pubs in northern Spain.

– The geographic location, contracted power, billing frequency and
time band discrimination (Group D). Studying this group is
important as it helps to establish the consumption range within
a time band, according to the features mentioned above. In the
analysis performed by the proposed IES, it is combined with the
results from the other sets. Figs. 6(a) and (b) show the differ-
ence between customer consumption within or without the dis-

Fig. 3. Graph showing the average monthly consumption (kWh) of a geographic location (in northern Spain).

10278 C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285


Author's personal copy

crimination band. The consumption curve shows that when
several discrimination bands are present, a high consumption
range will be seen.

The statistical indicators used in these studies are the average
and the standard deviation of customer consumption calculated
for each group mentioned. Also, a time division is made for each
group. Temporal parameters depending on the measurement peri-
od are used to make the temporal divisions. These divisions are
absolute (all the electricity consumption during the study period),

yearly, seasonally, and monthly consumption. A consumption
regression study is made to identify the consumption trends.
Trends can explain the reduction in consumption in the customers
analyzed, if the energy demands in a particular geographic location
and economic sector decrease.

7. Text mining module

The text mining module learning process is discussed in this sec-
tion. Following the learning process, the IES uses the information

Fig. 4. Graph showing the average monthly consumption in the postal code XXX03.

Fig. 5. Graph showing the average consumption of kWh in the economic sector of restaurants and pubs.

C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285 10279


Author's personal copy

generated in the analysis process. The main objective of this text
mining module is the extraction of the information from the inspec-
tor commentaries and from the company documents on customer
facilities. This is non-structured information, couched in natural
language. Usually, the power company inspectors check the cus-
tomer installations from time to time, and enter their results in
the corporate databases. The proposed IES uses this very interesting
information to improve on the analysis of the NTL research. A com-
mon term dictionary has been compiled to be used on a set of rules
that execute the following actions:

– Economic sector checking. It is necessary to determine whether
the customer informed activity sector concurs with the real
information. For instance, if is possible that the customer has
specified the economic sector, which has changed or not been

correctly informed possibly due to a contract change. Such a
scenario would make the analysis of the consumers difficult;
therefore, it becomes useful to detect this abnormality and
identify the customer’s economic sector.

– Characterization of the customers’ consumption. Inspectors’
reports or technical repair reports can justify certain circum-
stances that can occur in the customer’s installation: for exam-
ple, the estimated measures, the low or even zero consumption
or the work inspector’s orders, repetitively, cancelled. We have
detected cases ranging from very little to frequent cases with
inaccessible or dangerous access; that explains the infrequent
reading measurements. Inspectors report these situations.

– Other information checking. These include concepts extracted
from the commentaries compiled by the technical staff mem-
bers, which in turn facilitates collecting additional information

Fig. 6. Graph showing the average consumption of kWh on customers (a) without time band discrimination and (b) with three time bands discrimination.

10280 C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285


Author's personal copy

on the customer’s installation, such as the existence of a capac-
itor bank, which explains the low reactive energy consumption
or information about when the measurement installation is to
occur.

The text mining module is applied to every database field that
contains information in natural language. This process is per-
formed prior to the IES analysis, as it is necessary to update the
knowledge base with the information collected by the text mining
module. The following steps are necessary to correctly realize this
process:

1. Data Preprocessing. Inconsistent, erroneous or missing informa-
tion must be corrected. Besides, both data integration and data
transformation are performed.

2. Concepts’ Extraction. It attempts to elicit the text field concepts,
structured or otherwise. A concept can comprise one or more
words. The concept may be a syntagm or a word which repre-
sents an entity (action, event, etc.). Natural Language Processing
(NLP) methods are used, to extract linguistic (words, phrases,
etc.) and non-linguistic (dates, numbers, etc.) concepts. An
interesting review of this technique and its use in the informa-
tion system management is proposed by Métais (2002). The fol-
lowing set of functionalities are included:
a. Recognition of punctuation errors. These types of mistakes

include the incorrect use of the tilde, the period, the comma,
the point and comma, the dividing bar, etc. Frequently, the
text fields contain commentaries in very colloquial language,
with less attention being paid to the correct placement of
these punctuation signs.

b. Recognition of spell errors. A grouping fuzzy technology is
applied. When concepts of the text are extracted, words with
similar spelling (referring to the letters that compose it) or
that are closely related are classified together. By applying
this algorithm, mistakes of omission of letters, duplication
of letters or permutation of letters are corrected.

c. Dictionaries. A dictionary of technical words is compiled, as
well as a dictionary of synonyms and abbreviations to help
the system recognize the concepts of a more sophisticated
form. Also, dictionaries of undesirable concepts are estab-
lished, to determine the words or concepts that are rejected
in the recognition process. In Spain, several dialects exist,
which complicate extraction of the concepts.

3. Categorization. In this step, each concept is qualified and classi-
fied based on the inspectors’ knowledge. This process classifies
the key concepts or words concurrent with the functions previ-
ously specified:
a. Identification of a possible change of economic sector. This clas-

sification allows the detection of 90 different economic
sectors.

b. Characterization of the customer consumption. Customer con-
sumption characterization is mainly focused on detecting
the justification of the anomalous measurements or of the
consumption anomalies. Another utility is to eliminate
minor concepts, categorizing and classifying them within
groups that are designated as sets of little interest, which
prevent them from being overlapped or confused with
others.

c. Diverse information of the customer installation.

8. Rule based expert system module

The RBES controls and monitors the analysis process. RBES em-
ploys a set of rules to determine the customers’ problems. In the
proposed IES, this knowledge is represented by rules of type IF-
THEN-ELSE. The representation of the objects and results is done

with a dynamic table, in which each row represents a contract of
a different customer. The result of applying a rule is entered in a
column that identifies its origin. More than 500 rules (including
the rules that generated the text mining and the data mining mod-
ules), help in classifying customers under seven different groups,
based on their intentions:

– Generation rules. These rules deal with the generation of new
information from the existing customers. They are used to
update erroneous information and to preprocess the database
information. These rules include the rules generated by the text
mining process. For example, the cycle consumption calculation
(a cycle is the period of time between taking a measurement
and the follow up) is performed by applying the rule, as shown
in Table 1. This rule calculates the corresponding consumption
between a measure (measure) and the previous one (previ-
ous_measure) which are registered in the company databases.
The information generated is applied to the other groups of
rules.

– Classification rules for contract mistakes. These rules deal with
the detection of information mistakes in the customer’s con-
tract information. In Table 1, as shown. This rule, for example,
allows the classification of customers with very low contracted
power. These customers cannot be analyzed in the same man-
ner as the normal or high contracted power customers are stud-
ied. They are analyzed using the text mining rules.

– Classification rules for facility problem. The objective is to detect
information problems related to the measuring equipment. In
Table 1, an example of a rule is shown, that selects customers
using obsolete measuring equipment.

– Classification rules for consumption problem. The application of
this rule set is based on the results of the previous rules, along
with the information generated in data mining; the customer is
classified according to his electrical consumption. Also, it estab-
lishes the limits of low consumption, excessive consumption or
null consumption. The initial values for these parameters are
determined during the knowledge acquisition process. Low con-
sumption particularly, usually indicates the possibility of an
NTL. Detection of low consumption customers employs the
rules generated with data mining as well as the text mining pro-
cess. There are several rules to determine a low consumption
case, regarding the history of customer consumption or consid-
ering customer’s cluster. An example, as shown in Table 1. This
rule compares customer consumption (customer_cycle_con-
sumption) with the results of statistical studies in which the

Table 1
Some RBES rules.

Group of rules Rules

Generation rules CYCLE CONSUMPTION=IF
measure>previous_measure THEN measure-
previous_measure ELSE (10maxlmum-number-of-digits-
previous_measure-1+measure)

Consumption problem
classification rules

LOW_CONSUMPTiON=IF
range_con_month_group_a >
customer_cycie_consumption and
range_con_month_group_b >
customer_cycle_consumption and
range_con_month_group_c >
customer_cycte_consumption and
range_con_month_group_d >
customer_cycle_consumption THEN TRUE ELSE
FALSE

Contract power
classification rules

POWER_VERY_LOW=IF
customer_contracted_power < 1.5 THEN TRUE ELSE
FALSE

Facility problem
classification rules

LOW_NUMBER_WHEELS=IF number_wheels<=4
THEN TRUE ELSE FALSE

C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285 10281


Author's personal copy

consumption range is established (range_con_month_group_a,
range_con_month_group_b, range_con_month_group_c and ran-
ge_con_month_group_d). In this case, the customer’s consump-
tion is compared with the low consumption range of each
group described in Section 6.

– Rules of customer selection for inspection. These sets of rules pro-
pose a list of customers to be inspected, identified from the pre-
vious rule conclusions. This selection classifies customers under
several categories, which indicate the risk of NTL. It is possible
to solve the NTL without making an ‘in-situ’ inspection, where
the problem can be solved by updating customer information
in the company databases.

– Verifying rules. Due to the irregularities in the information avail-
able in the corporate databases, rules to check the veracity of
the systems results become an absolute necessity. These rules
determine if the customer has been correctly analyzed, and also
if the customer cannot be reported due to too much incoherent
information collected.

– Explanation rules. This system strengthens the reports using
these IF-THEN-ELSE rules, adding interesting information on
the customer or his installation. The reported information
includes:
o Contractual information like billing frequency, time band dis-

crimination, contracted power, postal code, economic activ-
ity, etc.

o Problems related with contract as wrongly applied rate,
incorrectly contracted power, incoherent information, etc.

o Problems related with measuring equipment: old register,
warn register, lack of obligatory power control switch, etc.

o Information and problems related with consumption: charac-
terization of customer’s consumption, unexpected low con-
sumption, etc.

o Other consumption operations, as the average consumption
on any consumption time band.

9. Experimental results

The tests were run on a computer with a double processor AMD
Opteron dual core (1, 7 Ghz), 3Gbytes of RAM and 100 Gb of hard
disk space.

This IES is part of a project named MIDAS, as shown in Fig. 1.
The IES have been designed to analyze the samples obtained by
the application of other detection studies, based on Artificial Intel-
ligence, statistical studies, and neural networks.

The IES was applied over a set of contracts ‘suspected’ of fraud
or abnormalities that had been obtained from studies described in
Biscarri et al.. (2008), Biscarri et al.. (2009). This ‘suspected cus-
tomer set’ consists of 134 contracts.

These contracts are selected as they reveal an anomalous con-
sumption pattern. The IES analyzes this set and configures it in
the most restrictive manner. All the selected contracts will be in-
spected, and the number of inspections is highly restricted because
they are very expensive. In this sense, the IES selected:

– Thirty two contracts with a possible NTL.
– Sixteen contracts with incoherent information or including

some problem which could be solved without inspection.
– Eighty six contracts without a clear NTL, as they had been

solved previously or those where the anomalous consumption
can be explained. These contracts were classified as false NTLs.

This set of 32 contracts was inspected ‘in situ’. The results of
these inspections were:

– Nine contracts have an NTL.
– Two contracts have measurements problems.
– Fourteen contracts have a non-NTL.
– Two contracts are not in force, and not included in Endesa dat-

abases as yet.
– Five contracts cannot be inspected, because the customers’

businesses had closed down, and it had not been entered in
the Endesa databases as yet.

The IES thus filtered 86 cases of false NTLs. A review of four fil-
tered cases is shown below. These cases have an NTL pattern,
although they have several data which explain the anomalous
consumption.

The first case has an anomalous consumption for several rea-
sons. Between 2005 and 2007 there was a period of decrease in
consumption. In 2007, the company started a proceeding, which

Fig. 7. Consumption graph of a customer with previous proceeding.

10282 C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285


Author's personal copy

solved the problem. Later, there were some cycles of low consump-
tion, which were explained in the inspectors’ commentaries. They
specified that the place had closed and the measurement was esti-
mated. Fig. 7 shows the consumption of the customer in a different
time discrimination band.

The second case is a customer with a contract for a fountain
and irrigation engine. The consumption of this type of activity is
very irregular, and usually involves a high rate of reactive energy.
The IES classified the customer as a non-NTL case. The inspectors’
commentaries specified that it was not possible to access and
measure the equipment because it was padlocked and damaged.
Besides, these types of customers show very irregular consump-
tion as they depend on the vagaries of the climatic conditions,

and the IES has no information on these. Fig. 8 shows the con-
sumption of this customer in a different time discrimination band.
The reactive energy is sufficiently high due to the latency period of
the engine.

The third case is a customer with a hotel. Usually, such clients
have five time discrimination bands making the analysis more dif-
ficult. Fig. 9 shows this customer’s consumption in a different time
discrimination band. The rules generated by the Data and Text
Mining modules are very important to correctly classify this case.
Using these rules, several reasons were detected: the customer’s
consumption is established by the tourist industry (normally, it
is seasonal); and the period of zero consumption was explained
by the inspectors’ commentaries, as a result of a problem with

Fig. 8. Consumption graph of a fountain and irrigation engine.

Fig. 9. Consumption graph of a seasonal case.

C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285 10283


Author's personal copy

the measuring equipment and was successfully solved. The IES
classified this as a non-NTL case.

The fourth case is a customer with zero consumption, as he had
several contracts in the same place. The inspectors’ commentaries
specified that the supply was used as a supplement and for occa-
sional use only. The IES classified this case as a non-NTL case.

10. Conclusions

The present paper includes the investigation of the expert sys-
tem in the power utility consumption subject, and its combination
with other technologies to further enhance the efficiency.

This paper particularly significantly contributes to an as-yet lit-
tle exploited subject – the automatic analysis of available customer
information of the utilities, on NTL classification. The main area of
complexity of this research field was the enormous quantity of
information required and the great variety of casuistry that is pres-
ent in the customer analysis. As evident in the bibliographic review
section, customer analysis is based on customer consumption.
When the inspector’s knowledge is included, new techniques need
to be added to treat it.

The main contributions done by the IES in this paper are:

– Identification and classification of the casuistry of utility distri-
bution on customer analysis.

– This system increases the availability of the quantity of useful
information on the customer.

– Increasing of the efficiency, regarding massive inspections. The
utilization of additional available information about the cus-
tomer (in addition to customer consumption) helps to greatly
increase the success ratio.

– Classification of the normal and NTL cases. The normal detection
methods are dedicated only to select the anomalous cases:
the present IES makes a classification of the normal and NTL
cases. The proposed IES is a real framework, actually in the test
phase, in Endesa.

– The proposed IES can be used as an additional method to detect
NTL increasing the efficiency of the studies.

In reality, we researched new techniques and technologies to im-
prove the efficiency of this IES. From this viewpoint, we are research-
ing the use of the following techniques.

– Text mining. In the proposed IES, the inspector’s knowledge in
the categorization process is used. The customer’s consumption
will be added to this process to automatically make the catego-
rization process. Thus, each rule created by text mining has sev-
eral associated concepts, one associated action and several
features related with consumption.

– Data mining. An additional improvement in curve consumption
calculation will be researched. This new feature will include the
time series and Bayesian networks with the use of events or
intervention fields like special independent fields that will be
used to model the effects of external occurrences.

– Data warehousing. The data mining and text mining techniques
will be completed by employing data warehousing technology.
This technology will improve the efficiency of the mining process.

– Real-time analysis. A new module to reduce the analysis time
will be added. This module will determine when a customer
will have new information available to analyze it.

References

Aliguliyev, R. M. (2009). A new sentence similarity measure and sentence based
extractive technique for automatic text summarization. Expert Systems with
Applications, 36, 7764–7772.

Biscarri, F., Monedero, I., León, C., Guerrero, J., Biscarri, J., & Millán, R. (2008). A data
mining method based on the variability of the customers consumption. In 10th
international conference on enterprise information systems, ICEIS2008, June 12–16,
Barcelona, Spain.

Biscarri, F., Monedero, I., León, C., Guerrero, J. I., Biscarri, J., & Millán, R. (2009). A
mining framework to detect non-technical losses in power utilities. In 11th
international conference on enterprise information systems, ICEIS2009, May 6–10,
Milano, Italy.

Bowen, J. E. (1994). An expert system for police investigators of economic crimes.
Expert Systems with Applications, 7(2), 235–248.

Cabral, J. E., Pinto, P., Onofre, J., Gontijo, E. M., & Filho, J. R. (2004). Fraud detection in
electrical energy consumers using rough sets. In 2004 IEEE international
conference on systems, man and cybernetics (Vol. 4, pp. 3625–3629).

Cabral, J. E., Pinto, J. O., Linares, K. S. C., & Pinto, A. M. A. (2006). Methodology for
fraud detection using rough sets. In 2006 IEEE international conference on
granular computing. IEEE Press.

Chen, W.-S., & Du, Y.-K. (2009). Using neural networks and data mining techniques
for the financial distress prediction model. Expert Systems with Applications, 36,
4075–4085.

Chen, S., Qu, G., & van Zuylen, H. (2010). A comparison of outlier detection
algorithms for ITS data. Expert Systems with Applications., 36(8), 10976–10986.

Cierniakoski, J. J., De, R., & May, J. H. (1991). MEDIN: An expert system for processing
medical insurance claims. Expert Systems with Applications, 2, 211–218.

Daskalaki, S., Kopanas, I., Goudara, M., & Avouris, N. (2003). Data mining for decision
support on customer insolvency in telecommunication business. European
Journal of Operational Research, 145, 239–255.

Depren, O., Topallar, M., Anarim, E., & Ciliz, M. K. (2005). An intelligent intrusion
detection system (IDS) for anomaly and misuse detection in computer network.
Expert Systems with Applications, 29, 713–722.

Estévez, P. A., Held, C. M., & Pérez, C. A. (2006). Subscription fraud prevention in
telecommunications using fuzzy rules and neural networks. Expert Systems with
Applications, 31, 337–344.

Galván, J. R., Elices, A., Muñoz, A., Czernichow, T., & Sanz-Bobi, M. A. (1998). System
for detection of abnormalities and fraud in customer consumption. In 12th
conference on the electric power supply industry. November 2–6, Pattaya, Thailand.

Gao, S., & Xu, D. (2009). Conceptual modelling and development of an intelligent
agent-assisted decision support system for anti-money laundering. Expert
Systems with Applications, 36, 1493–1504.

Hand, D. J., & Blunt, G. (2001). Prospecting for gems in credit card data. IMA Journal
of management Mathematics, 12(2), 173–200.

He, H., Wang, J., Graco, W., & Haukins, S. (1997). Application of neural networks
to detection of medical fraud. Expert Systems with Applications, 13(4),
329–336.

Hernández-Pereira, E., Suárez-Romero, J. A., Fontenla-Romero, O., & Alonso-
Betanzos, A. (2009). Conversion methods for symbolic features: A comparison
applied to an intrusión detection problem. Expert Systems with Applications, 36,
10612–10617.

Hilas, C. S. (2009). Designing an expert system for fraud detection in private
telecommunications networks. Expert Systems with applications, 36,
11559–11569.

Huang, C.-L., Chen, M.-C., & Wang, C.-J. (2007). Credit scoring with a data mining
approach based on support vector machines. Expert Systems with Applications,
33, 847–856.

Kim, H. R., Im, H. K., & Park, S. C. (2010). DSS for computer security incident response
applying CBR and collaborative response. Expert Systems with Applications.,
37(1), 852–870.

Kirkos, E., Spathis, C., & Manolopoulos, Y. (2007). Data mining techniques for the
detection of fraudulent financial statements. Expert Systems with Applications,
32, 995–1003.

Liao, S.-H. (2005). Expert system methodologies and applications – A decade review
from 1995 to 2004. Expert Systems with Applications, 28, 93–103.

Liao, S.-H., Hsieh, C.-L., & Huang, S.-P. (2008). Mining product maps for new product
development. Expert Systems with Applications, 34, 50–62.

Liao, S.-H., & Wen, C.-H. (2007). Artificial neural networks classification and
clustering of methodologies and applications – Literature analysis from 1995 to
2005. Expert Systems with Applications, 32, 1–11.

Métais, E. (2002). Enhancing information systems management with natural
language processing techniques. Data & Knowledge Engineering, 41,
247–272.

Quah, J. T. S., & Sriganesh, M. (2008). Real-time credit card fraud detection using
computational intelligence. Expert Systems with Applications, 35, 1721–1732.

Rahman, S., & Lauby, M. (1993). Identification of potential areas for the use of
expert systems in power system planning. Expert Systems and Applications, 6,
203–212.

Richardson, R. (1997). Neural networks compared to statistical techniques.
computational intelligence for financial engineering (CIFEr). In Proceedings of
the IEEE/IAFE 23–25 March 1997, pp. 89–95.

Sánchez, D., Vila, M. A., Cerda, L., & Serrano, J. M. (2009). Association rules applied to
credit card fraud detection. Expert Systems with Applications, 36, 3630–3640.

Schutzer, D. (1990). Business expert systems: the competitive edge. Expert Systems
with Applications, 1, 17–21.

Sung, N. H., & Chang, Y. S. (2004). Business information extraction from semi-
structured webpages. Expert Systems with Applications, 26, 575–582.

Wang, D., Wang, Q.-Y., Zhan, S.-Y., Li, F.-X., & Wang, D.-Z. (2004). A feature
extraction method for fraud detection in mobile communication networks. In

10284 C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285


Author's personal copy

Fifth world congress on intelligent control and automation, WCICA 2004 (Vol. 2, pp.
1853–1856).

Wang, S.-J., Mathew, A., Chen, Y., Xi, L.-F., Ma, L., & Lee, J. (2009). Empirical analysis
of support vector machine ensemble classifiers. Expert Systems with Applications,
36, 6466–6476.

Wheeler, R., & Aitken, S. (2000). Multiple algorithms for fraud detection. Knowledge-
Based Systems, 13(2–3), 93–99.

Yang, W.-S., & Hwang, S.-Y. (2006). A process-mining framework for the detection of
healthcare fraud and abuse. Expert Systems with Applications, 31, 56–58.

Yang, H.-C., & Lee, C.-H. (2004). A text mining approach on automatic generation of
web directories and hierarchies. Expert Systems with Applications, 27, 645–663.

Yu, L., Yue, W., Wang, S., & Lai, K. K. (2010). Support vector machine based multiagent
ensemble learning for credit risk evaluation. Expert System with Applications.,
37(2), 1351–1360.

C. León et al. / Expert Systems with Applications 38 (2011) 10274–10285 10285