PII: S0957-4174(03)00073-3


Goal-oriented sequential pattern for network

banking churn analysis

Ding-An Chiang
a
, Yi-Fan Wang

b,*, Shao-Lun Lee
a
, Cheng-Jung Lin

a

a
Department of Information Engineering, Tamkang University, Tanshui, Taipei, Taiwan, ROC

b
Department of Information Management, Chang Gung Institute of Technology, Kwei-Shan, Tao-Yuan, Taiwan, ROC

Abstract

Discovering sequential patterns is one of the most important task in data mining. In this paper we propose an efficient algorithm, called

Goal-oriented sequential pattern. It can provide enterprises warning signs soon before they are losing valuable customers and give them

reference for decision making. Experiments comparing Apriori showed that Goal-oriented is more efficient, and performs reasonably well for

the rules.

q 2003 Elsevier Ltd. All rights reserved.

Keywords: Data mining; Association rule; Sequential pattern; Goal-oriented; Retention analysis

1. Introduction

The Pareto Principle or the 80:20 Rule is commonly

employed in customer relationship management in which

80% of the company income comes from 20% of the

major customers while 90% of the company income is

generated from the basic customers. According to Don

Peppers and Martha Rogers, the marketing experts, most

enterprises average lost 25% customers annually. How-

ever, the cost of obtaining a new customer is five times

higher than maintaining an existing customer (Gronroos,

1984; Reich & Benbasat). Nevertheless, the major

concerns of the enterprises today are to maintain

customers’ loyalty and to find out the possible reasons

for the failures of keeping customer loyalty (Reichheld,

1996). Further, to establish a system that can get alert

before losing customers in order to take further actions to

keep the customer loyalty is also the main consideration.

It is very important for every company to fully understand

the changes of numbers about their customers and how

the changes affect the company. The prime concern for

this research is to help the company understand the

customer behaviors and list out the possible losing

customers in order to retain customers.

In this research, we specify how to judge whether a

customer is leaving and the retaining strategies. The

Sequential Pattern cannot conduct a research function by a

specific item; meanwhile, the regulations it figures are

relatively irrelevant. This does not only increase the

difficulties for interpreting the regulations, but also lack of

efficiency. Therefore, to alter the flaw of being lack of

efficiency, and the incapacity of searching a specific item,

Goal-oriented pattern is adapted in this research to

implement the searches for some particular items. We

use the association analysis, comprising sequential

patterns (Agrawal & Srikant, 1995; Han et al., 2000;

Han, Pei, & Yin, 2000; Masseglia, Cathala, & Poncelet,

2001; Pei et al., 2001; Tseng & Hsu, 2001) and

association rules (Agrawal, Imielinski, & Swami, 1993;

Agrawal & Srikant, 1994; Park, Chen, & Yu, 1995;

Srikant & Agrawal, 1995). The cause of leaving

customers can be generated by sequential patterns.

Consequently, the strategy planners can take proper

actions to retain customers. We list the possible reasons

of customer leaving by means of reverse sequence in

which the original sequence is being reversed so that the

main concerns of the item are arranged in front of the

array. The contrasts between the goal items are then used

to exclude the irrelevant items. Finally, the reversed

sequence is re-reverse again to the original sequence in

order to enhance the efficiency for searching the goal item

rules.

0957-4174/03/$ - see front matter q 2003 Elsevier Ltd. All rights reserved.

doi:10.1016/S0957-4174(03)00073-3

Expert Systems with Applications 25 (2003) 293–302

www.elsevier.com/locate/eswa

* Corresponding author.

E-mail addresses: yfwang@mail.cgit.edu.tw (Y.-F. Wang), chiang@cs.

tku.edu.tw (D.-A. Chiang).

http://www.elsevier.com/locate/eswa


2. Relative research

2.1. Association rules

Association rules are mainly used to find out the relations

between items or features that happened synchronously in

the database, such as generating the data about the groceries

that are bought at the same time when people shop in the

mall. For instance, 80% of people who merchandise milk

buy bread as well. As soon as the decision maker fetches this

information, strategies, such as setting the relevant counters

nearby or held a promotion, can be generated from this

aspect. Therefore, the main purpose of implementing

association rules is to find out the synchronous relationship

by analyzing the random data for the reference of making

decisions.

The definition of association rule is as following: Make

I ¼ {i1; i2; …; im} as the itemset, in which each item

represents a specific commodity. D stands for a trading

database in which each transaction T represents a itemset.

That is T # I: Each itemset is a non-empty sub-itemset of I;
and the only identify code is TID: Each itemset X , I has a
measure standard—Support, to evaluate the statistical

importance in D: SupportðX; DÞ denotes the rate of

merchandising X in transaction D:

The format of the association rule is X ! Y; in which

X; Y , I; and X > Y ¼ B: The interpretation of this
association rule is that if X is purchased, Y can be bought

at the same time. Each rule has a measuring standard called

Confidence; i.e. ConfidenceðX ! YÞ ¼ SupportðX < Y; DÞ=
SupportðX; DÞ: In this case, ConfidenceðX ! YÞ denotes if

the merchandise including X; the chance of buying Y is

relatively high.

There are two steps to find out the association rules.

First, detecting the large itemset. Second, generating the

association rules by utilizing the large itemset. Therefore,

to explore the association rules also means to find out all

the association rules of X ! Y formats and meet the

following conditions:

1. SupportðX < Y; DÞ $ Minsup
2. ConfidenceðX ! YÞ $ Minconf

The Minsup and Minconf are both set by the users. In

general, the numbers of the transactions that comprising X is

called the support of X; denoted by sx: Make Minsup the

minimum value of support. If the support of X meets the

condition, sx $ Minsup; X is the large itemset.

As for the exploration of the association rules, many

researchers take the Apriori algorithm (Agrawal & Srikant,

1994) supported by Agrawal et al. as the basic formulation.

The Apriori algorithm forms the rules by way of simple and

sequential progresses to lay out the relationship in the

database. It is widely adopted to find out the large itemset, as

demonstrated in Fig. 1. By means of this algorithm, each

support can be calculated followed by scanning

the transaction database. Further, judge if the support

exceeds the minimum support so that the large itemset can

be identified. In the following rounds, we use the itemset

selected previously to be the seed itemset, which can be

utilized to generate a new potential large itemset, called

Candidate itemsets. Calculate the support of each candidate

itemset to decide whether the itemsets can be the genuine

large itemset and be the seed itemset for the next round. This

algorithm can be ceased when no further candidate itemset

can be generated.

For example, in Fig. 2, database D is the original

transaction database. Assume the minimum support is 2.

First, calculate the number of each item that appears in the

transaction database, which is to calculate the support and to

evaluate whether the number is bigger than or equal to the

minimal support and determine the Large 1-itemsets, L1:

Next, generate candidate 2-itemsets, C2 from L1 £ L1:

Further, calculate the support of C2 to create L2: From L2 £

L2 brings about Candidate 3-itemsets, C3: In the phase of

Fig. 1. Apriori algorithm.

Fig. 2. Original transaction database.

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302294


join, we have {40; 70; 80}: However, in the phase of

pruning, the sub-itemsets {40; 70}; {40; 80} of {40; 70; 80}

are both comprised in L2; but {70; 80} is not enclosed in L2:

Thus, it does not meet Candidate 3-itemset C3: The

algorithm, therefore, ceased. In consequence, the large

itemsets generated are L1 ¼ {10}; {20}; {30}; {40};

{70}; {80}; L2 ¼ {40; 70}; {40; 80} as demonstrated in

Fig. 3.

In step 2, the association rules are created by use of the

large itemsets. As the large itemsets figured in the previous

step are L1 ¼ {10}; {20}; {30}; {40}; {70}; {80}; L2 ¼

{40; 70}; {40; 80}: Therefore, the association rules we

intend to find out are {40} ! {70}; {70} ! {40}; {40} !

{80} and {80} ! {40}:

2.2. Sequential patterns

The sequential patterns originate the association rules

that occur frequently. It emphasizes the factor of time. What

being concerned is the data that related on the base of time.

We use this pattern to analyze the association of each

individual event chronologically. The major method of

mining sequential patterns are Apriori algorithm (Agrawal

& Srikant, 1994), DHP algorithm (Park et al., 1995), FP-tree

algorithm (Han et al., 2000), and FPL algorithm (Tseng &

Hsu, 2001). The sequential pattern can be divided into

Sequential Procurement (Agrawal & Srikant, 1995) and

Cyclic Procurement (Ozden, Ramaswamy, & Silberschatz,

1998) by the sequence and the section of time. What

sequential procurement concerns is the chronological

pattern. It only analyses the association of data by the

sequence of time. On the other hand, what cyclic

procurement concerns are the events that happen in the

same time section and whether they reoccur in other time

section as well. It emphasizes the variation in different time

section.

The main concern in this research is on sequential

procurement. In general, there are two phases to create the

sequential pattern: Setting large itemset and establishing

sequential pattern rules by large itemset. As the association

rule, the meaning of sequential patterns is similar to the

large itemset, which is to find out the association between

items. The only difference is that in sequential patterns, the

chronological pattern is one of the major concerns. In phase

I, Apriori algorithm introduced in Fig. 1 is applied to find

the large itemset. Whereas in phase II, there are five phases

to generate sequential pattern rule from the large itemset.

Sort Phase. Customer series is the major figure while

transaction time is the sub-major figure to arrange the raw

data from the original transaction database. The raw

transaction database is demonstrated in Fig. 2. The new

database after sorting incrementally by time is showed in

Fig. 4.

Large Itemset Phase. Find out all the large itemsets and

denote each large itemset with a specific symbol.

Assume the minimum support is 2, from Section 2.2, we

get the large itemset as L1 ¼ {10}; {20}; {30}; {40};

{70}; {80}; L2 ¼ {40; 70}; {40; 80}: Then code them with

an integer as showed in Fig. 5.

Transformation Phase. Delete the non-large itemsets

from the raw transaction database.

The items not in the large itemset from the raw

transaction database are deleted and rearranged data by

customers as shown in Fig. 6.

Sequential Phase. Delete the sequences that each

contains two items from the data.

The sequences can be generated by algorithm similar to

Apriori. The difference from finding the large itemset is in

Fig. 4. The customer sequence is ordered by increasing transaction time.

Fig. 3. Generation of candidate itemsets and large itemsets.

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302 295


this phases, the factor of sequence is being concerned.

Hence, the combination of candidate sequence of two items

is square of these of single item when we want to find the

combination sequence from one item to two items.

Similarly, we also can use Apriori-gen Algorithm to find

out the candidate sequence if the items are above three. The

LS1 conducted by all large itemsets is shown in Fig. 7. In

order to produce CS2, we proceed L1 £ L1; which needs 64

combinations (8 £ 8). We can get LS2 by calculating

supported level of CS2. Similarly, we can get CS3 via

LS2 £ LS2. Although we generate k3,4,6l, k3,4,8l and
k3,6,8l in the merging phase, k4,6l, k4,8l, k6,8l (the subset
of k3,4,6l, k3,4,8l and k3,6,8l, respectively) are not included
within L2 in pruning phase. This cannot satisfy the 3-items’s

candidate sequence (CS3) and then algorithm will stop.

Maximal Phase. Find out the maximal sequences within

the large sequences.

To link up sequence phase to reduce the time of counting

non-maximal sequences. The sequence generated in the

previous step is not contained in others is the largest

sequence (Fig. 7). Sequence k3,4l,k3,6l and k3,8l are the
largest sequence. Transforming the largest sequence to the

original items, we can get our mining sequence pattern

(k(30)(40)l, k(30)(80)l, k(30)(40,80)l).

3. Problem statement

At the present business field, creating a customer

retention analysis model is the first step to start customer

relation managements. Many businesses, therefore, hope to

figure out the real cause of the loss of a customer, or even to

be told that they are about to lose the customer by some

clues before it occurs, and then they can propose or make

some new sales strategies against the loss in advance.

Although many businesses eager to create a customer-losing

model to solve the problem of the loss of customers, they

cannot find an effective way to solve the problem that has

been confused them for a long time. This research,

therefore, proposes a new method to solve such a problem,

and according to the method, we can find out behavior

patterns of losing customers or clues before they stop using

some products through mining Sequential Patterns. In this

section, we will take network banking services as an

example to elaborate this new method, and moreover, to

identify the real causes of the loss of customers in different

trades.

3.1. Definition of loss

At first, we will make a simple definition for the analysis

of the loss of customers in network banking services. We

focus on each user who applied for network banking

services to find out the periodicity of transaction time of the

user. Generally, there are two ways for us to judge whether

or not a customer is lost: One is judging by the average

periodic transaction days of the customer in the past. If the

interval between the present day and the last transaction day

is longer than the average interval of transaction days in the

past, we can tell that the customer has been lost. The other is

judging by the longest interval from one transaction day to

another in the past. We can also tell the loss of a customer, if

the total of no transaction days since the last transaction day

is longer than the longest interval between two transaction

days.

3.2. Methodology

In this section, we offer a method to solve the constant

loss of customers in network banking services, that is, to

find out main reasons why customers stop using the network

banking services through Sequential Patterns Algorithm and

the definition of loss mentioned in Section 3.1. To identify

the real cause of the loss of the customer, common

sequential patterns cannot entirely be applied to it. Because

if we put all of the transaction data into the program of

sequential patterns, it will cost us lots of time to calculate to

find out a rule, and even if we have found one, the chances

are that it is useless rule.

Fig. 5. Large itemsets.

Fig. 6. Transformed database.

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302296


3.2.1. Time windows and normalization

For solving the problem described above, we use the

concept of Time windows and Normalization to select the

data we want. The size of the time window will affect

many factors, in other words, the larger the window is, the

more the transaction data we have to deal with, the more

time we have to spend on calculation, and the more

related rules we can find out. The supportability rating,

therefore, may be raised. Conversely, the smaller the

window is, the less or even none related rules would be

found.

Firstly, we have to decide the size of the time window.

For example, one year, six months, one-quarter, one month,

or one week, etc. Next, in accordance of the characteristics

of the problem, we use a way to set time windows that is

different from other sequential patterns.

For instance, supposing that we set the time window size

as one month, it means that we only select the transaction

record of each customer in the last month from the database.

It helps us a lot to differentiate the real cause of the loss of

the customer, for the transaction activity of each customer

before loss is what we are concerned, and these activities

will reflect on the transaction data in the last month. Thus,

by the concept of Time Windows and Normalization, we

can get a better effect in finding a sequential pattern rule.

After the time window was selected (e.g. one month), it is

very common to select the transaction data for one month

from the entire data. However, in order to find out the real

factor of the loss of the customer, we use the idea of

Normalization. We regard the last transaction date of each

customer as a datum point, and so different customers have

different datum points. We select the data that is traced back

for one month from the datum point of each customer.

As shown in Fig. 8, we take the vertical axis as the

customer, the horizontal axis as the transaction date, dark

color as the selected data, and light color as the unselected

data. The numbers in dark color represent the number of

transaction times. When the time window is set as one

month, the last transaction date of customer ID0001 is on

15/11/2001, then the data we selected are four transactions

from 16/10/2001 to 15/11/2001; the last transaction date of

customer ID0002 is on 16/08/2001, then the data we

selected are the five transaction data from 17/07/2001 to 16/

08/2001; the last transaction date of customer ID003 is on

20/04/2001, then the data we selected are the three

transactions between 21/03/2001 and 20/04/2001, and so

are the rests on analogy of this.

3.2.2. Data preprocessing and virtual labels

As among the four transactions of customer ID0001,

ways of practicing general sequential patterns is regard

Fig. 8. Use time window to extract data.

Fig. 7. Generation of candidate sequences and large sequences.

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302 297


these four descriptions of the transactions as a single record,

and then we induce rules by using sequential pattern

algorithm. However, due to the characteristics of the

problem, we add the concept of virtual labels before we

delete duplicated descriptions of transactions. For general

sequential patterns cannot find out number of times of every

transaction activity when dealing with these four transaction

data, it is surely that some information will be omitted by

means of such calculation. Thus, in order to retain key

information, we put some key descriptions of transaction

different virtual labels. Take the customer as an example, if

there are two repeated transactions described as ‘login in

failure’, at the same period, we will combine these two

consecutive data and label it a new description ‘login in

failure 2’, which means two repeated attempts are made to

sign on with incorrect PIN. Corresponding, if there are three

repeated transaction descriptions of the customer, which are

described as login in failure, at the same period, we will

combine these three consecutive data and label it a new

description ‘login in failure 3’, which means PIN is entered

incorrectly three consecutive times. These descriptions like

Login in failure 2 and Login in failure 3 are virtual labels

that separately represent some important information such

as the failures may be caused by the user who forgot the PIN

or hackers who attempt to enter into the system.

3.3. Goal-oriented sequential patterns

Traditional sequential patterns cannot excavate specific

goal items; however, for most of the decision makers, they

have a need for finding out the happening order of some

specific items. Therefore, we propose a method to deal with

the problem that happens when we search the specific goal

items. Generally, if the goal items appear at the top of the

sequence, we only have to delete the large itemset that is not

related to the goal items and then make the rules. Relatively,

if the goal items appear at the bottom of the sequence, then

we have to use the concept of ‘the reversed sequence’ to

reverse the original sequence and make the items we

concern at top of the whole sequence; after that, comparing

the goal times and sifting the rules irrelevant to the goal

items to increase the rule-finding efficiency. In addition, if

the goal items appear at the middle of the sequence, we

firstly delete the sequences after the goal items, and then

follow the steps as mentioned in above.

As mentioned in Section 2.2, a general way of finding

sequential patterns can be resolved into two steps. The first

step is to find out large itemset. The second step is use the

large itemset to conclude sequential patterns rules. At the

first step, we use Apriori calculation to find out the large

itemset, while at the second step as to use the large itemset

to induce rule, we propose a new algorithm to coincide with

the characteristics of the problem that is to find out the rules

of loss. The algorithm can delete some rules that are

irrelevant to the goal item in advance, and the efficiency of

finding rules is increased by a big margin herein.

Although the five steps of the traditional sequential

patterns mentioned above can also find out the rule of loss, it

is a very inefficient way because we will find many rules,

which are mostly useless ones that irrelevant to the loss.

Because the purpose of this research is to find out the rule of

loss, in other words, to find out the transaction activity of the

customer before losing, to deal with the problem like this,

we propose a new sequential patterns called Goal-oriented

algorithm to solve such a problem towards a specific goal.

The problem of mining Goal-oriented sequential patterns is

split into six phases.

Descend Sort Phase. The sort phase converts the original

transaction database into a database of customer sequences.

The customer sequence is a list of transactions ordered by

decreasing transaction time. Delete the sequences when it

was not beginning with goal item.

We can get a new database (Fig. 9) from decreasing

sorting raw one (Fig. 2) by time, if we set the target as 80

and omit the sequence that does not start from the target

item.

Large Itemset Phase. Find out all the large itemsets and

denote each large itemset with a specific symbol.

Assume the minimum support is 2, we get the large

itemset as L1 ¼ {30}; {40}; {80}; L2 ¼ {40; 80}: Then code

them with an integer as shown in Fig. 10.

Fig. 9. The customer sequence is ordered by decreasing transaction time.

Fig. 10. Large itemsets.

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302298


Transformation Phase. Delete the non-large itemsets

from the raw transaction database.

The items not in the large itemset from the raw

transaction database are deleted and rearranged data by

customers as shown in Fig. 11.

Sequential Phase. Delete the sequences that each

contains two items from the data.

Whereas in phase II, the numbers of large itemsets,

which contains target item (80) are 2. One is large itemset

(80), the mapping value is 3, the others is (40,80), which

mapping value is 4. The LS1 conducted by all large

itemsets. In order to produce CS2, we proceed L1 £ L1:

Because of the numbers of large itemsets, which contain

target item (80) are 2, so it only needs eight combinations

ð4 £ 2 ¼ 8Þ: The result is shown in Fig. 12. Comparison

with traditional method (Section 2.2), it needs compute 64

times, there is a major difference between them. And then

we can get LS2 by calculating supported of CS2. Similarly,

we can get CS3 via LS2 £ LS2, because we cannot generate

CS3 in the merging phase, the algorithm will stop.

Maximal Phase. Find out the maximal sequences within

the large sequences.

To link up sequence phase to reduce the time of counting

non-maximal sequences. The sequence of k3,1l and k4,1l is
the large sequences (Fig. 12). Transforming the largest

sequence to the original items, we can get our mining

sequence pattern k(80)(30)l and k(40,80)(30)l.
Reverse Phase. Reverse the maximal sequences.

Reverse the maximal sequences k(80)(30)l and
k(40,80)(30)l, and we get the sequences k(30)(80)l and

k(30)(40,80)l. Therefore, the goal oriented sequential
pattern rules we intend to find out are k(30)(80)l and
k(30)(40,80)l.

4. The experiment results

To examine the executing efficiency of the Goal-

oriented sequential pattern algorithm, we separately

design some experiments to examine our method, and

compare it with Apriori algorithm. At last, we will explain

the result of our experiment. In Section 4.1, we will

introduce experimental environments, including the exper-

iment desktop, the practicing program language and

information resources. In Section 4.2, we will introduce

the experiment results and compare the executing

efficiency of the Goal-oriented sequential patterns algor-

ithm. In Section 4.3, we will introduce the customer loss

analysis result.

4.1. Experimental environments

1. Experimental platform
* CPU: Pentium III 600 MHz
* Memory: 392 MB RAM’13.9 GB HD
* Operating System: Window 2000
* Database: SQL Server 2000

2. Implementation Language
* Visual Basic 6.0

3. Experimental Data Source
* We analyze the customers’ data from a famous

network banking in Taiwan, total more than one

million data. These data are banks’ customers more

than 6 months transaction data.

4.2. The executing efficiency analysis

Fig. 13 shows the executing frame of Apriori algorithm,

and we donot have to restrict its items of producing rules,

while the executing frame of Goal-oriented algorithm as

shown in Fig. 14, we have to restrict its items of producing

rules. The program offers two choices: One is to begin at one

target item; the other is to end at one target item. The

customer loss model offered in the thesis utilizes the latter

choice. We focus on the target item ‘Termination’ as the

ending rule, so that we can know what activities happened

before the termination.

In order to compare the executing efficiency between

these two algorithms, we utilize four different supports

0:1; 0:2; 0:3; 0:4; and examine them under the condition of

Fig. 11. Transformed database.

Fig. 12. Generation of candidate sequences and large sequences.

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302 299


allowing 5 as the max ruling length. The result was shown in

Fig. 15. Compare with these two algorithms, in the same

minimum support 0.1 and different data set, the result was

shown in Fig. 16. It indicates that the executing efficiency of

the Goal-oriented algorithm is evidently better than that of

the Apriori algorithm, and it is more apparently as there are

more activities and numbers of item and the minimum

support comes to less. The numbers of executing efficiency

between these two algorithms can up to multi-ten times

difference.

Fig. 14. The executing frame of goal-oriented algorithm.

Fig. 13. The executing frame of apriori algorithm.

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302300


As in Fig. 17, which shows the numbers of rules these

two algorithms generated, the Apriori algorithm produces

more rules though, it makes users not so easy to read and

check. Compare with these two algorithms, in the same

minimum support 0.1 and different data set, the result was

shown in Fig. 18. As in Fig. 13, if we do not restrict an

item, it contains all the rules, including the rules we

concern and those we do not concern. It is difficult for us

to read. Contrarily, the numbers of rules that are produced

by the Goal-oriented sequential patterns focus on the rules

relate to the target item, and therefore have a better

readability. In Fig. 14, we restrict each rule to end up with

the description ‘transaction inquiry’, and therefore, the last

description of each rule is ‘transaction inquiry’. It gets us

easy to read.

4.3. The customer loss analysis result

After we use Goal-oriented sequential patterns algor-

ithm to process the customer loss analysis, the rule of loss

will be produced as shown in Fig. 19; take the model

login in failure 2 as an example, 77.894% of the

customers who accord with the situation do not enter

into the system over 30 days, that is to say that it is very

possible for them to stop using the service. Login in

failure 2 indicates the meaning of entering incorrect PIN

two times, and due to the agreements of the banking

services, the bank will disable the service if three

consecutive attempts are made to sign on with incorrect

PIN, such a regulation causes the customers choose to

terminate the services. According to the loss analysis data,

we can find out the customers of this type, if the bank

agent who is responsible for the customer service can take

the initiative in contacting the customer and solving the

problem in time, then the efficiency of constant use of the

services is increased by a big margin, it also applies to

other customers in other models. While the bank is

blindly promoting its sales to get more new users, it does

not realize that the existing customers are losing

gradually. For the bank, such a gradual loss is larger

than the benefits of getting new customers. Fig. 19 shows

the main reasons for the customer loss after the analysis.

We list the first 20 rules in proportion to the termination

rate, and we give the customer center the name lists of the

customers who are classified by the causes. The center

then can trace the losing customers, and because the

customer activity is subject to change, we can use the

system to add new information and renew the rules of loss

as we find out the rate of the causes of these rules is

decreased by a big margin.

Fig. 15. Execution time (same data set and different minimum support).

Fig. 16. Execution time (same minimum support and different data set).

Fig. 18. Number of rules (same minimum support and different data set).

Fig. 17. Number of rules (same data set and different minimum support).

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302 301


5. Conclusion

In this paper, in order to set up the customer-losing

pattern, we present a novel method to find out the activities

of the losing customers by the excavating sequential pattern

algorithm. Moreover, we propose another new algorithm

called the Global-oriented algorithm in search of specific

goal items. The most distinct difference between this

algorithm and the traditional one is that we use the concept

of reversed sequence. We firstly reverse the original

sequence, and then sort the items we concerns and put

them to the top of the whole sequence, next, comparing goal

items to sift the rules which are irrelevant to the goal items.

The algorithm makes the rule-finding efficiency increased

by a big margin. Besides, the rules concluded by the

algorithm are simple, which result from the goal items we

concern; and the rules, therefore, are easier to read. Finally,

we attempt to find out the main reasons why the customers

terminate the services through the Goal-oriented algorithm.

The results show that up to 80% of the terminated users, the

cause of the termination is just because they repeated

entered incorrect PIN for two times. The customers stop

using the network banking services or become a static

account in consequence. After obtaining such information,

with some timely solutions, businesses can effectively

prevent the loss from happening.

References

Agrawal, R., Imielinski, T., & Swami, A. (1993). Mining association

rules between sets of items in large database. Proceedings of the

ACM SIGMOD international conference on management of data,

207 – 216.

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association

rules. Proceedings of the 20th international conference on very large

data bases, 478 – 499.

Agrawal, R., & Srikant, R. (1995). Mining sequential patterns. Proceedings

of the 11th international conference on data engineering, 3 – 14.

Gronroos, C. (1984). A service quality model and its marketing implication.

European Journal of Marketing, 18(4), 36 – 44.

Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M.-C.

(2000a). Freespan: frequent pattern-projected sequential pattern

mining. Proceedings of the international conference of data mining

and knowledge discovery, 355 – 359.

Han, J., Pei, J., & Yin, Y. (2000b). Mining frequent patterns without

candidate generation. Proceedings of the ACM SIGMOD conference on

management of data, 241 – 250.

Masseglia, F., Cathala, F., & Poncelet, P. (1998). The PSP approach for

mining sequential patterns. Proceedings of the second european

symposium on principles of data mining and knowledge discovery,

1510, 176 – 184.

Ozden, B., Ramaswamy, S., & Silberschatz, A. (1998). Cyclic association

rules. In Proceedings of the 14th international conference on data

engineering, 412 – 421.

Park, J. S., Chen, M. S., & Yu, P. S. (1995). An effective hash based

algorithm for mining association rules. Proceedings of the ACM

SIGMOD conference on management of data, 175 – 186.

Pei, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U., & Hsu, M.-C.

(2001). Prefixspan: mining sequential patterns efficiently by prefix

projected pattern growth. Proceedings of the international conference

of data engineering, 215 – 224.

Reich, B. H., & Benbasat, I (2003). An empirical investigation of factors

influencing the success of customer-oriented strategic system. Infor-

mation Systems Research, 1(3) 325 – 347

Reichheld, F. (1996). The loyalty effect: The hidden force behind growth,

profit and lasting value. Cambridge: Harvard Business School Press.

Srikant, R., & Agrawal, R. (1995). Mining generalized association rules.

Proceedings of the 21th international conference on very large data

bases, 407 – 419.

Tseng, F. C., & Hsu, C. C. (2001). Generating frequent patterns with the

frequent pattern list. Proceedings of the asia pacific conference of data

mining and knowledge discovery, 376 – 386.

Fig. 19. Main losing mode.

D.-A. Chiang et al. / Expert Systems with Applications 25 (2003) 293–302302


	Goal-oriented sequential pattern for network banking churn analysis
	Introduction
	Relative research
	Association rules
	Sequential patterns

	Problem statement
	Definition of loss
	Methodology
	Goal-oriented sequential patterns

	The experiment results
	Experimental environments
	The executing efficiency analysis
	The customer loss analysis result

	Conclusion
	References