key: cord-024461-xo75855r
authors: Zhang, Yuanzhe; Luo, Ling; Wang, Yang; Wang, Zhiyong
title: FCP Filter: A Dynamic Clustering-Prediction Framework for Customer Behavior
date: 2020-04-17
journal: Advances in Knowledge Discovery and Data Mining
DOI: 10.1007/978-3-030-47426-3_45
sha: 
doc_id: 24461
cord_uid: xo75855r

Customer purchase behavior prediction plays an important role in modern retailing, but the performance of this task is often limited by the randomness of individual historic transaction data. In the meanwhile, Fragmentation and Coagulation Process (FCP), a stochastic partition model, has recently been proposed for identifying dynamic customer groups and modeling their purchase behavior. However, FCP is not able to forecast the purchase behavior because such a data-driven method requires transaction observations to conduct clustering. To tackle this challenge, we propose FCP filter, a clustering-prediction framework based on FCP, which can forecast purchase behavior and filter random noise of individual transaction data. In our model, FCP clusters customers into groups by their temporal interests to filter random noise of individual transaction data. Then a predictor is built on grouped data. The predicted results are also fed to FCP to adjust the parameter for prior knowledge at the next time step. Our model is superior in capturing temporal dynamics and having flexible number of groups. We conduct experiments on both synthetic and real-world datasets, demonstrating that our model is able to discover the latent group of individual customers and provides accurate predictions for dynamic purchase behavior.

In modern retailing, accurate prediction on the quantity that customers are going to purchase over items helps retailers to design effective marketing and warehousing strategies. However, the purchase behavior of individual customer is often random, limiting the accuracy of prediction. For example, Tom normally purchases 1 bottle of milk in his weekly visit to supermarket, but may buy 2 bottles occasionally. The observation of purchase intensity on transaction data is 1 or 2 but the real purchase intensity could be 1.2, so the observational noise is 0.2 and 0.8 for those two observations. In the above example, we can see that the purchase behavior of individual customer is not stable. To address this problem, we propose to cluster customers into groups by their historic transaction data because the purchase intensity of customer group is more stable and can represent the real purchase intensity of individuals. The random purchase noise of a customer can be filtered if his latent group could be accurately found.

For clustering customers into groups, a dynamic and flexible clustering model, Fragmentation and Coagulation Process (FCP) [7] , has been recently proposed. FCP is a data-driven clustering model, with scalable number of customer groups, which does not need to be predefined, and can evolve with the data. This property enables FCP to capture the dynamic purchase behavior of customers accurately. However, FCP clustering can only be conducted when customer purchase data is given, which makes it hard to forecast the behavior of customer groups in the future. The significance of FCP in real-world applications is limited by the unavailability of future transaction data. For retailers, forecasting the behavior of customer groups is more important than just grouping customers in the past time. In order to reduce the purchase random noise of individual customers as well as to enable FCP to forecast group purchase intensity, we propose our FCP filter based on FCP to predict purchase intensity at group level instead of individual customer. For example, as shown in Fig. 1 , there are 3 groups at t = 10. We aim to predict the purchase intensity of these 3 groups at t = 11 and use the predicted value of each group as the purchase intensity of its members. The purchase intensity of individual customer changes rapidly and randomly, but the group-based purchase intensity is more stable and easier to discover regular patterns. We can take the group-level purchase intensity as the actual state of its members, while taking the individual customer purchase intensity as the observation of group purchase intensity. By predicting over actual states instead of observations, the individual randomness can be filtered.

Also, in our model, not only the clustering results influence prediction but also the prediction results influence FCP clustering. In traditional FCP clustering, there is a hyperparameter to represent the prior knowledge of group purchase intensity. Since our FCP filter can get the prediction of group purchase intensity, we propose to update this parameter in a time-evolving manner instead of using a predefined value. This parameter can be calculated from prediction results and influence FCP clustering at the next time step. Theoretically, an accurate prediction leads to better clustering fitness than fixing the parameter.

In summary, we construct a dynamical clustering-prediction framework for modeling customer behavior. The main contributions of our model are (1) from prediction perspective, this framework helps to filter individual random purchase noise, (2) from clustering perspective, we enable FCP, a data-driven clustering model, to forecast group purchase intensity. The flexibility and dynamics of our FCP filter are appropriate for modeling customer behavior. It is flexible that the number of groups do not need to be predefined but estimated from customer transaction data. It is dynamic that the customer membership and group number can change with time. The hyperparameter controlling the priori knowledge of group purchase intensity is also updated dynamically so that the group purchase intensity can be estimated more accurately.

Clustering on customers is also known as customer segmentation, which aims to identify the customers whose purchase behavior is in the same manner [10] . In order to identify customer groups, the data-driven approaches based on clustering analysis are formal and reliable solution [3] . Decision tree [5] was used to segment customers using their demographic information. Clustering models like K-means [4] for static clustering and mixture model based on Non-homogenous Poisson process [6] for tracking dynamic group interests were also proposed. However, the preferences and interests of customers may also change over time. In order to track the customers' temporal shifting across groups, a novel Bayesian non-parametric customer segmentation model FC-CSM [7] based on a random partition process, Fragmentation and Coagulation Process (FCP) [1] , was proposed. It achieves high accuracy in fitting individual purchase frequency. Besides modeling the dynamics of segmentation, another advantage of FC-CSM is the flexibility. There is no need to set the number of customer groups manually, which can be learned automatically from data directly. However, the FC-CSM relies on the observed transaction data, so that the clustering can only be conducted for the past time. It is more meaningful to forecast the purchase behavior of groups instead of only analyzing past data. In this way, we propose to build prediction on FCP. Due to the efficiency of FCP to identify latent groups and model purchase behavior, the prediction could be more accurate than individual prediction.

Our problem can be formally described as follows. Given the transaction data of a product, a matrix X U ×T is generated to record the transaction quantity, for U customers during T time steps. Each entry x it in X U ×T refers to the purchase quantity of customer i at time step t. The task is to forecast the purchase quantities of customers at the next time step T + 1, i.e.Λ U ×1

T +1 in whichλ i(T +1) means the predicted purchase quantity of customer i at time T + 1.

Overall, our model has three main components: (1) customer segmentation based on FCP; (2) tracking model to track group purchase intensity trajectory and (3) predictor to forecast group purchase intensity at next time step.

We adopt Fragmentation and Coagulation Process (FCP) [1, 7] , a dynamic random partition model, to segment customers and capture dynamic interests of customers. The schematic diagram of FCP from time step t to t + 1 is illustrated in Fig. 2 . FCP contains two procedures: fragmentation and coagulation. Given the initial customer partition π t , at the fragmentation step, each customer group can remain the same or be split into several subgroups, forming the intermediate partition π t . Then, at the following coagulation step, a group can remain the same or be merged with other groups, forming the new partition π t+1 . In this way, FCP can capture the evolution of customer segmentation from t to t + 1. Theoretically, FCP is flexible to model any change of segmentation, which means that the new segmentation can be totally different from the previous one.

Priori Probability of Customer Segmentation. FCP defines the priori transition probability from partition π t to π t+1 . Formally, at t = 1, we adopt a random partition process Chinese Restaurant Process (CRP) [8] to model the initial partition of customers, and the probability of customer i in group g is:

where the hyperparameter ρ is to control the probability that the customer starts a new group, and M g denotes the set of customer members in group g. π −i t is the partition of customers except for customer i at t, which assumes customer i is the last one who needs to allocate. In CRP model, the larger groups of a partition tends to attract more members and becomes larger.

Given partition and allocation at previous time step, for customer i, the transition probability from group g in the current partition to group g in fragmentation step is defined as:

which refers to the groups splitting from M g . Equation (2) shows that a customer is more likely to join large groups splitting from M g . The hyperparameter δ controls the probability that customer i starts a new group not existing in the previous group π t (i), which is also the temporal dependency of partitions between consecutive time steps.

Similarly, in the coagulation step, the transition probability of customer i joining group l from the intermediate group g is:

where C t (M l ) = B|B ∈ π −i t+1 , B ⊆ M l , B = φ denotes the set of subgroups merged into M l . The priori knowledge is that a customer is more likely to join the group that merged by more subgroups.

The individual purchase quantity is modeled by Poisson distribution. Given the purchase quantity of customer i at time step t, x it , the likelihood of customer i belonging to group g at t is represented as follows:

where the purchase intensity for customer group g is λ g .

The purchase intensity has Gamma distribution as its prior, due to the conjugacy of Poisson and Gamma distributions. Therefore, we have the Maximum A Posteriori (MAP) of λ g as follows:

where the purchase intensity of a group can be interpreted as the average purchase quantities of its members, and impacted by the hyperparameters of α t (i.e. shape parameter) and β (i.e. scale parameter) of the Gamma prior.

For each customer i, we need to determine the purchase intensity trajectory {λ it } T 1 in order to predict for the future. An intuitive idea is to use the purchase intensity of the group that customer i belongs to along the time as the trajectory of purchase intensity, i.e. λ it = λ πt(i) for any t. However, the customer interests are evolving with time that the groups from the past may not fit the customers' current interests, and those λ πT −n(i) may demonstrate misleading trends for prediction.

Therefore, we propose to predict their purchase intensityλ i(T +1) only considering the current group membership, π T (i) and backtrack the intensities of this group in the past time steps, instead of using the actual groups the customers belonged to. The difficulty for tracking the purchase intensity of group π t (i) is that the group members could be totally different in consecutive time steps i.e. M πt(i) = M πt−1(i) . To address this problem, we build a backward tracking model to get the series of purchase intensities for the current group M πT (i) in partition π T .

Assume the group we are going to track is denoted as g tracking and the members of g tracking as M g tracking . The group g tracking is initialized as π T (i) for current time step t = T . If there exists g ∈ π t−1 satisfying tracking rules (Eq. (6)), we update the group g as the new group to be tracked.

In the tracking rules (Eq. (6)), we require that the majority of group g has shifted to group g tracking and the majority of group g tracking come from group g. The hyperparameter η 1 and η 2 are generally set as >0.5, so there could only be at most one or no tracked group. If there is no group g ∈ π t−1 satisfying the tracking rules, g tracking remains the same:

As to the individual purchase intensity, it is defined as follows based on whether there is a group g found:

If there is no tracked group g found, we use the average purchase intensity at t − 1 of all members of group g tracking to represent tracked group intensity.

By computing backwards from t = T to t = 1, we can finally get the trajectory of group purchase intensity {λ it } T 1 for customer i.

Finally, the prediction model can be applied on the tracked purchase intensity

In traditional FCP, the priori distribution of group purchase intensity is modeled by Gamma distribution with static predefined hyperparameter α t and β in Eq. (5). Since our FCP filter can get the prediction of group purchase intensity, we propose to update this prior hyperparameter with the prediction results so that the priori knowledge of group purchase intensity could be more accurate. For Gamma distribution, we estimate the parameter α T +1 by Maximum Likelihood Estimation (MLE), taking the predicted group purchase intensityλ g at time step T + 1 as observations, and we have:

where |π T | is the total number of groups in the partition. Our framework does not restrict the prediction models to use, and we have tested the performance of using the framework with various models including linear regressions and Long Short Term Memory (LSTM) in our experiments.

The generative graphical model of our FCP filter is shown in Fig. 3 . The initial partition π 1 is sampled based on CRP rules and the partitions in following time steps obey FCP rules as described in Sect. 3.1. Given customer i belonging to group π t (i) at time t, the individual purchase intensity x it is drawn from Poisson distribution with parameter λ πt(i) , which is the purchase intensity of the group he belongs to. The group purchase intensity λ πt(i) at time t is drawn from Gamma distribution with hyperparameters α t (i.e. shape parameter) and β (i.e. scale parameter). It is worth noting that α t is dynamic, which means that different α t at different time step t, that is different from original FCP. The parameter α t is computed by using MLE of Gamma distribution with the predicted group purchase intensityλ t and the scale parameter β as shown in Sect. 3.3. 

The customer partition and allocation are inferred by sampling using the posterior transition probabilities, computed by Eqs. (11) and (12) based on the priori transition probabilities and the observation likelihood terms. For the customer segmentation component, we use Gibbs sampler to infer the group membership of each customer over time π t (i). In more detail, since the FCP is exchangeable and projective [9] , we assume that customer i is the last customer to be sampled, which means that we can allocate customer i given the allocation of all the other customers. According to Bayesian theorem, the sampling posterior transition probabilities for split and merge steps are defined respectively as:

where the priori terms in the equations above can be calculated based on Eqs. (2) and (3) by forward and backward algorithm as used in Hidden Markov Model [2] , with the likelihood terms given in Eq. (4).

In summary, the dynamic customer segmentation is firstly modeled by FCP. Then we build tracking model to get intensity trajectory of each latent group. After that, predictor can be used to predict the purchase intensity of tracked groups. Finally, the predicted results also influence FCP clustering at the next time step by updating α t .

We conducted experiments on synthetic and real-world datasets to illustrate that our model can (1) identify dynamic customer groups based on purchase behavior, (2) achieve more accurate prediction results by filtering individual random noise. The hyperparameters are empirically set using validation dataset as follows: ρ = 0.8, δ = 0.4, η 1 = η 2 = 0.65, α 1 = 2 and β = 0.5. The evaluation metrics in our study is the Mean Absolute Error (MAE). The MAE measures the average error between predicted purchase intensity and the ground truth.

We generate a synthetic dataset to demonstrate our model's capability to identify the latent group and customer shifting over groups. There are 40 products in the synthetic dataset. For each product, we generate purchase quantity X 100 × 10 of 100 customers from 3 latent groups with 10 time steps.

Specifically, we firstly generate the group purchase intensity of those 3 groups at the first 5 time steps randomly Λ 3 × 5 . At each time step, we sort the grouplevel purchase intensities from large to small values, so that those 3 groups can show relevant purchase patterns continuously. To fill in the intensity matrix Λ 3 × 10 of 10 time steps, λ gt from t = 6 to t = 10 is generated by linear regression of 3 orders: λ gt = 3 n=1 a n * λ g(t−n) + b. Then all customers are allocated into those 3 groups randomly at t = 1. We assume that a customer changes group membership over time with probability of 0.1, which means that the customers have 10% of chance shifting into another group. Finally, we generate customer purchase quantities using Poisson distribution with parameter λ = λ πt(i) based on their allocation.

We test the predicting performance using FCP filter model and using individual records. As the purchase intensity of each group in our synthetic data evolves according to the rule of linear regression of 3 orders, the same regression predictor is used for both cases. Accurate prediction results could demonstrate the capability of our model to identify latent groups for customers. The results are shown in Fig. 4 comparing these two models. We can see that our FCP filter achieves lower MAE on almost all products. The average MAE over 40 products of individual prediction and our FCP filter are 5.58 and 3.34, respectively. This means that FCP filter successfully tracked the evolving purchase intensity of latent groups in this dynamic dataset and predicted accurately.

To illustrate the flexibility of FCP filter, we also compared the average MAE using prediction models built on static K-means clustering with different number Our model outperformed static K-means clustering with K = 3 or 5, even when the ground truth for the number of clusters is 3. It shows that the importance of dynamics and flexibility of FCP filter in capturing the evolution of the purchase intensities. Moreover, there is no need to pre-define the number of clusters in our model.

In this section, we use a real-world supermarket dataset 1 to illustrate our model's capability of filtering random purchase noise of individuals to get accurate prediction and usage of various predictors. The dataset contains 2,595,732 transaction records of 2,500 frequent customers on 2,383 products in 711 days (about two years). The transaction data is sparse in the first several months, so that we use the transaction data from 141 days to 420 days (40 weeks) for experiments. We divide 40 weeks into 10 time steps with 4 weeks in each time step. We select 24 popular products, which had the largest number of records and common in our daily life for experiments such as milk, cereal, eggs and so on. For each product, we discard customers who never bought that product and who ranked at the top 5 % based on purchase quantities as outliers. We randomly sample 100 customers for computational convenience and the purchase frequency is defined as the quantity purchased by a customer at one time step (4 weeks).

Several predictors are applied in our experiments to show FCP filter can generally achieve better prediction accuracy. They are LSTM network, 1-order and 3-order linear regressions, and a last-step predictor which takes the value at last time step as predicted value λ i(T +1) = λ πT (i) . The average MAE is shown in Table 1 . Similar to the results on the synthetic data, our FCP filter achieves the best prediction accuracy with all the predictors. It is mainly because our dynamic model is suitable for modeling customers' dynamic interests and identifies the latent groups covered by random individual purchase behavior. We notice that the 3-order regression is not accurate and stable, and the possible reason could be that it is sensitive to the input purchase intensity series data. Specially, Fig. 5 shows the prediction results of our model and individual model with the simple last-step predictor on all products. Our FCP filter gets higher prediction accuracy than individual prediction for every product. Since the predictor is quite simple, this result implies that the evolving customer purchase intensity is closer to group purchase intensity than individual one, and our FCP filter is able to find the latent group of customers and filter the noise in the individual records to produce accurate prediction results.

We build a dynamic and flexible clustering-prediction framework FCP filter to predict customer purchase intensity regardless of the random noise of individual customer behavior. Our model segments customers by FCP and then predict customer purchase intensity on the tracked group purchase intensity. After that, the prediction result adjusts priori knowledge of clustering at next time step. We conduct experiments on both synthetic and real-world datasets, and show that FCP filter model is able to (1) identify the latent group and track purchase intensity evolving trends of groups; (2) improve the accuracy of customer purchase intensity prediction. Our framework is scalable with the datasets, without the needs of defining the number of clusters and is flexible to work with different predictors. Generally, our proposed model is not restricted to the domain of customer behavior modeling. It is also useful for other sequential data containing subjects that shifting among latent groups. In our future work, our model will be built on other domains with sequential data to improve prediction accuracy.

Random Fragmentation and Coagulation Processes

Pattern Recognition and Machine Learning

Customer segmentation based on transactional data using stream clustering

Data mining for shopping centres-customer knowledge-management framework

Customer segmentation and strategy development based on customer lifetime value: a case study

Discovering temporal purchase patterns with different responses to promotions

Tracking the evolution of customer purchase behavior segmentation via a fragmentation-coagulation process

Combinatorial stochastic processes

Modelling genetic variations with fragmentation-coagulation processes

A survey of the challenges and pifalls of cluster analysis application in market segmentation