Solution strategy based on Gaussian mixture models and dispersion reduction for the capacitated centered clustering problem


Solution strategy based on Gaussian
mixture models and dispersion reduction
for the capacitated centered clustering
problem
Santiago-Omar Caballero-Morales

Postgraduate Department of Logistics and Supply Chain Management, Universidad Popular
Autonóma del Estado de Puebla, Puebla, Puebla, Mexico

ABSTRACT
The Capacitated Centered Clustering Problem (CCCP)—a multi-facility location
model—is very important within the logistics and supply chain management fields
due to its impact on industrial transportation and distribution. However, solving the
CCCP is a challenging task due to its computational complexity. In this work, a
strategy based on Gaussian mixture models (GMMs) and dispersion reduction is
presented to obtain the most likely locations of facilities for sets of client points
considering their distribution patterns. Experiments performed on large CCCP
instances, and considering updated best-known solutions, led to estimate the
performance of the GMMs approach, termed as Dispersion Reduction GMMs, with a
mean error gap smaller than 2.6%. This result is more competitive when compared to
Variable Neighborhood Search, Simulated Annealing, Genetic Algorithm and
CKMeans and faster to achieve when compared to the best-known solutions
obtained by Tabu-Search and Clustering Search.

Subjects Algorithms and Analysis of Algorithms, Data Mining and Machine Learning,
Optimization Theory and Computation
Keywords Capacitated centered clustering problem, Gaussian mixture models, Dispersion
reduction, Expectation-maximization

INTRODUCTION
Facilities are very important infrastructure within the supply chain as they support
production, distribution and warehousing. Due to this, many operative processes
associated to facilities are subject to optimization. Fields such as facility layout planning
are crucial for smooth material handling and production flow (Mohamadghasemi &
Hadi-Vencheh, 2012, Hadi-Vencheh & Mohamadghasemi, 2013; Niroomand et al., 2015).

On the other hand, where to locate facilities within specific regions is a central problem for
strategic decisions of transportation and distribution (Chaves, Correa & Lorena, 2007).
This is because the distance between the facilities and the customers (demand or client
points) is crucial to provide efficient transportation and distribution services.

Within this context, the Capacitated Centered Clustering Problem (CCCP) is one of the
best-known and most challenging multi-facility location (MFL) problems in the fields
of operations research, logistics and supply chain management (Chaves & Nogueira-Lorena,
2010, Mahmoodi-Darani et al., 2013). The CCCP is focused on determining clusters or

How to cite this article Caballero-Morales S-O. 2021. Solution strategy based on Gaussian mixture models and dispersion reduction for the
capacitated centered clustering problem. PeerJ Comput. Sci. 7:e332 DOI 10.7717/peerj-cs.332

Submitted 11 June 2020
Accepted 18 November 2020
Published 3 February 2021

Corresponding author
Santiago-Omar Caballero-Morales,
santiagoomar.caballero@upaep.mx

Academic editor
Angelo Ciaramella

Additional Information and
Declarations can be found on
page 17

DOI 10.7717/peerj-cs.332

Copyright
2021 Caballero-Morales

Distributed under
Creative Commons CC-BY 4.0

http://dx.doi.org/10.7717/peerj-cs.332
mailto:santiagoomar.�caballero@�upaep.�mx
https://peerj.com/academic-boards/editors/
https://peerj.com/academic-boards/editors/
http://dx.doi.org/10.7717/peerj-cs.332
http://www.creativecommons.org/licenses/by/4.0/
http://www.creativecommons.org/licenses/by/4.0/
https://peerj.com/computer-science/


groups of demand or client points which can lead to minimum average distances to their
centroids (where facilities are to be located). The cumulative demand of the client points
assigned to a cluster cannot exceed the capacity of the facility located at its centroid (Chaves &
Nogueira-Lorena, 2010; Chaves & Nogueira-Lorena, 2011; Negreiros & Palhano, 2006).

The mathematical formulation of the CCCP is presented as follows (Chaves &
Nogueira-Lorena, 2011; Negreiros & Palhano, 2006):

min
X
i2I

X
k2K

xi �mkk k2yik (1)

subject to
X
k2K

yik ¼ 1 8i 2 I (2)

X
i2I

yik ¼ nk 8k 2 K (3)

X
i2I

xiyik � nkmk 8k 2 K (4)

X
i2I

diyik � Ck 8k 2 K: (5)

mk 2<l; nk 2 N; yik 2f0; 1g 8i 2 I; 8k 2 K (6)
where (a) I is the set of all demand points or clients (N = number of points); (b) K is the set of
clusters (group of clients assigned to a facility) with |K| = number of facilities; (c) xi is the
geometric position of the i-th point in the <l space (l = 2 for a two-dimensional space and
xi = cxi cyi½ � where cxi is the x-coordinate and cyi is the y-coordinate of xi); (d) mk is the
geometric position of the centroid of a cluster k (i.e., the location of the k-th facility, and if l = 2,
mk = cxk cyk½ � where cxk is the x-coordinate and cyk is the y-coordinate of mk); (e) yik = 1 if
the point i is assigned to cluster k and yik = 0 otherwise; (f) nk is the number of points in
the cluster k; (g) di is the demand of the point i; and (h) Ck is the capacity of each cluster k.

In this formulation Eq. (1) defines the non-linear objective function which consists
on minimizing the total distance between each point and the centroid of the cluster where
the point is assigned. As mentioned in Chaves & Nogueira-Lorena (2010) the geometric
position of the centroid depends on the points that compose the cluster. Equations (2) and
(3) are restrictions that define that each point is only assigned to one cluster and provides
the number of points in each cluster respectively. Equation (4) is the restriction that
locates the centroid of each cluster at its geometric center while Eq. (5) is the restriction
that defines that the total demands of the points assigned to a cluster k must not exceed
its capacity. Finally, Eq. (6) define the nature of mk, the decision variable yik and the
upper limits for the number of individuals or points per cluster (nk). nk is obtained from
the values of the decision variables yik as these determine the number of points to be
assigned at each cluster (i.e., if for k = 1 the optimal solution is y20,1 = y13,1 = y46,1 = 1, then
n1 = 3). This is explained in more detail in Section “Standard EM Algorithm”.

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 2/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


The CCCP has been approached with different solving methods (particularly
meta-heuristics) due to its combinatorial nature and NP-hard computational complexity
(hence, it is difficult to solve it with exact methods) (Chaves, Correa & Lorena, 2007;
Chaves & Nogueira-Lorena, 2010; Herda, 2015; Negreiros & Palhano, 2006). While there
are works reported in the literature that have achieved very competitive results for
small, medium and large instances of the CCCP, their performance is dependent of the
size or scale of the instances.

In this work the technique of Gaussian mixture models (GMMs) is proposed to estimate
clusters of maximum probability in order to provide feasible solutions for large instances
of the CCCP (N > 1,000 points). This is accomplished by the meta-heuristic termed as
Dispersion Reduction GMMs (DRG) which integrates the following algorithms:

� an adapted Expectation-Maximization (EM) algorithm to estimate the parameters of
GMMs and generate clusters of points of maximum likelihood with the capacity
requirements of the CCCP;

� a dispersion reduction algorithm to compress large CCCP data and achieve near optimal
results within faster computational times.

Both, the application of the GMMs and the validity of the EM algorithm for the CCCP,
have not been explored in previous works. Experiments of the DRG on large instances
of the CCCP considering the benchmarks of the Clustering Search (CS) algorithm
(De-Oliveira, Chaves & Nogueira-Lorena, 2013) and Tabu-Search (TS) (Fernandes-
Muritiba et al., 2012) led to a mean error less than 2.6%. In general, the DRG algorithm
performed better than Variable Neighborhood Search (VNS), Simulated Annealing (SA),
Genetic Algorithms (GA) and CKMeans (CKM). The advances of the present work can
provide a better understanding of Gaussian modeling and the EM algorithm for their
application in facility location problems.

The contributions of this paper are presented as follows: First, in Section “Clustering” an
overview of clustering and the mathematical foundations of GMMs are presented. Also, the
findings regarding the specific dispersion reduction process required by CCCP data in order to
make it suitable for the EM algorithm are presented and discussed. Then, in Section “DRG
Meta-heuristic” the details of the DRG meta-heuristic are presented while the results regarding
its performance are discussed in Section “Results and Assessment”. Finally, conclusions and
improvement recommendations are discussed in Section “Conclusions and Future Work”.

CLUSTERING
Most of the solving algorithms for the CCCP perform clustering within the search
process for initial partitioning of the set of demand points. This partitioning is then
improved by exchanging certain points between the most promising partitions or
clusters (Hansen & Jaumard, 1997; Negreiros & Palhano, 2006).

Formally, clustering involves the grouping of data points in a way that homogeneity of
points within a group, and the heterogeneity of points between groups, are maximized
(Chaves & Nogueira-Lorena, 2010; Negreiros & Palhano, 2006). For the CCCP the
technique used for clustering has an important role in the quality of the solutions, which

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 3/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


may lead to significant differences from best-known results (i.e., errors from to 15.0%
(Radiah, Hasnah & Omar, 2013) to 50% (Negreiros & Palhano, 2006)).

Gaussian mixture models
Gaussian Mixture Models (GMMs) have been widely studied for data modeling within the
field of pattern classification based on Bayes decision theory (Theodoridis & Koutroumbas,
2010). For more accurate modeling of multi-dimensional patterns a mixture of Gaussian
distributions can be used. Each mixture component is defined by two main parameters:
a mean vector and a covariance matrix (Forsyth, 2012; Theodoridis & Koutroumbas, 1979;
Theodoridis & Koutroumbas, 2010). Within this context, “clusters” can be characterized
by each Gaussian component (mixture) which can model sub-sets of the whole set of
patterns with maximum likelihood.

There are important differences between clustering that can be obtained with GMMs
and centroid-based clustering techniques (such as K-Means or K-Nearest Neighbors).
The following can be mentioned:

� Over-fitting (e.g., the model “overreacts” to minor fluctuations in the training data
for prediction purposes) can be avoided with Gaussian distribution-based clustering.

� Clusters defined with Gaussian distributions can have different shapes and sizes.
By contrast, centroid-based clustering algorithms tend to find clusters of comparable
size of (more or less) symmetrical shape (Mohammed, Badruddin & Mohammed, 2016).

� At each iteration, Gaussian distribution-based clustering performs, for a given point, a
“soft-assignment” to a particular cluster (there is a degree of uncertainty regarding the
assignment). The centroid-based clustering performs a hard-assignment (or direct
assignment) where a given point is assigned to a particular cluster and there is no uncertainty.

Due to these differences, the GMM-based clustering was considered as an alternative to
generate feasible solutions for the CCCP. In terms of the CCCP formulation described
in Section “Introduction” a cluster can be modeled by a single Gaussian probability density
function (PDF). Hence, the location “patterns” of a set of clients X can be modeled by
a mixture of K Gaussian PDFs where each PDF models a single cluster. If the set contains
N clients, then X ¼ xi¼1; xi¼2; …; xi¼N½ � and the mixture can be expressed as:

pðXÞ¼
XK
k¼1

PkpðXjkÞ¼
XK
k¼1

PkNðXjmk; SkÞ; (7)

where k = 1,…, K and |K| = p is the number of Gaussian PDFs, p(X | k) represents the
probabilities of each Gaussian PDF describing the set of clients X (Theodoridis &
Koutroumbas, 2010) and Pk is the weight associated to each Gaussian PDF (hence,PK

k¼1 Pk = 1.0). Each Gaussian component can be expressed as:

pðXjkÞ¼NðXjmk; SkÞ¼
1ffiffiffiffiffiffiffiffiffiffiffiffi
2pjSkj

p e�
1
2
ðX �mkÞTS�1k ðX �mkÞ 8k 2 K; (8)

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 4/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


where mk is the mean vector and Sk is the covariance matrix of the k-th Gaussian PDF or k-
th cluster. Note that mk and each element of X (i.e., any xi) must have the same size
or dimension which is defined by l (in this case, l = 2 because each point consists of a
x-coordinate and a y-coordinate). Finally, Sk is a matrix of size l × l.

For clustering purposes, mk can model the mean vector of a sub-set of points in
X which is more likely to be described by the Gaussian PDF k as estimated by Eq. (8).
If the points in this sub-set of X are clustered, then mk can represent the “centroid”
of the cluster k and Sk can model the size and shape of the cluster k (Bishop, 2006;
Theodoridis & Koutroumbas, 2010). Observe that X − mk defines a distance or difference
between each point in X and the centroid (located at mk) of each cluster k. Thus, the
estimation of probabilities considers the distance between each point xi and each
cluster k.

The parameters of the Gaussian PDFs (Pk, mk and Sk) that best model (describe) each
sub-set of the whole pattern X can be estimated by the iterative algorithm of
Expectation-Maximization (EM) (Bishop, 2006; Theodoridis & Koutroumbas, 2010).

The advantage of this Gaussian approach for clustering is that faster inference about the
points xi that belong to a specific cluster k may be obtained considering all points. In this
context, it is important to mention that due to the probabilistic nature of the inference
process, a single point xi is not directly assigned to a specific cluster (as it is required by
the CCCP) because all points have probabilities associated to all clusters (i.e., “soft-
assignment”). Also, this approach does not consider the capacity of each cluster and the
demand associated to each client point. Thus, restrictions about the quantity of points xi
associated to each cluster are not integrated in this algorithm. In order to consider
these requirements and restrictions a modification on the EM algorithm was performed.
This is described in the following sections.

Dispersion reduction in GMMs performance for the CCCP
Capacitated Centered Clustering Problem data, which consists of x and y coordinates,
represents a particular challenge for GMMs. This is because the values of coordinates can
affect the computation of the exponential element of Eq. (8). Also, dispersion of data may
affect convergence of the EM algorithm for Pk, mk and Sk. Thus, specific re-scaling or
coding of CCCP data was required to reduce the effect of dispersion and coordinates’
values on the computations of GMMs.

In order to reduce dispersion of the CCCP data the compression algorithm presented in
Fig. 1 was performed. It is important to mention that compression is only performed
on the coordinates’ values, and not on the number of data points (thus, the size of the
instance remains the same).

As presented, the original x–y coordinates were replaced by their unique indexes.
This led to elimination of empty spaces between data points. Coding of the compressed
data was performed within the interval 0; 1½ � as presented in Fig. 1. This coding made the
compressed data more suitable for fast computation of Eq. (8).

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 5/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


An important issue regarding the use of GMMs for the CCCP is the restriction on the
number of Gaussian components. In this regard, extensive research has been performed to
determine the most suitable number of Gaussian components for different sets of data
(McLachlan & Rathnayake, 2014). However, available research does not consider the
capacity aspect of clustering which is the main feature of CCCP data. Nevertheless, a
restriction on the number of Gaussian components must be considered because, as
discussed in Fraley & Raftery (1998), the EM algorithm may not be suitable for cases with
very large numbers of components.

Due to this situation a maximum number of 30 Gaussian components was considered.
This restriction was based on the DONI database, which has some of the CCCP instances
with the largest number of client points (Fernandes-Muritiba et al., 2012) (i.e., N = 13,000
client points with p = 30 facilities). A discussion on future extensions for the Gaussian
approach and the EM algorithm to address larger number of components is presented in
the final section.

DRG META-HEURISTIC
Standard EM algorithm
Figure 2 presents the structure of the standard EM algorithm (Bishop, 2006; Theodoridis &
Koutroumbas, 1979; Theodoridis & Koutroumbas, 2010) considering the variables defined
by Eqs. (7) and (8). In this case, X represents the array of compressed and coded x − y
coordinates of all points xi (the array structures presented in Figs. 2 and 3 follow the EM
formulation described in Theodoridis & Koutroumbas (1979)).

0     50  100  150 200 250

80
70
60
50
40

original coordinates

250 40
150 50
50 80
50 60

unique x-coordinates = [50, 150, 250]
index i =    1,     2,    3

unique y-coordinates = [40, 50, 60, 80]
index j =   1,   2,   3,   4

compressed coordinates

3 1
2 2
1 4
1 3

i j

0      1      2     3     4     5

5
4
3
2
1

coded coordinates

1.00 0.25
0.67 0.50
0.33 1.00
0.33 0.75

i/maxi j/maxj

0   0.33  0.67 1.00

1.00
0.75
0.50
0.25

maxi = 3  maxj=4

Figure 1 CCCP data compression and coding algorithm (dispersion reduction algorithm).
Full-size DOI: 10.7717/peerj-cs.332/fig-1

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 6/19

http://dx.doi.org/10.7717/peerj-cs.332/fig-1
http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


As presented in Fig. 2, the locations of the N clients are stored into the array X which
consists of a matrix of dimension l × N where: (a) N is the number of client points,
and (b) l = 2 as each column vector of X consists of the values cxi and cyi that identify the
compressed and coded x − y coordinates of the client point xi where i = 1,…, N.

The EM algorithm starts with initial values for mk, Sk and Pk. Values for mk and Sk were
randomly generated as follows:

mk ¼
h
randomðcxmin; cxmaxÞ; randomðcymin; cymaxÞ

iT
8k 2 K; (9)

Sk ¼ randomð0:0; 0:1Þ� Il 8k 2 K; (10)
where (cxmin, cxmax) and (cymin, cymax) are the minimum and maximum values throughout
all compressed and coded x and y coordinates respectively, and Il is the identity matrix of
size l × l.

Figure 2 Structure of the standard expectation-maximization (EM) algorithm. Full-size DOI: 10.7717/peerj-cs.332/fig-2

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 7/19

http://dx.doi.org/10.7717/peerj-cs.332/fig-2
http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


For Pk a lower bound for K was obtained by considering the total demand of the points
xi and the capacity of the clusters Ck. Because all clusters have the same capacity, Ck = C.
Then, K and Pk were obtained as follows:

K ¼
PN
i¼1

di

C
; (11)

Pk ¼
1
jKj 8k 2 K: (12)

The stage of Expectation starts with these initial values for mk, Sk and Pk. An initial
computation of assignment or “responsibility” scores γ (zik) is performed to determine
which xi is more likely to be associated to a particular cluster (and thus, to belong to this
cluster) with parameters mk, Sk and Pk (Bishop, 2006). Observe that, as presented in Fig. 2
(Step 2), γ (zik) is computed by means of Eq. (8). These scores can lead to provide
values for the decision variable yik of the CCCP objective function (see Eq. (1)). This
process will be discussed in the following section.

Then, the stage of Maximization integrates the scores γ (zik) into the re-estimation of the
cluster’s parameters which leads to mnewk , S

new
k and P

new
k . Convergence is achieved if

the total error or absolute difference between the previous and new estimates is less than a
given threshold e (in this case, e=0.5). If convergence is not achieved, then mk  mnewk ,
Sk  Snewk and Pk  Pnewk .

Modified EM algorithm with direct assignment and capacity
restrictions
As discussed in Section “Gaussian Mixture Models” the advantage of the GMMs approach
is that faster inference about the points xi that belong to a specific cluster k can be obtained.
This inference is performed based on the probabilities defined by Eq. (8) where the

Figure 3 Assignment of values for yik from the scores of γ(zik).
Full-size DOI: 10.7717/peerj-cs.332/fig-3

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 8/19

http://dx.doi.org/10.7717/peerj-cs.332/fig-3
http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


parameters of the Gaussian PDFs are estimated with the standard EM algorithm. However
under this approach a single point xi is not directly assigned to a specific cluster (as it is
required by the CCCP) because all points have probabilities associated to all clusters
(i.e., “soft-assignment”). Also, while the standard EM algorithm can determine the most
likely sub-sets of client points in X to be covered by each cluster k, these sub-sets are not
restricted by the capacity of the cluster and the demand of each assigned client point.
Hence, the standard EM algorithm was modified in order to comply with the requirements
and restrictions of the CCCP.

As previously mentioned, the score γ (zik) computed in Step 2 of the EM algorithm
(see Fig. 2) represents the likelihood or responsibility of the cluster k on the generation of
the point xi (Bishop, 2006). As presented in Fig. 3 all scores are stored into a matrix γ
of dimension K × N, where each column vector cðzi1Þ; …; cðziKÞ½ �T contains the assignment
scores of the point xi to all clusters (thus,

PK
k¼1 cðzikÞ¼ 1:0 8i 2 N). These scores

represent the basis for defining the decision variable yik.
From Section “Introduction” it was explained that yik = 1 if the point xi is assigned to

cluster k and yik = 0 otherwise (each point xi can be assigned to only one cluster). Based on
the scores of γ (zik) it was determined that for each vector xi the cluster assignment
would be defined by the cluster k with maximum γ(zik) value. If two or more clusters share
the same maximum likelihood, then one of them is randomly assigned. An example of this
assignment process is presented in Fig. 3.

By determining the unique assignment of each point xi to each cluster k at Step 2 of the
EM algorithm (see Fig. 2), the number of points assigned to each cluster is obtained.
This leads to determine the cumulative demand of the points assigned to each cluster.
This information is stored into the vector:

Demands ¼ D1; D2; …; DK½ �; (13)
where Dk represents the cumulative demand of the points assigned to cluster k and it must
satisfy Dk ≤ Ck. This vector is important to comply with the capacity restrictions because it
was found that homogenization of the cumulative demands Dk contributes to this
objective. Homogenization is achieved by minimizing the coefficient of variation between
all cumulative demands:

min CV ¼ Standard DeviationðDemandsÞ
MeanðDemandsÞ : (14)

The objective function defined by Eq. (14) is integrated within the evaluation step of the
standard EM algorithm (see Step 4 of Fig. 2). This leads to the modified EM algorithm with
capacity restrictions where convergence is based on two objectives:

� minimization of the error (less than a threshold of e = 0.5) between the estimates of mk,
Pk and Sk;

� minimization of the coefficient of variation of Demands.
An important issue regarding the minimization of CV is that there is the possibility that

min CV may not lead to comply the condition Dk ≤ Ck for all clusters k. This was addressed

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 9/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


by the following strategy: if min CV leads to a cluster k where the condition Dk ≤ Ck is not
complied, then the farthest assigned points are re-assigned to other clusters with closer
centroids and available capacity.

Figure 4 presents an example of the clustering achieved without and with the objective
function defined by Eq. (14). As presented, the clusters are more accurately defined with
the integration of Eq. (14). Also, clusters comply with the capacity restrictions of the
CCCP.

It is important to remember that the proposed DRG meta-heuristic consists of the
modified EM algorithm which provides the cluster assignation for each point xi
considering compressed and coded CCCP data. Thus, Fig. 4 presents the decoded and
uncompressed points xi (i.e., original cxi and cyi coordinates) assigned to each cluster based
on the assignments of the DRG meta-heuristic with the modified EM algorithm on
compressed and coded data.

Further improvement of the assignments and the centroids at mk are achieved by a
Greedy algorithm that verifies that all points xi are assigned to the closest cluster. This leads
to additional exchange and insertion/deletion of complying client points xi between
clusters. These operations must comply with the capacity restrictions of each cluster. The
assignments are updated if the insertion/deletion is valid. This leads to re-estimation of the
locations of the mk centroids.

RESULTS AND ASSESSMENT
Implementation of the DRGMMs meta-heuristic was performed with the software
MATLAB on a laptop computer with the following hardware: Intel CORE i7-4500U CPU
at 2.40 GHz with 8 GB RAM. For testing purposes the libraries presented in Table 1 were
considered.

For comparison purposes the methods and best-known results presented in Chaves &
Nogueira-Lorena (2010), Chaves & Nogueira-Lorena (2011), Fernandes-Muritiba et al.
(2012), De-Oliveira, Chaves & Nogueira-Lorena (2013), Negreiros & Palhano (2006),

(a)

Centroid ( ) 
of cluster k

Centroid ( ) 
of cluster k

(b)

Figure 4 Clustering with (A) standard EM algorithm and (B) DRG meta-heuristic (modified EM
algorithm with decoded and uncompressed CCCP data). Full-size DOI: 10.7717/peerj-cs.332/fig-4

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 10/19

http://dx.doi.org/10.7717/peerj-cs.332/fig-4
http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


Palhano, Negreiros & Laporte (2008) and Pereira & Senne (2008) for the CCCP were
considered. These methods are the following:

� Best-known solutions�. The benchmarks reported in De-Oliveira, Chaves & Nogueira-
Lorena (2013) have been used for the assessment of the most efficient methods for the
CCCP. For this work, these benchmarks were updated with the best-known results
obtained with TS (Fernandes-Muritiba et al., 2012) which is considered as the current
state of the art algorithm for the CCCP (Carvalho, Mendes & Azeredo-Chaves, 2017;
Pereira & Carvalho, 2017).

� Clustering Search (CS). Hybrid method which combines meta-heuristics and local
search heuristics in which the search is intensified only in areas of the search space
that deserve special attention (promising regions). The main idea of CS is to identify
promising areas of the search space by generating solutions through a meta-heuristic
and “clustering” them into groups that are further explored with local search heuristics
(Chaves & Nogueira-Lorena, 2010; Chaves & Nogueira-Lorena, 2011; De-Oliveira,
Chaves & Nogueira-Lorena, 2013). The results reported in De-Oliveira, Chaves &
Nogueira-Lorena (2013) were considered for comparison with DRG.

� A two-phase heuristic using VNS. The results reported in Negreiros & Palhano (2006)
were considered for comparison with DRG. These results were also reviewed by
Fernandes-Muritiba et al. (2012) for comparison with TS.

Table 1 Libraries of CCCP instances considered for testing and assessment.

Instance N K

doni1 1,000 6

doni2 2,000 6

doni3 3,000 8

doni4 4,000 10

doni5 5,000 12

doni6 10,000 23

doni7 13,221 30

SJC1 100 10

SJC2 200 15

SJC3a 300 25

SJC4a 402 30

TA25 25 5

TA50 50 5

TA60 60 5

TA70 70 5

TA80 80 7

TA90 90 4

TA100 100 6

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 11/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


� CKMeans (CKM). The results reported in Palhano, Negreiros & Laporte (2008) were
considered for comparison with DRG. These results were also reviewed by Chaves &
Nogueira-Lorena (2011) for comparison with GA.

� Two-Column Generation (TCG). The results reported in Pereira & Senne (2008) were
considered for comparison with DRG.

� Simulated Annealing (SA) and Genetic Algorithm (GA). CS performs a clustering
strategy on solutions generated by a meta-heuristic, and the research in Chaves &
Nogueira-Lorena (2010, 2011) reported the comparison of results of the CS method with
the SA and GA meta-heuristics respectively. For assessment purposes, (Chaves &
Nogueira-Lorena, 2010, 2011) also reported the comparison of results of SA and GA
without the CS strategy (e.g., independent performance of SA and GA). These
independent results were considered for comparison with DRG.

� Tabu-Search (TS). The TS algorithm reported in Fernandes-Muritiba et al. (2012) is
currently considered state of the art by Rodrigo de Carvalho et al. for the CCCP
(Carvalho, Mendes & Azeredo-Chaves, 2017; Pereira & Carvalho, 2017). The results
reported in Fernandes-Muritiba et al. (2012) were considered for comparison with DRG.

In order to compute the error, gap or deviation from the updated best-known solutions
the error metric presented by Yousefikhoshbakht & Khorram (2012) was considered:

Errorð%Þ¼ 100� a�b
b

� �
; (15)

where a is the cost or distance of the best solution found by the algorithm for a given
instance while b is the best known solution for the same instance. In this case it is
important to mention that the best-known solution is not necessarily the optimal solution
due to the NP-hard complexity of the CCCP. Initially, this metric was computed for the
DRG, VNS, SA, CS, TS and GA methods because the reference data was available for all
sets of instances.

DRG vs. VNS-SA-CS-TS-GA
Table 2 presents the best results of the DRG meta-heuristic for the considered instances.
Information regarding the runs performed by each method to report the best result is also
presented when available. Also, information regarding the programing language and
the hardware used by the authors of the reviewed methods were also included.

As reported in the literature, CS and TS are the most competitive methods to solve the
CCCP with a mean error gap of 0.78% and 0.61% respectively (and thus, were considered
to update the benchmark solutions). Then, the proposed DRG meta-heuristic stands as
the next most competitive method with a mean error gap of 2.58%. The VNS, SA and
GA methods show a more significant mean error gap with 4.69%, 13.79% and 7.03%
respectively. Also, VNS, SA and GA show a higher variability in performance which is
characterized by an increased standard deviation when compared with their mean error
gap. The DRG shows a balanced mean and standard deviation, thus its performance is
more robust and consistent through all different CCCP instances.

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 12/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


DRG vs. CKM-TCG
Tables 3 and 4 present the results of the DRG meta-heuristic for the instances where
reference data of the CKM and TCG methods were available. As presented in Table 3, the
DRG meta-heuristic is more competitive than the CKM method. Also, as previously
observed, the DRG is more consistent.

When compared to the TCG method, this is more competitive than the DRG approach
even though the error gaps are minimal (less than 1.5%).

Error and speed vs. size of the instance
Figure 5 presents the graphical review of the error gaps of all methods based on the size of
the test instance. TS and CS are located on the x-axis as they are the benchmark methods.
It can be observed that, as the size of the instance increases, the error gap of SA, GA
and VNS significantly increases. TCG presents very small error gaps with small instances
(less than 1,000 points) and CKM reports error gaps comparable to SA for small-medium
size instances (less than 5,000 points). In contrast, DRG performs consistently through
all instances, decreasing its error gap as the instance grows from medium to large size
(up to 13,0221 points).

Table 2 Performance of DRG, VNS, SA, CS, TS and GA on CCCP instances when compared to updated Best-known solutions*: (a) MATLAB,
Intel CORE i7 at 2.4 GHz and 8 GB RAM, (b) AMD ATHLON at 1.6 GHz and 512 MB RAM, (c) C++, Pentium 4 at 3.0 GHz, (d) C++, Pentium
4 at 3.0 GHz, (e) C++, Intel CORE 2 Quad Q9550 CPU at 2.83 GHz and 4 GB RAM, (f) C++, Pentium 4 at 3.0 GHz.

Instance Best
known*

N K DRG (a)
(10 runs)

Error
(%)

VNS (b) Error
(%)

SA (c)
(10 runs)

Error
(%)

CS (d)
(20 runs)

Error
(%)

TS (e)
(25 runs)

Error
(%)

GA (f)
(20 runs)

Error
(%)

TA25 1,251.44 25 5 1,256.62 0.41 1,251.44 0.00 1,273.46 1.76 1,251.44 0.00 1,251.45 0.00 1,273.46 1.76

TA50 4,474.52 50 5 4,476.92 0.05 4,476.12 0.04 4,478.15 0.08 4,474.52 0.00 4,474.52 0.00 4,474.52 0.00

TA60 5,356.58 60 5 5,356.58 0.00 5,356.58 0.00 5,370.05 0.25 5,356.58 0.00 5,356.58 0.00 5,356.58 0.00

TA70 6,240.67 70 5 6,270.45 0.48 6,241.55 0.01 6,267.89 0.44 6,240.67 0.00 6,240.67 0.00 6,267.89 0.44

TA80 5,515.46 80 7 5,748.30 4.22 5,730.28 3.89 5,780.55 4.81 5,730.28 3.89 5,730.28 3.89 5,775.69 4.72

TA90 8,899.05 90 4 9,069.85 1.92 9,103.21 2.29 9,069.85 1.92 9,069.85 1.92 9,069.85 1.92 9,133.35 2.63

TA100 8,102.04 100 6 8,122.36 0.25 8,122.67 0.25 8,153.64 0.64 8,102.04 0.00 8,102.04 0.00 8,189.44 1.08

SJC1 17,359.75 100 10 17,492.77 0.77 17,696.53 1.94 17,363.47 0.02 17,359.75 0.00 17,359.75 0.00 17,363.47 0.02

SJC2 33,181.65 200 15 33,317.03 0.41 33,423.84 0.73 33,458.40 0.83 33,181.65 0.00 33,181.65 0.00 33,324.04 0.43

SJC3a 45,354.38 300 25 46,395.96 2.30 47,985.29 5.80 46,847.61 3.29 45,354.38 0.00 45,356.35 0.00 46,682.60 2.93

SJC4a 61,931.60 402 30 62,701.95 1.24 66,689.96 7.68 64,981.66 4.92 61,931.60 0.00 61,993.66 0.10 65,978.89 6.54

DONI1 3,021.41 1,000 6 3,074.97 1.77 3,021.41 0.00 3,138.67 3.88 3,022.26 0.03 3,025.12 0.12 3,122.02 3.33

DONI2 6,080.70 2,000 6 6,456.14 6.17 6,080.70 0.00 6,985.30 14.88 6,372.81 4.80 6,384.84 5.00 6,394.96 5.17

DONI3 8,343.49 3,000 8 8,911.87 6.81 8,769.05 5.10 9,653.27 15.70 8,438.96 1.14 8,343.49 0.00 8,945.88 7.22

DONI4 10,777.64 4,000 10 11,453.68 6.27 11,516.14 6.85 13,328.16 23.66 10,854.48 0.71 10,777.64 0.00 11,130.16 3.27

DONI5 11,114.67 5,000 12 11,776.59 5.96 11,635.18 4.68 13,920.49 25.24 11,134.94 0.18 11,114.67 0.00 11,341.52 2.04

DONI6 15,610.46 10,000 23 16,362.38 4.82 18,443.50 18.15 29,102.49 86.43 15,722.67 0.72 15,610.46 0.00 19,226.96 23.17

DONI7 18,484.13 13,221 30 18,974.23 2.65 23,478.79 27.02 29,484.66 59.51 18,596.74 0.61 18,484.13 0.00 29,915.77 61.85

Mean= 2.58 Mean= 4.69 Mean= 13.79 Mean= 0.78 Mean= 0.61 Mean= 7.03

StdDev= 2.45 StdDev= 7.18 StdDev= 23.43 StdDev= 1.41 StdDev= 1.48 StdDev= 14.68

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 13/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


Regarding speed, Fig. 6 presents the computational (running) times reported by the
reviewed methods. While TS and CS are the benchmark methods, these take over 25,000 s
to reach the best-known solution for the largest instance. Note that for these methods, their
computational times exponentially increase for instances larger than 6,000 points.

In contrast, SA is very consistent with a computational time of approximately 1,000 s
through all instances. GA significantly increases for instances larger than 6,000 points
(up to 7,000 seconds for the largest instance). However, these methods have the largest
error gaps as reviewed in Fig. 5. The speed pattern of DRG is very similar to that of GA,
however, as reviewed in Fig. 5, its error gap is the closest to the benchmark methods for
instances larger than 6,000 points.

Table 3 Performance of DRG and CKM on CCCP instances when compared to Best-known
solutions*.

Instance DRG Error (%) CKM Error (%)

SJC1 1,7492.77 0.77 20,341.34 17.18

SJC2 33,317.03 0.41 35,211.99 6.12

SJC3a 46,395.96 2.30 50,590.49 11.54

SJC4a 62,701.95 1.24 69,283.05 11.87

DONI1 3,074.97 1.77 3,234.58 7.06

DONI2 6,456.14 6.17 6,692.71 10.06

DONI3 8,911.87 6.81 9,797.12 17.42

DONI4 11,453.68 6.27 11,594.07 7.58

DONI5 11,776.59 5.96 11,827.69 6.42

Mean= 3.52 Mean= 10.58

StdDev= 2.70 StdDev= 4.36

Table 4 Performance of DRG and TCG on CCCP instances when compared to Best-known
solutions*: (a) MATLAB, Intel CORE i7 at 2.4 GHz and 8 GB RAM, (b) C++, Intel CORE 2 Duo
at 2 GHz with 2 GB RAM.

Instance DRG (a) Error (%) TCG (b) Error (%)

TA25 1,256.62 0.41 1,280.49 2.32

TA50 4,476.92 0.05 4,474.52 0.00

TA60 5,356.58 0.00 5,357.34 0.01

TA70 6,270.45 0.48 6,240.67 0.00

TA80 5,748.30 4.22 5,515.46 0.00

TA90 9,069.85 1.92 8,899.05 0.00

TA100 8,122.36 0.25 8,168.36 0.82

SJC1 17,492.77 0.77 17,375.36 0.09

SJC2 33,317.03 0.41 33,357.75 0.53

SJC3a 46,395.96 2.30 45,379.69 0.06

SJC4a 62,701.95 1.24 61,969.06 0.06

Mean= 1.10 Mean= 0.35

StdDev= 1.28 StdDev= 0.71

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 14/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


0.02 2.0 2.2 2.0 2.7 2.2 3.0 1.9 2.3 2.2 2.6 2.3 0.4 1.9 3.0 0.15 TA100 TA90 8899.05 9103.21 2.3 9069.85 1.9
TA100 8102.04 8122.67 0.3 8153.64 0.6

4.7 13.8
7.2 23.4

1.53 1.699

doni1
doni2
doni3
doni4
doni5
doni6
doni7

doni6
doni7
TA25
TA50
TA60
TA70
TA80
TA90

TA100

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

90.00

100.00

0 2000 4000 6000 8000 10000 12000 14000

 
E
rr

or
 G

ap
 (%

)

N

CKM VNS SA GA DRG TCG

SA

GA

DRG

CKM

VNS

TCG TS/CS 
(Best-known)

Figure 5 Comparison of error gap vs. size of the instance.
Full-size DOI: 10.7717/peerj-cs.332/fig-5

6270.45 0.5 TA70 70 0.00 0.01 0.01 0.01 InstanceBest Known TCG % DRG % 10000 0.00
5748.30 4.2 TA80 80 0.05 0.06 0.05 0.05 SJC1 17359.75 17375.36 0.1 17492.77 0.8 13221 0.00
9069.85 1.9 TA90 90 0.03 0.02 0.03 0.02 SJC2 33181.65 33357.75 0.5 33317.03 0.4
8122.36 0.3 TA100 100 0.00 0.01 0.01 0.00 SJC3a 45354.38 45379.69 0.1 46395.96 2.3 25 TA25 2.3213258

2.6 SJC4a 61931.60 61969.06 0.1 62701.95 1.2 50 TA50 0
2.5 TA25 1251.44 1280.49 2.3 1256.62 0.4 60 TA60 0.0141882

0.949 TA50 4474.52 4474.52 0.0 4476.92 0.1 70 TA70 0
TA60 5356.58 5357.34 0.0 5356.58 0.0 80 TA80 0
TA70 6240.67 6240.67 0.0 6270.45 0.5 90 TA90 0
TA80 5515.46 5515.46 0.0 5748.30 4.2 100 TA100 0.8185593
TA90 8899.05 8899.05 0.0 9069.85 1.9 100 SJC1 0.0899206

TA100 8102.04 8168.36 0.8 8122.36 0.3 200 SJC2 0.530715
0.4 1.1 300 SJC3a 0.055805

0.7 1.3 402 SJC4a 0.0604861
1.995 1.167

% % %

Normalized Errors max= 85.1
Instances N VNS SA GA DRG

TA25 25 0.00 1.76 1.76 0.41
TA50 50 0.04 0.08 0.00 0.05
TA60 60 0.00 0.25 0.00 0.00
TA70 70 0.01 0.44 0.44 0.48
TA80 80 3.89 4.81 4.72 4.22
TA90 90 2.29 1.92 2.63 1.92
TA100 100 0.25 0.64 1.08 0.25
SJC1 100 1.94 0.02 0.02 0.77
SJC2 200 0.73 0.83 0.43 0.41

SJC3a 300 5.80 3.29 2.93 2.30
SJC4a 402 7.68 4.92 6.54 1.24
doni1 1000 0.00 3.88 3.33 1.77
doni2 2000 0.00 14.88 5.17 6.17
doni3 3000 5.10 15.70 7.22 6.81
doni4 4000 6.85 23.66 3.27 6.27
doni5 5000 4.68 25.24 2.04 5.96
doni6 10000 18.15 86.43 23.17 4.82
doni7 13221 27.02 59.51 61.85 2.65

0

5000

10000

15000

20000

25000

30000

35000

40000

0 2000 4000 6000 8000 10000 12000 14000

 
E
xe

cu
tio

n 
T

im
e 

(s
ec

on
ds

)

N

SA GA DRG TCG CS TS

SA

GA
DRG

CS

(Best-known)

TS

TCG

Figure 6 Comparison of speed vs. size of the instance. Full-size DOI: 10.7717/peerj-cs.332/fig-6

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 15/19

http://dx.doi.org/10.7717/peerj-cs.332/fig-5
http://dx.doi.org/10.7717/peerj-cs.332/fig-6
http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


It is important to mention that this comparison may not be fair due to the differences in
the programming language and the hardware used for implementation and testing of all
the methods. In order to compare running speed all methods should be tested with the
same hardware and be implemented with the same programming language by the same
software developer (the developer’s expertise may also affect the speed performance of the
software). Due to the difficulty of achieving this task, in practice the comparison is
commonly performed on the best results obtained by other methods as performed in
Chaves & Nogueira-Lorena (2011) and Fernandes-Muritiba et al. (2012). Running time is
measured in order to determine if the proposed method can provide a solution within
reasonable time considering standard resources of hardware and software. Due to this
situation, the DRG can provide very suitable results (<2.6% error gap) within very
reasonable time.

CONCLUSIONS AND FUTURE WORK
Both, the application of Gaussian probability functions and the EM algorithm have not
been explored in the literature as solving techniques for facility location problems
(e.g., CCCP). An important aspect for the application of GMMs is the reduction of
dispersion to accomplish more efficient clustering and convergence of the EM algorithm.
Hence, the proposed DRG meta-heuristic provides important insights about the suitability
of these techniques for the CCCP and the challenges to improve its performance.

Regarding performance, TS and CS are the most competitive solving methods for the
CCCP and thus, were considered as benchmark methods with mean average error of
0.61% and 0.78% respectively. The proposed DRG meta-heuristic performed as the closest
best method with a mean error gap smaller than 2.6% and there is evidence than it can
provide these results faster when compared to TS and CS for large instances (>6,000
points). Hence, the DRG meta-heuristic can be a suitable alternative when compared to CS
and TS regarding time, and a more efficient method when compared to VNS, SA, GA,
and CKM.

The future work is focused on improving the performance of the DRG based on the
following key aspects:

� The EM algorithm was found to be functional with up to 30 Gaussian components for
the clustering process (see discussion on Section “Dispersion Reduction in GMMs
Performance for the CCCP”). In this case the analysis of the Infinite Gaussian Mixture
Model described in Rasmussen (2010) may lead to overcome this restriction.

� While most optimization methods such as those based on Mixed Integer Linear
Programming (MILP) or meta-heuristics are purely quantitative, modeling of
qualitative criteria may improve the optimization task. In example, in Hadi-Vencheh &
Mohamadghasemi (2013) a methodology that integrated the Analytic Hierarchy Process
(AHP) and a Nonlinear Programming Model (NLP) provided very suitable solution for
a facility layout problem. Such approach may lead to improve the solving methods for
other logistic problems such as the facility location problem.

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 16/19

http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


� Integration with state-of-the-art meta-heuristics such as Migrating Birds Optimization
(MBO) (Niroomand et al., 2015) and mat-heuristics (Sartori & Buriol, 2020).

ADDITIONAL INFORMATION AND DECLARATIONS

Funding
The author received no funding for this work.

Competing Interests
The author declares that they have no competing interests.

Author Contributions
� Santiago-Omar Caballero-Morales conceived and designed the experiments, performed
the experiments, analyzed the data, performed the computation work, prepared figures
and/or tables, authored or reviewed drafts of the paper, and approved the final draft.

Data Availability
The following information was supplied regarding data availability:

Code and test instances of the considered databases are available in the
Supplemental Files.

Supplemental Information
Supplemental information for this article can be found online at http://dx.doi.org/10.7717/
peerj-cs.332#supplemental-information.

REFERENCES
Bishop CM. 2006. Pattern recognition and machine learning. New York: Springer Science and

Business Media, LLC.

Carvalho R, Mendes G, Azeredo-Chaves F. 2017. A heuristic for the capacitated clustering
problem. In: In Proceeding of the XLIX Simposio Brasileiro de Pesquisa Operacional. 1–12.

Chaves AA, Correa FA, Lorena LAN. 2007. Clustering search heuristic for the capacitated
p-median problem. Advances in Software Computing Series 44:136–143.

Chaves AA, Nogueira-Lorena LA. 2010. Clustering search algorithm for the capacitated
centered clustering problem. Computers & Operations Research 37(3):552–558
DOI 10.1016/j.cor.2008.09.011.

Chaves AA, Nogueira-Lorena LA. 2011. Hybrid evolutionary algorithm for the capacitated
centered clustering problem. Expert Systems with Applications 38(5):5013–5018
DOI 10.1016/j.eswa.2010.09.149.

De-Oliveira ACM, Chaves AA, Nogueira-Lorena LA. 2013. Clustering search.
Pesquisa Operacional 33(1):105–121 DOI 10.1590/S0101-74382013000100007.

Fernandes-Muritiba AE, Negreiros M, Gois-Oria HL, Ferreira de Souza M. 2012. A tabu
search algorithm for the capacitated centred clustering problem. In: 2012 Congreso Latino-
Iberoamericano de Investigacion Operativa (CLAIO)—2012 Simposio Brasileiro de Pesquisa
Operacional (SBPO). 2344–2359.

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 17/19

http://dx.doi.org/10.7717/peerj-cs.332#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.332#supplemental-information
http://dx.doi.org/10.7717/peerj-cs.332#supplemental-information
http://dx.doi.org/10.1016/j.cor.2008.09.011
http://dx.doi.org/10.1016/j.eswa.2010.09.149
http://dx.doi.org/10.1590/S0101-74382013000100007
http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


Forsyth D. 2012. Lecture notes of CS 498 signals AI: Gaussian classifiers. Illinois: University of
Illinois, Urbana-Champaign.

Fraley C, Raftery A. 1998. How many clusters? Which clustering method? Answers via model-
based cluster analysis. Computer Journal 41(8):578–588 DOI 10.1093/comjnl/41.8.578.

Hadi-Vencheh A, Mohamadghasemi A. 2013. An integrated AHP–NLP methodology for facility
layout design. Journal of Manufacturing Systems 32(1):40–45 DOI 10.1016/j.jmsy.2012.07.009.

Hansen P, Jaumard B. 1997. Cluster analysis and mathematical programming. Mathematical
Programming 79:191–215.

Herda M. 2015. Combined genetic algorithm for capacitated p-median problem. In: Proc. of the
16th IEEE International Symposium on Computational Intelligence and Informatics, CINTI 2015.
Piscataway: IEEE, 151–154.

Mahmoodi-Darani N, Ahmadi V, Saadati Eskandari Z, Yousefikhoshbakht M. 2013. Solving the
capacitated clustering problem by a combined meta-heuristic algorithm. Journal of Advances in
Computer Research 4(1):1–13.

McLachlan GF, Rathnayake S. 2014. On the number of components in a Gaussian mixture model.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 4(5):341–355
DOI 10.1002/widm.1135.

Mohamadghasemi A, Hadi-Vencheh A. 2012. An integrated synthetic value of fuzzy judgments
and nonlinear programming methodology for ranking the facility layout patterns. Computers &
Industrial Engineering 62(1):342–348 DOI 10.1016/j.cie.2011.10.004.

Mohammed M, Badruddin M, Mohammed E. 2016. Machine learning: algorithms and
applications. Milton Park: CRC Press, Taylor & Francis Group.

Negreiros M, Palhano A. 2006. The capacitated centred clustering problem. Computers &
Operations Research 33(6):1639–1663 DOI 10.1016/j.cor.2004.11.011.

Niroomand S, Hadi-Vencheh A, Sahin R, Vizvari B. 2015. Modified migrating birds optimization
algorithm for closed loop layout with exact distances in flexible manufacturing systems.
Expert Systems with Applications 42(19):6586–6597 DOI 10.1016/j.eswa.2015.04.040.

Palhano A, Negreiros M, Laporte G. 2008. A constrained k-median procedure for the capacitated
centered clustering problem. In: Proceeding of the XIV Congreso Latino Ibero Americano de
Investigacion de Operaciones, CLAIO 2008.

Pereira WM, Carvalho R. 2017. ILS-PR: an efficient heuristic for the capacitated centered
clustering problem. Proceeding Series of the Brazilian Society of Computational and Applied
Mathematics 5(1):104271–104272.

Pereira MA, Senne LF. 2008. A column generation method for the capacitated centred clustering
problem. In: Proceedings of the VI ALIO/EURO Workshop on Applied Combinatorial
Optimization. 1–6.

Radiah S, Hasnah N, Omar M. 2013. An alternative heuristic for capacitated p-median problem
(CPMP). In: Proceeding of the 2013 IEEE Business Engineering and Industrial Applications
Colloquium, BEIAC 2013. 916–921.

Rasmussen C. 2010. The infinite Gaussian mixture model. In: Solla SA, Leen TK, Müller KR, eds.
Advances in Neural Information Processing Systems. Vol. 12. Cambridge: MIT Press, 554–560.

Sartori CS, Buriol LS. 2020. A study on the pickup and delivery problem with time windows:
matheuristics and new instances. Computers & Operations Research 124(2):1–17
DOI 10.1016/j.cor.2020.105065.

Theodoridis S, Koutroumbas K. 1979. Pattern recognition. Second Edition. San Diego: Elsevier
Academic Press.

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 18/19

http://dx.doi.org/10.1093/comjnl/41.8.578
http://dx.doi.org/10.1016/j.jmsy.2012.07.009
http://dx.doi.org/10.1002/widm.1135
http://dx.doi.org/10.1016/j.cie.2011.10.004
http://dx.doi.org/10.1016/j.cor.2004.11.011
http://dx.doi.org/10.1016/j.eswa.2015.04.040
http://dx.doi.org/10.1016/j.cor.2020.105065
http://dx.doi.org/10.7717/peerj-cs.332
https://peerj.com/computer-science/


Theodoridis S, Koutroumbas K. 2010. An introduction to pattern recognition: a MATLAB
approach. Cambridge: Academic Press.

Yousefikhoshbakht M, Khorram E. 2012. Solving the vehicle routing problem by a hybrid meta-
heuristic algorithm. Journal of Industrial Engineering International 8(11):1–9
DOI 10.1186/2251-712X-8-11.

Caballero-Morales (2021), PeerJ Comput. Sci., DOI 10.7717/peerj-cs.332 19/19

http://dx.doi.org/10.1186/2251-712X-8-11
https://peerj.com/computer-science/
http://dx.doi.org/10.7717/peerj-cs.332

	Solution strategy based on Gaussian mixture models and dispersion reduction for the capacitated centered clustering problem
	Introduction
	Clustering
	Drg meta-heuristic
	Results and assessment
	Conclusions and future work
	References


<<
  /ASCII85EncodePages false
  /AllowTransparency false
  /AutoPositionEPSFiles true
  /AutoRotatePages /None
  /Binding /Left
  /CalGrayProfile (Dot Gain 20%)
  /CalRGBProfile (sRGB IEC61966-2.1)
  /CalCMYKProfile (U.S. Web Coated \050SWOP\051 v2)
  /sRGBProfile (sRGB IEC61966-2.1)
  /CannotEmbedFontPolicy /Warning
  /CompatibilityLevel 1.4
  /CompressObjects /Off
  /CompressPages true
  /ConvertImagesToIndexed true
  /PassThroughJPEGImages true
  /CreateJobTicket false
  /DefaultRenderingIntent /Default
  /DetectBlends true
  /DetectCurves 0.0000
  /ColorConversionStrategy /LeaveColorUnchanged
  /DoThumbnails false
  /EmbedAllFonts true
  /EmbedOpenType false
  /ParseICCProfilesInComments true
  /EmbedJobOptions true
  /DSCReportingLevel 0
  /EmitDSCWarnings false
  /EndPage -1
  /ImageMemory 1048576
  /LockDistillerParams false
  /MaxSubsetPct 100
  /Optimize true
  /OPM 1
  /ParseDSCComments true
  /ParseDSCCommentsForDocInfo true
  /PreserveCopyPage true
  /PreserveDICMYKValues true
  /PreserveEPSInfo true
  /PreserveFlatness true
  /PreserveHalftoneInfo false
  /PreserveOPIComments false
  /PreserveOverprintSettings true
  /StartPage 1
  /SubsetFonts true
  /TransferFunctionInfo /Apply
  /UCRandBGInfo /Preserve
  /UsePrologue false
  /ColorSettingsFile (None)
  /AlwaysEmbed [ true
  ]
  /NeverEmbed [ true
  ]
  /AntiAliasColorImages false
  /CropColorImages true
  /ColorImageMinResolution 300
  /ColorImageMinResolutionPolicy /OK
  /DownsampleColorImages false
  /ColorImageDownsampleType /Average
  /ColorImageResolution 300
  /ColorImageDepth 8
  /ColorImageMinDownsampleDepth 1
  /ColorImageDownsampleThreshold 1.50000
  /EncodeColorImages true
  /ColorImageFilter /FlateEncode
  /AutoFilterColorImages false
  /ColorImageAutoFilterStrategy /JPEG
  /ColorACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /ColorImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000ColorACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000ColorImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasGrayImages false
  /CropGrayImages true
  /GrayImageMinResolution 300
  /GrayImageMinResolutionPolicy /OK
  /DownsampleGrayImages false
  /GrayImageDownsampleType /Average
  /GrayImageResolution 300
  /GrayImageDepth 8
  /GrayImageMinDownsampleDepth 2
  /GrayImageDownsampleThreshold 1.50000
  /EncodeGrayImages true
  /GrayImageFilter /FlateEncode
  /AutoFilterGrayImages false
  /GrayImageAutoFilterStrategy /JPEG
  /GrayACSImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /GrayImageDict <<
    /QFactor 0.15
    /HSamples [1 1 1 1] /VSamples [1 1 1 1]
  >>
  /JPEG2000GrayACSImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /JPEG2000GrayImageDict <<
    /TileWidth 256
    /TileHeight 256
    /Quality 30
  >>
  /AntiAliasMonoImages false
  /CropMonoImages true
  /MonoImageMinResolution 1200
  /MonoImageMinResolutionPolicy /OK
  /DownsampleMonoImages false
  /MonoImageDownsampleType /Average
  /MonoImageResolution 1200
  /MonoImageDepth -1
  /MonoImageDownsampleThreshold 1.50000
  /EncodeMonoImages true
  /MonoImageFilter /CCITTFaxEncode
  /MonoImageDict <<
    /K -1
  >>
  /AllowPSXObjects false
  /CheckCompliance [
    /None
  ]
  /PDFX1aCheck false
  /PDFX3Check false
  /PDFXCompliantPDFOnly false
  /PDFXNoTrimBoxError true
  /PDFXTrimBoxToMediaBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXSetBleedBoxToMediaBox true
  /PDFXBleedBoxToTrimBoxOffset [
    0.00000
    0.00000
    0.00000
    0.00000
  ]
  /PDFXOutputIntentProfile (None)
  /PDFXOutputConditionIdentifier ()
  /PDFXOutputCondition ()
  /PDFXRegistryName ()
  /PDFXTrapped /False

  /CreateJDFFile false
  /Description <<
    /CHS <FEFF4f7f75288fd94e9b8bbe5b9a521b5efa7684002000500044004600206587686353ef901a8fc7684c976262535370673a548c002000700072006f006f00660065007200208fdb884c9ad88d2891cf62535370300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c676562535f00521b5efa768400200050004400460020658768633002>
    /CHT <FEFF4f7f752890194e9b8a2d7f6e5efa7acb7684002000410064006f006200650020005000440046002065874ef653ef5728684c9762537088686a5f548c002000700072006f006f00660065007200204e0a73725f979ad854c18cea7684521753706548679c300260a853ef4ee54f7f75280020004100630072006f0062006100740020548c002000410064006f00620065002000520065006100640065007200200035002e003000204ee553ca66f49ad87248672c4f86958b555f5df25efa7acb76840020005000440046002065874ef63002>
    /DAN <FEFF004200720075006700200069006e0064007300740069006c006c0069006e006700650072006e0065002000740069006c0020006100740020006f007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e007400650072002000740069006c0020006b00760061006c00690074006500740073007500640073006b007200690076006e0069006e006700200065006c006c006500720020006b006f007200720065006b007400750072006c00e60073006e0069006e0067002e0020004400650020006f007000720065007400740065006400650020005000440046002d0064006f006b0075006d0065006e0074006500720020006b0061006e002000e50062006e00650073002000690020004100630072006f00620061007400200065006c006c006500720020004100630072006f006200610074002000520065006100640065007200200035002e00300020006f00670020006e0079006500720065002e>
    /DEU <FEFF00560065007200770065006e00640065006e0020005300690065002000640069006500730065002000450069006e007300740065006c006c0075006e00670065006e0020007a0075006d002000450072007300740065006c006c0065006e00200076006f006e002000410064006f006200650020005000440046002d0044006f006b0075006d0065006e00740065006e002c00200076006f006e002000640065006e0065006e002000530069006500200068006f00630068007700650072007400690067006500200044007200750063006b006500200061007500660020004400650073006b0074006f0070002d0044007200750063006b00650072006e00200075006e0064002000500072006f006f0066002d00470065007200e400740065006e002000650072007a0065007500670065006e0020006d00f60063006800740065006e002e002000450072007300740065006c006c007400650020005000440046002d0044006f006b0075006d0065006e007400650020006b00f6006e006e0065006e0020006d006900740020004100630072006f00620061007400200075006e0064002000410064006f00620065002000520065006100640065007200200035002e00300020006f0064006500720020006800f600680065007200200067006500f600660066006e00650074002000770065007200640065006e002e>
    /ESP <FEFF005500740069006c0069006300650020006500730074006100200063006f006e0066006900670075007200610063006900f3006e0020007000610072006100200063007200650061007200200064006f00630075006d0065006e0074006f0073002000640065002000410064006f0062006500200050004400460020007000610072006100200063006f006e00730065006700750069007200200069006d0070007200650073006900f3006e002000640065002000630061006c006900640061006400200065006e00200069006d0070007200650073006f0072006100730020006400650020006500730063007200690074006f00720069006f00200079002000680065007200720061006d00690065006e00740061007300200064006500200063006f00720072006500630063006900f3006e002e002000530065002000700075006500640065006e00200061006200720069007200200064006f00630075006d0065006e0074006f00730020005000440046002000630072006500610064006f007300200063006f006e0020004100630072006f006200610074002c002000410064006f00620065002000520065006100640065007200200035002e003000200079002000760065007200730069006f006e0065007300200070006f00730074006500720069006f007200650073002e>
    /FRA <FEFF005500740069006c006900730065007a00200063006500730020006f007000740069006f006e00730020006100660069006e00200064006500200063007200e900650072002000640065007300200064006f00630075006d0065006e00740073002000410064006f00620065002000500044004600200070006f007500720020006400650073002000e90070007200650075007600650073002000650074002000640065007300200069006d007000720065007300730069006f006e00730020006400650020006800610075007400650020007100750061006c0069007400e90020007300750072002000640065007300200069006d007000720069006d0061006e0074006500730020006400650020006200750072006500610075002e0020004c0065007300200064006f00630075006d0065006e00740073002000500044004600200063007200e900e90073002000700065007500760065006e0074002000ea0074007200650020006f007500760065007200740073002000640061006e00730020004100630072006f006200610074002c002000610069006e00730069002000710075002700410064006f00620065002000520065006100640065007200200035002e0030002000650074002000760065007200730069006f006e007300200075006c007400e90072006900650075007200650073002e>
    /ITA <FEFF005500740069006c0069007a007a006100720065002000710075006500730074006500200069006d0070006f007300740061007a0069006f006e00690020007000650072002000630072006500610072006500200064006f00630075006d0065006e00740069002000410064006f006200650020005000440046002000700065007200200075006e00610020007300740061006d007000610020006400690020007100750061006c0069007400e00020007300750020007300740061006d00700061006e0074006900200065002000700072006f006f0066006500720020006400650073006b0074006f0070002e0020004900200064006f00630075006d0065006e007400690020005000440046002000630072006500610074006900200070006f00730073006f006e006f0020006500730073006500720065002000610070006500720074006900200063006f006e0020004100630072006f00620061007400200065002000410064006f00620065002000520065006100640065007200200035002e003000200065002000760065007200730069006f006e006900200073007500630063006500730073006900760065002e>
    /JPN <FEFF9ad854c18cea51fa529b7528002000410064006f0062006500200050004400460020658766f8306e4f5c6210306b4f7f75283057307e30593002537052376642306e753b8cea3092670059279650306b4fdd306430533068304c3067304d307e3059300230c730b930af30c830c330d730d730ea30f330bf3067306e53705237307e305f306f30d730eb30fc30d57528306b9069305730663044307e305930023053306e8a2d5b9a30674f5c62103055308c305f0020005000440046002030d530a130a430eb306f3001004100630072006f0062006100740020304a30883073002000410064006f00620065002000520065006100640065007200200035002e003000204ee5964d3067958b304f30533068304c3067304d307e30593002>
    /KOR <FEFFc7740020c124c815c7440020c0acc6a9d558c5ec0020b370c2a4d06cd0d10020d504b9b0d1300020bc0f0020ad50c815ae30c5d0c11c0020ace0d488c9c8b85c0020c778c1c4d560002000410064006f0062006500200050004400460020bb38c11cb97c0020c791c131d569b2c8b2e4002e0020c774b807ac8c0020c791c131b41c00200050004400460020bb38c11cb2940020004100630072006f0062006100740020bc0f002000410064006f00620065002000520065006100640065007200200035002e00300020c774c0c1c5d0c11c0020c5f40020c2180020c788c2b5b2c8b2e4002e>
    /NLD (Gebruik deze instellingen om Adobe PDF-documenten te maken voor kwaliteitsafdrukken op desktopprinters en proofers. De gemaakte PDF-documenten kunnen worden geopend met Acrobat en Adobe Reader 5.0 en hoger.)
    /NOR <FEFF004200720075006b00200064006900730073006500200069006e006e007300740069006c006c0069006e00670065006e0065002000740069006c002000e50020006f0070007000720065007400740065002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740065007200200066006f00720020007500740073006b00720069006600740020006100760020006800f800790020006b00760061006c00690074006500740020007000e500200062006f007200640073006b0072006900760065007200200065006c006c00650072002000700072006f006f006600650072002e0020005000440046002d0064006f006b0075006d0065006e00740065006e00650020006b0061006e002000e50070006e00650073002000690020004100630072006f00620061007400200065006c006c00650072002000410064006f00620065002000520065006100640065007200200035002e003000200065006c006c00650072002000730065006e006500720065002e>
    /PTB <FEFF005500740069006c0069007a006500200065007300730061007300200063006f006e00660069006700750072006100e700f50065007300200064006500200066006f0072006d00610020006100200063007200690061007200200064006f00630075006d0065006e0074006f0073002000410064006f0062006500200050004400460020007000610072006100200069006d0070007200650073007300f5006500730020006400650020007100750061006c0069006400610064006500200065006d00200069006d00700072006500730073006f0072006100730020006400650073006b0074006f00700020006500200064006900730070006f00730069007400690076006f0073002000640065002000700072006f00760061002e0020004f007300200064006f00630075006d0065006e0074006f00730020005000440046002000630072006900610064006f007300200070006f00640065006d0020007300650072002000610062006500720074006f007300200063006f006d0020006f0020004100630072006f006200610074002000650020006f002000410064006f00620065002000520065006100640065007200200035002e0030002000650020007600650072007300f50065007300200070006f00730074006500720069006f007200650073002e>
    /SUO <FEFF004b00e40079007400e40020006e00e40069007400e4002000610073006500740075006b007300690061002c0020006b0075006e0020006c0075006f0074002000410064006f0062006500200050004400460020002d0064006f006b0075006d0065006e007400740065006a00610020006c0061006100640075006b006100730074006100200074007900f6007000f60079007400e400740075006c006f0073007400750073007400610020006a00610020007600650064006f007300740075007300740061002000760061007200740065006e002e00200020004c0075006f0064007500740020005000440046002d0064006f006b0075006d0065006e00740069007400200076006f0069006400610061006e0020006100760061007400610020004100630072006f0062006100740069006c006c00610020006a0061002000410064006f00620065002000520065006100640065007200200035002e0030003a006c006c00610020006a006100200075007500640065006d006d0069006c006c0061002e>
    /SVE <FEFF0041006e007600e4006e00640020006400650020006800e4007200200069006e0073007400e4006c006c006e0069006e006700610072006e00610020006f006d002000640075002000760069006c006c00200073006b006100700061002000410064006f006200650020005000440046002d0064006f006b0075006d0065006e00740020006600f600720020006b00760061006c00690074006500740073007500740073006b0072006900660074006500720020007000e5002000760061006e006c00690067006100200073006b0072006900760061007200650020006f006300680020006600f600720020006b006f007200720065006b007400750072002e002000200053006b006100700061006400650020005000440046002d0064006f006b0075006d0065006e00740020006b0061006e002000f600700070006e00610073002000690020004100630072006f0062006100740020006f00630068002000410064006f00620065002000520065006100640065007200200035002e00300020006f00630068002000730065006e006100720065002e>
    /ENU (Use these settings to create Adobe PDF documents for quality printing on desktop printers and proofers.  Created PDF documents can be opened with Acrobat and Adobe Reader 5.0 and later.)
  >>
  /Namespace [
    (Adobe)
    (Common)
    (1.0)
  ]
  /OtherNamespaces [
    <<
      /AsReaderSpreads false
      /CropImagesToFrames true
      /ErrorControl /WarnAndContinue
      /FlattenerIgnoreSpreadOverrides false
      /IncludeGuidesGrids false
      /IncludeNonPrinting false
      /IncludeSlug false
      /Namespace [
        (Adobe)
        (InDesign)
        (4.0)
      ]
      /OmitPlacedBitmaps false
      /OmitPlacedEPS false
      /OmitPlacedPDF false
      /SimulateOverprint /Legacy
    >>
    <<
      /AddBleedMarks false
      /AddColorBars false
      /AddCropMarks false
      /AddPageInfo false
      /AddRegMarks false
      /ConvertColors /NoConversion
      /DestinationProfileName ()
      /DestinationProfileSelector /NA
      /Downsample16BitImages true
      /FlattenerPreset <<
        /PresetSelector /MediumResolution
      >>
      /FormElements false
      /GenerateStructure true
      /IncludeBookmarks false
      /IncludeHyperlinks false
      /IncludeInteractive false
      /IncludeLayers false
      /IncludeProfiles true
      /MultimediaHandling /UseObjectSettings
      /Namespace [
        (Adobe)
        (CreativeSuite)
        (2.0)
      ]
      /PDFXOutputIntentProfileSelector /NA
      /PreserveEditing true
      /UntaggedCMYKHandling /LeaveUntagged
      /UntaggedRGBHandling /LeaveUntagged
      /UseDocumentBleed false
    >>
  ]
>> setdistillerparams
<<
  /HWResolution [2400 2400]
  /PageSize [612.000 792.000]
>> setpagedevice