Your Paper's Title Starts Here: Please Center


Gathering Correlated Data in Wireless Sensor Networks Using a 
Heuristic Algorithm 

Yuexian Wang, Cheng Chew Lim 
School of Electrical and Electronic Engineering, the University of Adelaide, Adelaide, 

Australia, {jwang, cclim}@eleceng.adelaide.edu.au 

Abstract. We propose a routing scheme for data gathering and aggregation in wireless sensor networks. The scheme 

aims to optimise an aggregation tree in order to minimise the energy dissipation of data aggregation and transmission.  

A modified particle swarm optimisation algorithm is developed in the proposed scheme. In addition, the routing scheme 

uses a generic data aggregation model which accommodates different correlation conditions. The performance of the 

proposed scheme is evaluated and compared with three other existing data gathering algorithms. Simulation results 

show that the proposed scheme outperforms existing algorithms in terms of energy consumption and that the scheme 

can adapt to the change of network connectivity and data correlation conditions.  

Keywords: Sensor networks, data aggregation, particle swarm optimisation, correlation coefficient, energy 

consumption 

1. Introduction 

In wireless sensor networks, a large number of sensor nodes are scattered in an observed region and left unattended 

after deployment, and all nodes in the network are thus energy-constrained. Energy consumption, scalability, and load 

balancing are important requirements for many data gathering sensor network applications. The tree-based data 

gathering algorithm is proposed for achieving development in this area, such as the shortest path tree algorithm [1], the 

minimum spanning tree algorithm [2] and the shallow light tree algorithm [3]. In a tree-based network, sensor nodes are 

organised into a tree structure where data aggregation is performed at intermediate nodes along the tree and a concise 

representation of the data is transmitted to the root node. Therefore, the research problem is to find out a routing tree 

with data aggregation to achieve minimum energy expense. This problem is an NP-complete problem [4,] because it can 

be reducible to a weighted set cover problem in graph theory, which has been shown to be NP complete [5].  

In this paper, we propose a routing scheme for data gathering and aggregation in wireless sensor networks. The 

scheme utilises a heuristic algorithm to optimise data traffic and the transmission structure in terms of energy 

consumption. Using this heuristic algorithm, significant data traffic can be reduced since more reasonable aggregator 

nodes to eliminate data redundancy can be selected. In order to reduce the energy consumption for data transmission, 

the heuristic algorithm optimises transmission structure as well. In addition, a generic aggregation model which does 

not depend on any specific relationship among correlated data is exploited in order to adapt to a variety of applications.  

The remainder of this paper is organised as follows: Section 2 gives the introduction to the standard particle swarm 

optimisation algorithm. The system models of our new scheme are presented in Section 3. The details of the proposed 

modified particle swarm optimisation algorithm for the routing scheme are described in Section 4. Section 5 discusses 

the performance evaluation and simulation results and Section 6 concludes the paper. 

2. Standard Particle Swarm Optimisation Algorithm 

The particle swarm optimisation (PSO) algorithm stems from the simulation of a simplified society model, and it 

was proposed as a typical heuristic algorithm inspired by the behaviour of bird flocking [6]. The PSO was used to 

search optimal or near-optimal solutions in large search space. Assume that in a D-dimensional searching space m 

particles compose a swarm and fly with a certain velocity, where the particle i denotes a D-dimensional vector 

},...,,{
21 idiii

xxxx  , i=1, 2, ..., m. The performance of 
i

x  is assessed by its fitness value which is calculated 

through substituting 
i

x  into the objective function. The motion of particles changes according to the following 

equations: 

)()()()(
21

1 t

id

t

gbest

t

id

t

ibest

t

id

t

id
xprandcxprandcvv 


 ,                                                                      (1) 

11 


t

id

t

id

t

id
vxx ,                                                                                                                                                      (2) 

Where t is the iteration number, 
id

v  denotes the velocity of the particle i in the dth dimensional (1 ≤ d ≤ D) particle 

swarm space, 
id

x  represents the particle i current position, 
ibest

p  is its best previous position, and 
gbest

p  is the best 

position among all particles in the population. rand() is a random function with a range [0, 1]. 
1

c  and 
2

c  are learning 

factors used to accelerate the motion speed of particles. The inertia weight, ω is a user-specified parameter that controls, 

mailto:cclim%7d@eleceng.adelaide.edu.au


with 
1

c  and 
2

c , the impact of previous historical values of particle velocities on its current one. A large inertia weight 

facilitates global exploration while a small inertia weight facilitates fine-tuning local search.  

3. System Models 

A.Network Model. One base station and n sensors are assumed to be distributed uniformly in a field and quasi-

stationary. The base station sends a query and k (k   n) sensors respond to that query. Wireless links are considered 
bidirectional and symmetric so that any two nodes can communicate using the same transmission power levels only if 

they are within range of each other. Sensor nodes are homogenous and arbitrarily allocated with equal initial energy. In 

addition, nodes are not location-aware and left unattended after deployment. 

B.Data Correlation and Aggregation Model. We assume that there is correlation among the data generated by the 

k sensors. To possibly accommodate a wide range of scenarios, we abstract data redundancy among two sensor nodes 

using a single value ρ, termed the correlation coefficient. Assume that data amount after aggregation is not less than any 

of its inputs and not more than the sum of all inputs. Let 
iv

R  and 
jv

R  be the amount of data generated by the node 
i

v  

and 
j

v  in response to a query. Without loss of generality, if
ji

vv
RR  , the output ),(

ji
vvR after data aggregation is: 

ji vvji
RRvvR )1(),(  ,                                                                                                                                 (3) 

C.Energy Model. We adopt both the free-space propagation model to approximate path loss sustained because of 

wireless channel transmission [7]. The energy expanded by the radio for transmitting and receiving an l bit message 

over a distance d, is given by: 
2

),( dllEdlE
fselecTx

 ,                                                                                                                            (4)                                     

elecRx
lElE )(                                                                                                                                                (5) 

Where 
Tx

E  is the energy dissipated in the transmitter, 
elec

E  is the per bit energy dissipation for running the 

transceiver circuitry, 
fs

  is the amplifier parameter for the free space propagation model, and 
Rx

E  is the radio expends 

of receiver.  

4. Problem Formulation 

Given the set of source node S and the base station d in wireless sensor networks G(V, E), our objective is to find a 

connected subgraph G′(V′, E′)  G, where S  V′, and d  V′, in order to minimise the energy consumption for 
transmitting data from all source nodes to the base station. The objective function is formulated as follows:  


 '

),(min
Vv

iv

i

i
dvR  ,                                                                                                                                               (6) 

Where ),( dv
i

 is the total weight of edges on path from node 
i

v  to base station on constructed aggregation tree. 

In wireless networks, the weight of each edge can be considered as the energy expended for per unit data 

communication and data aggregation. 
iv

R  is the amount of data generated by node 
i

v . 

5. PSO Modified by Genetic Algorithm for Routing with Aggregation 

The standard PSO algorithm is the real valued PSO, and it cannot operate addition or subtraction directly on the 

path. Hence, the standard PSO algorithm should be extended to deal with the discrete optimisation problems which 

require the ordering or arranging of discrete elements, eg. the routing tree with aggregation problem. Facing the 

problems, we propose a PSO algorithm which is modified by the genetic algorithm [8] in order to address the discrete 

nature of our optimisation problem. The core of this algorithm is an equivalent form of velocity and displacement 

formulae combining the thought of genetic algorithm. Hence, Eq. 1and Eq. 2 are replaced by Eq. 7 and Eq. 8. 

)()(
1 t

id

t

gbest

t

id

t

ibest

t

id

t

id
xpxpvv 


 ,                                                                                                   (7) 

   
11 


t

id

t

id

t

id
vxx ,                                                                                                                                                     (8) 

Where the operator   represents a composition, and the operator   indicates information exchange which is 
implemented by crossover.  

A.Encoding. Encoding involves coding paths serial numbers into a feasible solution (or a position) in the search 

space. We represent the individual, for a specific aggregation tree, as a string of node numbers. The length of each 

individual is always equal to the number of relay nodes. A routing scheme for a network with 7 relay nodes, and one 

base station, is shown in Fig. 1(a) and the corresponding particle is shown in Fig. 1(b).  


 (a) Routing tree                                   (b) Individual (Solution representation) 

Fig.1 Encoding a routing tree as an individual 

B.Population Initialisation. In general, there are two ways to generate the initial population, heuristic initialisation 

and random initialisation. For most large scale problems, such as network communication design, random initialisation 

has better effect on global optimal solutions. The main idea of our work is that the correspondence of nodes behaviour 

through local optimisation is greater than the correspondence of nodes behaviour in random initialisation. The SPT 

algorithm and the MST algorithm have good results in the initial population, when correlation coefficient is equal to 0 

and 1 respectively. Hence the initial population should include the shortest path tree and the minimum spanning tree.  

C.Fitness Function. We define the fitness value as the energy consumption of the network, and our metric for total 

energy dissipation takes into consideration including the transmit energy, the receive energy and aggregation energy.  

D.Selection. The function of selection operator is to select individuals which have relative large fitness values with 

relative large probabilities from their parent generation. The PSO algorithm then puts the chosen individuals in the 

offspring generation and waits for further evolution implemented by crossover and mutation. We adopt the ranking 

selection [9] as a part of implementation of the modified PSO algorithm in this paper. 

E.Crossover. The operator   in Eq. 7 indicates information exchange which is similar to the operation of 

crossover in the genetic algorithm, so we use the crossover as the equivalent form of the   operation. Because of the 
requirement of encoding, source nodes are known at the stage of crossover. First, we randomly select a gene in the locus 

of the same source node between the optimal individual and a generic individual and put it into the same locus of the 

offspring. We then make the same selection at the locus of the relay node which is indicated by the previous chosen 

gene. This process is repeated until the base station is found. For the rest nodes which are not used, genes from the same 

locus between the two individuals are selected randomly. Fig. 2 is an example for the crossover operation. 

 
Fig.2 Crossover between the optimal individual and the generic individual. 

F.Mutation. Multiplying ω by 
k

id
v  in Eq. 7, indicating that a particle searches toward a new space, is similar to the 

operation of mutation in the genetic algorithm, so we use the mutation as the equivalent form of
k

id
v . We select a node 

3 to substitute for node 6 from set Ω where the node 3 has the same previous hop node 7 and next hop node 8 as the nod 

6. This process is illustrated by Fig. 3.    

 
Fig.3 Examples of the mutation operation. 

G.Addition. The addition operation indicates the sum of several processing steps. Through orderly executing the 

equivalent subitems in the PSO algorithm, we can obtain the effect of addition. In the equivalence of the addition, the 

outputs of previous steps are the inputs of latter steps. According to Eq. 7 and Eq. 8, the computation sequence is from 

right to left. Applying two crossover operations and one mutation operation to a generic aggregation tree, we can 

acquire an offspring and then put it in the population for the next iteration. 

6. Simulation Results and Discussions 

In our simulation, average energy dissipation and the maximum run (the first node dies) are metrics used to compare 

the PSO algorithm with three existing tree-based data gathering algorithms: the shortest path tree (SPT) algorithm, the 

minimum spanning tree (MST) algorithm, and the shallow light tree (SLT) algorithm. 

A square region of size 50m×50m is uniformly divided into 100 grids, and each sensor is deployed in one grid. 

Since the coordinate of each sensor is determined by a random function which can generate uniformly distributed 

random numbers within the interval [0, 50], the sensor nodes are randomly distributed. Each node is equipped with 1J 

initial energy at the beginning of the simulation. Every node transmits a 4000 bit message per round to the base station. 


The calculation of energy dissipation in the simulations is based on Eq. 4 and Eq. 5. We use the same parameters as 

that in [10]: 
elec

E =50nJ/bit and
fs

 =10pJ/bit/
2

m . Moreover, the per bit energy dissipation for aggregating data is set 

as 
da

E =15nJ/bit/signal and the transmission range is set as R =20 m . 

 
(a)                                                                                       (b) 

Fig.4 Influence of the number of source nodes on (a) mean of energy consumption versus the number when ρ 
approaches 0 and (b)  mean of energy consumption versus the number of source nodes when ρ approaches 1 

Fig.4 (a) shows the mean of energy consumption per round versus the number of source nodes using an aggregation 

tree with the PSO algorithm, the MST algorithm, the SPT algorithm, and the SLT algorithm. Clearly, when ρ 

approaches 0 the PSO algorithm performs as well as the SPT algorithm and reduces energy consumption significantly 

compared with the MST algorithm and the SLT algorithm. A reduction of average energy consumption of about 40% 

and 50% can be obtained by the PSO algorithm as well as the SPT algorithm over the SLT algorithm and the MST 

algorithm, respectively. In a network with poor correlation, the PSO algorithm reduces to the SPT algorithm, and data 

should be transmitted directly to the base station via the shortest paths instead of aggregator nodes by detouring, since 

there is no redundancy among the data, and aggregation at any relay node is not efficient in reducing the data amount.  

When ρ approaches 1 as illustrated in Fig.4 (b), the PSO algorithm performs better than all other algorithms with 

regard to the mean of energy consumption. A reduction of average energy consumption of about 5%, 5%, and 40% can 

be obtained by the PSO algorithm over the MST algorithm, the SLT algorithm, and the SPT algorithm respectively. The 

performance of the MST algorithm is the closest to that of the PSO algorithm. However, since some nodes within one 

hop to the base station would waste energy for detouring to aggregator nodes rather than transmitting directly to the 

base station, the MST algorithm cannot achieve the optimal performance. Because the nodes can not sufficiently take 

advantage of data aggregation to reduce the transmission task, the SPT algorithm expends the most energy compared 

with the three other algorithms. The SLT algorithm can balance between data aggregation and direct transmission, but it 

gets the benefit implicitly. Hence the energy consumption of the PSO algorithm increases more slowly than the SLT 

algorithm with the increase in the number of source nodes. 

Fig.5(a) illustrates the average energy consumption of the four algorithms when the number of source node k=15. 

The costs of all algorithms decrease with the increase in ρ, the correlation coefficient. This exemplifies that data 

aggregation in sensor networks can greatly benefit the routing performance by reducing redundancy among correlated 

data. In addition, the PSO algorithm evidently outperforms other algorithms in the energy consumption with the 

increase in the correlation coefficient. A reduction of average energy consumption of about 25%, 30%, and 35% can be 

obtained by the PSO algorithm over the SLT algorithm, the MST algorithm, and the SPT algorithm, respectively. When 

ρ is small, the SPT algorithm performs well. However, it does not benefit from the increase in ρ as it cannot efficiently 

perform data aggregation to eliminate redundancy among data. In the contrast, the MST algorithm gets the worst 

performance when ρ is small since it pursues data aggregation but data are actually low correlated. The PSO algorithm 

performs much better than the SLT algorithm since the PSO algorithm recalculates total energy dissipation in every 

stage to get perfect matching and thus can adapt to the correlation among nodes. 

We can see from Fig. 5(b) that the maximum run of all algorithms grows with the increase in ρ. The main reason is 

that data aggregation can significantly reduce traffic by eliminating redundancy among correlated data and hence 

consume less energy. Since aggregator nodes receive data from different nodes and perform data aggregation, they 

always die earlier than others due to the heavy load. When ρ < 0.8, the SPT algorithm outperforms other algorithms 

markedly because data are opportunistically aggregated and the amount of data for aggregation is not large. The SPT 

algorithm does not benefit well when ρ increases from 0.8 to 1 due to insufficient data aggregation. Because the MST 

algorithm and the SLT algorithm prefer to detour to pursue data aggregation, large amount of data may be aggregated 

on nodes near the sources, and these aggregator nodes will exhaust energy earlier than that in the PSO algorithm. In 

addition, the maximum run of the PSO algorithms drops dramatically when ρ varies from 0 to 0.2 because the PSO 


algorithm can find out more aggregator nodes to share the same paths and the load of aggregator nodes becomes heavier.  

 
                                                   (a)                                                                                   (b) 

Fig.5 Influence of correlation coefficient on (a) mean of energy consumption versus the correlation coefficient when 
k=15 and (b)  The number of run versus the correlation coefficient when k=15 

7. Conclusions 

We propose an energy-efficient data gathering and aggregation routing scheme that caters for the energy challenges 

in wireless sensor networks. The particle swarm optimisation algorithm is utilised to optimise a joint objective between 

data traffic and transmission structure. The scheme also adopts a generic data aggregation model which accommodates 

different correlation conditions. Simulation results show that the particle swarm optimisation algorithm outperforms its 

counterparts in terms of energy consumption. Although there is a small sacrifice in the maximum run due to the heavy 

load of aggregator nodes, the particle swarm optimisation algorithm can be regarded as a good tradeoff since it can save 

energy and adapt to the change of network connectivity and data correlation conditions. 

References 

[1] W. Heinzelman, J. Kulik, and H. Balakrishnan, “Adaptive protocols for information dissemination in wireless sensor 

networks,” in Proc. ACM MobiCom, Aug. 1999, pp. 174-185. 

[2] R. C. Prim, “Shortest connection networks and some generalization,” Bell System Technical, vol. 36, no. 6, pp. 

1389-1401, 1957. 

[3] A. Goel and K. Munagala, “Balancing steiner trees and shortest path trees online,” in Proc. 11th ACM-SIAM Symp. 

on Discrete Algorithms, Jan. 2000, pp. 562-563. 

[4] R.Cristescu, B. Beferull-Lozano, and M. Vetterli, “On network correlated data gathering,” in Proc. IEEE 

INFOCOM, vol. 4, Mar. 2004, pp. 2571-2582. 

[5] M. R. Garey and D. Johnson, Computers and intractability: A guide to the theory of NP-Completeness. W. H. 

Freeman, 1979. 

[6] J. Kennedy and R. C. Eberhart, “Particle Swarm Optimization,” in the IEEE Conf. on Neural Networks, IV, 1995, 

pp. 1942-1948. 

[7] H. Xia, H. L. Bertoni, and L. R. Maciel, “Radio propagation characteristics for line-of-sight microcellular and 

personal communications,” Antenna and Propagation, vol. 41, no. 10, pp. 1439-1447, Oct. 1993. 

[8] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, 3rd ed., Springer-Verlag, 1996. 

[9] J. E. Baker, “Adaptive selection methods for genetic algorithms,” in Proc. the 1st International Conference on 

Genetic Algorithms, 1985, pp. 101-111. 

[10] W. R. Heinzelman, A. Chandrakasan, and H. Balakrishnan, “An application-specific protocol architecture for 

wireless microsensor networks,” Wireless Communications, vol. 1, no. 4, pp. 660-670, October 2002.