key: cord-0120370-9ssvdn72
authors: Rahman, Aniq Ur; Fourati, Fares; Ngo, Khac-Hoang; Jindal, Anish; Alouini, Mohamed-Slim
title: Network Graph Generation through Adaptive Clustering and Infection Dynamics: A Step Towards Global Connectivity
date: 2021-11-20
journal: nan
DOI: nan
sha: 9bf79b705fb4c5ad8342a6c37f85f23acada79ba
doc_id: 120370
cord_uid: 9ssvdn72

More than 40% of the world's population is not connected to the internet, majorly due to the lack of adequate infrastructure. Our work aims to bridge this digital divide by proposing solutions for network deployment in remote areas. Specifically, a number of access points (APs) are deployed as an interface between the users and backhaul nodes (BNs). The main challenges include designing the number and location of the APs, and connecting them to the BNs. In order to address these challenges, we first propose a metric called connectivity ratio to assess the quality of the deployment. Next, we propose an agile search algorithm to determine the number of APs that maximizes this metric and perform clustering to find the optimal locations of the APs. Furthermore, we propose a novel algorithm inspired by infection dynamics to connect all the deployed APs to the existing BNs economically. To support the existing terrestrial BNs, we investigate the deployment of non-terrestrial BNs, which further improves the network performance in terms of average hop count, traffic distribution, and backhaul length. Finally, we use real datasets from a remote village to test our solution.

T HE research and development of future communication networks has been driven towards providing faster and more reliable connection for urban and developed regions. Provisioning connectivity to remote regions has been relegated to the bottom. In 2019, about 87% of people in developed countries were connected to the Internet, while in striking contrast only 19% of people in the least developed countries were connected [1] . This means that the most vulnerable to the COVID-19 pandemic were also those do not have access to online tools to respond to the impact of the pandemic. The pandemic has thus exacerbated the lingering digital divide. This calls for a consensus to provide broadband connectivity to rural/remote regions in 6G [2] , [3] .

One of the main challenges in establishing broadband connectivity in remote areas is the deployment of mobile backhaul solutions. Due to high deployment costs, network operators have been reluctant to deploy fiber optics. Therefore, rural/remote backhaul relies mostly on wireless solutions, such as microwave, free-space optics (FSO), and satellite [4] . In any case, taking both capital expenditures and long-term operational expenditures into account, only few of backhaul nodes (BNs) would be deployed in denser areas. However, a non-negligible fraction of rural population are scattered in isolated villages with geographic barriers, such as mountains and forests, to the main BNs. Therefore, the deployment of access points (APs) in proximity to the users for fronthaul connectivity should be carefully designed. A cost analysis of different solutions for fronthaul and backhaul connectivity in rural areas was reported in [4] . Design and analysis of rural networks based on different solutions have been reported, such as long-range Wi-Fi [5] , drones [6] , and satellites [7] . For example, Viasat has a fleet of satellites capable of providing global coverage in the Ka-band [8] . Having such satellite BNs can bring connectivity to the most remote locations on earth, thereby bridging the digital divide. The existing literature does not provide a general algorithm to deploy frugal networks [9] in any location, so as to connect its unconnected population to the internet. Therefore, in this work, we address the problem of network deployment in rural/remote areas in a systematic manner. Given a set of few terrestrial BNs available in a sparsely populated region, we aim to design the deployment of APs to serve the scattered users. Specifically, we optimize the number and locations of the APs and the network configuration to connect those APs to the BNs. We also explore the use of non-terrestrial BNs to further improve the connectivity performance of the network.

Contributions and Organization. The contributions of our work are summarized as follows.

• We propose a metric called connectivity ratio to assess the quality of network deployment, balancing user coverage and deployment cost. This metric is used as the optimization objective to determine the optimal set of APs. • We convert the connectivity ratio maximization problem to relaxed sub-problems, where we first determine the number of APs through an agile search algorithm and then perform weighted clustering to place the APs. • We propose a novel algorithm inspired by infection dynamics, to economically connect all the deployed APs to the BNs. • We investigate the effect of adding non-terrestrial BNs on the network performance.

The remainder of the paper is organised as follows. In Sec. II, we define the coverage ratio and solve the maximization of this metric. In Sec. III, we propose a graph generation algorithm inspired by infection dynamics, and also investigate the use of non-terrestrial BNs. The paper is finally concluded in Sec. IV with some comments on future works.

Notation. We denote the set of integers from m to n by [[m, n]]; Area(U) denotes the area of the convex hull of a set U in a 2D space; · denotes the Euclidean distance. 

acess points Finding optimal number of clusters population Fig. 1 : Generating AP locations through iterative clustering, based on the spatial distribution of population.

We consider a set of users U scattered in a two-dimensional region of interest in the presence of a small number of terrestrial BNs. To cover the users, we deploy a set of APs A. Each AP a in A covers the users in a circular region of radius R, denoted by u a U ∩ B (a, R), where B(x, l) denotes a circle of radius l centred at x. The number of users covered by at least one AP in A is denoted by C A a∈A u a . Note that due to the limited range of the APs, not all the users are guaranteed to be covered, i.e., |C A | ≤ |U |.

To effectively deploy the APs, we need to determine the number of APs and their positions. On the one hand, the number of APs needs to be sufficiently large so that A can collectively cover the region. On the other hand, an excessive number of APs increases the deployment cost. This calls for a design metric that balances between user coverage and deployment cost, which remains unclear in the literature. To this end, we propose a metric called connectivity ratio. Definition 1. The connectivity ratio ρ(A) associated with the set of APs A is defined as:

The connectivity ratio is the product of two important metrics: (i) the average number of users per AP |C A | |A| , and (ii) the coverage ratio |C A | |U | . A deployment with large connectivity provides connectivity to a majority of the users while minimizing the number of APs, as interpreted in the following remark.

. If the two deployments use the same number of APs, i.e., |A 1 | = |A 2 |, then A 1 covers a larger number of users, i.e., |C A1 | > |C A2 |. If they covers the same number of users, i.e., |C A1 | = |C A2 |, then A 1 uses a smaller number of APs, i.e., |A 1 | < |A 2 |, thus saves the deployment cost.

Therefore, to balance between maximizing coverage and minimizing the deployment cost, we maximize the connectivity ratio.

Problem 1. Generate a set of AP locations A * such that the connectivity ratio ρ(A * ) is maximized, i.e.,

For a fixed number of APs k = |A|, one can optimize the positions of the APs by clustering [10] - [12] the set of all users U into k clusters and place an AP at the centroid of each cluster. We denote the set of APs generated from this clustering by A = ψ(k). Nevertheless, in our setup, k is unknown a priori and also needs to be optimized. In order to simplify Problem 1 while exploiting existing clustering algorithms, we decouple the optimization of k and of the positions of the APs as follows. First, we optimize the number of clusters k as

Then, the set of clusters is generated as A * = ψ(k * ).

Solving the optimization of k in (3) through exhaustive search has a worst-case complexity [10] of: O |U | k=1 k|U| k+1 , which is computationally expensive, especially when the number of users |U| is large. Therefore, we aim to reduce the search space.

With a slight abuse of notation, hereafter we write ρ(ψ(k)) simply as ρ(k) for convenience. When k increases from a small value, the connectivity ratio ρ(k) increases since for small number of clusters, each added AP helps covering more users. Specifically, in this regime, |C A | 2 increases faster than k, thus it follows from (1) that ρ(k) increases. However, for large values of k, the coverage zones of the APs begin to overlap and cover the same population. Once the majority of the users have been connected, adding more APs increases the denominator of ρ(k) while the numerator |C A | 2 remains approximately the same. This suggests that ρ(k) decreases after a certain value of k. This is made precise in the following proposition, where we invoke the definition of covering in Appendix A.

where U R/2 denotes the union of the circles of radius R/2, each centered at a point in U.

Proof. By definition of covering, k max is the smallest number of APs for which all users are covered, i.e., |C ψ(kmax) | = |U|.

Since |C ψ(k) | is non-decreasing in k, it holds that |C ψ(k) | = |U|, ∀k > k max . Note that k max is guaranteed to be finite as k max ≤ |U |. It follows that for k > k max , the connectivity ratio is given by ρ(k) = |U | k , which is obviously a decreasing function of k. The bound (4) follows directly from Proposition 2.

It follows from Proposition 1 that ρ(k) ≤ ρ(k max ), ∀k ∈ [[k max , |U|]]. Therefore, the search space in (3) 

This k * is guaranteed to exist as the search space is discrete and bounded. Although the search space has been reduced, it remains big for large k max . Specifically, we see from (4) that k max is large when R is small and when Area(U) is large, i.e., the set of users is scattered in a large region, which is the case in remote/rural areas. To further reduce the space, we propose a heuristic method to estimate a valuek in proximity to the optimal value k * and then search in the neighborhood ofk. Specifically, it follows from Proposition 2 that k max is lower-bounded by P (2R, U) , which is the largest number of APs such that the circles with radius R centered at these APs do not overlap. Since ρ(k) starts decreasing when the overlap between the clusters becomes significant, we predict that ρ(k) is maximized near P (2R, U), i.e., k * ∈ [[P (2R, U)− κ, P (2R, U) + κ]] for sufficiently large κ. Therefore, we first estimate P (2R, U) and then search for k * in the neighborhood of the estimate. To estimate P (2R, U), we start from a value k 0 larger than P (2R, U) (e.g., k 0 = 4Area(U R/2 ) πR 2 ), partition the population into k 0 clusters, and then progressively remove the APs whose radius-R circle intersects with other APs' circles. In this way, we expect to form a dense 2R-packing of U and thus the resulting number of APsk closely approaches the packing number P (2R, U). Then, we find k * using an exhaustive search the extensively reduced search space [[k −κ,k +κ]]. Finally, we perform clustering with k * clusters to determine the positions of the APs. The proposed method is presented in Algorithm 1 and illustrated in Fig. 1 and Fig. 2 . We next demonstrate our algorithm using a real dataset of the population of Kilimambogo, Kenya [13] , which has one of the lowest gross domestic product (GDP) per capita in the world. The connectivity ratio for various coverage radius R is shown in Fig. 3 . Moreover, in Fig. 4 , we show the optimal location of the APs and the spatial distribution of the population for R = 750 m in an area of roughly 400 km 2 .

Input: the population U, radius R, and initial guess k 0 Output: set of APs A * 1 A ← clustering(k 0 , U) Weighted clustering 2 N ← 0 Count of overlapping APs 

Through clustering, we have obtained the set of APs A * , and we also have available with us, the set of terrestrial BNs I. 1 Now, our focus is to provide backhaul to all the APs while optimally using the backhaul resources. We want every AP to be connected to one of the BNs, such that the total length of the backhaul resource we utilize is minimized. Formally, we frame the problem as follows.

Problem 2. Generate a graph G(A * ∪ I, E * ) such that all the points in A * are directly or indirectly connected to one of the points in I and the sum of the edges' length is minimized:

where a ↔ a denotes that the vertices a and a are directly or indirectly connected, and e is the length of the edge e. To reduce complexity, we propose in the next subsection an algorithm to approximately solve Problem 2.

Our proposed algorithm is inspired by the concept of infection dynamics [15] , [16] . The vertices belonging to I (infected) compete among themselves to infect the vertices in A * by sweeping out a circle whose radius increases nonlinearly with time. Once the circle touches a vertex in A * , this vertex also gets infected and begins competing with the 1 The location of the existing BNs, i.e., cell towers is obtained through Open Cell ID [14] . other infected vertices. This process is illustrated in Fig. 5 . Mathematically, we model the growth of the radius of an infected vertex i at time t aṡ

where α, β are hyper parameters, t i 0 denotes the time of infection of vertex i, and 1{·} denotes the indicator function.

Instead of running the algorithm over continuous time, we discretize it into time-steps of size δ each. In this way, at each time-step j, we update the radius r i [j] and speed s i [j] of the infected vertex i as

where j i 0 = t i 0 δ . Initially j i 0 = 0, ∀i ∈ I. The vertex connects to its infector, in other words, it connects to the vertex whose circle touches it first. The algorithm terminates once all the vertices are infected and the resulting graph gives the network topology. The algorithm is presented in Algorithm 2. The change in the infection speed and infection radius with iterations is shown in Fig. 6 . The infection radius increases quite rapidly in the beginning since the infected node wants to be faster than its infector in capturing the neighboring nodes.

Input: the population U, the set of APs A * Output: the edges E * 1 E * = ∅ Initially the vertices are unconnected 2 j ← 0

Initialize time-step to zero 3 I j = I Initialize the set of infected vertices 4 while |I j | < |I ∪ A * | do Relying entirely on the fixed terrestrial BNs may not be optimal. These terrestrial BNs are typically deployed in densely populated area, as is the case for the region in Kilimambogo, Kenya. Therefore, the APs in the sparse locations are connected indirectly to the BNs through many hops. To further improve the network configuration, we suggest the deployment of non-terrestrial BNs which receive backhaul from satellites or high altitude platforms. The position of these non-terrestrial BNs can be dynamically changed, which increases the chance to reliably connect the remote APs. We generate the graph using the infection algorithm by adding the non-terrestrial BNs to the initially infected vertices. The resulting graph for the Kilimambogo region as the number of non-terrestrial BNs increases is shown in Fig. 7 .

With Network X library [17] , we perform analysis on the resulting networks and show the improvement in the network design in Fig. 8 . In Fig. 8(a) , we see that the average hop count decreases as we introduce more non-terrestrial BNs. Similarly, in Fig. 8(b) , the number of APs supported by each BN decreases. Adding more non-terrestrial BNs also makes the AP distribution per BN more fair, as seen in Fig. 8(c) . Moreover, this also lowers the use of backhaul links, and we see the reduction in the total backhaul length in Fig. 8(d) . These results suggest that the addition of non-terrestrial BNs significantly improve the deployment of a realistic network.

1XPEHURI17%QRGHV $YHUDJHQXPEHURIKRSV 

In this work, we have proposed an algorithmic pipeline to deploy a communication network to connect the unconnected population in remote/rural areas. We made use of the high resolution population data to first plan the AP deployment, and then connected them to the BNs in a cost-effective manner. To support the existing terrestrial BNs, we suggest the deployment of non-terrestrial BNs to further improve the network performance in terms of average hop count, traffic distribution, and backhaul length. The next task is to choose the best backhaul type [18] based on the distance and geographical conditions. The number of backhaul and fronthaul nodes are constrained by the budget, but they also need to be sufficiently high to meet the traffic demands of the users. Solving this optimization problem relies on the subjective costs such as per capita GDP, and operational and capital expenses related to the infrastructure [19] - [21] .

The work is partially supported by the Klaus Tschira Foundation through Alumnode Project Funding 2021-2022.

We define covering and packing in a two-dimensional space. See [22, Sec. 4.2] for a reference.

Definition 2 (Covering). An -cover of a set T in R 2 is a set {t 1 , . . . , t N } ⊂ T such that for all t ∈ T there exists an i ∈ [ [1, N ] ] such that t i − t ≤ . The -covering number N ( , T ) is the cardinality of the smallest -cover.

Definition 3 (Packing). An -packing of a set in R 2 is a set {t 1 , . . . , t P } ⊂ T such that t i − t j > for all i, j ∈ [[1, P ]]. The -packing number P ( , T ) is the cardinality of the largest -packing.

and

where T /2 denotes the union of the circles of radius /2, each centered at a point in T . (T /2 is an inflated set of T .)

Proof. The bounds (10) and (11) follows Lemma 4.2.8 and Proposition 4.2.12, respectively, in [22] .

The state of broadband 2021: People-centred approachesfor universal broadband

6G for bridging the digital divide: Wireless connectivity to remote areas

Big communications: Connect the unconnected

Efficient fronthaul and backhaul connectivity for IoT traffic in rural areas

Self-sustainable energy efficient long range wifi network for rural communities

Coverage analysis for UAV-assisted cellular networks in rural areas

A techno-economic cost framework for satellite networks applied to low earth orbit constellations: Assessing starlink, oneweb and kuiper

Connecting the unconnected: Toward frugal 5g network architecture and standardization

Applications of weighted voronoi diagrams and randomization to variance-based k-clustering

How slow is the k-means method

Scikit-learn: Machine learning in Python

High resolution population density maps. Humanitarian Data Exchange (HDX)

Cell tower data

Propagation and immunization of infection on general networks with both homogeneous and heterogeneous components

Infection dynamics on scale-free networks

Exploring network structure, dynamics, and function using NetworkX

A comparative study on wireless backhaul solutions for beyond 4g network

Heterogeneous wireless sensor network deployment and topology control based on irregular sensor model

Financial analysis of 4g network deployment

Techno-economic analysis and prediction for the deployment of 5g mobile network

High-dimensional probability: An introduction with applications in data science