key: cord-0443664-r98rx6lq
authors: Materwala, Huned; Ismail, Leila
title: Performance and Energy-Aware Bi-objective Tasks Scheduling for Cloud Data Centers
date: 2021-04-25
journal: nan
DOI: nan
sha: 9770eb428562fc272fcb3efd8fc3cef7e0be8d14
doc_id: 443664
cord_uid: r98rx6lq

Cloud computing enables remote execution of users tasks. The pervasive adoption of cloud computing in smart cities services and applications requires timely execution of tasks adhering to Quality of Services (QoS). However, the increasing use of computing servers exacerbates the issues of high energy consumption, operating costs, and environmental pollution. Maximizing the performance and minimizing the energy in a cloud data center is challenging. In this paper, we propose a performance and energy optimization bi-objective algorithm to tradeoff the contradicting performance and energy objectives. An evolutionary algorithm-based multi-objective optimization is for the first time proposed using system performance counters. The performance of the proposed model is evaluated using a realistic cloud dataset in a cloud computing environment. Our experimental results achieve higher performance and lower energy consumption compared to a state of the art algorithm.

Cloud computing [1] has become a very promising paradigm for both consumers and service providers allowing convenient, on-demand network access to a shared pool of configurable computing resources. With the advancement in technological paradigms such as the Internet of Things (IoT) and Big data analytics for smart cities' applications, data center traffic is exploding with the rapid growth of cloud applications. It is predicted that the global cloud data center traffic will increase from 6 zettabytes (ZB) in 2016 to reach 19.5 ZB by the year 2021 [2] . Furthermore, with the current COVID-19 pandemic situation, all the essential services such as healthcare, work, food, and education have become online. These services heavily rely on the cloud computing paradigm. Consequently, cloud computing infrastructure must  Correspondence: Leila Ismail (email: leila@uaeu.ac.ae) maintain large-scale data centers, consisting of thousands of computing nodes that consume a large amount of electrical power.

It is estimated that the data centers will become the world's largest energy consumers globally, with an increase from 3% of total energy consumption in 2017 to 4.5% in 2025 [3] . The data center energy cost increases by 100% every 5 years [4] . High energy consumption not only incurs a high cost but also harms the environment. It is predicted that by 2025 the data centers will emit nearly 3.5% of carbon emission globally [5] . According to a report by Natural Resources Defense Council (NRDC), it is expected that data centers will emit nearly 100 million metric tons of carbon pollution per year [6] . Consequently, it becomes crucial to address this issue of cloud energy consumption.

Several works in the literature have proposed energy-efficient tasks' scheduling algorithms using evolutionary algorithm in the cloud computing environment [7]- [12] . As tasks' scheduling in the cloud is an NP-hard problem, the evolutionary algorithm, such as genetic [13] , is well suited for task optimization problems due to its characteristics of parallel and efficient global search. However, the tasks' performance, i.e., Quality of Service (QoS) should be considered while minimizing energy consumption. Very few works in the literature focus on multi-objective performance and energy-aware tasks scheduling in the cloud using evolutionary algorithm [8] - [12] . However, none of these works considers the tasks' resource utilization in terms of system performance metrics, i.e., CPU, memory, disk, and network, while computing the energy consumption. This is crucial considering the dynamic nature of the tasks submitted to the cloud.

In this paper, we develop an intelligent autonomous agent for performance and energy-aware biobjective tasks' scheduling in a cloud data center based on the evolutionary algorithm. We consider the task's execution time as a measure of performance. The tasks' scheduling is modeled as a bi-objective optimization problem to minimize tasks' execution time and energy consumption. We use the Locally Corrected Multiple Linear Regression (LC-MLR) [14] power consumption model, which is based on CPU, memory, disk, and network utilization, for the prediction of the computing server's power consumption. The predicted power is then used to compute the server's energy consumption. The performance of the proposed model is evaluated using a realistic cloud dataset in terms of energy consumption and execution time. This is in a cloud data center simulated using the CloudSim 3.0.3 [15] , a software tool for cloud computing simulation. The performance of the proposed model is compared with the genetic algorithmbased task scheduling model in the literature that uses a power model based on CPU and memory utilization values [8] .

The rest of the paper proceeds as follows. Section 2 provides an overview of the related work. The cloud system model is presented in Section 3. Section 4 describes the optimization problem and its formulation using evolutionary algorithm. The experiments and the performance evaluation are presented in Section 5. Section 6 concludes our work.

Several works in the literature have proposed the use of the evolutionary genetic algorithm for energyefficient multi-objective tasks' scheduling in a cloud computing environment [8] - [12] . However, [11] and [12] do not mention the power model used for the computation of energy consumption. A hardware-based power model using the computing server's voltage and frequency is considered by [10] . However, the hardware-based power model often requires physical sensors for monitoring the hardware resources. This leads to high hardware cost and sensors' energy consumption when the sensors are attached to thousands of servers in a cloud data center [16] . A software-based power model consisting of system performance metrics such as CPU, memory, disk, and/or network resources is used by [9] and [8] . However, the power model used by [9] is based only on CPU utilization, and the one used by [8] is based on CPU and memory utilization values. To the best of our knowledge, none of the works on performance and energy-optimized cloud tasks' scheduling based on evolutionary genetic algorithm use an energy consumption formulation based on system performance counters. In this work, we propose an evolutionary algorithm-based intelligent agent for task scheduling in cloud computing while minimizing the task's execution time and energy consumption. The energy consumption in the proposed bi-objective optimization method considers system performance counters. We compare the performance of our proposed model with the genetic algorithm-based bi-objective optimization model in the literature that uses power model based on CPU and memory utilization values [8] .

The cloud computing architecture consists of 'v' heterogeneous virtual machines (VMs) that operate on 'p' heterogeneous physical machines (PMs) as shown in Figure 1 . The set of VMs in represented as V = {VM1, VM2, …, VMv} and the set of PMs is represented as P = {PM1, PM2, …, PMp}. The cloud users' tasks are submitted to the cloud broker which implements an intelligent agent that schedules the tasks on a VM such that the energy consumption and task execution time are the minima. The task analyzer monitors and records the resources and service requirements of the tasks submitted by the cloud users. The resources' requirements of a task include the CPU, memory, disk, and network utilization values, while the service requirement involves the performance metrics such as task deadline and execution time. Based on the task's requirements in terms of CPU, memory, disk, and network, the agent calculates the execution time and energy consumption on each VM. Therefore, the agent communicates with the VM manager which is responsible to monitor the resource utilization of running VMs. It reads the current energy consumptions of the VMs which are maintained by the energy consumption monitor of the cloud. 

The power consumption of executing a task on a computing server is predicted using a power model. We use the Locally Corrected Multiple Linear Regression (LC-MLR) power model as stated in Equation 3 . LC-MLR is selected in this paper because it is found to be accurate in a cloud computing environment [14] .

where , , , , , ,

where , 1 

are the errors calculated as the difference between the actual and the predicted power consumption values obtained from the MLR model.

The energy consumption of a task on can be then calculated using Equation 6 , based on the energy function proposed in [17] that considers the increase in the energy consumption of the ongoing tasks on a VM due to the increase in their execution time while calculating the energy consumption of a new task on that VM. 

where −1, is the new execution time of the task −1 that was ongoing on while task is scheduled on . The new execution time is the increment in the execution time of −1 as the processing speed of is distributed among −1 and .

, ′ is the execution time of when running in parallel with −1 . 0 is the time when the task is executed in parallel with the task −1 and 1 is the time when the task is executed alone.

Let us consider a set of tasks, T = {t1, t2, …, tm} that needs to be scheduled on a set of virtual machines, V. The scheduling of the tasks on the VMs is represented using a matrix S(m x v). For instance, Sji = 1 indicates that the task tj is scheduled on VMi for execution. The bi-objective optimization problem is to schedule tasks in a cloud computing environment in a way that the execution time and the energy consumption are the minima. These objectives are represented using a weighted sum cost function as stated in Equation 7 .

where and (1 − ) are the weights for the execution time and the energy consumption objectives respectively such that 0 ≤ ≤ 1.

The bi-objective tasks scheduling optimization problem can be now formulated as follows:

Objective:

∀ ∈ ( , ), = {1, 2, 3, … , }

Constraints:

where Equation 8 shows the optimization objective, i.e., minimizing the cost function. Equations 9 and 10 represent the constraints. Equation 9 states that each task should be executed only on one VM and Equation 10 indicates that the total utilization of a VM should be always less than a threshold utilization to avoid performance degradation.

The task scheduling optimization problem in a cloud computing environment can be designed as an autonomous agent system where the agent schedules the tasks on the VMs to minimize the objective function. The task analyzer, the VM manager, the power consumption monitor, and the resource utilization monitor components in the cloud broker represent the sensors of the system environment and the mapping of tasks to the VMs depict the actuator output. The agent's system environment for task scheduling is fully observable, stochastic, sequential, dynamic, discrete, and single-agent. The intelligent autonomous agent for task scheduling can be classified as a utility-based agent [18] as shown in Figure 2 . This is because the tasks' scheduling problem involves contradicting optimization objectives with a trade-off between energy consumption and execution time. Evolutionary genetic algorithm [13] is a search-based heuristic. The main components of the evolutionary algorithm are as follows: • Initial tasks-VMs mapping (population): The mapping of tasks to the VMs is the initial population in cloud tasks' scheduling. Each solution in the population is represented as a chromosome. The chromosome for tasks scheduling problem can be considered as the mapping of tasks to VMs. • Fitness function: The inverse of the cost function for task scheduling that minimizes the energy consumption and the execution time (Equation 7) is the fitness function for the problem under study. • Crossover: Crossover operation is achieved by selecting two parent population and then creating a new mapping by alternating some or all the genes of the parents. Each element of the chromosome is known as a gene. • Mutation: It is the operator that produces offspring by tweaking genes of a single chromosome.

In this paper, we use an energy-efficient task scheduling algorithm Modified Worst Fit Decreasing (MWFD) for the selection of the initial population. MWFD is chosen for the selection of the initial population due to its optimal performance compared to other energy-aware task scheduling algorithms [19] . This reduces the time to obtain a global solution. In MWFD, each task is assigned to a VM where the increase in power consumption after scheduling the task is the maximum. Algorithm 1 shows the pseudocode for population initialization. Algorithm 2 shows the pseudocode for bi-objective optimization using evolutionary algorithm. allocated VM = null 6.

foreach VM in VMList do 7.

if VM has enough resources for Task then 8. powerAfterAllocation = Calculate power using Equation 7 9. powerDiff = VM.getPower() -powerAfterAllocation 10.

if powerDiff >maxPower then 11. allocatedVM = VM 12. maxPower = powerDiff 13.

if allocatedVM ≠ null then 14.

allocation.add(Task, allocatedVM) 15. Return allocation Algorithm 2: Performance and energy-aware bi-objective tasks' scheduling using evolutionary algorithm 1. Input: TaskList, VMList Output: Scheduling of Tasks 2. Generate the initial tasks and VMs mapping using Algorithm 1 3. while (non-termination condition) do 4. SelectFitTasksVMsMapping //select initial tasks-VMs mapping 5.

Perform_crossover_NewTasksVMsMapping //create new scheduled mapping 6. Perform_mutation 7.

foreach newMapping do //check for each new scheduled tasks 8.

if Fitness.newMapping<Fitness.Mapping then //check the fitness value of the new mapping 9.

add.Mapping(newMapping) //if new mapping is more efficient than the previous one then add the new mapping to the list 10.

replace.currentMapping(feasiblenewMapping) 11. Return allocation //return the new tasks and VMs mapping for the scheduled tasks

To evaluate the performance of the proposed model in a cloud computing environment with a large number of hosts and VMs, we simulate a cloud data center using CloudSim 3.0.3 simulation software [15] . We create a homogenous data center by using six different host types (Table 1 ) and five different VM types (Table 2) . Servers 1 and 2 from the host types are part of our Intelligent Distributed Computing and Systems (INDUCE) research laboratory at the College of Information technology of the United Arab Emirates University. The specifications of servers 3-6 are taken from the SPEC Power benchmark suite [20] in a way that they belong to the same family as the ones present in the laboratory, but with different architectures and capabilities. Acer Incorporated Acer AR585 F1 with AMD Opteron 6238 CPU core of 2.60 GHz, 12-core, with 16 MB L3 cache, 4GB 2Rx8 PC3L-10600E memory and 1 x 500GB SATA2 7200 RPM 3.5" HDD disk drive [24] 

To evaluate the performance of our proposed evolutionary algorithm scheduler using LC-MLR (GA_LC-MLR), we first simulate the data center with an appropriate number of hosts and VMs. We create 800 hosts with each of the host types equally distributed. We then create the VMs with each of the four VM types equally distributed. We generate tasks with the random length between 6000-12500 MI at an interval of 3 seconds each. For the CPU utilization of the tasks, we use the real-life workload data traces from CoMon project, a monitoring infrastructure for PlanetLab [25] . The workload consists of the CPU utilization values collected from more than 1000 VMs from servers located at 500 different places across the globe. We use the data of 3 March 2011 for the experiments. For the memory, disk, and network utilization values we randomly generate memory sizes, read/write bytes per second, and data transferred per second, respectively. We compare the performance of GA_LC-MLR with the genetic algorithm in the literature that uses a linear power model based on CPU and memory utilization (GA_LM) [8] . We measure the total energy consumption for the tasks' execution and mean execution time. Figure 3 shows the energy consumption of GA_LC-MLR and GA_LM. It shows that the energy consumption of GA_LC-MLR is low. This is because of two reasons. First, the proposed algorithm considers the impact of ongoing tasks running on a server while scheduling an incoming task on that server. Consequently, the algorithm selects a server where the increase in the energy consumption is the minimum considering the execution of the tasks when running alone and when running in parallel with the task to place. Second, the energy consumption in the proposed algorithm considers the CPU, memory, disk, and network resources utilization values whereas the GA_LM algorithm considers only the CPU and memory utilization values. Consequently, the proposed model predicts more accurately the power consumption of a server while scheduling a task compared to the GA_LM algorithm, leading to higher energy savings. Figure 4 shows the comparison of mean execution time. GA_LC-MLR takes 0.5 seconds less compared to GA_LM. 

Cloud computing is an emerging technology, enabling companies to consume a computing resource as a utility rather than building and maintaining an in-house infrastructure. Due to the development of smart cities and worldwide pandemics, the use of cloud computing is increasing. This extensive utilization of cloud computing resources leads to the high energy consumption of underlying data centers. In addition to electricity consumption, environmental threats become considerable. In this work, we proposed an intelligent autonomous agent scheduler that schedules a user's task in a way that the energy consumption and execution time are minimized. The developed bi-objective optimization model is based on evolutionary algorithm. We evaluate the performance of the model in terms of energy consumption and execution time.

The NIST Definition of Cloud Computing Recommendations of the National Institute of Standards and Technology

Cloud data center traffic

Energy consumption and emission mitigation prediction based on data center traffic and PUE for global data centers

Mastering Cloud Computing: Foundations and Applications Programming

Tsunami of data' could consume one fifth of global electricity by 2025

America's Data Centers Consuming and Wasting Growing Amounts of Energy | NRDC

Energy-efficient multi-job scheduling model for cloud computing and its genetic algorithm

A new multi-objective optimal programming model for task scheduling using genetic gray wolf optimization in cloud computing

CLPS-GA: A case library and Pareto solution-based hybrid genetic algorithm for energy-aware cloud service scheduling

Multi-objective task scheduling to minimize energy consumption and makespan of cloud computing using NSGA-II

Multi-objective genetic algorithm for task assignment on heterogeneous nodes

A hybrid energy-aware resource allocation approach in cloud manufacturing environment

A genetic algorithm tutorial

Linear Power Modeling for Cloud Data Centers: Taxonomy, Locally Corrected Linear Regression, Simulation Framework and Evaluation

CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms

Computing Server Power Modeling in a Data Center

EATSVM: Energy-Aware Task Scheduling on Cloud Virtual Machines

Artificial intelligence: a modern approach

Energy-Aware VM Placement and Task Scheduling in Cloud-IoT Computing: Classification and Performance Evaluation

Server 3: SPECpower_ssj2008

Server 4: SPECpower_ssj2008

Server 5: SPECpower_ssj2008

Server 6: SPECpower_ssj2008

PlanetLab Workload Traces

This research was funded by the National Water and Energy Center of the United Arab Emirates University (Grant 31R215).