key: cord-0060320-y826hpue authors: Gand, Fabian; Fronza, Ilenia; El Ioini, Nabil; Barzegar, Hamid R.; Azimi, Shelernaz; Pahl, Claus title: Fuzzy Container Orchestration for Self-adaptive Edge Architectures date: 2021-03-23 journal: Cloud Computing and Services Science DOI: 10.1007/978-3-030-72369-9_9 sha: 9a8d23149ab425792b399268fc1ec9c3bf72d0e0 doc_id: 60320 cord_uid: y826hpue The edge or edge computing refers to the computational infrastructure between sensors and Internet-of-Things world on the one side and centralised cloud data centres on the other. Edge clusters could consist of small single-board devices, which are widely used in different applications. This includes microcontrollers regulating an industrial process or controllers monitoring and managing traffic roadside. Although generally hardware capabilities of devices are growing, resources at the edge are often still limited, requiring an intelligent resource management. This can done through self-adaptive scaling mechanisms in these clusters allowing to scale components of the application in the cluster. We introduce here an auto-scalable cluster architecture for lightweight container-based edge devices. A serverless architecture forms the core of this architecture. The auto-scaling component is based on fuzzy logic in order to better manage uncertainty problems arising in such contexts. Our evaluations show that our platform architecture and the auto-scaling functionality satisfy the need of lightweightness in edge architecture. Today, many processing tasks in distributed environments are executed directly on nodes at the edge of a network rather than being send to central, but remote processing hubs. This principle is often called Edge Computing [52] . The management of such edge systems often requires different concepts for scaling system components dynamically in order to deal with changing resource requirements in a constrained environment. Proposed solutions range from simple algorithms that define scale values based on set thresholds [76] to systems using neural networks for decision making [42] . However, more work in the edge context [18] is needed on auto-scaling algorithm. Our objective is the evaluation of a serverless architecture managed by an auto-scaling component. The goal is to provide a lightweight edge computing solution, which extends our previous work in [18] by focusing more on the architecture implementation here. At the core is a serverless approach that delegates the deploying, scaling and maintaining of software to the cloud/edge provider [6] . The experimental edge platform we use here is implemented using a cluster of eight single-board devices. Such small clusters have been evaluated in basic settings. We aim here to evaluate a complete system based on real-life requirements and constraints with dynamic scalability included that would for instance arise in automotive and mobility applications. We deploy Raspberry Pis as single-board devices and evaluate under which conditions a cluster of these devices can support low latency needs, which are tightly constrained by given performance requirements. We use technologies and tools such as MQTT for inter-cluster communication, openFaas and Docker Swarm for the implementation of the serverless concept, also Prometheus for monitoring purposes. This is combined with fuzzy logic at the core the central auto-scaling component. We look at the performance of the system in a range of cases, aiming to detect bottlenecks. The implemented auto-scaling algorithm is evaluated experimentally. In the remainder of this paper, relevant concepts, tools and technologies are introduced in Sect. 2. Then, Sect. 3 starts with a high-level architecture, followed by details of the auto-scaling mechanism in Sect. 4. In Sect. 5, we evaluate platform and auto-scaler. Then, related work on distributed edge systems and auto-scaling is reviewed, before concluding with a summary and ideas for future research. Serverless Computing. Serverless computing is an new concept for the deployment of cloud applications that has seen an increase in popularity since it was first introduced a couple of years ago [6] . It allows developers to focus on the application without having to consider the deployment servers, which is handled by the cloud provider. This allows for features such as fault-tolerance and auto-scaling to be centrally managed [6] . Serverless computing is related to the Functions-as-a-Service (FaaS) approach. Here, small chunks of functionality are deployed in the cloud and scaled dynamically by the cloud provider [40] . These functions are usually smaller than microservices, are short-lived and have clear input and output parameters. If the component to be deployed is more complex than a simple function and is supposed to stay active for a longer period of time, a stateless microservice is another option [12] . Managing and deploying these microservices is similar to serverless functions. In addition to major cloud providers already offering serverless functionality as part of their cloud solutions, several open-source frameworks have been developed and released in recent years. These solutions usually involve having to self-host the serverless frameworks on own hardware instead of relying on hardware provided by third-party providers. Here, we will mainly focus on open source, self-hosted solutions. Four open-source frameworks are compared in Table 1 , which aids us in the selection of a suitable one. Self-adaptive systems [2, 28, 29] adapt their behavior dynamically to either preserve or enhance required quality attributes in the presence of uncertain operating conditions. The development of microservice applications as self-adaptive systems is still a challenge [48, [62] [63] [64] . In practice, platforms such as Kubernetes container orchestration facilitate the deployment and management of microservice applications. However, Kubernetes natively only supports basic auto-scaling by changing the number of service instances automatically. Fuzzy logic aims to bridge between machine and human reasoning. Computers traditionally work well for tasks that contain formal calculations. Human reasoning, however, is often more complex. Natural language is rarely precise in a way that it quantifies something as one thing or the other: words can have uncertain, ambiguous meanings [21] . Thus, human reasoning is fuzzy. Uncertainty also arises in edge environments through incomplete or potentially incorrect or conflicting observations. Fuzzy logic has the objective of addressing this by mapping inputs (e.g., observations) to outputs (e.g., analyses or reactions) based on gradually changing functions and a set of rules rather than fixed thresholds. A so-called membership function is a function that represents a fuzzy set and decides to which degree an item belongs to a certain set. Fuzzification is a process that refers to input values [18] . that are mapped to membership functions to derive the degree of membership in that set [46] -which is different from a binary approach where the element can either be part of a set or not. Fuzzy rules define how after fuzzification the values are matched against if-else rules. Defuzzyfication is the final step, where a numerical output value is generated. An examples ahall illustrate this. A sample goal might be to calculate the money that should to be saved each month based on a flexible salary and the expected expenses. The amount of money that is to be saved in a long-term savings account will be returned by the fuzzy system. We illustrate the ingredients of a fuzzy system: -Membership functions -The variables salary, expected expenses and the money that is suggested to be saved are visualized in Figs. 1a, 1b and 2. Each variable consists of three membership functions: low, high and medium. These can represent, e.g., the degree to which a salary of 50 can be considered a low, high or average salary. -Rules -The rules of the fuzzy system can be defined in the following way: The above abstract principles shall be complemented by the introduction of concrete infrastructure and software technologies used in the platform implementation. A Raspberry Pi is a widely used single-board computer based on an ARM-processor. Its original purpose was to introduce school children to programming [8, 73, 74] . Due to its low price and various use cases it has found its way into more industrial IT projects. Since the start of the project in 2012 there have been four major iterations of the Raspberry Pi platform. The version 2 B models that are used in this cluster include a 900 MHz quad-core ARM Cortex-A7 CPU and 1 GB of RAM [60] . Docker and Docker Swarm. Docker is software for containerization. Containerization is a virtualization technology that, instead of virtualizing hardware, separates processes from each other by utilizing certain features of the Linux kernel. It has been successfully used on Raspberry Pis [65, 66] . Docker containers bundle an application along with all of its dependencies. Docker offers the ability to create, build and ship containers [11] . Compared to virtual machines, containers offer a far better use of the host resources while providing similar advantages of having a separate system. Images are the blueprints of docker containers. Each container is created from an image. The images, on the other hand, are built using Dockerfiles which describe the system to be constructed. Docker (specifically the Docker Engine) is based on a client-server architecture. The client communicates with the Docker daemon via a command-line interface. The docker daemon is in charge of managing the components and containers. Docker services represent the actual application logic of a container in production. Using services, a distributed web application could be split into one service for the front-end components, one for the database and another one for the content management system that is used to update the website. Docker Swarm is the cluster management tool that is integrated into the Docker Engine. Instead of running the given services and their corresponding containers on one host, they can be deployed on a cluster of nodes that is managed like a single, dockerbased system. By setting the desired number of replicas of a service, basic scaling is also possible. Hypriot OS is an operating system based on Debian that is specifically tailored towards using Docker Containerization technology on ARM devices such as the Raspberry Pi [22] . The OS comes prepackaged with the latest Docker version ready to be used. Hypriot OS also comes bundled with a few additional tools that we will use. Clout Init -Clout init is a light-weight approach to creating templates of an operating system [9] . This can be used to ensure that two nodes are identical clones of each other. The instances are configured in .yml configuration files. These files can be used to set, for example, the hostname of the node or to execute certain commands at boot time. Avahi -Avahi is a tool for simplifying networking by allowing nodes in a network to address each other with their hostnames without having to configure static IP addresses or rely on DHCP or DNS services [3] . If the hostname of the master node has been set in the cloud-init configuration file, all worker nodes are able to directly connect to the master node by addressing it by its hostname. Ansible. Ansible is a tool for automating a variety of tedious tasks in a cluster and cloud environments such as individual node configuration or application deployment [1] . The nodes are usually connected using SSH. Ansible usually uses SSH keys for authorization [1] . The individual nodes need to be defined in a host configuration file. After defining the nodes, Ansible can be used to execute instructions on all nodes synchronously. These instructions can either be raw commands for simple cases or "playbooks", that contain a set of instructions for more complex tasks. MQTT is a network protocol that is primarily used for unreliable networks with limited bandwidth, thus it is suitable for Internet of Things applications. MQTT uses a publisher-subscriber approach. Clients establish a connection to a broker and subscribe to topics. Clients may also publish a message to a topic. When a message is published, the broker relays the message to all clients that are subscribed to the corresponding topic. If a message is flagged as a retained message it is kept by the broker after relaying it. Clients will receive the retained message as soon as they subscribe to the topic. Prometheus is a monitoring tool used to gather and process application metrics. In contrast to other monitoring tools, it does not rely on the application delivering the metrics to the monitoring tool. Prometheus "scrapes" the metrics from a predetermined interface in a given interval. This means that the metrics are expected to be exposed by the application. As a FaaS Functions-as-a-Service framework for containers, Open-Faas can be deployed on top of a Docker swarm or a Kubernetes cluster. When starting the openFaas framework a number of standard docker containers are deployed: -Gateway: used as the central gateway for calling functions from anywhere in the cluster. Exposes a webinterface for managing functions. -Prometheus: a simple Prometheus instance is running on this container. In addition, the Prometheus webinterface is also exposed. -Alertmanager: reads Prometheus metrics and issues alerts to the gateway. openFaas does provide a simple form of autoscaling [50] that leverages the default metrics aggregated by Prometheus and scales based on given thresholds. Functions and serverless microservices can be written in a variety of different programming languages. Functions are usually created by using the faas-cli command line tool. An example for creating a function in python would be: faas-cli new --prefix=examples --lang python3-armhf example-function This command creates a directory structure as follows: example-function example-function handler.py requirements.txt example-function.yml The .yml configuration file contains information such as the IP address of the gateway or the name of the docker image. The programming logic can be found within the handler.py file. It needs to contain a handle function that receives a request parameter and returns a response after executing its logic. The requirements.txt file contains the python-pip dependencies needed for the project. The corresponding pip install command is executed automatically once the function is built. The -lang parameter determines the language and architecture template that is used for this function. In the given case, a python function running on an ARM architecture is created. The -prefix parameter is used as a prefix for the resulting Docker image that is uploaded to the Docker hub. The Docker hub is used for distributing the images across the cluster. Stateless microservices can be built by using the Dockerfile template. In this case a Dockerfile has to be set up manually to create a container for the service to run in. Among the serverless frameworks compared in Table 1 , we selected openFaas for the implementation of the application. The reasons are a wide array of supported languages, openFaas being a complete, all-in-one framework, Prometheus as an integrated, extendable monitoring solution, the simple set-up process and its out-of-the-box support for Docker Swarm. Other frameworks either lacked a simple monitoring solution (serverless), included more custom components that needed to be configured manually (openWhisk) or did not support Docker Swarm (Kubeless). The architecture approach is decomposing the application into microservices and containerizing them, by utilizing the above technologies and allowing the hardware to be reallocated dynamically. OpenFaas allows building and deploying services in the form of functions across a cluster. Some of its features, such as built-in Prometheus services or the gateway, can be extended to make openFaas the central building block [19] . An overview of the different technologies in the context of the given platform is presented in Fig. 3 . OpenFaaS also enables scaling the different parts of the application. Even though openFaas scaling is limited, we use it as a foundation for our selfadaption. We implement a more fine-grained scaling algorithm using the built-in monitoring options. As the reasoning foundation for scaling, we use fuzzy logic. We introduce here the core elements of our architecture before covering lowlevel implementation details of the platform and its auto-scaling component in the next section. The proposed binding blocks of the application such as the a serverless and microservices-based architecture can be reused for different applications in different contexts. The scaling component is also usable in different applications by reconfiguring a few parameters. We use a Traffic Management (TM) System as a sample application. We assume a constant exchange of messages between the traffic management and vehicle components that in our prototype implementation contains simulations of vehicles. A control system is used to scale the TM System based on the monitored and analysed data. While we investigate a specific case, transferability is given. Fig. 4 . Interaction between different systems [18] . We build on a three-layered architecture. The platform layer represents the hardware architecture of the cluster. The system layer comprises the central management components. On top of these, the controller layer scales the components of the platform. Figure 4 shows the interaction between system and controller layer and additional components. The application is deployed on a cluster managed by Docker Swarm. The cluster includes one master node and an arbitrary number of worker nodes. Ansible is used to execute commands on all nodes without having to connect to each node individually. All nodes are able to connect to the MQTT broker that is running on the master device after startup. Using Docker swarm and openFaas, the RPIs can be connected so that they can be seen as one system. If a service is supposed to be deployed, openFaas will distribute it among the available nodes. There is no need to specify a specific node as this abstraction layer is hidden behind the openFaas framework. The services and functions are built and deployed using the openFaas command line interface. OpenFaas is also utilized to scale the services independently. Communication between the services is achieved by relying on the openFaas gateway as well as on the MQTT broker. These elements of the platform layer are shown in Fig. 5 . Cluster Setup. Our cluster architecture is comprised of eight Raspberry Pi 2 Model B connected to a mobile switch via 10/100 Mbit/s Ethernet that is powering the RPIs via PoE (Power over Ethernet). We have documented our solution on github. The system components are split into three repositories. The rpicluster repository 1 contains the clout-init configuration files for setting up the Raspberry Pis. The rpicluster-application repository 2 includes all the microservices and scripts that make up the application logic. The third repository 3 includes the openFaas repository in a modified form. All nodes of the cluster run hypriot OS. Cloud-init is used to define the initial boot steps of the nodes. The master initiates the docker swarm. The only command that needs to be executed on the workers is the swarm join command. It can be distributed among the nodes by using Ansible. After this command is executed on each node, the swarm is fully set up. The worker nodes contain almost no additional dependencies since they are all included in the Docker containers. The only additional dependencies that are directly installed on the nodes are used to run a python script that monitors system metrics and publishes them to the metrics service. The complete steps to set-up the system on the cluster are as follows: 1. Flash one master configuration of hypriot to an SD Card and insert it into the master RPI: The system can now be started from within the rpicluster-application folder by executing the control.py script that will use openFaas to start-up all application components and enable auto-scaling. Monitoring is done through the openFaas Prometheus instance, provides as one of the predefined containers. Metrics Service. The metrics service is used to acquire metrics about the system, mainly by serving as a central hub that accumulates all cluster-wide metrics and by publishing those metrics via a flask HTTP endpoint. This endpoint is the central interface for Prometheus to scrape from. The Prometheus python API is used to implement the metrics. The metrics service exposes the number of messages, e.g., the number of active cars in our case as well as the cumulative memory and CPU usage. The number of messages is implemented as a counter which is continuously increasing. The CPU/memory usage, on the other hand, is realized by using a gauge which can be a set to an arbitrary number. OpenFaas: Prometheus. This instance is used to store metrics and query them when needed. Prometheus provides a REST API along with a language called PromQL to aggregate and query metrics [59] . The aggregated data is returned in the JSON format. Before startup, Prometheus needs to be informed about the endpoints that metrics should be collected from. The Prometheus instance that is shipped with openFaas only scrapes metrics from the openFaas gateway since only metrics related to function execution are being monitored by default. Configuring the openFaas Prometheus instance to aggregate custom metrics of an application is not documented. Exploring this possibility and implementing it was a part of the scope of this thesis. In order to add a second endpoint for the additional metrics, the openFaas repository had to be forked and the prometheus/prometheus.yml configuration file had to be edited, adding the metrics endpoint to the file. The metrics microservice is accessible via the gateway. Therefore, it is possible to address the metrics endpoint by calling the gateway and there is no need to specify the static IP address of the node the metrics service is running on. Specifying the metrics path (/appmetrics) as well as the port (8080) is also mandatory. The central component in our architecture that manages performance is the auto-scaling controller. A requirement for it is lightweightness. Consuming too many system resources such as storage space or CPU usage has to be avoided because the algorithm is meant to be deployed on the RPI cluster itself with most of the clusters resources being reserved for application components themselves. At the core is fuzzy logic that is a good compromise between a powerful decision-making process and a limited resource consumption. The initial fuzzy knowledge base is, however, difficult to obtain. The following solution combines reactive and proactive configuration methods by initially anticipating demand in the calibration and configuration (proactive) and then continuously re-adjusting them if needed (reactive). In order to set-up an initial fuzzy knowledge base, values of previous runs of the system are used. The reactive part of the algorithm continuously updates parts of the knowledge base dynamically. The auto-scaling algorithm builds on the MAPE-K [38] controller loop for selfadaptive systems. The steps of this MAPE-K controller loop are Monitor the application and collect metrics, Analyze the gathered data, Plan actions accordingly in order to maintain objectives, and Execute the planned actions. The Knowledge component defines a shared, continuously updated knowledge base. Scaling Configuration and Calibration: The scaling algorithm, based on the four main phases of the MAPE-K loop, is implemented in a python script (control.py). The script is run independently on a single cluster node. The algorithm starts by building the fuzzy membership functions, essentially calibrating them based on existing experience. The values used for constructing the functions are part of the MAPE-K knowledge base. These values are calculated by relying on metrics of previous runs of the system. Therefore, before starting the scaling algorithm effectively, the system needs have been run at least once to determine behaviour that can be anticipated. Based on the initial membership functions, a first global scale value is computed according to which the system is scaled. This forms the proactive part of the controller, which provides settings for future runs based on anticipated load and performance. The MAPE-K loop for the given system is presented in Fig. 6 . Continuous Scaling: The reactive part of the algorithm is executed as the default after start. It aims to adjust the current settings to specified requirements. The script receives current performance metrics from the Prometheus API. These metrics are evaluated against the allowed threshold values stored in the knowledge component. The knowledge component is implemented within the control script. Based on these computations, a plan is devised that involves updating the membership functions. The goal is to constantly update the membership function such that the fuzzy service provides optimal scale values for all load scenarios. Optimal is defined as a system that is scaled in order to meet the SLO without wasting unnecessary resources, i.e., the ultimate goal is to only scale up as much as necessary. The membership functions are passed to the fuzzy service that calculates a scale value. After scaling, the loop repeats by monitoring and analyzing the effects of the previous iteration. The definitions of the membership functions are also part of the knowledge component and are continuously updated (Fig. 7) . The task of the fuzzy service is to determine the scale value. Its rules are predefined. It uses the membership functions as input. Another parameter is the range of the scale values, which can be defined to set the minimum and maximum scale values that are considered acceptable. The services returns a global scale value as output. The calc function is used to simulate a function that is called continuously. It is always scaled to the maximum. If there are only a few message processes by the system that the calc function is allowed to scale higher. When the number rises, Fig. 7 . Details of services providing scaling functionality [18] . (a) Initial membership functions. (b) Re-adjusted membership functions: the system can handle less messages than expected. the allowed number of replicas will decrease. The minimal alpine distribution, which is used for the other images, could not be used in this case since skfuzzy, the python fuzzy module that was used, relies on the numpy module that requires a more complex OS. Therefore, the image of the fuzzy service is based on ubuntu:18.04. The metrics that are gathered during earlier system runs are used as the basis for calibrating the fuzzy membership functions in order to manage the anticipated demand better. The metric used here to measure the current load of the system is the number of exchanged messages in a given time frame. This metric, called messages total, only defines the total number of messages the system has to process since monitoring started. This requires a way to measure how many messages are processed in a given time frame. This is done using the Prometheus rate() function, which returns the per second rate of increase. The following example query is used throughout the application to return the per-second rate of increase measured for the past 20 s: The increase rate r is then calculated as follows: where x i−t is the number of messages for t seconds in the past with x i as the most recent number of messages. This value is divided by t, the time frame that is considered, since r is the per-second value. Based on the rate, the Prometheus query language is used to obtain three values: the global median, the global standard deviation and the global maximum. Global in this context refers to metrics that were calculated based on previous runs of the system. These values are used to create the initial membership functions of the rate of messages. In sk-fuzzy, Gaussian membership functions are created by defining their mean and their standard deviation. The Gaussian membership functions are simple to create and re-adjust. All membership functions set the previously obtained global standard deviation σ g as their own standard deviation. The mean of the average membership function is placed at the global median: with a being a set of data, in this case the rate of messages. The mean of the low membership function is: with σ g again being the global standard deviation. The mean of the high membership function is thus: The initial membership functions are shown in Fig. 8a . The next step is the calculation of the three global variables. As an example, the following query provides the median value over all rates of messages (i.e., the rates covering a time span of 20 s each) of the last 90 days: While Prometheus does not support a median function, the built-in quantile function can be used to calculate the median. A quantile splits a probability distribution into groups according to a given threshold. The 0.5 quantile equals the median. Generally, the median can be defined as: The standard deviation over time is calculated by using the corresponding PromQL function: This call returns the standard deviation of the rate of messages (covering a time span of 20 s each). The total time span that is considered are the last 90 days. The standard deviation for the given context is: with N being the rates of messages considered while x i is a single rate of messages and μ is the mean of all rates of messages. Finally, only the global maximum of the data set remains to be calculated: Auto-scaling is based on the previous initial membership functions that are readjusted continuously. This reactive part is implemented as part of an infinite loop. After the initial configuration and calibration part is concluded, the initial membership functions are passed to the fuzzy service that returns a scale value that scales the relevant services: Here, s denotes the new scale value that a service is scaled to. D refers to the default value the service is scaled to at startup and G is the rounded global scale value that has been calculated by the fuzzy service and is used for all dynamically scaled components. The scalable components include the gatherer service and the decision function that were introduced earlier. The Message Roundtrip Time MRT is calculated after a scaling action. This metric for the sample Traffic Management use case indicates the average time a vehicle needs to wait for a response from the gatherer after publishing its latest status. In the analysis phase, the MRT is compared to the maximum threshold that is defined in the SLA. If the average MRT is above the SLO, the membership functions need to be re-adjusted. The initial membership function shown in Fig. 8a results in a scale value too low for the given load: the measured MRT after scaling was above the defined threshold. Therefore, the membership function need to be shifted to the left. The result of this shift can be seen in Fig. 8b . If we assume the current load to be 0.5 and compare the degree of membership in both figures, we find that in the initial figure a load value of 0.5 is considered an "average" load. Looking at the re-adjusted functions, a value of 0.5 is now seen as more of a "high" load. Since the fuzzy service classifies a value of 0.5 as a "higher" load now, it also maps it to a higher scale value. Consequently, the system now scales up at a load value of 0.5 where it had previously taken no action. Similarly, we need to consider a situation in which the MRT is well below the defined threshold. In this case, the functions need to be shifted to the right. The rate at which the functions are adjusted (shifted) in each iteration of the loop can be controlled by (manually) updating the adjustment factor. Hence, the membership functions for each iteration are determined in the following way: Here, A is the adjustment factor for the functions. A is computed in each iteration. A positive re-adjustment factor results in a shift to the right, a negative one to the left. To avoid constantly moving the functions back and forth, shifting right is only allowed until a situation is encountered where the STO threshold is not met anymore. After completing the shifting process with the control script, the reactive part of the algorithm is started again, providing the fuzzy service with the re-adjusted membership functions. Algorithm 1 outlines the continuous reactive scaling functionality explained above. if invocationtime < slo then 8: shift membership functions right 9: if invocationtime >= slo then 10: shift membership functions left 11: Scale(membershipFcts) Performance is the critical property. Thus, the evaluation shall focus on the performance of the proposed serverless architecture with the auto-scaling mechanism at the core and determine crucial bottlenecks. The auto-scaling approach needs to be evaluated in terms of its effectiveness as the core solution component. Being effective means that the auto-scaling algorithm is able to maintain the set SLO thresholds by re-adjusting the fuzzy membership functions. This would result in a smooth scaling process where the scalable components are gradually scaled up or down, avoiding sudden leaps of the scale value. For the given application, we also determined the maximum load, i.e., here the maximum number of vehicles the architecture (including the network it is operated in) can support 4 . The evaluation goals were analyzed for two different cluster set-ups. In order to obtain a first understanding of the system and the possible range of variables, a pilot calibration evaluation was conducted on a small cluster of three RPIs. As a second step, the evaluation procedure was repeated for a complete cluster (consisting of eight RPis). For the evaluation, we report a range of performance metrics that indicate the effectiveness of the system or provide insight into an internal process. is listed separately in order to individually report on serverless performance aspects. All MRT and FIT values are considered average values aggregated over the last 20 s after the previous scaling operation was completed. Here, the maximum scale value was unknown. In concrete scenarios, this value could be determined prior to execution. We use different MRT thresholds. In concrete settings, the maximum response time could be given in advance and would unlikely to be the subject of change. For all set-ups and iterations that were evaluated, the hardware workload was measured by computing the average CPU and memory usage over all nodes of the cluster. Some configurations were made to facilitate the evaluation. Prometheus is used to gather and aggregate the metrics for the evaluation. The cloud-init configuration was adjusted so that an additional python script is executed on each node. The script uses the psutil python module to record CPU and memory usage of the node and publish them to a specific MQTT topic. Additional functionality was also added to the metrics service: the service receives the CPU/memory metrics of all nodes and stores them internally. When the metrics are collected, the service calculates average CPU and memory usage across all nodes. Prometheus then processes these. Pilot Calibration. An initial evaluation (calibration pilot) was conducted to obtain an initial idea of the system's capabilities and adjust the manually-tuned parameters accordingly. It was also used to evaluate whether the scaling functionality yields promising results before putting it to use in a bigger set-up. The evaluation was started with a cluster consisting of three RPIs: a master and two worker nodes. The maximum scale value was initially set to 5 in order to avoid scaling unreasonably high. Table 2 reports on the initial set of metrics for different numbers of vehicles. The scaling functionality was active when the metrics were monitored. Table 3 includes data of the scaling algorithm for an initial calibrating run. Full Cluster. A cluster of eight RPIs was used. Here, the decision-making functionality is included in the gatherer service, which is now scaled independently. Hence, there is no longer the need to call the decision function for each message. Results for the auto-scaling algorithm can be found in Table 4 . The CPU usage started at 47% and showed a linear increase up to about 56% at 14 vehicles. The hardware does not seem to be the limiting factor. The inital setup, however, shows a MRT of about 1.6 s for only 2 vehicles. At 12 vehicles, a Roundtrip Time of 4.5 s is reached and at 14 vehicles the MRT is already above 10 s, which is clearly a value too high for many real-world scenario. However, based on these findings, an initial evaluation run of the auto-scaling algorithm was conducted with the Roundtrip Time threshold set to 2.0 s. Even though this is not realistic, it allows for testing the auto-scaling functionality. The experimental runs yielded promising results and show that the algorithm is able to adaptively rescale the system: The MRT is initially below the predefined threshold at about 1.7 s. In the second iteration, the MRT value is recorded above the threshold at 2.5 s. The system reacts to this situation by slowly re-adjusting the membership function, resulting in a higher fuzzy scale value, which consequently scales the system up until the measured MRT drops below the threshold again. We derived starting values for all variables using the calibration pilot on a smaller cluster set-up. The evaluation of the architecture indicated that serverless function calls should be restricted since they introduce network latency problems. The bottleneck is here the network. The given set-up was not able to process more than 75 user processes at a time. The CPU and Memory usage numbers as well as the steady but slow increase of the MRT imply that the hardware itself is able process a higher number of vehicles. Future extensions should find different network set-ups that allow for a dependable solution overcoming the limits identified here. Our fuzzy auto-scaling solution works as intended, i.e., it is able to scale the system in a balanced manner. After finding a SLO, usually given in the system specifications, only the maximum scale value and the adjustment factor are the values that need to be set manually. We can conclude that the full version of our system architecture provides satisfactory results in terms of resource consumption and performance (MRT). Furthermore, we could demonstrate acceptable scalability of the system. The discussion of serverless technology in academia is still in its early beginnings as noted by Baldini et al. [6] . This is a stark contrast to the attention is has gotten from industry and the software engineering community. Baldini et al. go on to introduce several commercial implementations such as Amazon's AWS Lambda or Microsoft Azure Functions. The major advantages mentioned are the responsibility of managing servers and infrastructure that no longer rests with the developer. Another advantage is that the provider may offer additional services such as logging or monitoring to be used on top of the client's applications. Kritikos et al. review several serverless framework and report on challenges faced when designing and implementing an application leveraging serverless technology [40] . They report on the need for new methods for creating an architecture for applications that contain both serverless components like functions as well as "classic" components such as microservices running inside a Docker Container [71] . They also note that the decision on how the application components should scale is largely left to the developer and suggest further research into automating this process. Lastly, they add that monitoring could be improved. Especially the option to use and aggregate custom metrics is noted as "missing in most of the frameworks" [40] . Kiss et al. gather requirements for applications making use of Edge Computing [39] . Specifically in the context of combining them with the capabilities of the 5G standard. They mention that recently released single-board devices open up the possibility of processing some of the data at the edge of the cluster. This, however, goes along with the challenge of orchestrating the available processing power. The IoT system needs to be able to "reorganize itself" [39] based on changing conditions. In order to create such a system, they propose the following steps: In the discovery phase the granular parts of the system are identified and the application is split up accordingly. Based on the observations, a deployment plan is created. This is followed by the execution phase in which the parts of the application are deployed to their corresponding locations and the system is started. The system's performance is monitored at run-time and reconfiguration steps are taken if necessary. The cycle is concluded by the learning and predicting step. To make use of the data gathered at run-time, the system needs to be able to learn using AI from it in order to improve its reconfiguration steps. There different examples of comparable IoT Systems that have been documented. In order to be comparable to the proposed solution in this thesis, the systems had to be comprised of different hardware nodes that needed to communicate with each other as well as with a central node. Tata et al. outline the state of the art as well as the challenges in modeling the application architecture of IoT Edge Applications [72] . One of the exemplary scenarios they introduce is a system for smart trains based on the principles of edge computing. The system is comprised of a set of different sensors attached to crucial components of a car. The data is gathered by a central unit for each car that sends its own data to the central processing unit of the train. This unit is then able to communicate with the cloud. The task of gathering and processing data is shifted to nodes at the edge of the network before passing it on the cloud. They go on to suggest further research in the field of finding the optimal balance between modeling an IoT application independently of its environment and designing it in a way that accurately represents the structure of the environment it is deployed in. Another issue they mention is the scalability of such applications since the number of cars and sensors on different trains can vary significantly. The modeling of the architecture should therefore consider a way to deploy some parts of the application only to a specific set of nodes. The mentioned papers offer an overview over the requirements and create proposals for the architecture of distributed IoT Systems. They also offer guidance on where more research could be conducted. They remain, however, on an abstract level and do not implement a prototype of a corresponding applications. The work presented in this paper aims at making use of the proposed approaches to implement a concrete, distributed IoT System based on a real-life scenario that is executable, observable and analyzable. The task of leveraging custom application metrics to enable auto-configuration will also be addressed. The devised solution will be evaluated in a second step. A system similar to ours in architectural terms as a Raspberry Pi-based implementation of an application system has been introduced in [68] . The authors introduce a containerized cluster based on single-board devices that is tailored towards applications that process and compute large amounts of data. They deploy this application, a weather forecast model, to the Raspberry Pi cluster and evaluate its performance. They note that the performance of the RPi cluster is within limits acceptable and could be a suitable option for comparable use cases, although networking performance of the Raspberry Pis has been identified as a bottleneck. We will address performance degradations here through auto-scaling. There are a range of scalability approaches. [31] categorize them into reactive, proactive and hybrid approaches. Reactive approaches react to changes in the environment and take action accordingly. Proactive approaches predict the need for resources in advance. Hybrid systems combine both approaches, making use of previously collected data and resource provisioning at run time. AI approaches can be used to improve the network response to certain environmental factors such as network traffic or threats to the system [45] . The authors propose a general architecture of smart 5G system that includes an independent AI controller that communicates with the components of the application. The controller obtains certain metrics from these interfaces, uses its own functionality (based on a cycle of sensing, mining, prediction and reasoning) to analyse results and sends these back to the system while adapting its own parameters dynamically to improve its accuracy. The paper categorizes machine learning, the subset of AI it focuses on, into three categories: Examples of implementations of algorithms for auto-scaling and autoconfiguration exist. [61] propose tuning configuration parameters using an evolutionary algorithm called Covariance-Matrix-Adaption. Using this approach, potential candidates for solving a problem are continuously evolved, evaluated and selected, guiding the subsequent generations towards a desired outcome. This algorithm does not rely on pre-existing information about the system and is therefore a black-box approach. The aim of the algorithm is to find an optimal set of parameters to "maximize throughput while keeping the response time below a threshold" [61] . This threshold is defined in a so called Service-Level-Agreement(SLA) [7, 55, 57] . Another paper introduces a Smart Hill Climbing Algorithm for finding the optimal configuration for a Web application [76] . The proposed solution is based on two phases: In the global search phase, they broadly scan the search space for a potential starting point of the local search phase. Another interesting approach aims at optimizing the configuration of Hadoop, a framework for distributed programming tasks [36] . This is an example of an offline, proactive tuning algorithm that does not reallocate resources at run-time, but tries to find the best configuration in advance. The proposed algorithm starts by computing a signature of the resource consumption. This signature is then compared against a database of signatures of other applications. This database maps the signature of the different applications to the predefined set of optimal parameters. The configuration parameters of the current application are then updated with the values of the application from the database it most closely resembles. There are some proposals using fuzzy logic for auto-configuration. [31] introduce an elasticity controller that uses fuzzy logic to automatically reallocate resources in a cloud-based system. Time-series forecasting is used to obtain the estimated workload at a given time in the future. Based on this information, the elasticity controller calculates and returns the number of virtual machines that needs to be added to or removed from the system. The allocation of VMs is consequently carried out by the cloud platform. There, the rule base and membership functions are based on the experience of experts. Reinforcement Learning in the form of Fuzzy Q-learning has been evaluated in this context [26, 33] . The aim of these systems is to find the optimal configuration provisioning by interacting with the system. The controller, making use of the Q-learning algorithm, selects the scaling actions that yield the highest long-term reward. It will also keep updating the mapping of reward values to state-action pairs [26] . The idea of using supervised learning approaches and relying on training data has also been discussed. Yigitbasi et al. introduce an approach based on the Support-Vector-Regression Model [77] : a set of several machine-learning algorithms based on training-data that are used for classification and regression. Fuller introduces the idea of using Neural Networks for building and improving the fuzzy knowledge base [17] . Adaptive neuro-fuzzy inference systems (ANFIS), combining neural networks and fuzzy logic, are discussed in [34] . He states that neural networks excel at pattern recognition but it is usually difficult to follow their "reasoning" process. This process is easier to comprehend for fuzzy logic systems but they rely on a knowledge base that is not trivial to obtain. Fuzzy logic and neural networks are shown to complement each other in a hybrid system. Newer papers highlight a "self-adaptive neural fuzzy controller" for server provisioning [42] . The proposed architecture is making use of a neural network of four layers. The layers of the neural network represent the fuzzy membership function and rule base. The neural network will constantly learn and adapt itself. This will lead to the membership functions and rules being established and refined over time. It excels over a regular fuzzy controller based on manual tuning of the knowledge base. Our work here focused on introducing a practical implementation approach for a lightweight auto-scaling controller based on fuzzy logic. So far, fuzzy auto-scaling for lightweight edge architectures has not been investigated in the literature. At the core of our platform is a containerized architecture on a cluster of single-board devices. Combined with a fuzzy auto-scaler, this results overall in a reconfigurable, scalable and dependable system. The implementation is a proof-of-concept with the constraints of the environment playing a crucial factor in the implementation. We aimed at experimentally evaluating a self-adaptive autoscaler based on the openFaas framework and a microservices-based architecture. The evaluation of the system was carried out to analyze the performance. OpenFaas was used for inter-service communication as well as for monitoring. The auto-scaling algorithm was specifically designed to support dependable resource management for lightweight edge device clusters. Using fuzzy logic, the auto-scaling controller can gradually scale the system. Previous examples of fuzzy auto-scalers that were deployed in large cloud infrastructures, the fuzzy scaling functionality in our case was constrained in its processing needs since it was deployed together with the main system components on the limited hardware cluster. Consequently, the algorithm was focused on using data and technologies that were already available within the cluster. This also applies for fuzzy systems based on neural networks. There are still parameters that need to be tuned manually to achieve the desired outcomes. This leaves a certain degree of uncertainty when applying the algorithm on a black-box system. In our use case scenarios. Our evaluation demonstrated that the set-up is only able to process up to 75 vehicles simultaneously. While our focus was on compute/storage resources, here network limitations emerge as the reason. Exploring these more deeply is part of future work. We shall also focus on improving the application management components towards more realistic behavior. We discussed related work on Adaptive Neuro-Fuzzy Inference Systems (called ANFIS). Working examples of ANFIS are relatively uncommon. One example uses python and the machine-learning framework tensorflow to combine fuzzy logic and a neural network [10] . Relying on the ubuntu distribution used for the fuzzy service, it is possible to create an ANFIS service for our setting. Previously, compiling and running the tensorflow framework on hardware such as the Raspberry Pi was a challenging task. With release 1.9 of the framework, tensorflow officially started supporting the Raspberry Pi [75] . Pre-built packages for the platform can now be directly installed. This adds new possibilities for creating a lightweight ANFIS service for the given system. ANFISs rely on training data. This introduces the challenge of finding sample data that maps the rate of messages and a scale value to an indicator that classifies the performance as acceptable/not acceptable: (rate of messages, scale value) → acceptable/not acceptable performance. This dataset could be aggregated manually or in an automated manner to address the problems. We also plan to address more complex architectures with multiple clusters to be coordinated [15, 16] . We propose Particle Swarm Optimization (PSO) [4] here, which is a bio-inspired optimization methods suitable to coordinate between autonomous entities such as clusters in our case, combined with machine learning techniques for the construction of the controller [5] . PSO distinguishes personal (here local cluster) best fitness and global (here cross-cluster) best fitness in the allocation of load to clusters and their nodes. This shall be combined with the fuzzy scaling at cluster level. Our abstract application scenarios, with cars just being simulated data providers, allows us to generalise the results beyond the concrete application case [23, 27] . We considered at traffic management and coordinated cars, where traffic and car movement is captured and processed, maybe supplemented by infotainment information with image and video data. Another application domain is mobile learning that equally includes heavy use of multimedia content being delivered to mobile learners and their devices [37, 44, 47, 49, 51] . These systems also rely on close interaction with semantic processing of interactions in order to support cognitive learning processes. Some of these can be provided at edge layer to enable satisfactory user experience based on sufficient platform performance. Ansible: Overview: How ansible works Certification-based cloud adaptation Avahi -What is avahi? Particle swarm optimization for managing performance in multi-cluster Root cause analysis and remediation for quality and value improvement in machine learning driven information models Serverless computing: current trends and open problems Model driven distribution pattern design for dynamic web service compositions The raspberry Pi: reviving the lost art of children's computer programming cloud-init: The standard for customising cloud instances Docker: What is docker? Introducing stateless microservices for openfaas An agility-oriented and fuzziness-embedded semantic model for collaborative cloud service search, retrieval and recommendation A classification and comparison framework for cloud service brokerage architectures A comparison framework and review of service brokerage solutions for cloud architectures Neural Fuzzy Systems A fuzzy controller for self-adaptive lightweight edge container orchestration Serverless container cluster management for lightweight edge clouds A lightweight virtualisation platform for cooperative, connected and automated mobility Induction of fuzzy rules and membership functions from training examples A review of distributed ledger technologies Trustworthy orchestration of container based edge computing using permissioned blockchain A decision framework for blockchain platforms for IoT and edge computing Self-optimizing memory controllers: a reinforcement learning approach Cloud migration patterns: a multicloud service architecture perspective Fuzzy self-learning controllers for elasticity management in dynamic cloud architectures Managing uncertainty in autonomic cloud elasticity controllers Pattern-based multi-cloud architecture migration Autonomic resource provisioning for cloudbased software Microservices: the journey so far and challenges ahead Self-learning cloud controllers: fuzzy Q-learning for knowledge evolution ANFIS: adaptive-network-based fuzzy inference system Ontology change management and identification of change patterns Towards optimizing hadoop provisioning in the cloud Automated tutoring for a database skills training environment The vision of autonomic computing Deployment of IoT applications on 5G edge A review of serverless frameworks Kubeless: Kubeless: The kubernetes native serverless framework Autonomic provisioning with self-adaptive neural fuzzy control for end-to-end delay guarantee Blockchain based service continuity in mobile edge computing An evaluation technique for content interaction in web-based teaching and learning environments Intelligent 5G: when cellular networks meet artificial intelligence Neural-network-based fuzzy logic control and decision system Constraint-based validation of adaptive e-learning courseware Developing self-adaptive microservice systems: challenges and directions A tool-mediated cognitive apprenticeship approach for a computer engineering course Supporting active database learning and training through interactive multimedia An architecture pattern for trusted orchestration in IoT edge clouds Architectural principles for cloud software An ontology for software component matching An ontology-based approach for modelling architectural styles A review of architectural principles and patterns for distributed mobile information systems Layered ontological modelling for web service-oriented model-driven architecture Microservices and containers Raspberry pi: products Autotuning configurations in distributed systems for performance improvements using evolutionary strategies Anomaly detection and analysis for clustered cloud computing reliability A controller architecture for anomaly detection, root cause analysis and self-adaptation for cluster architectures Detecting and localizing anomalies in container clusters using Markov models A containerized big data streaming architecture for edge cloud computing on clustered single-board devices A containerized edge cloud architecture for data stream processing A containerized tool to deploy scientific applications over Soc-based systems: the case of meteorological forecasting with WRF Microservices anti-patterns: a taxonomy. Microservices Microservices in agile software development: a workshop-based study into issues, advantages, and disadvantages Patterns for serverless functions (Function-as-a-Service): a multivocal literature review Living in the cloud or on the edge: opportunities and challenges of IoT application architecture A Performance exploration of architectural options for a middleware for decentralised lightweight edge cloud architectures A Lightweight Container Middleware for Edge Cloud Architectures Tensorflow 1.9 officially supports the raspberry Pi A smart hill-climbing algorithm for application server configuration Towards machine learning-based auto-tuning of mapreduce