key: cord-0181530-bmvpyiup authors: Lo, Sin Kit; Lu, Qinghua; Paik, Hye-Young; Zhu, Liming title: FLRA: A Reference Architecture for Federated Learning Systems date: 2021-06-22 journal: nan DOI: nan sha: e3e00c5fd1e2ccddb4132907a83a3ad79584e7a3 doc_id: 181530 cord_uid: bmvpyiup Federated learning is an emerging machine learning paradigm that enables multiple devices to train models locally and formulate a global model, without sharing the clients' local data. A federated learning system can be viewed as a large-scale distributed system, involving different components and stakeholders with diverse requirements and constraints. Hence, developing a federated learning system requires both software system design thinking and machine learning knowledge. Although much effort has been put into federated learning from the machine learning perspectives, our previous systematic literature review on the area shows that there is a distinct lack of considerations for software architecture design for federated learning. In this paper, we propose FLRA, a reference architecture for federated learning systems, which provides a template design for federated learning-based solutions. The proposed FLRA reference architecture is based on an extensive review of existing patterns of federated learning systems found in the literature and existing industrial implementation. The FLRA reference architecture consists of a pool of architectural patterns that could address the frequently recurring design problems in federated learning architectures. The FLRA reference architecture can serve as a design guideline to assist architects and developers with practical solutions for their problems, which can be further customised. The ever-growing use of industrial-scale IoT platforms and smart devices contribute to the exponential growth in data dimensions [23] , which, in turn, empowers the research and applications in AI and machine learning. However, the development of AI and machine learning also significantly elevates data privacy concerns, and General Data Protection Regulation (GDPR) 3 stipulates a range of data protection measures with which many of these systems must comply. This is a particular challenge in machine learning systems as the data that is ready for model training is often insufficient and they frequently suffer from "data hungriness issues". As data privacy is now one of the most important ethical principles of machine learning systems [17] , there needs to be a solution that can deliver sufficient amount of data for training while the privacy of the data owners is respected. To tackle this challenge, Google proposed federated learning [28] in 2016. Federated learning is a variation of distributed machine learning techniques that enables model training on a highly distributed client devices network. The key feature of federated learning is the training of models using the data collected locally, without transferring the data out of the client devices. A global model is initialised on a central server and broadcast to the participating client devices for local training. The locally trained model parameters are then collected by the central server and aggregated to update global model parameters. The global model parameters are broadcast again for the next training round. Each local training round usually takes a step in the gradient descent process. Fig. 1 presents an overview of the federated learning process. A federated learning system can be viewed as a large-scale distributed system, involving different components and stakeholders with diverse requirements and constraints. Hence, developing a federated learning system requires both software system design thinking and machine learning knowledge [25] . Further, despite having various reference architectures for machine learning, big data, industrial IoT, and edge computing systems, to the best of our knowledge, there is still no reference architecture for an end-to-end federated learning system. Based on findings in several federated learning reviews [24, 18] , the application of federated learning is still limited and immature, with only certain stages of an end-to-end federated learning architecture are extensively studied, leaving many unfilled gaps for architecture and pipeline development. In contrast, many reusable solutions and components were proposed to solve the different challenges of federated learning systems and this motivates the design of a general federated learning system reference architecture. Therefore, this paper presents a pattern-oriented reference architecture that serves as an architecture design guideline and to facilitate the end-to-end development and operations of federated learning systems, while taking different quality attributes and constraints into considerations. This work provides the following contributions: -A pattern-oriented federated learning reference architecture named FLRA, generated from the findings of a systematic literature review (SLR) and mining of industrial best practices on machine learning system implementations. -A pool of patterns associated with the different components of the FLRA reference architecture that target to address the recurring design problems in federated learning architectures. The structure of the paper is as follows. Section 2 introduces the methodology for the reference architecture design, followed by the presentation of the reference architecture in Section 3. Section 4 presents the related work. Section 5 presents the discussions of this work and finally concludes this paper. We have employed parts of an empirically-grounded design methodology [11] to design the federated learning reference architecture. Firstly, the design and development of this reference architecture are based on empirical evidence collected through our systematic literature review on 231 federated learning academic literature from a software engineering perspective [24] . The review is conducted based on Kitchenham's guideline [19] with which we designed a comprehensive protocol for the review's initial paper search, paper screening, quality assessments, data extractions, analyses, and synthesis. We have also adopted the software development practices of machine learning systems in [35] to describe the software development lifecycle (SDLC) for federated learning. Using the stages of this lifeycycle as a guide, we formulated our research questions as: (1) Background understanding; (2) Requirement analysis; (3) Architecture design; and (4) Implementation & evaluation. One major finding of the SLR is that federated learning research and applications are still highly immature, and certain stages of an end-to-end federated learning architecture still lack extensive studies [24] . However, we have also identified many solutions and components proposed to solve the different challenges of federated learning systems, which can be reused and adapted. This motivates the design of a federated learning system reference architecture. Based on the findings, we specifically adopted the qualitative methods in empirical studies of software architecture [34] to develop and confirm the theory for the reference architecture design. The proposition is generated based on syntheses and validations of the different recurring customers and business needs of the federated learning systems, in addition to the collections and analyses of the reusable patterns to address these architectural needs. We then conducted studies on some of the best practices in centralised and distributed machine learning systems to cover some of the components that are not covered in the federated learning studies. The main processes are the: (1) generation of theory and (2) confirmation of theory. The architecture design methodology is illustrated in Fig. 2 . The generation of the initial design of the reference architecture theory is performed in this stage. Since there is no standard reference architecture for federated learning yet, we generated the theory by referring to the architecture of a machine learning system. Here, we adopted cross-case analysis [34] as the theory generation method, which is an analysis method that compares two different cases based on some attributes and examines their similarities and differences. We performed a cross-case analysis on the pipeline design of conventional machine learning and federated learning systems. Here, we reviewed several machine learning architectures proposed by well-known companies, such as Google 4 , Microsoft 5 , and Amazon 6 , specifically on their machine learning pipeline designs. Furthermore, based on our previous project implementation experience, we defined a general federated learning pipeline based on the standards proposed by these industry players that covers job creation, data collection, data preprocessing (cleaning, labeling, augmentation, etc.), model training, model evaluation, model deployment, and model monitoring stage. Since federated learning was first introduced by Google, the pipeline components analysis and mining are performed heavily on the federated learning standards proposed by Google researchers in [28, 5, 18] , and the frameworks for federated learning system benchmark and simulation, such as Tensorflow Federated (TFF) 7 , LEAF 8 , and FedML 9 . From the findings, we were able to conclude that the data collection is fairly similar whereas data preprocessing, model training, model evaluation, model deployment, and model monitoring stages for federated learning systems are different from machine learning pipelines. Especially for the model training stage, the federated learning pipelines encapsulate model broadcast, local model training, model upload and collection, and model aggregation operation under this single stage. Furthermore, the iterative interaction between multiple client devices with one central server is the key design consideration of the federated learning architecture, and therefore, most academic work extensively studied the model training stage and proposed many solutions which can be adapted as reusable components or patterns to address different requirements. Besides observing the pipeline design and the components, we performed qualitative content analyses on existing machine learning and federated learning systems proposed by industrial practitioners and academics to extract requirements, reusable patterns, and components for the design of the reference architecture. In the SLR, a series of system quality attributes are defined based on ISO/IEC 25010 System and Software Quality model 10 and ISO/IEC 25012 11 Data Quality model to record the different challenges of a federated learning system addressed by researchers. The empirical evidence associated with each quality attribute and business need is analysed and validated as the support for the design proposition of the reference architecture. After the generation of theory for the hypothesis of the reference architecture, we designed the reference architecture according to the theory. In this stage, we confirmed and verified the viability and applicability of the reference architecture proposed. Since this reference architecture is built from scratch based on the patterns and requirements collected through qualitative analyses, we evaluated the architecture by building a convincing body of evidence to support the reference architecture, which is different from conventional evaluation approaches. We employed the qualitative validation method known as triangulation [34] . The basic idea is to gather different types of evidence to support a proposition. The evidence might come from different sources, be collected using different methods, and in our case, the evidence is from the SLR and the industrial implementations of machine learning systems from renowned institutions and companies, and our previous implementation experience. We have reviewed these evidence based on the SLDC lifecycle of machine learning systems we developed for the SLR to identify the adoptions of the different reusable patterns or components, in addition to the basic machine learning pipeline components that are mentioned in these evidence. These mentions and adoptions are collected to prove applicability of the instantiated components in the federated learning reference architecture. In short, the triangulation process justified that the reference architecture is applicable as it is supported by various empirical evidence we collected and analysed. In this section, we present FLRA, a pattern-oriented reference architecture for federated learning systems. Fig. 3 illustrates the overall reference architecture. A base version of a federated learning system consists of two main participants: (1) central server and (2) client devices. A central server initialises a machine learning job and coordinates the federated training process, whereas client devices perform model training using local data and computation resources. Underneath the two participants, there are two types of components: (1) Mandatory components and (2) optional components. The mandatory components provide the basic functions required by a federated machine learning pipeline. To fulfill the different software quality requirements and design constraints in federated learning systems, we collected and defined a set of patterns based on our SLR results and the mining of some existing federated learning simulation frameworks. Each pattern is embedded as optional components to facilitate the architecture design. We summarised all the mandatory and optional components of the reference architecture and briefly highlighted the functionalities and responsibility of each component in Table 1 . The table presents the details of each component associated with the federated learning pipeline stages. The federated learning process starts with the creation of a model training job (including initial model and training configurations) via job creator on the central server. Within the job creator component, three optional components could be considered are: client registry, client cluster, client selector. In a federated learning system, client devices may be owned by different parties, constantly connect and drop out from the system. Hence, it is challenging to keep track of all the participating client devices including dropout devices and dishonest devices. This is different from distributed or centralised machine learning systems in which both clients and the server are typically owned and managed by a single party [27] . A client registry is required to maintain all the information of the client devices that are registered, (e.g., ID, resource information, number of participating rounds, local model performance, etc.) Both IBM Federated Learning Framework 12 and doc.ai 13 adopted client registry in their design to improve maintainability and reliability of the system since the system can manage the devices effectively and quickly identify the problematic ones via the client registry component. FedML which is a federated learning benchmarking and simulation framework has also explicitly covered the client manager module in their framework that serves the same purpose as the client registry. However, the system may sacrifice client data privacy due to the recording of the device information on the central server. The non-IID 14 data characteristics of local raw data and the data-sharing restriction translates to model performance challenge [18, 28, 42, 21] . When the data from client devices are non-IID, the global models aggregated is less generalised to the entire data. To improve the generalisation of the global model and speed up model convergence, a client cluster component can be added to cluster the 12 https://github.com/IBM/federated-learning-lib 13 https://doc.ai/ 14 Non-Identically and Independently Distribution: Highly-skewed and personalised data distribution that vary heavily between different clients and affects the model performance and generalisation [33] . client devices into groups according to their data distribution, gradient loss, and feature similarities. This design has been used in Google's IFCA algorithm 15 , TiFL system [8] , and Massachusetts General Hospital's patient system [13] . The side effect of client cluster is the extra computation cost caused by client relationship quantification. The central servers interacts with a massive number of client devices that are both system heterogeneous and statistically heterogeneous. The magnitude of client devices number is also several times larger than that of the distributed machine learning systems [18, 24] . To increase the model and system performance, client devices can be selected every round with predefined criteria (e.g., resource, data, or performance) via client selector component. This has been integrated into Google's FedAvg [28] algorithm and IBM's Helios [39] . Each client device gathers data using different sensors through the data collector component and process the data (i.e., feature extraction, data cleaning, labeling, augmentation, etc.) locally through the data preprocessor component, due to the data-sharing constraint. This is different from centralised or distributed machine learning systems in which the non-IID data characteristic is negligible since the data collected on client devices are usually shuffled and processed on the central server. Thus, within the data preprocessor, an optional component heterogeneous data handler is adopted to deal with the non-IID and skewed data distribution issue through data augmentation techniques. The known uses of the component include Astraea 16 , FAug scheme [14] and Federated Distillation (FD) method [2] . Local model training. Once the client receives the job from the central server, the model trainer component performs model training based on configured hyperparameters (number of epochs, learning rate, etc.). In the standard federated learning training process proposed by McMahan in [28] , only model parameters (i.e., weight/gradient) are mentioned to be sent from the central server, whereas in this reference architecture, the models include not only the model parameters but also the hyperparameters. For multi-task machine learning scenarios, a multi-task model trainer component can be chosen to train task-related models to improve model performance and learning efficiency. Multitask Learning is a machine learning approach to transfer and share knowledge through training of individual models. It improves model generalisation by using the domain information contained in the parameters of related tasks as an inductive bias. It does this by learning tasks in parallel while using a shared representation; what is learned for each task can help other tasks be learned better [7] . In federated learning scenarios, this technique is particularly relevant when faced with non-IID data which can produce personalised model that may outperform the best possible shared global model [18] . This best practice solution is identified based on Google's MultiModel 17 architecture, and Microsoft's MT-DNN 18 . Model evaluation. The local model evaluator component measures the performance of the local model and uploads the model to the model aggregator on the central server if the performance requirement is met. In distributed machine learning systems, the performance evaluation on client devices is not conducted locally, and only the aggregated server model is evaluated. However, for federated learning systems, local model performance evaluation is required for system operations such as client selection, model co-versioning, contributions calculation, incentive provision, client clustering, etc. Model uploading. The trained local model parameters or gradients are uploaded to the central server for model aggregation. Unlike centralised machine learning systems that performs model training in a central server or distributed machine learning systems that deals with fairly small amount of client nodes, the cost for transmitting model parameters or gradients between the bandwidthlimited client devices and central server is high when the system scales up [18, 24] . A message compressor component can be added to improve communication efficiency. The embedded pattern are extracted from Google Sketched Update [20] , and IBM PruneFL [16] . Model aggregation. The model aggregator formulates the new global model based on the received local models. There are four types of aggregator-related optional components within the model aggregator component: secure aggregator, asynchronous aggregator, decentralised aggregator, and hierarchical aggregator. A secure aggregator component prevents adversarial parties from accessing the models during model exchanges through multiparty computation protocols, such as differential privacy or cryptographic techniques. These techniques provide security proof to guarantee that each party knows only its input and output. For centralised and distributed machine learning settings that practice centralised system orchestration, communication security between clients and server is not the main concern. In contrast, for federated learning settings, this best practices are used in SecAgg [6] , HybridAlpha [38] , and Ten-sorFlow Privacy Library 19 . Asynchronous aggregator is identified from ASOfed [9] , AFSGD-VP [12] , and FedAsync [37] . The asynchronous aggregator component enables the global model aggregation to be conducted asynchronously whenever a local model update arrives. Similar technique have been adopted in distributed machine learning approaches such as iHadoop [10] and it is proven that this can effectively reduce the overall training time. The conventional design of a federated learning system that relies on a central server to orchestrate the learning process might lead to a single point of failure. A decentralise aggregator performs model exchanges and aggregation in decentralised manner to improve system reliability. The known uses of decentralised aggregator include BrainTorrent [31] and FedPGA [15] . Blockchain can be employed as a decentralised solution for federated learning systems. In distributed machine learning systems, p2p network topology is employed to in MapReduce [27] to resolve the single-point-of-failure threat on parameter servers. A hierarchical aggregator component can be selected to improve system efficiency by adding an intermediate edge layer to aggregate the model updates from related client devices partially before performing the final global aggregation. This pattern has been adopted by HierFAVG [22] , Astraea, and HFL [1] . In addition to aggregator-related optional components, a model co-versioning registry component can be embedded within the model aggregator component to map all the local models and their corresponding global models. This enables the model provernance and improves system accountability. The model co-versioning registry pattern is summarised and adopted from the version control methods in DVC 20 , Replicate.ai 21 , and Pachyderm 22 . After the aggregation, the global model evaluator assesses the performance of the global model. One example is TensorFlow Extended (TFX) 23 that provides a model validator function to assess the federated learning model performance. If the global model performs well, the model deployer component deploys the global model to the client device for decision-making through the decisionmaker component. For instance, TensorFlow lite 24 prepares the final validated model for deployment to the client devices for data inference. Within the model deployer component, there are two optional components for selection: deployment selector and incentive registry. The deployment selector component examines the client devices and selects clients to receive the global model based on their data characteristics or applications. The deployment selector pattern has been applied in Azure Machine Learning 25 , Amazon SageMaker 26 , and Google Cloud 27 to improve model performance. The incentive registry com-ponent maintains all the client devices' incentives based on their contributions and agreed rates to motivate clients to contribute to the training. Blockchain has been leveraged in FLChain [3] and DeepChain [36] to build a incentive registry. After the deployment of models for the actual data inference, a model monitor keeps track of the model performance continuously. If the performance degrades below a predefined threshold value, the model replacement trigger component notifies the model trainer for local fine-tuning or sends an alert to the job creator for a new model generation. The model replacement trigger pattern is identified based on the known uses including Microsoft Azure Machine Learning Designer 28 , Amazon SageMaker 29 , Alibaba Machine Learning Platform 30 . The most widely mentioned definition of a reference architecture is defined by Bass et al. [4] as "a reference model mapped onto software elements (that cooperatively implement the functionality defined in the reference model) and the data flow between them. Whereas a reference model divides the functionality, a reference architecture is the mapping of that functionality onto a system decomposition." Nakagawa et al. collected a series of definitions of reference architectures by various researchers and summarised them as follows: "the reference architecture encompasses the knowledge about how to design system architectures of a given application domain. It must address the business rules, architectural styles (sometimes also defined as architectural patterns that address quality attributes in the reference architecture), best practices of software development (architectural decisions, domain constraints, legislation, and standards), and the software elements that support the development of systems for that domain [29] ." Reference architectures for machine learning applications and big data analysis were researched comprehensively. For instance, Pääkkönen and Pakkala proposed a reference architecture of big data systems for machine learning in an edge computing environment [30] . IBM AI Infrastructure Reference Architecture is proposed to be used as a reference by data scientists and IT professionals who are defining, deploying, and integrating AI solutions into an organization [26] . Reference architectures for edge computing systems are also widely studied. For example, H2020 FAR-Edge-project, Edge Computing Reference Architecture 2.0, Intel-SAP Reference Architecture, IBM Edge computing reference architecture, and Industrial Internet Reference Architecture (IIRA) are proposed by practitioners to support the development of multi-tenant edge systems. There are existing works proposed to support federated learning system and architecture design. For instance, Google was the earliest to introduce a system design approach for federated learning [5] . A scalable production system for federated learning in the domain of mobile devices, based on TensorFlow described from a high-level perspective. A collection of architectural patterns for the design of federated learning systems are summarised and presented by [25] . There are also many architectures and adoptions of federated learning systems proposed by researchers for diverse applications. For instance, Zhang et al. [40] proposed a blockchain-based federated learning architecture for industrial IoT to improve client motivatability through an incentive mechanism. Samarakoon et al. [32] have adopted federated learning to improve reliability and communication latency for vehicle-to-vehicle networks. Another real-world federated learning adoption by Zhang et al. [41] is a dynamic fusion-based federated learning approach for medical diagnostic image analysis to detect COVID-19 infections. We observed that there have been multiple studies on federated learning from different aspects and their design methods are highly diverse and isolated which makes their proposals challenging to be reproduced. Motivated by the previous works mentioned above, we intend to fill the research gap by putting forward an end-to-end reference architecture for federated learning systems development and deployment which has been distinctly lacking in the current state-of-the-art. A reference architecture can be served as a standard guideline for system designers and developers for quick selection of best practice solutions for their problems, which can be further customised as required. To the best of our knowledge, there is still no reference architecture proposed for an end-to-end federated learning system while many reusable components and patterns have been proposed. Thus, in this paper, we proposed FLRA, a pattern-oriented reference architecture for federated learning system design to increase the real-world adoption of federated learning. To design the reference architecture, we developed an empirically-grounded qualitative analysis method as the basis of design theory generation. The empirical evidence to support the reference architecture design is a collection of findings (requirements, patterns, and components) gathered and defined by our previous systematic literature review on federated learning and well-known industry practices of machine learning systems. After developing the reference architecture, we compared it with existing machine learning architectures of Google, Amazon, Microsoft, and IBM to examine its applicability. The key differences between centralised or distributed machine learning with federated learning are the non-IIDness of training data, variation in the data partitioning (e.g., vertical, horizontal, and transfer federated learning) and device partitioning (e.g., cross-device, cross-silo), the ownership and security requirements of different client devices, the system heterogeneity, and the participation of client nodes. The proposed FLRA architecture adopted many reusable machine learning and federated learning patterns while maintaining most of the mandatory machine learning pipeline components. This ensures that the reference architecture is generalised to support the basic model training tasks in the real world. While there are different constraints when developing a federated learning system for different applications and settings, the possible trade-offs and the pattern solutions to these challenges are discussed comprehensively. The confirmation of theory justified the applicability of FLRA and the patterns associated with the support of empirical evidence collected. Hence, the FLRA proposed is applicable in the real world for a general, end-to-end development of federated learning systems. Our future work will focus on developing an architecture decision model for federated learning system design. We will also work on the architecture design for trust in federated learning systems. Hierarchical federated learning across heterogeneous cellular networks Wireless federated distillation for distributed edge learning with heterogeneous data Flchain: A blockchain for auditable federated learning with trust and incentive Software architecture in practice Towards federated learning at scale: System design Practical secure aggregation for privacy-preserving machine learning Multitask Learning Tifl: A tier-based federated learning system Asynchronous online federated learning for edge devices with non-iid data 2011 IEEE Third International Conference on Cloud Computing Technology and Science Empirically-grounded reference architectures: A proposal Privacy-preserving asynchronous federated learning algorithms for multi-party vertically collaborative learning Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records Communicationefficient on-device machine learning: Federated distillation and augmentation under non-iid private data Decentralised federated learning with adaptive partial gradient aggregation Model pruning enables efficient federated learning on edge devices The global landscape of AI ethics guidelines Advances and open problems in federated learning Systematic literature reviews in software engineering-a systematic literature review. Information and software technology Federated learning: Strategies for improving communication efficiency On the convergence of fedavg on non-iid data Client-edge-cloud hierarchical federated learning An interoperable component-based architecture for data-driven iot system A systematic literature review on federated machine learning: From a software engineering perspective Architectural patterns for the design of federated learning systems P2p-mapreduce: Parallel data processing in dynamic cloud environments Communication-efficient learning of deep networks from decentralized data Reference architecture and product line architecture: A subtle but critical difference Extending reference architecture of big data systems towards machine learning in edge computing environments Braintorrent: A peer-to-peer environment for decentralized federated learning Federated learning for ultrareliable low-latency v2v communications Robust and communicationefficient federated learning from non-i.i.d. data Qualitative methods in empirical studies of software engineering How does machine learning change software development practices? Deepchain: Auditable and privacy-preserving deep learning with blockchain-based incentive Asynchronous federated optimization Hybridalpha: An efficient approach for privacy-preserving federated learning Helios: Heterogeneity-aware federated learning with dynamically balanced collaboration Blockchain-based federated learning for device failure detection in industrial iot Dynamic fusion-based federated learning for covid-19 detection Federated learning with non-iid data