key: cord-0631783-ln21c4pg authors: Zhou, Jiehan; Zhang, Shouhua; Lu, Qinghua; Dai, Wenbin; Chen, Min; Liu, Xin; Pirttikangas, Susanna; Shi, Yang; Zhang, Weishan; Herrera-Viedma, Enrique title: A Survey on Federated Learning and its Applications for Accelerating Industrial Internet of Things date: 2021-04-21 journal: nan DOI: nan sha: 3d5c3bfff46006aac71e741d43825eb3b0b8eb75 doc_id: 631783 cord_uid: ln21c4pg Federated learning (FL) brings collaborative intelligence into industries without centralized training data to accelerate the process of Industry 4.0 on the edge computing level. FL solves the dilemma in which enterprises wish to make the use of data intelligence with security concerns. To accelerate industrial Internet of things with the further leverage of FL, existing achievements on FL are developed from three aspects: 1) define terminologies and elaborate a general framework of FL for accommodating various scenarios; 2) discuss the state-of-the-art of FL on fundamental researches including data partitioning, privacy preservation, model optimization, local model transportation, personalization, motivation mechanism, platform&tools, and benchmark; 3) discuss the impacts of FL from the economic perspective. To attract more attention from industrial academia and practice, a FL-transformed manufacturing paradigm is presented, and future research directions of FL are given and possible immediate applications in Industry 4.0 domain are also proposed. Google first proposed [1] FL to aggregate distributed intelligence without compromising data privacy security. The increasing attention of FL comes from the combined force of emerging new technologies with applications. Although Industry 4.0 was proposed in 2013 [2] and Internet of Things (IoT) is being widely applied in mobile services. There are few reports on applying large-scale data and deep learning (DL) to implement large-scale enterprise intelligence. One of the reasons is lack of machine learning (ML) approaches which can make distributed learning available while not infringing the user's data privacy. Clearly, FL trains a model by enabling the individual devices to act as local learners and send local model parameters to a federal server (defined in section 2) instead of training data. This gives a clear advantage in terms of privacyoriented industrial applications. Another key advantage is that FL does not need large data-sets to be moved to a central repository (edge/cloud), it avoids known problems related to the sink node congestion/overloading. Another advantage of FL is to give small and medium-sized enterprise (SMEs) an opportunity to make full use of intelligence, which might be lack of large sets of data and more eager to apply FL into balancing data intelligence and proprietary for promoting innovation and enhancing competitiveness. There have been several surveys on FL. For example, Yang et al. [3] made a seminal survey that introduces the basic concepts in FL and a secure FL framework. Aledhari et al. [4] provided a study of FL with an emphasis on enabling software and hardware platforms, protocols, real-life applications and use-cases. Li et al. [5] discussed the unique characteristics and challenges of FL, provided a broad overview of current approaches, and outlined several directions for future work. Lo et al. [6] performed a systematic literature review on FL from the software engineering perspective. Li et al. [7] conducted a A Survey on Federated Learning and its Applications for Accelerating Industrial Internet of Things review of FL systems, introduced the definition of FL systems and analyzed the system components. Mothukuri et al. [8] provided a study concerning FL's security and privacy aspects and outlined the areas which require in-depth research and investigation. The early reviews introduced the basic concepts and optimization models of FL. Recently, related platforms and tools are developed, incentive mechanisms are considered, and benchmarks and personalized FL are added as well. The FL architecture needs to be updated as well to accommodate the increasing FL research and development. Meanwhile, it is noted that most FL pioneers come from the fields of the computer and information communication community, and may not put enough emphasis on the communication with industrial engineering, which seriously hinders the application of FL on industrial Internet of Things (IIoT) and the development of IIoT. Therefore, we revisit this hot topic from the perspective of promoting Industry 4.0, incorporating the consideration from the practice of industrial big data [9] and edge computing [10] . Our contribution in this survey lies in two aspects: a comprehensive investigation of the state of the art on FL, including fundamental and applied research; attracting and aggregating attentions from informatics and industrial expertise to advance the application of FL into Industry 4.0 by presenting our insights on promoting industrial data protection and intelligence. The remainder of the paper is organized as follows. Section II goes over the origin and development of FL, defines the terminology used in FL and this paper, and describes the FL mechanism in our terminology. Section III reviews the state of the art on fundamental FL and future opportunities. Section IV presents the FL-transformed manufacturing paradigm and reviews the state of the practice on FL and future opportunities, specially in Industry 4.0. Section V concludes the paper and presents the insights for advancing FL studies. FL is one of the future generation of artificial intelligence (AI), and it is also based on the latest stage of information communication technology (ICT) and new hardware technologies. After AlphaGo successfully defeated professional Go players in 2015, AI once again attracted worldwide attention [11] . ML is a part of AI. ML algorithms build models based on sample data (called "training data") in order to make predictions or decisions without explicit programming [12] . ML and Data Mining (DM) have a lot of overlap, but ML focuses on prediction based on learned information from training data, while data mining focuses on discovering unknown information in the data. DL is a part of ML based on artificial neural networks with representation learning. Learning can be supervised, semi-supervised or unsupervised. DL has various learning structures, such as deep neural networks, recurrent neural networks and convolutional neural networks. They have been used in machine vision, speech recognition, natural language processing, etc., where their produced results can be comparable to and surpass the performance of human experts in some cases [13] [14] . Distributed machine learning (DML) is a multi-node-based ML where the master node cooperates with each slave node to train a model in parallel to improve learning performance from large amounts of data [15] [16] . This traditional "centralized" distributed learning still has some drawbacks [17] : low efficiency with high transmission cost and lack of privacy preservation which significantly reduce application levels of DL in domains, for example, manufacturing. Besides, limitations on providing enough training data and computing power prevent many industries from adopting ML. Also, most industrial manufacturers would not share their data for security and privacy reasons. FL is a part of DML, which is defined in the next section. FL, also called federated machine learning, is an ML framework that can effectively make use of data and perform ML without having to share local data. Based on the mathematical formulation given in [3] [7], we refine the following conditions 1) and 2) for describing the accuracy of an FL for facilitating the following discussions. The terms relevant to FL are listed in Table 1 . Assume that there are N different learners L who aim to train the FM together. Each learner is denoted by Li, where i∈ [1,N] . Di denotes the raw data owned by Li and participated in FL. For a non-federated setting, put all the data together and use D=D1∪…∪DN to train a model Mcenter. The predictive accuracy of Mcenter is denoted as Acenter. For another non-federated setting, each learner Li trains a local model LMi with Di separately. The predictive accuracy of LMi is denoted as Ai. For the federated setting, all the learners collaboratively train a model Mfed while each learner Li protects its own data Di based on its privacy constraint. The predictive accuracy of Mfed denoted as Afed should be very close to Acenter. Formally, let ε be non-negative real number; if | Afed -Acenter |<ε (1) Afed > Ai ( i∈ [1,N]) (2) then we say that the algorithm for FL has ε accuracy loss. Let SIi denote the sample id space of Di. Let Xi denote the feature space of Di. Let Yi denote the label space of Di. So, we use (SIi, Xi, Yi) to represent Di. FL itself does not guarantee data privacy. After each round of training, learner Li will share the local model LMi, and other learners or organizer can reconstruct part of Li 's information based on LMi. We propose a privacy measurement method based on reverse reconstruction. Suppose Xi=[ xi (1) , xi (2) ,…, xi(Mi)] and Mi is the feature number of Xi. The learner Li reversely reconstructs Xi, which is expressed in Equation (3). The learner Lj or organizer reversely reconstructs Xi, which is expressed in Equation (4). When the Lj or organizer has no data, it can randomly initialize a dummy Xj and Yj. The privacy measurement of Li reconstructing its own original data with LMi is expressed in Equation (5), and the privacy measurement of others reconstructing the original data of Li with LMi is expressed in Equation (6) . Equation (3) is the benchmark that has the highest similarity. where d denotes the measurement method, such as Euclidean distance. The discussion on the calculation method of FL privacy and d is beyond the scope of this paper. We first take the industrial equipment health monitoring as a typical example of HFL to describe the procedure of FL ( Figure 1 ) with the above terminologies. It is a common centralized architecture. The decentralized architecture is described in Section III. Suppose that N companies are participating in FL. That is, there are N learners. The basic learning steps are as follows [19] : 1) The organizer chooses an FM and initializes its parameters. 2) The organizer calls FM transmitter to send FM to all the learners participating the learning. 3) FM receiver of Li (i∈ [1,N] ) receives and stores it. 4) Li calls the trainer to train LMi with local data and FM. 5) Li calls LM transmitter to send the LMi to the organizer. 6) The LM receiver receives each LMi. 7) The optimizer updates FM with the aggregation algorithm and the received LMs. 8) Repeat the above step 2 to step 7 until convergence. Second, we describe a typical example of applying VFL as follows. Suppose that dealer A and company B want to build a sales forecast model for company B's products based on the data owned by both parties. We denote dealer A as LA and company B as LB. We denote the sales data owned by dealer A as DA. DA can be represented by (SIA, XA, YA). We denote the data on product processing owned by company B as DB. DB can be represented by (SIB, XB). The basic learning steps are as follows [19] : 1) LA and LB align the sample data with samples' id. 2) LA and LB choose an FM. LA initializes part of the parameters of FM according to DA, that is LMA. LB initializes part of the parameters of FM according to DB, that is LMB. LMA and LMB make up all the parameters of FM. 3) LB calls LM trainer for a round of training and sends the result MB to LA and LA calls LM trainer for a round of training and sends the result MA to LB [3] . 4 ) LA calculates the loss LS and the gradient GA with MA and MB. LB calculates the gradient GB with MA and MB. 5) LA updates LMA with GA and LB updates LMB with GB. 6) Repeat steps 3-5 until convergence. Third, a typical example of applying FTL is as follows. Transfer learning aims at shifting knowledge from existing domains to a new domain. When dealer A has sold a small number of company B's products (Figure 1 ), dealer A and company B still want to build a sales forecast model for company B based on the data owned by both parties. The learning process of FTL is similar to VFL, except that the details of the intermediate results exchanged between A and B are changed [3] [20] . These three kinds of FL mechanisms can help all participants in the above example make full use of the original data of federation members to realize intelligent sharing based on large-scale data, while protecting the privacy of the original data. In this section, we present a comprehensive review and analysis of the fundamental studies on FL in the past two years, excluding the studies of integrating learning paradigms such as unsupervised learning [21] [22] . Data partitioning is significant in the learning process. HFL is the most commonly adopted approach in both cross-device and cross-silo scenarios where data can hardly be centralized due to privacy or legal concerns. Cross-device FL aims to train application-centered models from the collaboration of a largescale distributed network, with a massive number of smart devices, whilst cross-silo FL does not allow to share data between involving organizations [23] [24] . In the cross-device setting, HFL handles the situation in product/service design when data analysis is integrated as a feature of the personalized product but with data privacy concerns, e.g., Google's mobile virtual keyboard prediction [25] , device failure detection [26] . In the cross-silo setting, HFL has been applied to the case when organizations share the same ML problems but under restricted data sharing policies, e.g., COVID-19 detection using diagnostic images from different medical institutions [27] . VFL is usually considered in the cross-silo setting when two organizations have the shared set of sample data but different ML objectives [3] [28] [29] [30], e.g., between a bank and an insurance company located in the same city, or between smart refrigerators and smart air conditioners produced by different manufacturers. In the cross-silo setting, when the participating organizations (usually only two organization involved) only have the partial shared set of sample space or feature space, transfer learning techniques [31] can be adopted in FL to train models collaboratively [31] [32][33] [34] . Data privacy is still the major challenge of FL since it is possible to leak private information through analysis on updates of local model parameters or gradients [6] [35] . There are mainly two ways to address this issue: secure multiparty computation and differential privacy. Homomorphic encryption is a technique to realize secure multiparty computation, which only allows the central server to conduct homomorphic computing based on the encrypted local model updates [36] [37] . Trusted Execution Environment can empower the detection of dishonest actions (e.g., tampering with client models, delaying local training, etc.), to guarantee the integrity of FL processes [38] . Differential privacy is often used to protect client data privacy by adding noise to model parameter data sent by each client [39] . Additionally, a hybrid approach combining secure multiparty computation with differential privacy is explored in [40] . Recently, Blockchain is used to share data generated and used in the model training, and clients can control the access to shared data [30] . Specifically, a directed acyclic graph is incorporated to improve the efficiency of data sharing, while an asynchronous FL scheme can minimize the total cost [41] . Further, model updates can be directly exchanged and verified on-chain [42] , which needs to separate clients into different groups and each group is assigned a miner to gather the model updates. Another option is to store original global updates offchain, and only save the pointer of the global updates to improve efficiency [43] . Federated averaging (FedAvg) algorithm is the first and most well-known algorithm proposed by Google [44] , which aggregates local model updates sent from clients for a federal model. However, the FedAvg algorithm fails to achieve a satisfactory model and system performance when the datasets produced by different clients are not independent and identically distributed (Non-IID) and the communication cost is high [45] . To solve this issue, particularly in the context of industrial IoT [46] , the algorithm optimization plays an important role in FL. Centroid distance based FedAvg approach is proposed to consider the centroid distance between each class as a metric of data heterogeneity and take it into the updated averaging [26] . Bounds expanding is used to handle data skew, which extends the bounds of each dataset by exchanging some data to make the data distribution similar [47] . A self-organized FL framework is proposed in [48] , where the server has the capability of recognizing heterogeneity and scheduling a stable collaboration plan for client selection. An optimal tuning on the distributed training set is achieved by a collaborative teaching approach to train models on the optimal tuning for better performance [49] . FL itself adopts a distributed topology via collaboration among participating clients in ML. However, it still maintains Figure 1 . The general FL implementation platform a settled centralized architecture where a server is required for model aggregation and distribution. Some studies investigate improving further decentralization to get over the restrictions of a fixed server-client architecture. [50] removes the central server, and clients need to communicate with each other for model update in each round. The whole network can be split into several subsets and each one is responsible for a certain part of the expected model [51] . Gossip learning is also considered as a decentralized alternative of FL [52] . In addition, blockchain can be exploited as a component to enable decentralized infrastructure in FL [53] [54] . As model updates are uploaded for aggregation by client devices that have slow connections to the server, it is valuable to improve the communication efficiency between clients and the server. The initial research focused on the synchronous update scheme [44] . In each epoch, some clients are randomly selected, and the server sends the current federal model to each of these clients. Then, each client performs local training based on the federal model and its local dataset, and sends updates to the server. The server then updates the federal model with these updates, and the process repeats. Asynchronous aggregation is used to update the federal model asynchronously to reduce the response time from the server [27] [45] [55] [56] . Sparse ternary compression is proposed to satisfy high-frequency and low-bit width communication, which compresses both upstream and downstream communications, and enables optimal Golomb encoding of the weight updates [57] . The Lyapunov optimization-based load balancing is used to reduce communication overhead [58] . To decrease the times of sending updates that are irrelevant to the improvement of the federal model, each client receives a global tendency of model updating as feedback and checks its updates with the global tendency. If client model updates do not align with the global tendency, the client will not upload the upgrades to the server [59] . The concept of personalized FL emerged to reduce heterogeneity and preserve the high-quality of client contributions. In order to tackle the challenges of device heterogeneity, statistical heterogeneity and model heterogeneity, an effective method is to implement personalization in device, data and model levels to reduce heterogeneity and obtain highquality personalized models for each device. Researchers from Google proposed three approaches to FL personalization [60] : 1) user clustering where the clients are divided into different groups and collectively train a model for each group; 2) data interpolation in which some data is shared as global data, and a model is trained using both local and global data; 3) model interpolation that combines the learned and optimized models. Based on these methods, a synergistic cloud-edge framework is proposed, which allows each client to offload its computationally intensive learning task to the edge [61] . Besides the mixture of local and federal models, the efficient optimization of communication shows better performance on convergence [62] . Furthermore, the Model Agnostic Meta Learning framework is similar to the personalization of FL, and can be used for the interpretation of existing FL algorithms [63] [64] . Incentive mechanisms are considered as an effective way to ensure the long-term stability of FL and motivate clients to provide learned models with higher quality. Data size and quality can be considered in the design of incentive mechanisms [26] . With a limited budget, incentives given to clients can be designed by computing solutions for payoffsharing with instalment [65] . Furthermore, the theory of Stackelberg game can be applied, in which the central server is a buyer for training service provided by clients [66] . Clients can decide the CPU power for gradient calculation based on the given incentive. To ensure both clients' enthusiasm and the quality of the aggregated model with diverse metrics, three kinds of fairness (i.e., contribution fairness, regret distribution fairness, and expectation fairness) are taken into account, to optimize the collective utility while minimizing corresponding inequalities [65] . A reputation mechanism is proved as a feasible way to ensure the trustworthiness of clients, which can record reputation histories on blockchain for tamper-resistance properties in a decentralized manner [67] . Blockchain can be also leveraged for the voting of clients' rewards that clients chosen in the current round need to vote for the previous model updates [68] . As FL involves multiparty computation to gather model updates for optimization, developing a user-friendly platform can ease the operations and maintenance [69] [70] . There are several mature FL platforms from the industry, including Federated AI Technology Enabler (FATE), TensorFlow Federated (TFF), OpenMined PySyft, PaddleFL, LEAF. Further, Flower is an open-sourced framework for practitioners to conduct experiments and implement their federated learning schemes [71] . In edge computing scenarios, various devices and cloud servers are coordinated to maintain communication and data analysis, which requires computing power, data storage and bandwidth. Therefore, a unified testbed is required to support the development of FL systems in complicated scenarios. Edge AIBench is proposed by BenchCouncil for edge AI benchmarks [72] . The emergence of FL has brought many opportunities to ML for IIoT, but it also faces more challenges. According to the application of fundamental research in FL for IIoT, we emphasize some future works that deserve further investigation in the following.  Privacy preservation. Quantifying data privacy exposure has not been fully studied in existing studies. The current research focuses on learning accuracy and does not study data privacy measurement. We believe that it is necessary to establish a mechanism to evaluate data privacy exposure like model accuracy in the future. Meanwhile, learners actually have different needs for data privacy, but it is currently limited to privacy protection at the same level.  Model evaluation criteria. The current model evaluations are all based on a third party and lack a universal and unified evaluation standard, such as representative data sets for evaluation, load, etc. Therefore, the establishment of a benchmark for FL is an important direction.  Personalization. The storage, computing, and communication capabilities of each client device in the federal network may vary due to differences in hardware, network connections, and power. Due to connectivity or energy constraints, it is also common for client devices to lose communication during iteration. These bring challenges to straggler mitigation and fault tolerance. The differences in equipment and data collection methods violate the independent and identically distributed assumptions, and may increase the complexity of problem modeling and theoretical analysis.  Incentive mechanism. There is currently a lack of effective incentive mechanisms in FL, such as contracts for more work, more rewards. Asynchronous methods can be difficult to combine with technologies such as differential privacy or secure aggregation. Standard FL is usually hosted and operated by a central server, which is somehow criticized for such a centralized mode. Higher level of decentralization can be further studied to alleviate this plight, for the fairness in possible coordination among multiple parties within Industrial IoT.  Platform and tools. A comprehensive platform is needed for covering the functional requirements from raw data processing, model storage, model training, model transportation, aggregation algorithms, data privacy preservation, incentive mechanism, personalization, etc.  Security. FL is still vulnerable to some attack models such as inference attack and poisoning attack [17] . Adversaries upload malicious updates to the server for aggregation, which may have a significant impact on the federal model. Curious or malicious servers can easily use the shared computing power to build malicious tasks in the federal ML model. Adversaries can partially reveal the training data of each participants' original training data according to the local models uploaded by them. Emerging challenges still exist when applying FL to IIoT. IV. FL-BASED APPLICATIONS Figure 2 illustrates how FL could be applied to product life cycle management under the concept of Industry 4.0. FL expects to be widely applied in harvesting powerful intelligence in enhancing product life cycle management (PLCM) with the deep implementation of Industry 4.0. In the product R&D phase, market demand discovery and product innovation can be devised based on FL. In the production phase, FL paves the way for making use of industrial big data across enterprises to leverage effective and efficient utilization of manufacturing resources of energy, device, manpower, tool, etc. In the marketing phase, FL can improve product marketing efficiency with the analysis of market data contributed by federal members. The FL-transformed manufacturing paradigm shows a quite broad spectrum to utilize FL. According to the conditions for applying FL described in Section 2, we provide our analysis and summary of important FL applications, according to different application areas that were reported in the past two years. There are few applications spreading over PLCM. More attention should be paid on utilizing FL in IIoT. Table II summarizes the related literature. Zhang et al. [26] proposed an FL method based on blockchain to detect device failures in IIoT. A platform architecture of FL system based on blockchain is designed, which supports verifiable integrity of client data. Each client periodically creates a Merkle tree where each leaf node represents a client data record and the root is stored on the blockchain. Moreover, a new centroid distance weighted federated averaging (CDW_FedAvg) algorithm is proposed to solve the data heterogeneity, which considers the distance between positive and negative classes of each client dataset. Ge et al. [73] gave the empirical research results of FL based production line fault prediction. Federated support vector machine (SVM) and federated random forest (RF) algorithms for HFL and VFL are designed respectively. An experimental process is proposed to evaluate the effectiveness of FL and centralized learning algorithms. It is found that there is no significant difference in the performance of between FL and centralized learning algorithms on global test data, random partial test data and estimated unknown Bosch data. Zhang et al. [74] designed an FL method for machinery fault diagnosis based on DL. A dynamic verification scheme based on FL framework is proposed to adjust the model aggregation process adaptively, which ignores the low quality data of some clients. Furthermore, a self supervised learning scheme is proposed to learn structural information from limited training data. This scheme has dual effects of data augmentation and multi task learning. Experiments on two rotating machinery datasets show that this method provides a promising FL method for fault diagnosis. However, there is still a significant gap between the proposed method and the traditional centralized training method with the Non-IID. Edge device failures seriously affect the production of industrial products in IIoT. In order to solve this problem, Liu et al. [75] proposed a new communication-efficient on-device FL-based deep anomaly detection framework for sensing timeseries data in IIoT. It enables distributed edge devices to train anomaly detection model cooperatively, so as to improve its generalization ability. An attention mechanism-based CNN-LSTM (AMCNN-LSTM) model is proposed to detect Table II presents The 'Bas' in PP column denotes that the application applies the basic privacy preserving built in FL. The 'SM' in Ben column denotes that the application does not use a benchmark, but a self-made benchmark instead. anomalies accurately. It uses the CNN module based on attention mechanism to capture important fine-grained features, so as to prevent memory loss and gradient dispersion. It uses LSTM module to accurately and timely detect anomalies. A gradient compression mechanism based on Top-k selection is proposed to improve the communication efficiency and meet the timeliness of industrial anomaly detection. The digital twin in IIoT maps the running state and behavior of devices to the digital world in real time. By considering the deviation between the digital twin and the actual value of device state in the trust-weighted aggregation strategy, Sun et al. [76] quantified the contribution of devices to the global aggregation of FL. The reliability and accuracy of the learning model are improved. Based on deep Q network (DQN), an adaptive calibration method of global aggregation frequency is proposed, which minimizes the loss function of FL under a given resource budget, and realizes the dynamic tradeoff between computing energy and communication energy in time-varying communication environment. In order to further adapt to the heterogeneous IIoT, an asynchronous FL framework was proposed, which eliminates the straggler effect of clustering nodes and improves the learning efficiency through appropriate time-weighted inter-cluster aggregation strategy. This framework determines the clustering frequencies of different clusters through the adaptive frequency calibration based on DQN. Li et al. [77] created an FL-based intrusion detection model named DeepFed with CNN and GNU to detect network threats against industrial cyber-physical systems. The designed FL framework allows multiple industrial cyber-physical systems to establish a comprehensive intrusion detection model in a way of privacy protection. A secure communication protocol based on Paillier cryptosystem was designed to keep the security and privacy of model parameters through the training process. The experiments on the data set of a real industrial cyber-physical system show that the model is highly effective in detecting various types of network threats in industrial cyber-physical systems. Liu et al. [33] proposed a learning architecture for cloud robotic system navigation, lifelong federated reinforcement learning (LFRL). LFRL can make the navigation-learning robots use prior knowledge effectively and adapt to the new environment quickly. A knowledge fusion algorithm (KFA) was designed for upgrading the shared model deployed on the cloud, and the transfer methods are introduced. LFRL is consistent with human cognitive science and suitable for cloud robotic system. Liu et al. [78] proposed an imitation learning framework for cloud robotic systems with heterogeneous sensor data, called federated imitation learning (FIL). FIL can use the knowledge of other robots in the cloud robotic system to improve the efficiency and accuracy of local robots' imitation learning. In addition, a KFA based on RGB images, depth images and semantic segmentation images was proposed, and a transfer method was introduced in FIL. In industrial working environment monitoring, it is very important yet difficult to follow the changing trend of the time series monitoring data when they come from different types of sensors and are collected by different companies. FL structure can not only keep the data privacy but also extract and fuse the trend features of time-series monitoring data of multi-sensors. Hu et al. [79] considered the conduction model and feature aggregation framework in FL, and proposed a trend following method to put all the fusion features of the multi-sensor timeseries monitoring data into the echo state network to realize the multi-sensor electromagnetic radiation intensity time-series monitoring data sampling of the actual mine. Protecting highly sensitive information is the shared responsibility of all parties including hospitals, AI companies, and corresponding regulatory agencies. Chen et al. [80] proposed the first FTL framework for wearable healthcare -FedHealth. FedHealth can achieve accurate and personalized healthcare without compromising privacy security. Xiong et al. [81] established a cross-silo federal drug discovery learning framework based on FATE for predicting drug-related properties and solving the dilemma of small and biased data in drug discovery. Pfohl et al. [82] studied the efficacy of centralized learning and FL in private and non-private environments. The clinical prediction tasks are to predict the prolonged length of stay and the mortality rate of thirty-one hospitals. They found that while training in a centralized setting, differential private stochastic gradient descent can be directly applied to achieve a strong privacy boundary, it is much more difficult to do so in a federated setting. Huang et al. [83] introduced a community-based federated learning (CBFL) algorithm. The algorithm clusters distributed data into clinically meaningful communities that capture similar diagnoses and geographic locations, and learns a model for each community. Li et al. [84] studied the feasibility of applying differential privacy to protect patient data in an FL setting. An FL system was implemented and evaluated for brain tumor segmentation on the BraTS dataset. Duan et al. [85] proposed a joint cloud video recommendation framework based on deep learning -JointRec. It integrates the JointCloud architecture into the mobile IoT to realize joint training among distributed cloud server for video recommendation. Qi et al. [86] proposed a FedNewsRec framework to coordinate a large number of users, and jointly train an accurate news recommendation model from the behavior data of these users without uploading raw data. Muhammad et al. [87] introduced a federated collaborative filtering (FCF) method for personalized recommendations. This method federates the standard collaborative filtering (CF) with stochastic gradient descent. Hartmann et al. [88] introduced an FL system built for use in Firefox. Users can type half a character less to find what they want. Samarakoon et al. [89] proposed a distributed, FL-based, joint power and resource allocation (FL-JPRA) framework for enabling ultra-reliable and low-latency vehicular communication. An FL mechanism is proposed in which vehicular users partially estimate the tail distribution with the help of roadside units. Liu et al. [90] proposed an FL-based recurrent unit neural network algorithm (FedGRU) for predicting traffic flow. Because of the low cost and easy implementation of localization based on received signal strength fingerprints (RSSFs), many studies have been conducted. It has promoted the emergence of many commercial applications based on localization services. Ciftler et al. [91] proposed a localization technology based on FL and RSSFs (FL-RSSF) to provide privacy-preserving crowdsourcing for localization. A new collaborative positioning and location data processing framework, FedLoc, is proposed, and all the building blocks required to build this framework were reviewed [92] . They put more efforts into the actual user cases of FedLoc and their implementation. Bakopoulou et al. [93] applied a Federated SVM (F-SVM) for Mobile Packet Classification, which allows mobile devices to collaborate and train global models without sharing the original training data. A reduced feature space, HTTP key, is proposed, which limits the sensitive information shared by users. Liu et al. [94] proposed a blockchain-based payment system, FedCoin, to enable FL. It can mobilize free computing resources in the community to perform the expensive computing tasks required by the FL incentive plan. FedCoin can correctly determine the contribution of the FL client to the global FL model based on the Shapley value, and has an upper limit on the computing resources required to reach an agreement. In this information age, the continuous generation of data has brought the problem of finding a needle in a haystack to determine useful data from a bunch of irrelevant data. Doku et al. [95] proposed a consensus mechanism called proof of common interest (PoCI) to store the most relevant data found when users interact with mobile devices by combining the trust mechanism of blockchain and FL. Through joint learning, the challenge of using image data owned by different organizations to establish an effective visual target detection model is solved. Liu et al. [96] built a FedVision platform, an end-to-end ML engineering platform that supports the easy development of FLpowered computer vision applications. The challenge of using image data owned by different organizations to establish an effective visual target detection model is solved with FL. How to accurately detect and classify targets and perfectly combine the corresponding virtual content with the real world is a major challenge for AR technology. Chen et al. [97] proposed a framework combining FL and MEC, FL-MEC, to solve the corresponding challenge. The amount of labeled data collected in smart cities is small, and there is a lot of unlabeled data. Albaseer et al. [98] proposed a semi-supervised federated edge learning method, called FedSem, to utilize unlabeled data in smart cities. FedSem can use unlabeled data to improve learning performance, even if the ratio of labeled data is low. Saputra et al. [99] proposed a federated energy demand learning method that allows charging stations to share their information without exposing the real dataset. The clusterbased energy demand learning method is applied in charging stations to further improve the accuracy of energy demand prediction. Nguyen et al. [100] developed a federated self-learning anomaly detection system for IoT -DÏOT, to use the unlabeled crowdsourcing data captured in the customer's IoT to learn anomaly detection models independently. Leroy et al. [101] studied the resource-constrained wake word detector with FL on crowdsourced speech data. Using an adaptive averaging strategy instead of a standard weighted model averaging can greatly reduce the number of communication rounds required to achieve the target performance. Liang et al. [102] presented an online federated reinforcement learning transfer process for real-time knowledge extraction. In this process, all participants will make corresponding actions based on the knowledge of others. As illustrated by the FL-transformed manufacturing in Figure 2 , FL could be applied to the entire product life cycle. FL also gives small data users (such as SMEs) an opportunity to make full use of intelligence. Specially to our understanding, FL could be seamlessly integrated into the following industrial applications:  Product recommendation systems. In the non-FL setting, manufacturers can only make product recommendations rely on their own sales. Companies should obtain more accurate recommendation services if they utilize FL mechanism to train the recommendation model.  Industrial equipment health monitoring. Modern industrial equipment is being connected to the Internet via IoT, and their health status can be monitored by big data intelligence. However, few companies have data enough for supporting data intelligence. In this case, industrial companies with similar equipment can apply FL mechanism to harvest federated intelligence for monitoring equipment's health more accurately.  AR/VR-guided operations. AR/VR has been widely used in industries, such as remote operation guidance, virtual assembly and machine operation training. Industrial companies can use FL strategies to train optimal models to improve the accuracy of detecting objects.  Precise robotics collaboration. Traditional RFID-based positioning accuracy is not high. RSSF positioning based on FL can achieve higher accuracy. This FL-enhanced precise positioning can be applied to robotics collaboration.  Industrial environmental monitoring. It is very important yet difficult to track time-series monitoring data on industrial environment collected by different types of sensors and different companies. At the same time, the privacy of data on the operating environment needs to be protected. We can utilize FL strategy to solve such problems.  Product defect detection. DL has a broad application prospect in the field of automatic detection. One of the biggest challenges of applying DL based methods to product defect detection is the lack of data samples for classification task of defect detection. Multiple enterprises that produce similar products can be attracted to join FL to realize sample expansion.  Optimal supply chain scheduling. Traditionally, the data on sales forecast across-regional distributors/industry associations is private. To realize efficient supply chain scheduling, manufacturers can encourage suppliers to participant in FL to extract the optimal model for predicting demand orders, supply quantity, inventory, and supply schedule.  Generative product design. The design data from different companies are only available to themselves for privacy reasons. To shorten the design cycle and reduce design iterations, FL is expected for companies to optimize the generative product design process across enterprises based on the modeling of the human/machine/material resources in each enterprise.  Security. Most of the existing AI intrusion detection schemes for IIoT are designed based on a strong assumption that there are always enough high-quality network attack instances for IIoT [77] . However, in real-world scenarios, a company usually has only a limited number of attack cases, which makes it a great challenge to build a model. In addition, companies are usually reluctant to share such attack instances (including normal behavior instances) with third parties, because these data always involve their highly sensitive information. Intrusion detection schemes based on FL can be used to solve this problem. In this paper, we revisit FL from the perspective of Industry 4.0 emphasizing its application in advancing intelligent manufacturing. To facilitate a common understanding of the FL paradigm, we elaborate and update relevant concepts of the roles, algorithms, tools used in FL, such as learner, organizer, local model, federal model, etc. With the comprehensive survey, the state of the art of FL on fundamental FL research is analyzed from eight topics and further work and challenges are presented. Before reviewing the FL applications in advancing more than thirteen economic sectors, we present the paradigm of FLtransformed manufacturing. Clearly, more attention should be paid on the investigation of integrating FL into Industry 4.0. Meanwhile, we list some industrial areas for IIoT researchers and practitioners into which FL could be seamlessly and immediately integrated. Our other findings are summarized as follows:  Recently, the attention and research on FL have increased exponentially. However, there is not much research on Industry 4.0 and smart manufacturing. This deserves more attention from the industrial academia and practice on FL.  The fundamental research corresponding to the recent applications is distributed in the eight areas, and most of them focus on data distribution, model optimization, and privacy protection. However, privacy protection lacks a measurement standard and the suitable quantitative evaluation is missed. We initially present and define the problem in this paper. On the other hand, there are few benchmarks and tool platforms. It can be seen that FL is still in its infancy stage.  Almost all the surveyed applications are based on CFL. Most of them are based on HFL. Few are based on VFL and FTL, which needs more attentions and efforts in the future.  The application is increasing in the IIoT, such as fault prediction, device failure detection, cloud robotic system, etc. However, there is huge potential space for FL to accelerate PLCM in the context of IIoT. Other applications mainly fall into categories of healthcare & medical and recommendation systems. Medical care focuses on drug discovery, medical image processing, privacy preservation of electronic health records, and activity recognition. Recommendations include entertainment, news, videos, and automatic text input on the browser. Federated optimization: distributed machine learning for on-device intelligence Recommendations for implementing the strategic initiative Industrie 4.0 Federated machine learning: concept and applications Federated learning: a survey on enabling technologies, protocols, and applications Federated learning: challenges, methods, and future directions A systematic literature review on federated machine learning: from a software engineering perspective A Survey on federated learning Systems: vision, hype and reality for data privacy and protection A survey on security and privacy of federated learning Revisiting industry 4.0 with a case study Industrial edge computing, enabling embedded intelligence A brief history of artificial intelligence: on the past, present, and future of artificial intelligence Automated design of both the topology and sizing of analog electrical circuits using genetic programming Deep learning ImageNet classification with deep convolutional neural networks Distributed machine learning MLbase: a distributed machine-learning system From distributed machine learning to federated learning: In the view of data privacy and security A blockchained federated learning framework for cognitive computing in industry 4.0 networks From federated learning to federated neural architecture search: a survey A secure federated transfer learning framework Federated selfsupervised learning of multisensor representations for embedded Intelligence Towards federated unsupervised representation learning Advances and open problems in federated learning Federated AI for the enterprise: a web services based implementation Federated learning for mobile keyboard prediction Blockchain-based federated learning for device failure detection in industrial IoT Dynamic fusion based federated learning for COVID-19 detection Federated forest Multi-participant multi-class vertical federated learning Consortium blockchain for secure resource sharing in vehicular edge computing: a contract-based approach A survey on transfer learning Cartel: a system for collaborative transfer learning at the edge Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems Quantifying the performance of federated transfer learning Comprehensive privacy analysis of deep learning: passive and active white-box inference attacks against centralized and federated learning On data banks and privacy homomorphisms Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption A training-integrity privacy-preserving federated learning scheme with trusted execution environment Differentially private federated learning: a client level perspective A hybrid approach to privacy-preserving federated learning Blockchain empowered asynchronous federated learning for secure data sharing in Internet of vehicles Blockchained on-device federated learning Decentralized privacy using blockchain-enabled federated learning in fog computing Communication-efficient learning of deep networks from decentralized data Asynchronous federated optimization On the convergence of fedavg on non-iid data Approaches to address the data skew problem in federated learning Realizing the heterogeneity: a self-organized federated learning framework for IoT Robust federated learning via collaborative machine teaching Braintorrent: a peer-to-peer environment for decentralized federated learning Decentralized federated learning: a segmented gossip approach Gossip learning as a decentralized alternative to federated learning A Decentralized Federated Learning Approach for Connected Autonomous Vehicles A blockchainbased decentralized federated learning framework with committee consensus Communication-efficient federated deep learning with layerwise asynchronous model update and temporally weighted aggregation Asynchronous online federated learning for edge devices Robust and communication-efficient federated learning from non-i.i.d. data CEFL: Online admission control, data scheduling, and accuracy tuning for cost-efficient federated learning across edge nodes CMFL: Mitigating communication overhead for federated learning Three approaches for personalization with applications to federated learning Personalised federated learning for intelligent IoT applications: a cloud-edge based framework Adaptive personalized federated learning Improving federated learning personalization via model agnostic meta learning Personalized federated learning: A meta-learning approach A fairness-aware incentive scheme for federated learning Motivating workers in federated learning: a stackelberg game perspective Incentive mechanism for reliable federated learning: a joint optimization approach to combining reputation and contract theory Mechanism design for an incentive-aware blockchain-enabled federated learning platform FL-QSAR: a federated learning based QSAR prototype for collaborative drug discovery Machine learning for all: a more robust federated learning framework Flower: a friendly federated learning research framework Edge AIBench: towards comprehensive end-to-end edge computing benchmarking Failure prediction in production line based on federated learning: an empirical study Federated learning for machinery fault diagnosis with dynamic validation and self-supervision Deep anomaly detection for time-series data in industrial IoT: a communication-efficient on-device federated learning approach Adaptive federated learning and digital twin for industrial Internet of things DeepFed: federated deep learning for intrusion detection in industrial cyber-physical systems Federated imitation learning: a privacy considered imitation learning framework for cloud robotic systems with heterogeneous sensor data Model and feature aggregation based federated learning for multi-sensor time series trend following FedHealth: a federated transfer learning framework for wearable healthcare Facing small and biased data dilemma in drug discovery with federated learning Federated and differentially private learning for electronic health records Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records Privacy-preserving federated brain tumour segmentation JointRec: a deeplearning-based joint cloud video recommendation framework for mobile IoT Privacy-preserving news recommendation model learning Federated collaborative filtering for privacypreserving personalized recommendation system Federated learning for ranking browser history suggestions Distributed federated learning for ultra-reliable low-latency vehicular communications Privacy-preserving traffic flow prediction: a federated learning approach Federated learning for localization: a privacy-preserving crowdsourcing method FedLoc: federated learning framework for data-driven cooperative localization and location data processing A federated learning approach for mobile packet classification FedCoin: a peer-to-peer payment system for federated learning Towards federated learning approach to determine data relevance in big data FedVision: an online visual object detection platform powered by federated learning Federated learning based mobile edge computing for augmented reality applications Exploiting unlabeled data in smart cities using federated edge learning Energy demand prediction with federated learning for electric vehicle networks DÏoT: a federated self-learning anomaly detection system for IoT Federated learning for keyword spotting Federated transfer reinforcement learning for autonomous driving