key: cord-0195854-6y9x2rpt authors: Liu, Boyi; Yan, Bingjie; Zhou, Yize; Wang, Jun; Liu, Li; Zhang, Yuhan; Nie, Xiaolan title: A Real-time Contribution Measurement Method for Participants in Federated Learning date: 2020-09-08 journal: nan DOI: nan sha: 8f312c9c01668b1133b8844250d16228db4136fd doc_id: 195854 cord_uid: 6y9x2rpt In recent years, individuals, business organizations or the country have paid more and more attention to their data privacy. At the same time, with the rise of federated learning, federated learning is involved in more and more fields. However, there is no good evaluation standard for each agent participating in federated learning. This paper proposes an online evaluation method for federated learning and compares it with the results obtained by Shapley Value in game theory. The method proposed in this paper is more sensitive to data quality and quantity. Today, there are many models that support smart behavior on mobile devices. For example, image classification can be used on mobile phones and tablets to predict. Its photos are most likely to be viewed or shared multiple times in the future. You can also use the language model to predict the next word or even the entire reply [1] . Thereby it improves the speech recognition and text input functions on the touch screen keyboard. In addition, service models such as insurance companies that use data models to integrate information in order to obtain better feedback also use related models [2] . There are some problems in these industries or smart terminals. That is, they all have more or less information leakage and privacy sensitivity in the process of achieving their goals. In image classification and language models, all photos taken by users and potential training data for everything they type on their mobile keyboards (including passwords, messages, etc.) can be sensitive to privacy. In the insurance industry, many companies have been concerned about protecting their data and are reluctant to share it with other entities. And by introducing federated learning(FL), data from multiple agents can be used to solve privacy issues in machine learning. Federated learning(FL) is the equivalent of creating an information system. It can be used by multiple agents to build models while protecting data privacy for participants [3] . Federated learning is not a direct transfer of data to a centralized data warehouse used to build machine learning models. Instead, it allows agents to have data in their place, and still allows agents to build machine learning models together. This can be achieved in the following ways. By building a central model from the sub-models built by the agents, only the model parameters are transmitted. Or by using encryption technology to allow secure communication between different agents. In order to make federated learning this system works effectively. It is necessary to encourage different agents to contribute their data and join collaborative alliances. In this process, credit allocation and contribution reward mechanisms are essential for federated learning to motivate current and potential participants. Fairly measuring the contribution of all agents in federated learning can achieve fair distribution. It plays a key role in the validity and practicability of the model for the data contributed by all agents. Therefore, we need a way to fairly measure overall data quality. In order to determine the contribution of all agents involved in model training. We introduce a contribution incentive mechanism combined with federated learning to achieve this goal. It can be well optimized for federated learning. In the distributed case, the attention mechanism of model aggregation is used to calculate the attention weight of all agents in federated learning. Each agent trains locally and uploads the parameters to the server. The server then adjusts to the central model and distributes it to the agents. In the context of the attention mechanism, "attention" is assigned to each agent. The server then updates the parameters based on the "attention" of each agent. In this way, we can calculate the impact of each agent on the server, and get the impact of each agent on each layer of the server model. Then, the initial and final values of the server parameters are used to obtain the changed value of each parameter of the server. Finally, a mathematical formula is established to solve using the attention-based RNN model [4] . The evaluator can compare the ideal results in the data transfer update to determine the contributions of all agents. And evaluate the contribution of each agent. The method proposed in this paper is an important attempt to study model contributions and reward allocation in the framework of federated learning. In this paper, we introduce the model aggregation attention mechanism in the framework of federated learning. So as to determine the contribution of all agents. The advantages of proposed method over the previous federated learning original paper simple sampling and averaging method are: 1) consider the relationship between the server model and the agents model and their weights, 2) optimize the server model and the agents model in the parameter space distance to learn a generic server model. Our contributions in this paper are as follows: • We propose a method to measure agents' contribution and compare this method with Shapley Value. • The method we propose is sensitive to data volume and data quality, and can be used for mutual comparison between agents. • In the training process, the contribution to each agent can be obtained in time, with low computational complexity. This paper refers to federated learning, contribution evaluation mechanism, attention mechanism. Companies applying federated learning can apply this contribution incentive mechanism in the case of commission distribution based on agents contribution. The training data will be distributed on each mobile device, not all of them will be sent to the central server, and only the updated data on each device will be aggregated to the central server. After federated optimization, the central server returns to the global state of each device, and continues to accept the updated model parameters calculated by each agent in the new global state. This method is Federated Learning [5] . Federated Learning [6] can solve the problem of unprotected large-scale private data and complete updating learning of devices without exchanging large amounts of data. This centralized training model approach provides privacy, security, regulation, and economic benefits [7] . Federated Learning presents new statistical and system challenges when training models on distributed device networks [8] . Federated Learning, which relies on scattered data, brings many aspects of research: Fei Chen et al. identified the combination of Federated Learning and Meta-learning as a major advance in Federated Learning [9] . Konstantin Sozinov et al. have made some progress in applying Federated Learning to human activity identification [10] . The way to assign different excitation by measuring the contribution of different agents to global optimization is called contribution mechanism. Guan Wang et al. measured the contributions of all agents from the perspectives of horizonal federated learning and vertical federated learning [11] . They used the deletion method to calculate the influence of grouping instances on the horizontal federated learning and the Shapley value on the vertical federated learning to calculate the characteristic values of grouping, which equitably and accurately realized the measurement of contributions of all agents. We effectively optimize Federated Learning by distributing different incentives to agents based on their contributions. Contribution incentive mechanism has also been widely used and improved. Robin C.goyer et al. proposed a federated learning optimization algorithm based on differential privacy protection of the agents that can hide the contribution of the agents [12] . In order to balance the contribution of the soft training model and ensure the collaborative convergence, Zirui Xu et al. proposed the corresponding parameter aggregation scheme [13] . Shaoxiong Ji et al. also investigated attention mechanisms and proposed a new model of aggregation. This method considers the contribution of the agents model to the global model and combines optimization techniques in the process of server aggregation. Their method minimizes the weighted distance between the server and the agents and improves the communication efficiency [14] . The attention mechanism is just a vector, used for directed perception, derived from the study of computer vision. Our experiment is based on the RNN model of attention. In order to solve the above issues, we propose a corresponding method which does not need to obtain agents data and data scale and can measure agents contribution. This method of measuring agents contributions is suitable for evaluating agents at a particular time point. In this section, we will introduce the basic architecture of federated learning and how parameters are updated. federated learning is a distributed learning method in which the server maintains an overall primary model and distributes it to individual user terminals. For privacy problems, the server can not obtain the data of the user terminal, so the computing power of the user terminal is used to learn at the user local terminal. A server will set up a fraction C, to extract the user terminal proportionally for the server's central model update. Then the extracted user's improved model parameters are uploaded to the server for parameter updating of the server model. It is then distributed to the user terminal to improve the model of the user terminal. In this way, we continue to improve the central model of the server and the local model of the user terminal. Under the premise of ensuring the correctness and privacy of the user terminal, using this method can make use of the computing power of the user terminal and a large amount of user data for learning, and at the same time maintain an excellent central model. Algorithm 1 Federated Averaging Algorithm 1: Server model update 2: K is the total number of agents, w t is the parameters of current central server model, w k t is the parameters of current agents model with index k. 3: Input: server parameters w t at t, agent parameters w k t+1 at t + 1 4: Output: server parameter w t+1 at t + 1 5: Initialize w 0 6: for each round t = 1, 2, · · · do 7: m ← max{C · K, 1} 8: s t ← random set of m agents from all K agents 9: for each k ∈ S t do 10: w k t+1 ← ClientUpdate(k, w k t ) //agents update model on local device 11: end for 12: w t+1 ← ServerOptimization(w t ,w k t+1 ) 13: end for 1: Local model update 2: K is the total number of agents, B is the local minibatch size, E is the number of local epochs, S is a set of all agents, w t is the parameters of current central server model, w k t is the parameters of current agents model with index k, and η is the learning rate. end for 10: end for 11: return w to server In this section, we will introduce a FedAtt algorithm proposed by Shaoxiong Ji, Shirui Pan, Guodong Long et al. , which is outperformed to the common FedAvg and FedSGD algorithm. The proposed attention mechanism in Federated Learning is used after the agents transmitted the parameters to the server, the server uses (1) to calculate the value of attention α of each layer parameter that should be allocated to the agent, and multiplies each layer parameter by the corresponding attention to update the central model. The server integrates the update parameters passed by the agents according to theirs own parameter information, and finally completes an update of the central model through the communication between the server and the agents. The s k l is the norm difference from the central model. We use w l to represent layer l parameters of the server and w l k for lth layer parameters of the terminal model of user k. So the definition of s k l is as follows So for m user terminals that are selected to update the central model, the method of updating becomes To protect the users data privacy, you can add the randomized mechanism before the agent passes parameters to the server. Randomly generate a random vector that obeys the standard distribution N (0, σ 2 ), multiply the corresponding weight β, the results of the final update parameters as shown in Formula(4) The implementation process of the whole algorithm is shown as Algorithm 2 In this section, the method for measuring agents contribution proposed in this paper will be introduced in detail. Before calculating each agent contribution, we make the following assumptions: we think each terminal does not tamper with the updated gradient itself during the previous process of passing and updating parameters between the server and the agents. The agents contribution to the server should be calculated after each layer of agent parameter attention. When the server has calculated the attention of agent K, it can calculate the impact of this agent K on the server model parameters in the T update. We define af t k t as follows: Algorithm 2 Attentive Federated Optimization 1: l is the ordinal of neural layers; is the stepsize of server optimization 2: Input: server parameters w t at t, agents parameters w 1 t+1 , w 2 t+1 , · · · , w m t+1 at t + 1 3: Output: server parameters w t+1 at t + 1 4: procedure ATTENTIVE OPTIMIZATION 5: Initialize attention α = {α 1 , α 2 , · · · , α m } 6: for each layer l in model do do 7: for each agents k ∈ S t from 1 to m do do where γ ∈ (0, 1) is forgetting coefficient, because large variations in the early stage of the training model, the model is not stable, and the impact of the previous endpoint should be reduced after multiple iterations. If the agent k is not in the selected M agents this time, we think that the impact of the agent t round is af t k t = af t k t−1 , that is, only the number of rounds that the agent participates in the update is calculated. Note that the server's attention to the agents are the attention of each layer, so when calculating the impact it also calculates the impact of each layer by reweighting and averaging. We will be each terminal attention MinMaxScaler normalized limited range, then Softmax, to obtain the contribution of each agent. We use con k t to represent the contribution of agent k at t time. The whole method of measuring agents contribution proposed in this paper is shown in Algorithm 3 Algorithm 3 Measure Agents Contributions 1: Input: l is the ordinal of neural layers; is the stepsize of server optimization, server parameters w t at time t + 1, agents parameters w 1 t+1 , w 2 t+1 , · · · , w m t+1 at time t + 1, γ is forgetting coefficient, S t is selected agents set. 2: Output: Agents contributions con t at t 3: procedure CONTRIBUTION CALCULATION 4: for each agents k from 1 to K do 5: if k ∈ S t then 6: af t k t = af t k t−1 9: end if 10: end for 11: con t = Softmax(MinMaxScaler(af t t )) In this section, we will introduce the verification experiments performed for our proposed agents' contribution evaluation method. The system used in our verification experiment is Ubuntu 18.04 LTS, the backend used is the GPU version of Pytorch, with the NVIDIA GTX1660Ti GPU acceleration for model calculation. The GRU-based agents model used the language processing model of RNN. For a detailed model description, see the subsection C under this Section. We perform experimental verification on public language datasets of Penn Treebank 1 to verify the effectiveness of our proposed algorithm and its sensitivity to special variables. In natural language processing, the LSTM model is often used for processing. We used a smaller GRU-based agent language model. First take texts as input, and convert words into word vectors according to a pre-built dictionary. The converted word vector is then used as an input to the LSTM model, and the prediction result is finally output. This section enumerates the meanings and values of the various parameters involved. At the same time, we believe that the data of each agent is independent and identically distributed. The parameters and their descriptions are listed in Tab. I TABLE I TABLE OF We use testing perplexity as an indicator of the evaluation model. In information theory, perplexity is a measurement of how well a probability distribution or probability model 1 Penn Treebank is available at https://github.com/wojzaremba/lstm/tree/master/data predicts a sample. The perplexity of a discrete probability distribution is defined as: In the above formula, H(p) is the expected probability distribution. If the prediction result of our model is m(x), then the perplexity of the language model is defined as: Obviously, H(p) H(p, m). Therefore, the smaller the perplexity, the more representative the probability distribution can be to better predict the sample distribution. And we compared with the results of Shapley Value evaluation. Shapley Value is originated from coalitional game theory and has proven theoretical properties. It provides a way to measure the impact and contribution of various agents. The definition of Shapley Value is: is the set of all agents, Q ⊂ S = 1, 2, · · · , n is a subset of the agent set S, i is the index of the agents, || represents the size of the set, ∆ Q (x) = af t Q denotes the affect of agent set Q. Because the complexity of directly calculating Shapley value is too high, we use the following estimation method to get the ϕ i (x) of each Agents where M is the number of iterations. ∆ Q m (x) denotes the affect of random set Q m . Finally, all the data obtained by Shapley Value is subjected to the same regularization processing as ours. To ensure the fairness of the evaluation contribution in this experiment, we ensure that each agent is drawn the same number of times. We randomly divided the data in the dataset, and distributed them evenly to all agents. Then, the corresponding special processing is performed on the data of the last few agents: reduce the amount of data by 30% and 70%, randomly generate the word sequence. Comparing the evaluation results after special processing with the results when unprocessed data, it is concluded that the sensitivity of variables such as data size and data quality is evaluated. 1) Experimental results of normal evaluation: We did not do any special treatment to any agents, and randomly divided the data set into each agent. The result is shown in Fig. 2 .In the figure, the abscissa is the index of agents, and the ordinate is the agents' contribution ratios. It can be seen that the deviation of each agent is not large, only individual agents are too high or too low. And the evaluation result is similar to Shapley Value. 2) Experimental results with randomize the word sequence: In this experiment, we modified the datasets of the next four agents into random word sequences. These should be regarded as dirty data by the model, so as to get a smaller return. The results are shown in Fig. 3 . Both the method proposed in this paper and the Shpaley Value method can identify these bad agents and give them a small contribution. 3) Experimental results with reduce the amount of data: To demonstrate the sensitivity of our evaluation method to the amount of data, we performed the following experiments. In this experiment, we reduced the data amount of the last 4 agents by 70%, and the data amount of other agents remained unchanged. As you can see in Fig. 4 , the contribution of the specially treated agents are significantly reduced. But it is relatively not obvious in the evaluation results of Shapley Value. In order to reflect the relative relationship between the amount of data, we processed the data of the last 4 agents: agents with index 16 to 17 reduced the amount of data by 30%, agents with index 18 to 19 reduced the amount of data by 70%. It can be seen in Fig. 5 that the method proposed in this paper can reflect the reduced amount of data, while Shapley Value cannot show it well. A reasonable and fair completion of the assessment of the contribution of each participant in federated learning and the establishment of an incentive mechanism are essential for the development of federated learning. In this paper, we use a "Fe-dAtt" method to train the models on Penn Treebank dataset.Ȧt the same time, we perform several special processing on the agents data, and compare the evaluation results of the special processing with the evaluation results of the unprocessed data. Experimental results show that we use this method to reasonably establish a measurement mechanism to evaluate the sensitivity of indicators such as data size and data quality. The data is evaluated and trained by this method, and the calculated results are real-time and fast, and the contribution rate reflects accurately. For future work, we hope to find a better algorithm to reduce the test confusion of unbundled medium and unbundled large models; at the same time, we should consider the negative contribution of agent evaluation and the comprehensiveness of the evaluation. Preliminary classification (i.e., positive and negative) is carried out at the end to avoid attacks under the federated learning framework mechanism [15] ; at the same time, the concept of game theory is introduced, such as the establishment of a PVCG mechanism on the supply side [16] . Improvements in these directions will promote the implementation and application of the federated learning incentive mechanism. Proceedings of the 29th annual symposium on user interface software and technology Measure contribution of participants in federated learning Federated learning: Strategies for improving communication efficiency Long short-term memory Communication-efficient learning of deep networks from decentralized data Federated learning: Strategies for improving communication efficiency Federated learning with non-iid data Federated multi-task learning Federated meta-learning for recommendation Human activity recognition using federated learning Measure contribution of participants in federated learning Differentially private federated learning: A client level perspective Elfish: Resourceaware federated learning on heterogeneous edge devices Learning private neural language modeling with attentive aggregation How to backdoor federated learning Optimal procurement auction for cooperative production of virtual products: Vickrey-clarkegroves meet cremer-mclean