key: cord-0641268-6mdacb80 authors: Lu, Wang; Wang, Jindong; Chen, Yiqiang; Qin, Xin; Xu, Renjun; Dimitriadis, Dimitrios; Qin, Tao title: Personalized Federated Learning with Adaptive Batchnorm for Healthcare date: 2021-12-01 journal: nan DOI: nan sha: bb9dd7d082bbcaad569d878b07e69631b5c43bc0 doc_id: 641268 cord_uid: 6mdacb80 There is a growing interest in applying machine learning techniques to healthcare. Recently, federated learning (FL) is gaining popularity since it allows researchers to train powerful models without compromising data privacy and security. However, the performance of existing FL approaches often deteriorates when encountering non-iid situations where there exist distribution gaps among clients, and few previous efforts focus on personalization in healthcare. In this article, we propose FedAP to tackle domain shifts and then obtain personalized models for local clients. FedAP learns the similarity between clients based on the statistics of the batch normalization layers while preserving the specificity of each client with different local batch normalization. Comprehensive experiments on five healthcare benchmarks demonstrate that FedAP achieves better accuracy compared to state-of-the-art methods (e.g., 10% accuracy improvement for PAMAP2) with faster convergence speed. M ACHINE learning has been widely adopted in many applications in people's daily life [1] , [2] , [3] . Specifically for healthcare, researchers can build models to predict health status by leveraging health-related data, such as activity sensors [4] , images [5] , and other health information [6] , [7] , [8] . To achieve satisfying performance, machine learning healthcare applications often require sufficient client data for model training. However, with the increasing awareness of privacy and security, more governments and organizations enforce the protection of personal data via different regulations [9] , [10] . In this situation, federated learning (FL) [11] emerges to build powerful machine learning models with data privacy well-protected. Personalization is important in healthcare applications since different individuals, hospitals or countries usually have different demographics, lifestyles, and other healthrelated characteristics [12] , i.e., the non-iid issue (not identically and independently distributed). Therefore, we are more interested in achieving better personalized healthcare, i.e., building FL models for each client to preserve their specific information while harnessing their commonalities. As shown in Fig. 1 , there are three different clients A, B, and C with different statistics of data distributions (e.g., the adult A and the child B may have different lifestyles and activity patterns). Even if federated learning can perform in the standard way, the non-iid issue cannot be easily handled. This will severely limit the performance of existing federated learning algorithms. The popular FL algorithm, FedAvg [13] , has demonstrated superior performance in many situations [14] , [15] . However, FedAvg is unable to deal with non-iid data among different clients since it directly averages the parameters of models coming from all participating clients [16] . There are some algorithms for this non-iid situation. FedProx [17] is designed for non-iid data. However, FedProx only learns a global model for all clients, which means that it is unable to obtain personalized models for clients. FedHealth [18] , another work for personalized healthcare, needs access to a large public dataset, which is often impossible in real applications. FedBN [19] handles the non-iid issue by learning local batch normalization layers for each client but ignores the similarities across clients that can be used to boost the personalization. In this article, we propose FedAP, a personalized feder- With the rapid development of perception and computing technology, people can make use of machine learning to help doctors diagnose [20] and assist doctors in the operation [21] , etc. Many methods are proposed to monitor people's health state [22] and diagnose diseases that may even have better performance than doctors', especially in the field of medical images [23] . Moreover, machine learning can make disease warnings via daily behavior supervision with simple wearable sensors [24] . For instance, certain activities in daily life reflect early signals of some cognitive diseases. Through daily observation of gait changes and finger flexibility, the machine can tell people whether they are suffering from Parkinson [25] . In addition, some studies worked for better personalization in healthcare [26] , [27] . Unfortunately, a successful healthcare application needs a large amount of labeled data of persons. However, in real applications, data are often separate and few people or organizations are willing to disclose their private data. In addition, an increasing number of regulations, such as [9] , [10] , hold back the leakages of data. These make different clients cannot exchange data directly, and the scattered data forms separate data islands, which makes it impossible to learn a traditional model with aggregated data. Federated learning is a usual way to combine each client's information while protecting data privacy and security [11] . It was first proposed by Google [13] , where they proposed FedAvg to train machine learning models via aggregating distributed mobile phones' information without exchanging data. The key idea is to replace direct data exchanges with model parameter-related exchanges. FedAvg is able to resolve the data islanding problems. Although federated learning is an emerging field, it has attracted much attention [28] , [29] . Federated learning can be divided into horizontal federated learning, vertical federated learning, and federated transfer learning according to the characteristics data. When the client features of the two datasets overlap a lot but the clients overlap little, horizontal federated learning can be applied [13] . In the horizontal federated learning, datasets are split horizontally and the clients share the same features finally. For example, Smith et al. [30] proposed a novel systems-aware optimization method, MOCHA, to solve security problems in multitasking. When the client features of the two datasets overlap little but the clients overlap a lot, we can utilize vertical federated learning, where different clients have different columns of the features [31] . For example, Cheng et al. [32] proposed a novel lossless privacy-preserving tree-boosting system known as SecureBoost to jointly conduct over multiple parties with partially common client samples but different feature sets. When the clients and client features of the two datasets both rarely overlap, federated transfer learning is often utilized [33] , [34] . For example, Yoon et al. [35] proposed a novel federated continual learning framework, Federated Weighted Inter-client Transfer (FedWeIT), which decomposed the network weights into global federated parameters and sparse task-specific parameters. In [35] , each client received selective knowledge from other clients by taking a weighted combination of their task-specific parameters. In addition, many methods, such as differential privacy, are proposed to protect data further [36] , [37] . In this paper, we mainly focus on horizontal federated learning when the training data are not independent and identically distributed (Non-IID) on the clients. Although FedAvg works well in many situations, it may still suffer from the non-iid data and fail to build personalized models for each client [30] , [38] , [39] . A survey about federated Learning on non-iid Data can be found here [40] . FedProx [17] tackled data non-iid by allowing partial information aggregation and adding a proximal term to FedAvg. [41] aggregated the models of the clients with weights computed via L 1 distance among client models' parameters. These works focus on a common model shared by all clients while some other works try to obtain a unique model for each client. [42] exchanged information of base layers and preserved personalization layer to combat the ill-effects of non-iid. [43] utilized Moreau envelopes as clients' regularized loss function and decoupled personalized model optimization from the global model learning in a bi-level problem stylized for personalized FL. [44] evaluated three techniques for local adaptation of federated models: fine-tuning, multi-task learning, and knowledge distillation. [45] also proposed and analyzed three approaches: user clustering, data interpolation, and model interpolation. [46] tried to jointly learn compact local representations on each device and a global model across all devices with a theoretic analysis. [47] proposed APFL where each client would train their local models while contributing to the global model. Another work [48] , Clustered Federated Learning (CFL), grouped the client population into clusters with jointly trainable data distributions. Two works most relevant to our method are FedHealth [18] and FedBN [19] . FedHealth needs to share some datasets with all clients while FedBN used local batch normalization to alleviate the feature shift before averaging models. Although there are already some works to cope with data non-iid, few works pay attention to feature shift non-iid and other shifts at the same time and obtaining an individual model for each client in healthcare. Batch Normalization (BN) [49] is an important component of deep learning. Batch Normalization improves the performance of the model and has a natural advantage in dealing with domain shifts. Li et al. [50] proposed an adaptive BN for domain adaptation where they learned domainspecific BN layers. Nowadays, researchers have explored many effects of BN, especially in transfer learning [51] . FedBN [19] is one of few applications of BN in the field of FL field. However, FedBN does still not make full use of BN properties, and it does not consider the similarities among the clients. In federated learning, there are N different clients (organizations or users), denoted as {C 1 , C 2 , · · · , C N } and each client has its own dataset, i.e. Obviously, n i = n tr i + n te i and Our goal is to aggregate information of all clients to learn a good model f i for each client on its local dataset D i without private data leakage: where is a loss function. There are mainly two challenges for personalized healthcare: data islanding and personalization. Following Fe-dAvg [13] and some other traditional federated learning methods [52] , [53] , it is easy to cope with the first challenge. Personalization is a must in many applications, especially in healthcare. It is better to train a unique model in each client for personalization. However, one client often lacks enough data to train a model with high accuracy in federated learning. In addition, clients do not have access to the data of other clients. Overall, it is a challenge that how to achieve personalization to obtain high accuracy in federated learning. As mentioned in [50] , batch normalization (BN) layers contain sufficient statistics (including mean and standard deviation) of features (outputs of layers). Therefore, BN has been utilized to represent distributions of training data indirectly in many works [50] , [54] . We mainly use BN to represent the distributions of clients. Therefore, on the one hand, we utilize local BN to preserve clients' feature distributions. On the other hand, we also use BN-related statistics to calculate the similarity between clients for better personalization with weighted aggregation 1 . In this paper, we propose FedAP (Adaptive Federated Learning) to achieve accurate personal healthcare via adaptive batch normalization without compromising data privacy and security. Fig. 2 gives an overview of its structure. Without loss of generality, we assume there are three clients, which can be extended to more general cases. The structure mainly contains five steps: 1) The server distributes the pre-trained model to each client. 2) Each client computes statistics of the outputs of specific layers according to local data. 3) The server obtains the client similarities denoted by weight matrix W to guide aggregation. 4) Each client updates its own model with the local train data and pushes its model to the server. 5) The server aggregates models and obtains N models delivered to N clients respectively. For stability and simplicity, we only calculate W once and we show that computing once is enough to achieve acceptable performance in experiments. Note that all processes do not involve the direct transmission of data, so FedAP avoids the leakage of private data and ensures security. The keys of FedAP are obtaining W and aggregating the models. We will introduce how to compute W after describing the process of model aggregation. We denote the parameters of each model f i as θ i = φ i ∪ ψ i , where φ i corresponds to the parameters of BN layers specific to each client and ψ i is the parameters of the other layers (colored blocks in Fig. 3 ). W is an N × N matrix, which describes the similarities among the clients. w ij ∈ [0, 1] represents the similarity between client i and client j: the larger w ij is, the more similar the two clients are. Fig. 3 demonstrates the process of model aggregation. As shown in Fig. 3 , φ i is particular while ψ i is computed according to w i , where w i means the i−th row of W, and ψ, . φ i is BN parameters that are not shared across clients while ψ i is other parameters that are shared. 1: Distribute f to each client 2: Each client computes its statistics (µ i , σ i ), where µ i represents the mean values while σ i represents the covariance matrices. Push (µ i , σ i ) to the server 3: Compute W according to the statistics 4: Update clients' model with local data. Push updated (2) and distribute them to the corresponding clients 6: Repeat steps 4 ∼ 5 until convergence or maximum round reached Let θ t i = φ t i ∪ ψ t i represent the parameters of the model from client i in the round t. After updating θ t i with the local data from the i−th client, we obtain updated parameters θ t * i = φ t * i ∪ ψ t * i . We use the * notation to denote updated parameters. Then, for aggregation on the server, we have the following updating strategy: The overall process of FedAP is described in Algorithm 1. In the next sections, we will introduce how to compute the weight matrix W. In this section, we will evaluate the weights with a pretrained model f and propose two alternatives to compute the weights. We mainly rely on the feature output statistics of clients' data in the pre-trained network to compute weights. We denote with l ∈ {1, 2, · · · , L} in superscript notations the different batch normalization layers in the model. And z i,l represents the input of l−th batch normalization layer in the i−th client. The input of the classification layer in the i−th client is denoted as z i which represents the domain features. We assume z i,l is a matrix, z i,l c i,l ×s i,l where c i,l corresponds to the channel number while s i,l is the product of the other dimensions. Similarly, z i = z i ci×si . We feed D i into f , and we can obtain z i,l c i,l ×s i,l . Obviously, s i,l = e × n i where e is an integer. Now, we try to compute statistics on the channels, and we treat z i,l as a Gaussian distribution. For the l−th layer of the i−th client, it is easy to obtain its distribution, N (µ i,l , σ i,l ). We only compute statistics of inputs of BN layers. And the BN statistics of the i−th client is formulated as: Now we can calculate the similarity between two clients. It is popular to adopt the Wasserstein distance to calculate the distance between two Gaussian distributions: where tr is the trace of the matrix. Obviously, it is costly and difficult to perform efficient calculations. Similar to BN, we perform approximations and consider that each channel is independent of the others. Therefore, σ i,l is a diagonal matrix, i.e. σ i,l = Diag(r i,l ). Therefore, we compute the approximation of Wasserstein distance as: Thus, the distance between two clients i, j is computed as: Large d i,j means the distribution distance between the i−th client and the jth client is large. Therefore, the larger d i,j is, the less similar the two clients are, which means the smaller w i,j is. And we setw i,j as the inverse of d i,j , i.e. w i,j = 1/d i,j , j = i. Normalizew i and we havê For stability in training, we take ψ t * into account for ψ t+1 . We update ψ t+1 in a moving average style, and we set w i,i = λ. Therefore, We denote this weighting method as the original FedAP. Similarly, we can obtain the corresponding W using only the last layer z i and we denote this variant as d-FedAP. In some extreme cases, there may not exist a pre-trained model. In this situation, we can evaluate weights with models trained from several rounds of FedBN [19] . As we can see from Fig. 4 , the running mean of the BN layer has a positive correlation with the statistical mean of the corresponding layer's inputs. And the variance has a similar relationship. From this, we can use running means and running variances of the BN layers instead of the statistics respectively. Therefore, we can perform several rounds of FedBN [19] which preserves local batch normalization, and utilize parameters of BN layers to replace the statistics when there does not exist a pre-trained model. We denote this variant as f-FedAP. We evaluate the performance of FedAP on five healthcare datasets in time series and image modalities 2 . The statistical information of each dataset is shown in TABLE 1. We adopt a public human activity recognition dataset called PAMAP2 [57] . The PAMAP2 dataset contains data of 18 different physical activities, performed by 9 subjects wearing 3 inertial measurement units and a heart rate monitor. We use data of 3 inertial measurement units which are collected at a constant rate of 100Hz to form data containing 27 channels. We exploit the sliding window technique and filter out 10 classes of data 3 . In order to construct the problem situation in FedAP, we use the Dirichlet distribution as in [58] to create disjoint non-iid splits. client training data. Fig. 5 3. We split PAMAP2 in this style mainly for two reasons. On the one hand, the data numbers of the subjects are different which may introduce some other problems, e.g. some clients cannot be adequately evaluated. On the other hand, this splitting routing is widely adopted in much work [58] , [59] . We select 10 classes with the most samples. data are used to train and the remaining data are for testing as in [19] . We also adopt a public COVID-19 posterioranterior chest radiography images dataset [60] . This is a combined curated dataset of COVID-19 Chest X-ray images obtained by collating 15 public datasets and it contains 9,208 instances of four classes (1,281 COVID-19 X-Rays, 3,270 Normal X-Rays, 1,656 viral-pneumonia X-Rays, and 3,001 bacterial-pneumonia X-Rays) in total. In order to construct the problem situation in FedAP, we split the dataset similar to PAMAP2. Fig. 5 (e) visualizes how samples are distributed among 20 clients for COVID-19. Note that this dataset is more unbalanced in classes which is an ideal testbed to test the performance under label shift (i.e., imbalanced class distribution for different clients). In each client, half of the data are used to train and the remaining data are for testing. MedMNIST [61] , [62] is a large-scale MNISTlike collection of standardized biomedical images, including 12 datasets for 2D and 6 datasets for 3D. All images are 28 × 28 (2D) or 28 × 28 × 28 (3D). We choose 3 datasets which have most classes from 12 2D datasets: OrganAM-NIST, OrganCMNIST, OrganSMNIST [63] , [64] . These three datasets are all about Abdominal CT images and all contain 11 classes. There are 58,850, 23,660 and 25,221 samples respectively. As operations in PAMAP2, each dataset is split into 20 clients with Dirichlet distributions, and Fig. 5 (a)-5(c) visualizes how samples are distributed for OrganAMNIST, OrganCMNIST, and OrganSMNIST respectively. In each client, half of the data are used to train and the remaining data are for testing. For PAMAP2, we adopt a CNN for training and predicting. The network is composed of two convolutional layers, two pooling layers, two batch normalization layers, and two fully connected layers. For three MedMNIST datasets, we all adopt LeNet5 [65] . For COVID-19, we adopt Alexnet [66] . We use a three-layer fully connected neural network as the classifier with two BN layers after the first two fully connected layers following [19] . For model training, we use the cross-entropy loss and SGD optimizer with a learning rate of 10 −2 . If not specified, our default setting for local update epochs is E = 1 where E means training epochs in one round. And we set λ = 0.5 for our method, since we can see that λ has few influences on accuracy and it only affects convergence speeds in the appendix. In addition, we randomly select 20% of the data to train a model of the same architecture as the pre-trained model. We run three trials to record the average results. We compare three extensions of our method with five methods including common FL methods and some FL methods designed for non-iid data: • Base: Each client uses local data to train its local models without federated learning. • FedAvg [13] : The server aggregates all client models without any particular operations for non-iid data. • FedProx [17] : Allow partial information aggregation and add a proximal term to FedAvg. • FedPer [42] : Each client preserves some local layers. • FedBN [19] : Each client preserves the local batch normalization. The classification results for each client on PAMAP2 are shown in TABLE 2. From these results, we have the following observations: 1) Our method achieves the best results on average. It is obvious that our method significantly outperforms other methods with a remarkable improve- Fig. 7(b) and Fig. 7(d) . We use a line chart just for better visualization effects but not trends. ment (over 10% on average). 2) In some clients, the base method achieves the best test accuracy. As it can be seen from Fig. 5(d) , the distributions on the clients are very inconsistent, which inevitably leads to the various difficulty levels in different clients. And some distributions in the corresponding clients are so easy that only utilizing the local data can achieve the ideal effects. 3) FedBN does not achieve the desired results. This could be caused by that FedBN is designed for the feature shifts while our experiments are mainly set in the label shifts. The classification results for each client on three MedM-NIST datasets are shown in TABLE 3, 4, 5 respevtively. From these results, we have the following observations: 1) Our method significantly outperforms other methods with a remarkable improvement (over 3.5% on average). 2) For all these three benchmarks, Base achieves the worst average accuracy, which demonstrates Base without communicating with each other does not have enough information for these relatively difficult tasks. 3) FedBN achieves the second best results on all three benchmarks. This could be because that there exist feature shifts among clients. The classification results for each client on COVID-19 are shown in Fig. 6 . From these results, we have the following observations: 1) Our method achieves the best average accuracy which outperforms the second-best method FedPer by 6.3% on average accuracy. 2) FedBN gets the worst results. This demonstrates that FedBN is not good at dealing with label shifts where label distributions of each client are different, which is a challenging situation. FedBN does not consider the similarities among different clients. From Fig. 5(e) , we can see that label shifts are serious in COVID-19 since it only has four classes. We consider the influence of data splits and local iterations in this section. As shown in Fig. 8(a) , we evaluate Fedavg and FedAP on three MedMNIST benchmarks with two different splits: α = 0.1 and α = 0.05 respectively. Smaller α means distributions among clients are more different from each other. Fig. 8(a) demonstrates that the performance of Fedavg which does not consider data non-iid will drop when encountering clients with greater different distributions while our method is not affected much by the degree of data non-iid, which means our method may be more robust. Fig. 8(b) shows the influence of local iterations and total rounds on FedBN and our method. It is obvious that FedBN drops seriously with more local iterations and fewer communication rounds while our method declines slowly, which means when limiting communication costs, our method may be more effective. To demonstrate the effect of weighting which considers the similarities among the different clients, we compare the average accuracy on PAMAP2 and COVID-19 between the experiments with it and without it. Without weighting, our method degenerates to FedBN. From Fig. 7(a) , we can see that our method performs much better than FedBN which does not include the weighting part. Moreover, from Fig. 7(b) , we can see our method performs better than FedBN on all clients. These results demonstrate that our method with weighting can cope with the label shifts while FedBN cannot deal with this situation, which means our method is more applicable and effective. We illustrate the importance of preserving local batch normalization. Fig. 7(c) shows the average accuracy between the experiments with preserving local batch normalization and the experiments with sharing common batch normalization while Fig. 7(d) shows the results on each client. LBN means preserving local batch normalization while SBN means sharing common batch normalization. Obviously, the improvements are not particularly significant compared with weighting. This may be caused by there mainly exist the label shifts in our experiments while preserving local batch normalization is for the feature shifts. However, our method still has a slight improvement, indicating its superiority. Different Implementations of Our methods. In Method section, we propose three implementations of our method: FedAP, d-FedAP, and f-FedAP. The main differences among them are how to calculate W. In Fig. 9 (a) and Fig. 9 (b), we can see that all three implementations achieve better average accuracy on both PAMAP2 and COVID compared with FedAvg and FedBN. In addition, f-FedAP performs slightly worse than the other two variants, which may be because it only utilizes weighting during half rounds for fairness and the other half are for obtaining W. We study the convergence of our method. From Fig. 11 , we can see our method almost convergences in the 10th round. And in the actual experiments, 20 rounds are enough for our method while FedBN needs over 400 rounds. Then, we evaluate the parameter sensitivity of FedAP. Our method is affected by three parameters: local epochs, client number, and λ. We change one parameter and fix the other parameters. From Fig. 10(a) , we can see that our method still achieves acceptable results. When the client numbers increase, our method goes down which may be due to that few data in local clients make the weight estimation inaccurate. And we may take f-FedAP instead. In Fig. 10(b) , we can see our method is the best and it is descending with local epochs increasing, which may be caused that we keep the total number of the epochs unchanged and the communication among the clients are insufficient. Fig. 10 (c)-10(d) demonstrates λ slightly affects the average accuracy of our method while it can change the convergence rate. The results reveal that FedAP is more effective and robust than other methods under different parameters in most cases. In this article, we proposed FedAP, a weighted personalized federated transfer learning algorithm via batch normalization for healthcare. FedAP aggregates the data from different organizations without compromising privacy and security and achieves relatively personalized model learning through combing considering similarities and preserving local batch normalization. Experiments have evaluated the effectiveness of FedAP. In the future, we plan to apply FedAP to more personalized and flexible healthcare. And we will consider better ways to calculate and update similarities among clients. Recurrent dictionary learning for state-space models with an application in stock forecasting Rethinking zero-shot video classification: End-to-end training for realistic applications Local and global alignments for generalizable sensor-based human activity recognition Cross-domain activity recognition via substructural optimal transport Deep convolution network based emotion analysis towards mental health care Mime: Multilevel medical embedding of electronic health records for predictive healthcare Escaped: Efficient secure and private dot product framework for kernel-based machine learning algorithms with applications in healthcare Semanticdiscriminative mixup for generalizable sensor-based cross-domain activity recognition China's cyber power. Routledge The eu general data protection regulation (gdpr) Federated machine learning: Concept and applications Federated learning for healthcare informatics Communication-efficient learning of deep networks from decentralized data Opportunities and obstacles for deep learning in biology and medicine The application of deep learning in cancer prognosis prediction On the convergence of fedavg on non-iid data Federated optimization in heterogeneous networks Fedhealth: A federated transfer learning framework for wearable healthcare Fedbn: Federated learning on non-iid features via local batch normalization Diagnose like a pathologist: Weakly-supervised pathologist-tree network for slide-level immunohistochemical scoring Impacts of telemanipulation in robotic assisted surgery Eeg-based pathology detection for home health monitoring Learning calibrated medical image segmentation via multi-rater agreement modeling Gait-based identification for elderly users in wearable healthcare systems Pdassist: Objective and quantified symptom assessment of parkinson's disease via smartphone Personalized medicine: part 1: evolution and development into theranostics Deep mixed effect model using gaussian processes: A personalized and reliable prediction for healthcare A survey on federated learning: The journey from centralized to distributed on-site learning and beyond A survey on federated learning Federated multi-task learning Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption Secureboost: A lossless federated learning framework A secure federated transfer learning framework Group knowledge transfer: Federated learning of large cnns at the edge Federated continual learning with weighted inter-client transfer Differential privacy: A survey of results cpsgd: communication-efficient and differentially-private distributed sgd Adaptive gradient-based meta-learning methods Data-free knowledge distillation for heterogeneous federated learning Federated learning on non-iid data: A survey Inverse distance aggregation for federated learning with non-iid data Federated learning with personalization layers Personalized federated learning with moreau envelopes Salvaging federated learning by local adaptation Three approaches for personalization with applications to federated learning Think locally, act globally: Federated learning with local and global representations Adaptive personalized federated learning Clustered federated learning: Model-agnostic distributed multitask optimization under privacy constraints Batch normalization: Accelerating deep network training by reducing internal covariate shift Adaptive batch normalization for practical domain adaptation Batch normalization embeddings for deep domain generalization On the convergence of communication-efficient local sgd for federated learning Provably secure federated learning against malicious clients Domainspecific batch normalization for unsupervised domain adaptation Deep leakage from gradients See through gradients: Image batch recovery via gradinversion Introducing a new benchmarked dataset for activity monitoring Bayesian nonparametric federated learning of neural networks Quasi-global momentum: Accelerating decentralized deep learning on heterogeneous data Curated dataset for covid-19 posterior-anterior chest radiography images (x-rays) Medmnist classification decathlon: A lightweight automl benchmark for medical image analysis Medmnist v2: A large-scale lightweight benchmark for 2d and 3d biomedical image classification The liver tumor segmentation benchmark (lits) Efficient multiple organ localization in ct image using 3d region proposal network Gradient-based learning applied to document recognition Imagenet classification with deep convolutional neural networks