key: cord-0233126-k7yz3ah6
authors: Zhang, Longling; Shen, Bochen; Barnawi, Ahmed; Xi, Shan; Kumar, Neeraj; Wu, Yi
title: FedDPGAN: Federated Differentially Private Generative Adversarial Networks Framework for the Detection of COVID-19 Pneumonia
date: 2021-04-26
journal: nan
DOI: nan
sha: 5e0db2ff53e5f3bb3e3a2e24fb4897e4881bf073
doc_id: 233126
cord_uid: k7yz3ah6

Existing deep learning technologies generally learn the features of chest X-ray data generated by Generative Adversarial Networks (GAN) to diagnose COVID-19 pneumonia. However, the above methods have a critical challenge: data privacy. GAN will leak the semantic information of the training data which can be used to reconstruct the training samples by attackers, thereby this method will leak the privacy of the patient. Furthermore, for this reason that is the limitation of the training data sample, different hospitals jointly train the model through data sharing, which will also cause the privacy leakage. To solve this problem, we adopt the Federated Learning (FL) frame-work which is a new technique being used to protect the data privacy. Under the FL framework and Differentially Private thinking, we propose a FederatedDifferentially Private Generative Adversarial Network (FedDPGAN) to detectCOVID-19 pneumonia for sustainable smart cities. Specifically, we use DP-GAN to privately generate diverse patient data in which differential privacy technology is introduced to make sure the privacy protection of the semantic information of training dataset. Furthermore, we leverage FL to allow hospitals to collaboratively train COVID-19 models without sharing the original data. Under Independent and Identically Distributed (IID) and non-IID settings, The evaluation of the proposed model is on three types of chest X-ray (CXR) images dataset (COVID-19, normal, and normal pneumonia). A large number of the truthful reports make the verification of our model can effectively diagnose COVID-19 without compromising privacy.

COVID-19 is a highly contagious infectious disease which is caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2) in which it and its variants are still spreading worldwide (Cao 2020) . As COVID-19 is raging around the world, hospitals lack sufficient staff to diagnose suspected patients with COVID-19 disease. To improve the efficiency of diagnosing COVID-19, researchers generally develop deep learning systems to diagnose patients' chest X-ray images (Liang et al. 2020 , Ting et al. 2020 , Chang 2018 . Dhiman et al. (2021) proposed the J48 decision three model in order to discover the COVID-19 samples and gained the conclusion that the method is useful. Although deep learning technologies have greatly improved the efficiency of diagnosis, these technologies base on a great amount of annotated instances and labeled data, which is hard for hospitals to find enough training samples , Abdel-Basset et al. 2021a , 2020 , 2021b . Furthermore, due to the privacy of medical image dataset, hospitals find it hard to gather enough samples. Therefore, the above data availability issues have brought many severe challenges to such a diagnostic method.

Inspired by the Generative Adversarial Networks (GANs) techniques (Goodfellow et al. 2014) , researchers generally use GAN to generate diverse training data to solve the data availability issues. For example, Waheed et al. (2020) proposed a GAN model to generate CXR images by developing an ACGAN model. Bao et al. (2020) proposed a model named COVID-GAN to predict the impact of COVID-19 epidemic. However, the above-mentioned method using GAN as a generative model will leak the patient's private information. Since the GAN-based model remembers the semantic information of the distribution of a big deal number of training samples, it is easy for a hacker to use reverse engineering to reconstruct the patient's private data (Jordon et al. 2018 , Chen et al. 2020a . For instance, Gu et al. (2020) used reverse engineering technique to obtain the hidden encoding of the real image combining the corresponding feature maps of the middle layer generated in the generator and then reconstructed picture similar to the real image. Therefore, we need to seek a way to design a data generative model that does not reveal privacy.

Furthermore, researchers also utilize data sharing strategies to solve data availability issues (Cosgriff et al. 2020) . Moorthy et al. (2020) designed a data sharing strategy to enable the hospital to have enough labeled data to train deep learning models. Data sharing methods have been used by hospitals to expand the scale of training samples until the issuance of the General Data Protection Regulation (GDPR) (Voigt and Von dem Bussche 2017) . The reason why we can no longer use data sharing methods is that the GDPR stipulates that organizations do not allow arbitrary sharing of user data, because this will leak user privacy. In particular, COVID-19 data is medical data, which is very sensitive to patients. Therefore, we need to seek novel learning methods to avoid data sharing that would leak privacy and violate the law.

First, since Differential Privacy (DP) technology is generally used in privacy protection, previous works (Xie et al. 2018 , Jordon et al. 2018 , Hitaj et al. 2017 ) focused on using DP technology to alleviate the problem of GAN leaking privacy. For example, Xie et al. (2018) and Liu et al. (2019) proposed Differentially Private GAN (DPGAN) to protect the user privacy by leveraging (ε, δ)-DP technique. However, such DPGANbased models are only suitable for centralized learning rather than distributed learning. Second, due to GDPR's restrictions on data sharing strategies, data exists between hospitals in the form of "islands" (Liu et al. 2020e, Li et al. 2020 , which inspired researchers to develop a privacy-persevering distributed machine learning paradigm, i.e., Federated Learning (McMahan et al. 2017) . In this context, references (Li et al. 2019 , Ge et al. 2020 , Sheller et al. 2020 , Sui et al. 2020 applied FL in medical fields to develop some privacy-persevering applications such as Medical Imaging (Li et al. 2019) , Medical Relation Extraction (Sui et al. 2020) , and Medical Named Entity Recognition (Ge et al. 2020) .

We gain inspiration in these above methods and propose the Federated Differentially Private Generative Adversarial Network (FedDPGAN) model to detect COVID-19 pneumonia without compromising patient privacy. In this model, DPGAN is a key component of the proposed model and its function is by adding Gaussian noise in training gradient that protects the training samples' privacy. In particular, we introduce a federated learning framework and developed a distributed DPGAN to enable different hospitals to train COVID-19 diagnostic models collaboratively without data sharing. Specifically, with the help of FL and its aggregation mechanism, FedDPGAN can aggregate model parameters from medical institutions in different geographical locations to construct a global GAN model with well-preserved privacy. A large number of truthful data studies that FedDPGAN model is better than the existing centralized learning and FL-based models. The research contents and contributions include as:

-Unlike existing frameworks, we propose a Federated Differentially Private Generative Adversarial Network framework, which enables different hospitals can utilize the privacy-preserving data augmentation method, i.e., distributed DPGAN model to generated high-quality training samples which relieves the problem of lacking the training sample of COVID-19 then apply ResNet (He et al. 2016 ) model in FL to achieve high-precision COVID-19 detection.

-To address the data availability issues in detecting COVID-19, we design a distributed DPGAN by leveraging FL framework. In particular, we find that distributed DPGAN can alleviate the Non-independent and identically distributed (non-IID) issue in FL. Specifically, with the help of FL and its aggregation mechanism, FedDPGAN can construct a global and local data augmentation model by aggregating model parameters from medical institutions in different geographical locations to do different medical task. -We conduct extensive case studies on different pneumonia CXR images demonstrate that the proposed model FedDPGAN is better than the existing centralized learning and FL-based models. Specifically, our model shows that the best centralized model by 1.52% and the FL model by 0.49% in IID distribution. In non-IID data distribution, our model performance is 3.00% higher than the best FL model.

The article organizational structure is as follows. Section II is literature review about differential Private Generative Adversarial Networks and Federated Learning. Section III Section IV presents the FedDPGAN algorithm. In FedDPGAN, we applicate the FedAVG algorithm and differential private method makes introduction particularly. The final result compares between Sections V and V-F. The Section VI includes brief summary of the article.

In this section, we summarize the advanced work in the Differentially Private Generative Adversarial Networks (DPGAN) and Federated Learning (FL) fields.

How to combine deep learning technology with privacy protection technology is an emerging research direction. For example, many researchers apply differential privacy (DP) technologies to training model that ensures models security. Abadi et al. (2016b) developed a privacy-preserving deep learning model training paradigm by adding well-designed differential privacy noise (i.e., Gaussian noise) when computing the model gradients. Voigt and Von dem Bussche (2017) proposed a DP-based deep learning models to achieve privacypreserving disease classification application. In particular, Xie et al. (2018) suggested using DPGAN to protect the user privacy by leveraging (ε, δ) − DP techniques. Inspired by the above work, the current work focuses on using DP-GAN to develop some medical-related applications (Choi et al. 2017 , Xie et al. 2018 . Chang et al. (2020) utilized DPGAN to develop a medical imaging application.

However, such DPGAN-based models are only suitable for centralized learning rather than distributed learning.As a result, although researchers have solved the above problems and proposed some privacy preserving methods to protect the local model training, distributed DPGAN has not been studied yet. We propose the distributed DPGAN that can be applied in distributed learning-based applications.

Federated Learning (FL) (McMahan et al. 2017) will establish a data protection model, distributing dataset on each client machine, and aggregating locally-computed updates for a globally model which helps the participating clients to achieve experimental results similar to distributed data (Liu et al. 2020c , while maintaining the privacy of the training data . Therefore, as a promising distributed machine learning framework for privacy protection, FL has spawned many emerging applications such as Google Keyboard (Hard et al. 2018) , traffic flow prediction (Liu et al. 2020d) , anomaly detection , medical imaging (Sheller et al. 2020) , etc. In particular, medical institutions turn their attention to FL to develop a collaborative learning paradigm for privacy protection, thereby avoiding legal problems caused by data sharing. For example, Ge et al. (2020) applied Medical Named Entity Recognition (NER) in FL by utilizing different hospitals data to promote the hospitals training models. Chen et al. (2020b) proposed FL-QSAR model contributing the performance in QSAR prediction by collaborating among pharmaceutical institutions in drug discovery. However, the non-IID problem in FL hinders the rapid development of FL in the medical field.

To address this problem, many novel optimization algorithms are designed to overcome the adverse effects of non-IID. designed a mechanism based on deep Q learning to maximize the reward for overcoming this problem by adopting the method of selecting a subset of devices during each round of communication. However, such optimization algorithms are suitable for mobile IoT and cannot be applied in the medical field. The reason is that this algorithm requires complex client selection and training complex deep Qleaning models. In this paper, we find that GAN can address this problem by generating diverse training samples for FL training. In short, solving the non-IID problem in FL is the only way to apply FL in the medical field.

DP is widely used to maintain secure model and protects the training data, hence it is a privacy protection technology. The classic definition of DP provided below relies on the concept of so-called adjacent databases, that is, databases that differ in only one element (or sample, as it is the case in Machine Learning datasets). Therefore, the formal definition of DP is as follows:

Definition 1 ((ε, δ)−DP (Dwork et al. 2014 )) Any two adjacent datasets b and b are input, with the random algorithm K and the subset of outputs S hold that:

where is the privacy budget and δ is the failure probability. The smaller the δ, the closer the distribution of the data output by K in b, b datasets.

According to Definition 1, we can use DP to ensure the privacy of the semantic information of training data. But DP can not be directly applied in deep learning because DP is applicable to query functions in the database. Therefore, researchers generally apply DP in deep learning by adding the well-designed Gaussian noise that is in keeping with the definition of the differential privacy. The formal definition of Gaussian noise mechanism is as follows:

Definition 2 (Gaussian Noise Mechanism) S f is a random function sensitivity for the two adjacent dataset b, b , f (b) is the query function, and S 2 f · σ 2 is the variance of the Gaussian distribution. For each pair of adjacent inputs b and b , the Gaussian noise mechanism can be expressed as follows:

where N 0, S 2 f · σ 2 is the noise to disturb the distribution in 0 and standard deviation S f ·σ. Then we give the definition of the sensitivity S f of the random function f as follows:

Definition 3 (Sensitivity (Mironov 2017 )) The sensitivity of the random function f is as follows:

where prioritize the two adjacent datasets b, b .

According to Definition 2 and Definition 3, it can be seen that the core meaning of the parameter of sensitivity indicating the magnitude of noise is to indicate the effect of deleting deleted records in the data set on the query result. That is to say, the noise scale change of the Gaussian noise mechanism is proportional to the sensitivity. In particular, when δ 4 5 exp −(σ · ε) 2 /2 and ε < 1, the random function f satisfies the definition of (ε, δ)-DP after adding Gaussian noise.

Generative Adversarial Network (GAN) is an approach of the non-supervision model. GAN contains two parts: Generator N and Discriminator M . The generator randomly takes samples which is from potential space (latent space) and emulate the truthful traning data more and more. Input set of discriminator network is truthful output data, that distinguish training data from truthful samples as much as possible. Inspired by Game Theory, the generative model N make effort that capture distribution of the data, and the model to discrimate M tries to estimate the probability. They confront and constantly adjust the parameters during training. The ultimate goal of GAN is to make the discriminator unable to judge whether the output result of the generator is true or fake. We mentioned above that the GAN optimization problem is actually a game theory of N and M , that is, a minimal-maximization problem, so reference (Goodfellow et al. 2014) proposed an important approach is to solve this problem:

Definition 4 (Optimal Generator) For generator, it learn a distribution P g of dataset. The input data distribution P z (z), the generator G (z; θ g ) : z → x, and the discriminator G (z; θ g ) : z → x. Therefore, the optimal generator of a function can be expressed as follows:

(4) However, the above optimization model has the problems of vanishing gradient and samples diversity. Therefore, researchers put forward an optimized GAN, which solves the problem of gradient disappearance, which is defined as follows:

Definition 5 (Optimize GAN) (F γ , F g ) is the set of joint distribution γ include F γ and F g all possible combinations. F γ and F g are the edge distribution.

As a promising distributed machine learning framework for privacy protection, Federated Learning protects users' privacy data by keeping their local data locally and only periodically exchanges updates with the server which reduces their communication costs. The classic algorithm for optimizing federated optimization problems is Federated Averaging (FedAvg) (McMahan et al. 2017) . In FL, we consider a server S and a subset of the clients K participating in the training of a shared global model F (·). We assume that each client holds an IID or non-IID datasets D k . At the client side for data sample x, we let (ω; x) be the loss function, where ω ∈ R d denotes the model's trainable parameters. At the server side, we let L(ω) = E x∼D [ (ω; x)] be the loss function and let the server to optimize the following objective function: where K represents the clients, (p k , k p k = 1) indicates the relative influence of each client on the global model. In FL, the training is conducted between the server and clients side in a T -round communication rounds to minimize the above objective function following a three step protocol:

-(Step 1, Initialization): The t-th round of training, the server selects a subset from clients K to participate in training and broadcasts the initialized global model parameters ω t to each client. Repeat the above steps until the global model converges.

We start with an introduction of medical DPGAN model. Then we introduce the FedDPGAN to diagnosis COVID-19 CXR images from various medical platforms. Specifically, FedDPGAN is a client-server architecture in which server shares the global model and coordinates the client's local privacy protection DPGAN model with SGD optimizer.

In this subsection, we present the approaches to protect the users' private data, including the basic DP noise mechanism and FedDPGAN algorithm. In our framework, there is a central server S and a client set K with their local dataset D k . Next, we introduce in detail the functions of the components of the proposed model.

First, we present our distributed DPGAN mechanism. Since COVID-19 data is very private, we need to protect the privacy of patients when we use GAN to generate COVID-19 data. Therefore, following existing works , Xie et al. 2018 ), we adopt the method of adding Gaussian noise to the training gradient to ensure dataset security. Specifically, we adopt this way by adding random noise in discriminator which interferes with original data distribution, thereby protecting the privacy of the training data. Therefore, according to Definitions 1-3 we have:

where g σ is the noisy gradient, c g is the sensitivity of the gradient function and C is the clipping threshold. The gradient as a random variable approximately obeys Gaussian distribution, we have:

where µ is the mean of the random variable. To make the above random function f satisfy the definition of differential privacy, we assume that ∆f = max

is the L2-Sensitivity and ∀δ ∈ (0, 1), we have:

When we add Gaussain noise N 0, σ 2 n c 2 g I to the gradient, If and only if

where sample rate is q. we can say that this function f satisfies the definition of DP. To prevent the gradient from exploding, we use gradient clipping technology to make the gradient in the range of (−c, c).

Following the client-server architecture in FL, the central server S selects a random fraction C of clients. Then the clients aggregate a model broadcasting the model to each client. After global model is initialized, the client uploads generator model's parameter θ k t+1 to the aggregator. Then the aggregator accumulates k k=1 N k N θ k t+1 getting the average value of θ t+1 and updates to the global model.

The server coordinates multiple clients updating and shares a new global model into the clients. The specific steps between clients and server can be summarized in three steps: -(Step 1, FL Initialization): Firstly, central server picks the subset of the clients. Then it broadcasts the initialized generator parameter θ t . -(Step 2, Distributed DPGAN Training): After the initialization, the selected clients K perform training iterations of SGD over their local data. We add Gaussian noise N (0, σ 2 n c 2 g I) appropriately in a bounded range min(1, C gσ ) and then it will automatically clipping parameters to add noise when the next time update. The client updates the weight parameters ω and truncating in the range of (−c, c) after updating the weight parameters to optimize the discriminator. 

We represent the COVID-19 dataset characteristics and the COVID-19 diagnpsis model. First, we use the publicly available COVID-19 dataset as a benchmark dataset for evaluating the performance of the proposed model. This dataset consists of chest X-ray images of patients. To this end, we need to apply advanced Convolutional Neural Network (CNN) structure suitable for vision tasks in FedDPGAN to achieve higher performance. Therefore, in this paper, we use ResNet (He et al. 2016 ) model to diagnosis COVID-19 by classifying the chest X-ray images. Specifically, ResNet is a powerful emerging deep learning model that has attracted considerable attention in recent years. ResNet adds a direct connection channel to the network structure to quickly transfer the training gradient, which greatly improves the efficiency of model training. Specifically, we get the gradient after the lower layer network training the parameters, the gradient is direct to transmit to the upper layer network parameters, that is, the original input information is allowed to be directly transmitted to the upper layer. Also, the correlation of gradients decays with the increase of layers. It has been proved that RESNET can effectively reduce the attenuation of this correlation. This feature enables ResNet to build a deeper network structure, which is widely used in image Algorithm 1 Federated Differentially Private Generative Adversarial Network (FedDPGAN) Algorithm.

Input: P : The platform set; α: The learning rate; c: The clipping parameter; m: The minibatch size; N d : Discriminator iteration; Ng: Generator iteration; σn: Noise scale; Pg(Z): Noise prior; P data (X): Data generating distribution; Output: θ: DP generator; 1: Server executes: 2: Initialize generator parameters θ 0 and discriminator parameters ω 0 ; 3: for each client k ∈ P in parallel do 4: θ k t+1 ← ClientUpdate(θt, ωt, k) 5:

6: end for 7: ClientUpdate(θt, ωt, k): 8: for t 1 = 1, ..., Ng do 9: for t 2 = 1, ..., N d do 10:

A mini-batch {Z 1 , ..., Zm} from Pg(Z) 11:

A mini-batch {X 1 , ..., Xm} from P data (X) 12:

gω ← gω min(1, C gω ) + N (0, σ 2 n c 2 g I)(adding noise) 13:

ω ← clip(ω + α · SGD(ω, gω), −c, c) 14:

end for 15:

gσ ← gσ min(1, C gσ ) 16:

θ ← θ − α · SGD(θ, g θ ) 17: end for 18: return θ to server classification and is suitable for our COVID-19 medical image classification task.

We apply real-world CXR images that comprehensively evaluate the proposed model. First, we give details of the experimental environment, datasets, hyperparameters, and model details for this experiment. Second, we compare the other baseline model like advanced centralized learning model and electronic language-based model to determine the performance of our proposed model. Then, we compare the model performance with other benchmark models under simulated non-IID distributions. Finally, the influence of privacy parameters on model performance is discussed.

We evaluate the FedDPGAN on different pneumonia images dataset published by Cohen et al. (2020b,a) , where the dataset consists of normal lung images, ordinary pneumonia images, and COVID-19 pneumonia images. Specifically, such a dataset contains 2,000 normal images, 1,250 normal pneumonia images and 350 COVID-19 pneumonia images. As mentioned above, we can find that this data has the problem of class imbalance, which is why we use the model DPGAN to generate diverse data. We generate fake chest X-ray images through DPGAN model and mix them into our dataset. More details can be seen in Fig.2 . In addition, we adjusted the image size about 28 × 28 pixels that speeds up the convergence of the model.

In this experiment, we use FedResNet, centralized ResNet ( Second, we use the FedAvg aggregation algorithm as the updated aggregation algorithm in the proposed framework. The reason is that the training model under the classic FedAvg aggregation algorithm performs well in various tasks.

In the medical field, since the data of different hospitals are collected by different types of collection equipment, the data between different hospitals is non-IID. In this paper, to achieve non-IID data distribution, we assign two types of data, i.e., normal chest images and general pneumonia images to most clients and we put COVID-19 images into only a few clients. More details can be seen in Fig. 3 .

Normal Pneumonia COVID-19

Original Dataset Fig. 3 Overview of non-IID data allocation method.

In this experiment, we set K = 100 clients and place the equal size dataset in each client. At each round of communication, we randomly select C = 10% of clients to participate in training and set the the local batch size B = 10, local epochs E = 5, the learning rate of α = 0.01, and Gaussian noise generator generates the noise which default σ = 0.0001.

The implementation of the model is under the TensorFlow 2.0 (Abadi et al. 2016a) , which is a powerful framework released by Google that can run on the GPU for acceleration. PyTorch (Paszke et al. 2019 ) is an open-source ML toolkit that hastens everything ranging from research prototyping to production deployment. All of the experiments are conducted using PyTorch and TensorFlow with Ubuntu 16.04. Experiments are conducted on a Linux Server with NVIDIA GeForce RTX 2080TI GPU and an i7 9900K CPU.

We compare the model performance of the proposed FedDPGAN-based ResNet model with that of FedResNet, DPGAN-based ResNet, ResNet, CNN, MLP, KNN, and SVM models with the same simulation configuration. Among these seven competing methods, DPGAN-based FedResNet and FedResNet are federated models and the rest are centralized models. ResNet (He et al. 2016 ) has a good performance on image classification tasks and become a He et al. (2016) 92.93% × × CNN Tajbakhsh et al. (2016) 90.72% × × MLP Li et al. (2018) 92.05% × × KNN Park and Lee (2018) 92.78% × × SVM Morra et al. (2009) 89.41% × × widely used baseline model. The SVM, KNN, and MLP models are popular ML models for general classification applications (Hao Zhang et al. 2006) . Table. 1 indicates the accuracy of model and the compared model in diagnosing COVID-19. From all of the results in the table, we can see that the proposed model can not only protect privacy but also use data augmentation method to improve performance. Fig. 4(a) shows that the performance of DPGAN-based ResNet model is better than the best baseline method centralized ResNet by 0.84%, which is 4.36% higher than the worst centralized baseline method SVM and is better than the worst deep learning baseline method CNN which is 3.05% below. The reason is that: (1) The large amount of data generated enables ResNet models to learn more samples. (2) Model training with DPGAN makes ResNet more generalizable.

In federated learning, our model can achieve the best model performance which is 0.49% higher than the baseline method DPGAN-based ResNet model, as shown in Fig.4(b) . In a word, FedDPGAN-based ResNet model can achieve accurate without compromising privacy.

In this part, we quest the influence of data augmentation methods in IID and non-IID settings. First, we compare the performance of the FedResNet model under IID and non-IID settings. Fig. 5(a) shows that the prediction error of FedResNet model under the non-IID setting is 2.75% higher than under the IID setting. Experimental results show that non-IID distribution will affect the convergence performance of the model, resulting in the degradation of model. The reason is that the distribution of non-IID data will affect the convergence of the model, resulting in a decline in model performance.

Second, under non-IID settings, we make overall evaluation of FedRes-Net by using data augmentation method and the FedResNet model without this method. Fig.5(b) shows that the prediction error of FedDPGAN-based ResNet (using data augmentation method) model is 3.00% lower than Fe-dResNet (without using data augmentation method) under non-IID setting. The reason is that data augmentation methods can alleviate non-IID problems by generating diverse data. Such a method can make the convergence of federated learning training more stable.

Third, we compare the performance of FedDPGAN-based ResNet model under non-IID setting and FedResNet model under IID setting. Fig. 6(a) shows that the performance of the FedDPGAN-based ResNet model with non-IID setting is close to the FedResNet with IID setting. Furthermore, Fig. 6(b) shows the prediction error of the proposed models under IID and non-IID settings. In this case, our model is superior to the centralized ResNet which without privacy protection as shown in Fig. 7 . In a word, our model is more suitable for real-world medical application scenarios.

In this part, we assess the performance of the proposed model by setting different privacy budgets. Since the privacy budget σ indicates the scale of Gaussian noise (Fredrikson et al. 2015) , we need to explore the relationship between the scale of Gaussian noise and the performance of the proposed model. Specifically, we generate fake images of different quality by adjusting privacy budgets σ and then explore their performance. We enhance the privacy protection ability of datasets through improving σ. From Table 2 , we can draw a conclusion that the smaller size of σ the higher model performance we will gain. The experimental results show that we can adjust the privacy budget σ to achieve a balance between performance and privacy protection.

We propose the FedDPGAN model can be used in diagnosing COVID-19 under using CXR images without compromising privacy. Such a model enables hospitals in different geographic locations to collaboratively train a COVID-19 diagnostic model without sharing data. Specifically, our method solves two serious challenges currently encountered in diagnosing COVID-19: data availability and data privacy. First, in this model, we design a distributed DPGAN model to address data availability issue by generating COVID-19 image data. In particular, we use (ε, δ)-DP noise to protect the privacy of GAN' training gradient. Second, we introduce FL framework to protect patient's privacy and apply the ResNet model to diagnostic COVID-19. In the experiment part, we test the performance of FedDPGAN model on COVID-19 chest X-ray image datasets and compare it with centralized ResNet, CNN, MLP, KNN, and SVM models. The results show that our method has the best model performance and privacy protection ability compared with competing methods. Furthermore, the experimental results indicate that the GAN component in the proposed model can alleviate the non-IID problem in FL, which opens a window for the use of data augmentation to solve the non-IID problem.

In the future, we will design a more realistic semi-supervised federated learning system to solve the lack of data labeling and data privacy issues in the medical field. Furthermore, we will explore how data augmentation methods can improve the non-IID problem in FL, which motivates us to design more efficient data augmentation methods to solve non-IID problem in the future.

Tensorflow: A system for large-scale machine learning

Deep learning with differential privacy

Hsma woa: A hybrid novel slime mould algorithm with whale optimization algorithm for tackling the image segmentation problem of chest x-ray images

Fss-2019-ncov: A deep learning architecture for semi-supervised few-shot segmentation of covid-19 infection

An intelligent framework using disruptive technologies for covid-19 analysis

Covid-gan: Estimating human mobility responses to covid-19 pandemic through spatio-temporal conditional generative adversarial networks

Covid-19: immunopathology and its implications for therapy

Synthetic learning: Learn from distributed asynchronized discriminator gan without sharing medical image data

Computational intelligence for medical imaging simulations

Gan-leaks: A taxonomy of membership inference attacks against generative models

Fl-qsar: a federated learning based qsar prototype for collaborative drug discovery

Generating multilabel discrete patient records using generative adversarial networks

Covid-19 image data collection

Covid-19 image data collection: Prospective predictions are the future

Data sharing in the era of covid-19

Adopt: automatic deep learning and optimization-based approach for detection of novel coronavirus covid-19 disease using x-ray images

The algorithmic foundations of differential privacy

Model inversion attacks that exploit confidence information and basic countermeasures

Fedner: Medical named entity recognition with federated learning

Generative adversarial nets

Image processing using multi-code gan prior

Svm-knn: Discriminative nearest neighbor classification for visual category recognition

Federated learning for mobile keyboard prediction

Deep residual learning for image recognition

Deep models under the gan: information leakage from collaborative deep learning

Pate-gan: Generating synthetic data with differential privacy guarantees

Federated learning: Challenges, methods, and future directions

Privacy-preserving federated brain tumour segmentation

Novel effective connectivity inference using ultragroup constrained orthogonal forward regression and elastic multilayer perceptron classifier for mci identification

Early triage of critically ill covid-19 patients using deep learning

Ppgan: Privacy-preserving generative adversarial network

Deep anomaly detection for time-series data in industrial iot: A communication-efficient ondevice federated learning approach

Federated learning in the sky: Aerial-ground air quality sensing framework with uav swarms

A secure federated learning framework for 5g networks

Privacy-preserving traffic flow prediction: A federated learning approach

Federated learning for 6g communications: Challenges, methods, and future directions

Rc-ssfl: Towards robust and communication-efficient semi-supervised federated learning system

Towards communicationefficient and attack-resistant federated edge learning for industrial internet of things

Communication-efficient learning of deep networks from decentralized data

Rényi differential privacy

Data sharing for novel coronavirus (covid-19)

Comparison of adaboost and support vector machines for detecting alzheimer's disease through automated hippocampal segmentation

Privacy preserving k-nearest neighbor for medical diagnosis in ehealth cloud

Pytorch: An imperative style, high-performance deep learning library

Federated learning in medicine: facilitating multiinstitutional collaborations without sharing patient data

Feded: Federated learning via ensemble distillation for medical relation extraction

Convolutional neural networks for medical image analysis: Full training or fine tuning?

Digital technology and covid-19

The eu general data protection regulation (gdpr). A Practical Guide

Covidgan: Data augmentation using auxiliary classifier gan for improved covid-19 detection

Optimizing federated learning on non-iid data with reinforcement learning

Covid-net: A tailored deep convolutional neural network design for detection of covid-19 cases from chest x-ray images

Dominant data set selection algorithms for electricity consumption time-series data analysis based on affine transformation

Differentially private generative adversarial network

Ganobfuscator: Mitigating information leakage under gan via differential privacy