key: cord-0430019-r2on4n04
authors: Kumar, Rajesh; Khan, Abdullah Aman; Zhang, Sinmin; Kumar, Jay; Yang, Ting; Golalirz, Noorbakhash Amiri; Zakria,; Ali, Ikram; Shafiq, Sidra; Wang, WenYong
title: Blockchain-Federated-Learning and Deep Learning Models for COVID-19 detection using CT Imaging
date: 2020-07-10
journal: nan
DOI: 10.1109/jsen.2021.3076767
sha: 8fb7e0dd7b79af7900e076c4c9b841e164d2a09b
doc_id: 430019
cord_uid: r2on4n04

With the increase of COVID-19 cases worldwide, an effective way is required to diagnose COVID-19 patients. The primary problem in diagnosing COVID-19 patients is the shortage and reliability of testing kits, due to the quick spread of the virus, medical practitioners are facing difficulty identifying the positive cases. The second real-world problem is to share the data among the hospitals globally while keeping in view the privacy concerns of the organizations. Building a collaborative model and preserving privacy are major concerns for training a global deep learning model. This paper proposes a framework that collects a small amount of data from different sources (various hospitals) and trains a global deep learning model using blockchain based federated learning. Blockchain technology authenticates the data and federated learning trains the model globally while preserving the privacy of the organization. First, we propose a data normalization technique that deals with the heterogeneity of data as the data is gathered from different hospitals having different kinds of CT scanners. Secondly, we use Capsule Network-based segmentation and classification to detect COVID-19 patients. Thirdly, we design a method that can collaboratively train a global model using blockchain technology with federated learning while preserving privacy. Additionally, we collected real-life COVID-19 patients data, which is, open to the research community. The proposed framework can utilize up-to-date data which improves the recognition of computed tomography (CT) images. Finally, our results demonstrate a better performance to detect COVID-19 patients.

A new type of Coronavirus emerged in the city of Wuhan in China. Unfortunately, within weeks this coronavirus speared to several countries and it has been proven fatal. With an estimated 325,000 deaths in 4 months, the COVID-19 virus is considered one of the most deadly viruses [1] . The first confirmed death from COVID- 19 This novel coronavirus is the seventh type . Some coronavirus has mild symptoms while others such as SARS (severe acute respiratory or syndrome-related Coronavirus) , and MERS (middle east respiratory) are much more dangerous. Coronavirus can be easily transmitted between humans mainly through social interaction with an active patient or direct contact with an infected animal.

Without any warning, the number of COVID-19 patients suddenly started to increase leaving the governments and medical practitioners unprepared to handle such a situation. Consequently, there is a shortage of testing kit supplies, and many hospitals worldwide are facing a challenge in identifying COVID-19 positive patients. The following criteria are used to diagnose COVID-19 patients: Clinical symptoms, Epidemiological history, and Positive CT and Pathogenic Testing. Radiological imaging is also one of the COVID-19s' major diagnosis method. Most COVID-19 cases exhibit common features (visual symptoms) on CT images, including early ground-glass opacity, and late-stage pulmonary consolidation. There is also a rounded morphology and a peripheral lung distribution [2] , [3] . While typical CT images may help to screen suspected COVID-19 cases at an early stage, CT images of various viral pneumonia are similar and overlap with other infectious and inflammatory lung diseases [4] , [5] , [6] , [7] . It is worth noting that radiologists distinguish between COVID-19 and other viral pneumonia. The previous work focus on to diagnose the COVID-19 patients using computed tomography [8] , [9] , [10] , [11] , [12] . Therefore, these previous work do not focus on collaboratively learn model and also do not consider the privacy issue of the hospitals.

The motivation of our study is inspired by some fundamental problems. COVID-19 is spreading rapidly having different symptoms with different symptoms with different patients. Thus, hospitals can share their data for the accurate diagnosis of COVID-19 patients. Sharing data securely (without leakage the privacy of users) and train the global model for detection of the positive cases, is a challenging task. Moreover, the existing studies are not capable enough to share the data collaboratively and train the model accurately. Collecting data from various sources is a big challenge and a bottleneck in the advancement of AI-based techniques. The availability of such confidential data is not possible due to the absence of privacy-preserving approach for the health care centers [13] , [14] , [15] , [16] , [17] , [18] , [19] , [20] , [21] , [22] . Furthermore, to train the deep learning model collaboratively, over a public network, is another challenge.

The latest report of the World Health Organization reveals that COVID-19 is an infectious disease that primarily affects the lungs such as SARS, giving them a honeycomb-like appearance [23] . Even after recovering from COVID-19, some patients have to live with permanent lung damage [24] . First motivation of our work to find small infected areas in the lungs by COVID-19, it benefits the professional radiologists do not missed infection. Second motivations to share the data to train a better deep learning model, while keeping in view the privacy concern of the data providers. The advantage to share the data is feasible to develop a deep learning-based model for automatic detection of COVID-19.

• The First challenge is the availability of confidential data is not possible due to the absence of privacy. • The second challenge is to train the global model (Federated model) via blockchain network. • The third challenge is the unavailability of a dataset, it is quite challenging to collect enough amount of training data and make it better predication model with the privacy concerns of hospitals. • Finally, to recognize the patterns of the lung screening of COVID-19 is also a challenging task.

In this paper, we propose a framework that builds an accurate collabroative model using data from multipile hospitals to recognize CT scans of COVID-19 patients. The proposed blockchain based federated learning framework learns collaboratively from multiple hospitals having different kinds of CT scanners. Firstly, we propose a data normalization process to normalize the data obtained from the different sources. Then we employ deep learning models to recognize the COVID-19 patterns of lung CT scans. We use segcaps for image segmentation and further train a Capsule Network [25] for better generalization. We found the capsule network achieved better performance as compared to other learning models. Finally, we train the global model and solve the privacy issue using the federated learning technique. The proposed framework collects the data and collaboratively trains an intelligent model then shares this intelligent model in a decentralized manner over the public network.

By using federated learning, the hospitals keep can their data private and share only weights and gradients while blockchain technology is used to distribute the data among the hospitals. The decentralized architecture for data sharing among multiple hospitals shares the data securely without leakage the privacy of the hospitals. Additionally, this article introduces a new dataset, named CC-19, related to the latest family of coronavirus i.e. COVID-19. The dataset contains the Computed Tomography scan (CT) slices for 89 subjects. Out of these 89 subjects, 68 were confirmed patients (positive cases) of the COVID-19 virus, and the rest 21 were found to be negative cases. The dataset contains 34,006 CT scan slices (images) belonging to 89 subjects. The data for these patients were collected on various days having about 231 CT scan volumes in total.

The main contributions of the paper are not limited to: 1) This paper proposes a data normalization technique (to accurately train the federated learning model) as the data is collected from different sources (i.e, Hospitals) and devices (CT scanner machines).

2) The proposed technique detects the patterns of COVID-19 from the lung CT scans using Capsule Network based segmentation and classification. 3) This paper proposed a blockchain empowered method to collect the dataset collaboratively from different sources while keeping in view the organizations' privacy concerns. Federated learning employed is to protect the organizations' data privacy and train the global deep learning model using less accurate local models. 4) Additionally, we introduce a new dataset that consists of 89 subjects out of which 68 subjects are confirmed COVID-19 patients. The dataset contains 34,006 CT scan slices (images) belonging to 89 subjects.

The proposed approach is practical for big data analysis (i.e., lung CT scans), and it efficiently process the data using blockchain and deep learning model. Consider a scenario of the real-time use case of a hospital having some new symptoms of the COVID-19 virus. To find out new symptoms or new information regarding COVID-19, the data needs to be stored on a decentralized network without leakage of the privacy of the patients and securely share the knowledge of the latest symptoms. The federated learning secures data through the decentralized network and distributes the training task to train a better model using the latest available patients data.

The proposed framework collects a small amount of data from various sources and to train the deep learning model, The blockchain combined the each trained model via federated learning. The trained model blockchain network provides more better and accurate predication beacuse it holds the the newest information about COVID-19 symptoms.

The rest of this paper is organized as follows: In Section II, this paper proposes a Capsule Network based segmentation and classification model and blockchain based federated learning for secure data sharing without leakage the privacy. In Section III, we describe the dataset and experiment results for our proposed scheme. In Section IV, we present an overview of the studies related to deep learning, COVID-19, and federated learning. Finally, Section V concludes this paper.

In reality, hospitals and other relevant organizations are reluctant to share their patients' data to preserve privacy of the patients. Moreover, it is a known fact that deep learning models required a large amount of data to train a model that can handle real-world problems. For that reason, this paper considers collecting multiple hospitals' data without leakage of data privacy. This paper proposes blockchain based federated learning framework to train and share a collaborative model. Federated learning is used to combine the weights of the locally trained model by the hospital referred as to a global or collaborative model.

As the data is collected from multiple sources, for that reason, we design a normalization technique to deal with different kinds of CT scanners (Brilliance ICT, Samatom definition Edge, Brilliance 16P CT) data. After normalization of the data, we segmented the images and then train the model for recognization of COVID-19 suspects using the Capsule Network.

We divided the methodology into two parts i) Local model ii) Federated learning. First, we solve the problem of heterogeneous CT scan data. Then, we use the Segcps [26] for segmentation and train the local model to detect the patterns of COVID-19. Finally, we share the local model weights to the blockchain network to train the global model.

A major issue with federated learning is to deal with input data from multiple sources and various machines with different parameters. Most of the existing techniques are not efficient enough to deal with this problem for federated learning. To solve this issue, we propose a normalization technique that can deal with any CT scan and bring the images to the same standard. Because of this normalization, federated learning can deal with the heterogeneity of the dataset and train a better learning model. The normalization method has two phases i) spatial normalization, and ii) signal normalization. Spatial normalization deal with the dimension and resolution of the CT scan. Signal normalization deals with the intensity of each voxel of the CT scanners which is based on the lung window. 1) Spatial Normalization: As already discussed, different CT scanners have different parameters for CT scans such as high-resolution scan volume is 0.31 × 0.31 × 0.31 mm 3 and low resolution 0.98 × 0.98 × 2.5 mm 3 . In our case, we used federated learning for the data obtained from multiple sources. We use the standardized volume 334 × 334 × 512 mm 3 for human lung. Moreover, we use the Lanczos interpolation [27] to resale the standard resolutions.

2) Signal Normalization: As every CT scan has Hounsfield Units (HU) and the data collected from different hospitals have different HU (i.e.,-400 HU to -600 HU). In medical practice, radiologists set the lung window for every CT scanner. There are different types of windows the one is window Level (W L) and the other is window width (W W ) are mostly used. Where W L is defined as the central signal value and W W defines the width of this window. The proposed Equation 1 represents the upper bound and the lower bound of the voxel.

I original is the intensity of the data and I normalized is the final intensity. We set the range of the lung window is [−0.5, 0.5] to standardized the embedding space.

This section proposes the segmentation based on [26] . Further, the Capsule Network is trained for the detection of COVID-19 using the segmented CT scan images.

1) Segmentation: We take 2D slices for the segmentation. A standardized volume 334 × 334 × 512 mm 3 for human lung segmentation is used. Each CT scan volume (3D) has three planes XY , XZ, and Y Z. We formalize the XZ or Y X planes to easily differentiated the lung infection (as shown in first row of Figure 10 ).

Where prob B defines as probability and B is the infection point. g is the method to define the voxel of three dimensions views. g is aggregation function to predict the P xy , P yz , and P xz voxel. Thus, the traditional Equation is time-consuming, so, we modify the Equation 2 to:

2) Capsule Networks for Classification of COVID-19: A deep learning framework usually has a feature extraction pipeline that estimates and extracts prominent features. Afterward, a learning process such as MLP (multi-layer perceptron) is applied to learn the appropriate class on the extracted features. Over the past few years, researchers have used and fine-tuned the feature extraction pipeline of these robust deep learning frameworks. We design a Capsule Network because it achieves high performance in detecting diseases in the medical images. The previous technique needs lots of data to train a more accurate model. The Capsule Network improves the deep learning models' performance inside the internal layers of the deep learning models. The architecture of our modified Capsule Network is shown in Figure 1 , which is similar to Hinton's Capsule Network. The Capsule Network contains four layers: i)convolutional layer, ii) hidden layer, iii) PrimaryCaps layer, and iv) DigitCaps layer.

A capsule is created when input features are in the lower layer. Each layer of the Capsule Network contains many capsules. To train the Capsule Network, the activation layer represents instantiate parameters of the entity and compute the length of the Capsule Network to re-compute the scores for the feature part. Capsule Networks is a better replacement for Artificial Neural Network (ANN). Here, the capsule acts as a neuron. Unlike ANN where a neuron outputs a scalar value, Capsule Networks tend to describe an image at a component level and associate a vector with each component. The probability of the existence of a component is represented by this vector's length and replaces max-pooling with "routing by agreement". As capsules are independents the probability of correct classification increases when multiple capsules agree on the same parameters. Every component can be represented by a pose vector U i rotated and translated by a weighted matrix W i,j to a vectorû i|j . Moreover, the prediction vector can be calculated as:û

The next higher level capsule i.e. s j processes the sum of predictions from all the lower level capsules with c i,j as a coupling coefficient. Capsules s j can be represented as:

where c i,j can be represented as a routing softmax function given as:

As can be seen from the Figure 1 , the parameter c, A squashing function is applied to scale the output probabilities between 0 and 1 which can be represented as:

For further details, refer to the original study [25] . We perform the routing by agreement using the Algorithm 1

Algorithm 1 Routing algorithm. C is an array after softmax, and it can be determined by dynamic routing by agreement. There are quite a few introductions to this method, the main meaning is that through several iterations, the distribution of the output of the lowlevel capsule to the high-level capsule is gradually adjusted according to the output of the high-level capsule, and finally an ideal distribution will be reached. The detailed training algorithm is shown in the paper [27] . We use the Capsule Network to train the model and compare it with the state of art deep learning networks. Table 1 shows the difference between traditional and Capsule Network. In section III, we compare traditional deep learning with the Capsule Network classifiers.

In this section, we consider a decentralized data sharing scenario with multiple hospitals. Each hospital is willing to share its model (weights), our proposed method assists in hiding the user data and share the model over a decentralized network. Further, federated learning is used to combine the net effect of different models shared by different hospitals. The base architecture of federated learning is shown in Figure  2 . The main goal is to utilize federated learning to share the data among the hospitals without leakage of privacy. We consider H is the hospitals and d is the union dataset. Each of the hospital H agrees to share the data without leakage of private information. First, we train the global model M , without leaking the privacy, then we alter a small part of 

Squash aj the randomized mechanism through 1) Random sub-sampling, 2) Distorting. The random sub-sampling model can get final weights R(M ) and share the data globally. 1) Random subsampling: Let H be the number of hospitals. In every round of the communication, a subset of X t of size m t ≤ H is sampled. Then, distribute the weights (wt) among the hospitals. The blockchain stores the local hospitals models w H mt H=0 . The difference between the local and distributed model is referred to as H s update ∆w k = w k − w t . The updated weights are sent to the decentralized network for each round.

2) Distorting: A Gaussian method was utilized to disorder the sum of updates. It requires information about the sensitivity information to sum all operations. The sensitivity of the updated version is measured by

. Scaling helps to ensure the limited second standard ∀H, ∆w H 2 < S. The sensitivity of the update bound operation by S. The updated model is defined as:

Gaussian mechanism approximation sum of updates

Sum of update sclipped at S (8) We noticed that the distortion of 1/mt in the Gaussian process is regulated by the S 2 σ 2 /m noise variance. But this distortion should not surpass a certain amount. Otherwise, the additional noise removes too much detail from the subsampled average and no learning improvement can be gained. Gaussian mechanism and sub-sampling are distributed processes. Nevertheless, it is used for gradient averaging covering a single data point gradient at each iteration. This m and σ often describes the lack of privacy suffered when the randomized process produces an average estimate.

The preference will then be based on an upper limit at the r-distortion rate and a lower limit on the number of 16 Step 1: Blockchain sharing the initial learning

Step 3: Averaging the new learning

Step 2: Uploading the local models and generating new learning using private data

Step 1: Blockchain sharing the initial learning data provider sub-samples. Therefore describe the V c variance between patients as a measure of similarity among hospital updates shown in the below Equation. The parameters (x, y) is the throughout of H hospitals is defined as:

where µ x,y = 1 H H H=1 ∆w H

x,y V c is defined as the sum of all variances in the update matrix:

Finally, the U s update can be expressed as:

∆w x,y describes the (x, y) − th parameters of the updates from ∆w ∈ R b×a for the communication round. Moreover, S defines trade-off. If S has a smaller value then the noise will be smaller.

As patients' data is sensitive and the volume is high, placing data on the blockchain with its limited storage space is expensive both financially and for computational resources. Thus, the actual CT scan data is stored by the hospital, while, blockchain helps to retrieve the trained model. When a new hospital provides the data, it stores a transaction in the block to verify the owner of the data. The hospital data include the type of data and the size of the data. Each transaction for data sharing and retrieval process is shown in Figure 3 . 

Provided two nodes H i and H j with unique IDs H i (id) and H j (id) shown in the Equation 13 .

To secure the privacy of data in a decentralized manner the randomized method for two hospitals nodes is shown in Equation 14 . Where R andR is the neighboring records of data. O is the outcome set of data. A(R) ∈ S achieves the privacy of the data.

However, to achieve data privacy for multiple hospitals, Laplace is applied for the local model training (m i ):

where s shows the sensitivity as expressed by Equation 16 :

The consensus algorithm is executed to train the global model by using the local models. As all nodes collaboratively train the model, we provide proof of work to share the data between the different nodes. During the training phase, the consensus algorithm checks the quality local models and the accuracy is measured by mean absolute error (MAE). 3) The leader broadcast the block the node to the H i and H j . 4) Verify the H i and H j and wait for the approval. 5) Finally, store the blocks in the retrieval blockchain database. 1) Data Sharing Process: Current approaches use encryption to protect data. It is a risk for data providers to share personal data because of certain security attacks. A simple solution is to transmit the data to the requester with legitimate details and to preserve the data holders' privacy. Instead of sharing original data, data providers such as hospitals, exchange only the learned models with the requester. Figure  4 shows the process of data sharing. The nodes are com- municating with each other and the consensus process learns from federated data. The provider, and requester search and store the data into the blockchain nodes. More precisely, the steps of data sharing are shown in Figure 4 . To integrate the blockchain with federated learning retrieved data securely for the multiple world-wide hospitals which can provide an effective prediction.

To protect the privacy of the data, we share the trained model instead of the original image data. The objective of the proposed architecture is to train the global model by using locally trained models. The secure data sharing is illustrated in Figure 5 . In the first phase, we select the training data and then use the private federated learning algorithm for collaborative multi-hospital learning. In other words, the hospital shares the locally trained model weights to the blockchain network and federated learning combines the local model into a global model.

In the past, Artificial intelligence (AI) has gained a reputable position in the field of clinical medicine. And in such chaotic situations, AI can help the medical practitioners to validate the disease detection process, hence increasing the reliability of the diagnosis methods and save precious human lives. Currently, the biggest challenge faced by AI-based methods is the availability of relevant data. AI cannot progress without the availability of abundant and relevant data. In this paper, we collected the data CT scan data for 34006 slices from the 3 different hospitals. The data is scanned by 6 different scanners shown in Table II . In addition, we collected the third party dataset [29] , [30] publicly available via GitHub (https://github.com/abdkhanstd/ COVID-19). The collected data set contains the Computed Tomography scan (CT) slices for 89 subjects. Out of these 89 subjects, 68 were confirmed patients (positive cases) of the COVID-19 virus, and the rest 21 were found to be negative cases. The proposed dataset CC-19 contains 34,006 CT scan slices (images) belonging to 89 subjects out of which 28,395 CT scan slices belong to positive COVID-19 patients. Figure  6 shows some 2D slices taken from CT scans of the CC-19 dataset. Moreover, some selected 3D samples from the dataset are shown in Figure 7 . The Hounsfield unit (HU) is the measurement of CT scans radiodensity as shown in Table  III . Usually, CT scanning devices are carefully calibrated to measure the HU units. This unit can be employed to extract the relevant information in CT Scan slices. The CT scan slices have cylindrical scanning bounds. For unknown reasons, the pixel information that lies outside this cylindrical bound was automatically discarded by the CT scanner system. But fortunately, this discarding of outer pixels eliminates some steps for preprocessing.

Collecting dataset is a challenging task as there are many ethical and privacy concerns observed the hospitals and medical practitioners. Keeping in view these norms, this dataset was collected in the earlier days of the epidemic from various hospitals in Chengdu, the capital city of Sichuan. Initially, the dataset was in an extremely raw form. We preprocessed the data and found many discrepancies with most of the collected CT scans. Finally, the CT scans, with discrepancies, were discarded from the proposed dataset. All the CT scans are different from each other i.e. CT scans have a different number of slices for different patients. We believe that the possible reasons behind the altering number of slices are the difference in height and body structure of the patients. Moreover, upon inspecting various literature, we found that the volume of the lungs of an adult female is, comparatively, ten to twelve percent smaller than a male of the same height and age [31] .

Specificity and sensitivity are the abilities of a model that how correctly the model identifies a subject with disease and without a disease. In our case, it is critical to detect a COVID-19 patient as missing a COVID-19 patient can have disastrous consequences. The formulas of the measures are given as follows:

Precision = T P T P + F P

Substance Hounsfield Unit (HU) 1

Air -1000 2 Bone +700 to +3000 3

Lungs -500 4

Water 0 5

Kidney 30 6 Blood +30 to +45 7

Grey matter +37 to +45 8 Liver +40 to +60 9

White matter +20 to +30 10 Muscle +10 to +40 11

Soft Tissue +100 to +300 12

Fat -100 to - 50 13 Cerebrospinal fluid(CSF) 15 sensitivity=recall = T P T P + F N specificity = T N T N + F P Total accuarcy = T P + T N T P + T N + F P + F N A medical diagnosis based system needs to have high sensitivity and recall. We present a comprehensive overview of various famous deep learning frameworks. The results presented in Table IV indicate the superiority of our proposed method.

We performed comprehensive experiments using different kinds of deep learning models i.e.,(VGG16, AlexNet, Inception V3, ResNet 50-152 layers, MobileNet, DenseNet). We Fig. 7 : This figure shows some selected samples from the "CC-19 dataset". Each row represents different patient samples with various Hounsfield Unit (HU) for CT scans. The first there columns represent the XY, XZ, and YX plane of the 3D-volume respectively. The fourth column represents a whole 3D-Volume followed by a bone structure in the fifth column. used deep learning models and different layers for comparing the performance models on the COVID-19 dataset, which is shown in Table IV . We evaluate the performance of the Capsule Network for the detection of COVID-19 lung CT image accuracy. Figure 8 shows the deep learning models; the Capsule Network achieves high sensitivity and less specificity, we achieved high detection performance through the Capsule Network. Figure 9 shows the Segcaps based Capsule Network achieved the best performance and provide the highest sensitivity and lowest specificity. These models were tested using three different test lists containing about 11,450 CT scan slices. The COVID-19 infection segmentation shown in Figure 10 , indicates our method outperforms the baseline methods. The proposed techniques' results are close to the ground truth. In contrast, U Net++s' performance is near to our results.

As the dataset has been gathered from different sources and different hospitals having various kinds of machines. To measure the performance of federated learning, we distribute the datasets over three hospitals. In this model, multiple hospitals can share the data and learn from federated learning. The performance of our proposed model is distributed shown in Figure 11 , accuracy was changed when the hospitals or providers were increases. It is better to use more data providers for better results. Figure 12 shows that model loss convergence. As we can see in Figure 11 the accuracy does not change smoothly because the samples from different hospitals are not the same. The accuracy depends on the number of patients or slices. The same is the process for the model loss. Also, it can be seen that the number of providers is increasing. The global model aggregate the local models, each local model normalizes the data before training a local model. The number of hospitals affects the performance of the collaborative model. Additionally, the run time is shown in Figure 13 . It varies for the datasets and number of iteration in different sub-datasets. We compare the federated learning with the local model as shown in Figure 9 . The local model is trained on the whole dataset and the federated learning model learns from the local models. Figure 11 and 12 indicates that performance increases significantly when data providers are increasing. However, federated learning does not affect the accuracy but it achieves privacy while sharing the data.

• Differences-Privacy: Figure 5 describes the differences in privacy analysis, where a principled approach that enables organizations to learn from most data while ensuring that these results do not allow data to be distinguished or re-identified by any individual. On the other hand, Equation 14 obtains the value in the data to ensure strong data security. • Trust: The decentralized trust mechanism of the blockchain allows everything to run automatically through a preset program, which improves data security.

Relying on a strict set of algorithms, the decentralized blockchain technology can ensure that the data is true, accurate, transparent, traceable, and cannot be tampered with. control their data. Actual data is uploaded with the signature of the owner in the blockchain database. The owner has the right to control and change the policy of the data using the smart contract. The blockchain uses cryptographic algorithms that enable the security of the data.

A lot of previous studies have been carried out for detecting the COVID-19 such as [41] , [8] , [42] , [43] , [44] , these techniques do not consider data sharing to train the better prediction model. However, some techniques used GAN and data augmentation for generating fake images. The performance of such methods is not reliable in the case of medical images. Due to the small number of data patients [45] the data analytic is difficult. Our proposed model collects a huge amount of real-time data to build a better prediction model. Firstly, we compare with the state of art studies and compare them with the deep learning models shown in Figure V . Moreover, we compare federated learning with the state-ofart deep learning models such as VGG, RESNET, ImageNet, MobileNet, Desnet, Capsule Network. The results show the accuracy is similar to train the local model with the whole dataset or divide data into different hospitals and combine the model weights using blockchain based federated learning.

Finally, we compare our work with blockchain-based data sharing techniques. [46] proposed a deep learning and blockchain-based technique to share the medical images, but the main weakness of the model is that it is not based on federated learning and do not aggregates the neural network weights over the blockchain. Moreover, [47] , [48] design a framework based on federated learning but they only consider share vehicle data. Our proposed framework trains the global model to collect data from different hospitals and train a collaborative global model. 

Artificial Intelligence (AI) based techniques have played an essential role in the domain of medical image processing, computer-aided diagnosis, image interpretation, image fusion, image registration, image segmentation, image-guided therapy, image retrieval, and analysis techniques. Artificial Intelligence aids in extracting information from the images and represent information effectively and efficiently. Artificial Intelligence facilitates and assists doctors and other medical practitioners to diagnose various diseases while eliminating human error and increasing the speed and accuracy of detection. These techniques enhance the abilities of doctors and researchers to understand how to analyze the generic variations which cause the disease in the first place. Deep learning is the core technology of the rising artificial intelligence and has reported significantly diagnostic accuracy in medical imaging for automatic detection of lung diseases [59] , [60] , [61] . Deep learning surpassed human-performance on the ImageNet image classification task, with one million images for training in 2015 [62] , which showed dermatologist-level performance on classifying skin lesions in 2017 [63] and obtained re- markable results for lung cancer screening in 2019 [59] . Pneumonia can be diagnosed using Computed Tomography (CT) scans of the chest of the subject. Artificial Intelligence (AI) based automated CT image analysis tools for the detection, quantification, and monitoring of coronavirus and to distinguish patients with coronavirus from disease-free have been developed [6] . In a study by Fei et al., they developed a deep learning-based system for automatic segmentation of all lung and infection sites using chest CT [7] . Xiaowei et al. aimed to establish an early screening model to distinguish COVID-19 pneumonia and Influenza-A viral pneumonia from healthy cases using pulmonary CT images and deep learning techniques [24] . Shuai et al., their study was based on the COVID-19 radiographic changes from CT images. They developed a deep learning method that can extract the graphical features of COVID-19 to provide a clinical diagnosis before pathogenic testing and thus save critical time for the disease diagnosis. Recently, C. Zheng et al. [11] developed a deep learning-based model for automatic COVID-19 detector using 3D CT volumes.

Federated learning was proposed by McMahan et al. [64] to learn from the shared model while protecting the privacy of data. In this context, federated learning is used to secure data and aggregates the parameters for the multiple organizations [65] , [66] , [67] , [68] , [69] . The hospitals can share the dataset during training and information about their dataset is revealed through analyzing the distributed model [70] , [16] , [71] , [72] , [73] , [74] , [75] , [76] , [77] . This decentralized approach to train models preserves privacy and security. A lot of research has been done in federated learning for transferring the matrices of weights of deep neural networks. The previous studies do not consider to share the medical data without compromising the privacy of organizations [78] , [79] . In this article, we simulate our model to collect the data from different sources using federated learning combined with blockchain technology while sharing data without privacy leakage.

This paper proposed a framework that can utilize up-todate data to improve the recognition of computed tomography (CT) images and share the data among hospitals while preserving privacy. The data normalization technique deals with the heterogeneity of data. Further, Capsule Network based segmentation and classification is used to detect COVID-19 patients along with a method that can collaboratively train a global model using blockchain technology with federated learning. Also, we collected real-life COVID-19 patients' data and made it publically available to the research community. Extensive experiments were performed on various deep learning models for training and testing the datasets. The Capsule Network achieved the highest accuracy. The proposed model is smart as it can learn from the shared sources or data among various hospitals. Conclusively, the proposed model can help in detecting COVID-19 patients using lung screening as hospitals share their private data to train a global and better model.

The authors report no declarations of interest.

CT imaging features of 2019 novel coronavirus (2019-NCoV)

The Lancet

Deep Learning-based Image Conversion of CT Reconstruction Kernels Improves Radiomics Reproducibility for Pulmonary Nodules or Masses

Application of artificial neural networks for automated analysis of cystoscopic images: a review of the current status and future prospects

Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

Clinical Characteristics of 138 Hospitalized Patients with 2019 Novel Coronavirus-Infected Pneumonia in

Epidemiological and clinical characteristics of 99 cases of 2019 novel coronavirus pneumonia in Wuhan, China: a descriptive study

Deep Learning-based Detection for COVID-19 from Chest CT using Weak Label

Covidgan: Data augmentation using auxiliary classifier gan for improved covid-19 detection

A first look at privacy analysis of covid-19 contact tracing mobile applications

Beeptrace: Blockchain-enabled privacy-preserving contact tracing for covid-19 pandemic and beyond

Semi-supervised distributed learning with non-iid data for aiot service platform

Blockchain and Federated Learning for Privacy-Preserved Data Sharing in Industrial IoT

Realizing the heterogeneity: A self-organized federated learning framework for iot

Towards communication-efficient federated learning in the internet of things with edge computing

Federated learning via overthe-air computation

Cefl: Online admission control, data scheduling and accuracy tuning for cost-efficient federated learning across edge nodes

Verifynet: Secure and verifiable federated learning

Enabling efficient and geometric range query with access control over encrypted spatial data

Imaging changes of severe COVID-19 pneumonia in advanced stage

Correlation of Chest CT and RT-PCR Testing in Coronavirus Disease 2019 (COVID-19) in China: A Report of 1014 Cases

Dynamic routing between capsules

Capsules for object segmentation

Capsule networks showed excellent performance in the classification of herg blockers/nonblockers

Kademlia: A peer-to-peer information system based on the XOR metric

A fully automated deep learning-based network for detecting covid-19 from a new and large lung ct scan dataset

Covid-ct-dataset: a ct scan dataset about covid-19

Sex differences in thoracic dimensions and configuration

Very deep convolutional networks for large-scale image recognition

Imagenet classification with deep convolutional neural networks

Xception: Deep learning with depthwise separable convolutions

Deep residual learning for image recognition

Identity mappings in deep residual networks

Rethinking the inception architecture for computer vision

Mobilenets: Efficient convolutional neural networks for mobile vision applications

2018 IEEE Conference on Computer Vision and Pattern Recognition

Densely connected convolutional networks

A Network-Based Stochastic Epidemic Simulator: Controlling COVID-19 with Region-Specific Policies

A Rapid, Accurate and Machine-Agnostic Segmentation and Quantification Method for CT-Based COVID-19 Diagnosis

A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization from Chest CT

Adaptive Feature Selection Guided Deep Forest for COVID-19 Classification with Chest CT

Cardiac involvement in a patient with coronavirus disease 2019

An integration of blockchain and ai for secure data sharing and detection of ct images for the hospitals

Federated tensor mining for secure industrial internet of things

Blockchain empowered asynchronous federated learning for secure data sharing in internet of vehicles

Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography: a prospective study

Large-scale screening of covid-19 from community acquired pneumonia using infection size-aware classification

Deep learning-based detection for covid-19 from chest ct using weak label

Using artificial intelligence to detect covid-19 and community-acquired pneumonia based on pulmonary ct: evaluation of the diagnostic accuracy

Ai-assisted ct imaging analysis for covid-19 screening: Building and deploying a medical ai system in four weeks

Deep learning enables accurate diagnosis of novel coronavirus (covid-19) with ct images

Engineering

Development and evaluation of an ai system for covid-19 diagnosis

A deep learning algorithm using ct images to screen for corona virus disease

Prior-attention residual learning for more discriminative covid-19 screening in ct images

End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography

Overview of deep learning in medical imaging

Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning

Delving deep into rectifiers: Surpassing human-level performance on imagenet classification

Dermatologist-level classification of skin cancer with deep neural networks

Communication-efficient learning of deep networks from decentralized data

Privacy-preserving asynchronous federated learning mechanism for edge network computing

Federated cooperation and augmentation for power allocation in decentralized wireless networks

Federated learning in vehicular edge computing: A selective model aggregation approach

Federated learningbased cognitive detection of jamming attack in flying ad-hoc network

Federated learning for uavsenabled wireless networks: Use cases, challenges, and open problems

Federated Learning : Collaborative Machine Learning without centralized training data

Blockchain Empowered Asynchronous Federated Learning for Secure Data Sharing in Internet of Vehicles

FDC: A Secure Federated Deep Learning Mechanism for Data Collaborations in the Internet of Things

Federated Learning in Mobile Edge Networks: A Comprehensive Survey

Using Federated Learning on Malware Classification

Federated learning via overthe-air computation

Blockchain-based Node-aware Dynamic Weighting Methods for Improving Federated Learning Performance

Federated machine learning: Concept and applications

Federated Learning: Challenges, Methods, and Future Directions

A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection

He has currently pursing Post Doctor in Information Security at the University of Electronic Science and Technology of China. His research interests include machine learning, deep leaning, malware detection, Internet of Things (IoT) and blockchain technology

He is currently perusing PhD degree in the field of Computer Science and Technology form the school of computer science and engineering

Simin Zhang received her Master's degree in Sichuan University, China. Currently, she is pursuing her Ph.D. in Medical imaging and nuclear medicine from West China Hospital of Sichuan University. Her research interests include machine learning and multi-model MRI applied to glioma

Currently, he is pursuing his Ph.D. in computer science and engineering from the University of Electronic Science and Technology of China (UESTC)

Professor Ting Yang He obtained Bachelor Degree of communication engineering from UESTC; He obtained Master Degree from School of Computer Science and Engineering of UESTC, His Ph.D. in electronics engineering from the University of Electronic Science and Technology of China (UESTC)

His main research interests include image processing, satellite and hyperspectral image de-noising, biomedical, signal processing, optimization algorithms, control, systems, pattern recognition

He has a vast academic, technical, and professional experience in Pakistan. His research interests include artificial intelligence, computer vision particularly vehicle re-identification Professor Wenyong Wang was born in 1967. Currently, He is working as a professor of Computer Science at University of Electronic Science and Technology of China (UESTC)

ACKNOWLEDGEMENT This work was supported by China Postdoctoral scine Science Foundation and Department of Science and Technology of Sichuan Province , Project Number: Y03019023601016201, H04W200533. Authors' contributions