key: cord-0671618-tlg9vzu3 authors: Wicaksana, Jeffry; Yan, Zengqiang; Zhang, Dong; Huang, Xijie; Wu, Huimin; Yang, Xin; Cheng, Kwang-Ting title: FedMix: Mixed Supervised Federated Learning for Medical Image Segmentation date: 2022-05-04 journal: nan DOI: nan sha: 7e480bc568d52cedf72fac075178ee26995df25a doc_id: 671618 cord_uid: tlg9vzu3 The purpose of federated learning is to enable multiple clients to jointly train a machine learning model without sharing data. However, the existing methods for training an image segmentation model have been based on an unrealistic assumption that the training set for each local client is annotated in a similar fashion and thus follows the same image supervision level. To relax this assumption, in this work, we propose a label-agnostic unified federated learning framework, named FedMix, for medical image segmentation based on mixed image labels. In FedMix, each client updates the federated model by integrating and effectively making use of all available labeled data ranging from strong pixel-level labels, weak bounding box labels, to weakest image-level class labels. Based on these local models, we further propose an adaptive weight assignment procedure across local clients, where each client learns an aggregation weight during the global model update. Compared to the existing methods, FedMix not only breaks through the constraint of a single level of image supervision, but also can dynamically adjust the aggregation weight of each local client, achieving rich yet discriminative feature representations. To evaluate its effectiveness, experiments have been carried out on two challenging medical image segmentation tasks, i.e., breast tumor segmentation and skin lesion segmentation. The results validate that our proposed FedMix outperforms the state-of-the-art method by a large margin. Medical image segmentation is a representative task for image content analysis supporting computer aided diagnosis, which can not only recognize the lesion category, but also locate the specific areas [1] . In the past few years, this task has been extensively studied and applied in a wide range of underlying scenarios, e.g., lung nodule segmentation [2] , skin lesion boundary detection [4] , and COVID-19 lesion segmentation [3] . The optimization of deep learning models usually relies on a vast amount of training data [5] . For example, for a fully-supervised semantic segmentation model, the ideal scenario is that we can collect the pixel-level annotated images as much as possible from diverse sources. However, this scenario is almost infeasible due to the following two reasons: i) the strict sharing protocol of sensitive patient information between medical institutions and ii) the exceedingly high pixel-level annotation cost. As the expert knowledge usually required for annotating medical images is much more demanding and difficult to obtain, various medical institutions have very limited strong pixel-level annotated images and most available images are unlabeled or weakly-annotated [3, 20, 21] . Therefore, a realistic clinical mechanism which utilizes every available supervision for cross-institutional collaboration without data sharing is highly desirable. Thanks to the timely emergence of Federated Learning (FL), which aims to enable multiple clients to jointly train a machine learning model without sharing data, the problem of data privacy being breached can be alleviated [11] . FL has gained significant attention in the medical imaging community [12, 17] , due to the obvious reason that medical images often contain some personal information. During the training process of a standard FL model, each local client first downloads the federated model from a server and updates the model locally. Then, the locally-trained model parameters of each client are sent back to the server. Finally, all clients' model parameters are aggregated to update the global federated model. Most of the existing FL frameworks [13, 18] require that the data used for training by each local client needs to follow the same level of labels, e.g., pixel-level labels (as shown in Fig. 1 (d) ) for an image semantic segmentation model, which limits the model learning ability. Although, some semi-supervised federated learning methods [31, 33] attempt to utilize the unlabeled data in addition to pixel-level labeled images in training, they do not make any use of the weakly-labeled images (e.g., image-level class labels in Fig. 1 (b) and bounding box labels in Fig. 1 (c) ), which are invaluable. Clients participating in FL may have different labeling budgets. Therefore, there may be a wide range of inter-client variations in label availability. Weak labels are easier to acquire and thus more broadly available compared to pixellevel ones. In practice, there is a wide range of weak labels with varying strengths and acquisition costs. While an image-level label indicating whether a breast ultrasound image is cancerous or not is easier to acquire compared to a bounding box label pointing out the specific location of the cancerous region, it is also less informative. Therefore, effectively utilizing the information from these weakly-labeled data with varying levels of label strengths as well as unlabeled data, especially for clients without pixel-level labeled data would be highly ben-eficial for improving the federated model's robustness while preventing training instability. In this work, as illustrated in Fig. 2 , we propose a label-agnostic Mixed Supervised Federated Learning (FedMix) framework, which is a unified FL model making use of data labeled in any form for medical image segmentation. Specifically, in the absence of pixel-level labels, FedMix first effectively utilizes unlabeled images as well as useful information contained in the weakly-labeled images (i.e., image-level class labels and bounding box labels) for producing and selecting high-quality pseudo labels. Through an iterative process, the accuracy of selected pseudo labels which are then used for local training on the client sides improves, leading to better model performance. To further improve the model robustness, FedMix takes into account the variability of local clients' available labels through an adaptive aggregation procedure for updating the global federated model. Compared to the existing methods, FedMix not only breaks through the constraint of a single type of labels, but also can dynamically assign an optimized aggregation weight to each local client. Experimental results on two challenging segmentation tasks demonstrate the superior performance of FedMix on learning from mixed supervisions, which is valuable in the clinical setting. Our contributions are summarized as follows: • The mixed supervised FL framework targeting multi-source medical image segmentation through an iterative pseudo label generator followed by a label refinement operation, based on the information derived from weaklylabeled data, to target high-quality pseudo labels for training. • An adaptive weight assignment across clients, where each client can learn an aggregation weight. Adaptive weight assignment is essential to handle inter-client variations in supervision availability. • Extensive experiments on the challenging breast tumor segmentation and skin lesion segmentation. FedMix outperforms the state-of-the-art methods by a large margin. The rest of this paper is organized as follows: Existing and related work are summarized and discussed in Section 2. The details of FedMix are introduced in Section 3. In Section 4, we present thorough evaluation of FedMix compared with the existing methods. We provide ablation studies as well as analysis in Section 5, and conclude the paper in Section 6. Federated learning (FL) is a distributed learning framework, which is designed to allow different clients, institutions, and edge devices to jointly train a machine learning model without sharing the raw data [11] , which plays a big role in protecting data privacy. In the past years, FL has drawn great attention from the medical image communities [18, 46] and has been validated for multi-site functional magnetic resonance imaging classification [13] , health tracking through wearables [52] , COVID-19 screening and lesion detection [47] , and brain tumor segmentation [12, 17] . In clinical practice, different clients may have great variations in data quality, quantity, and supervision availability. Improper use of these data may lead to significant performance degradation among different clients. To reduce the inter-client variations, FL has been combined with domain adaptation [16, 53, 56] , contrastive learning [54] and knowledge distillation [55] to learn a more generalizable federated model. However, existing works do not consider the variation in supervision availability (i.e., different clients have different levels of image labels), which is often observed in clinical practice. In our work, we use all the available image label information including image-level class labels, bounding box labels, and pixel-level labels to train a medical image segmentation model and propose a mixed supervised FL framework. In a standard federated learning setting, not every local client has access to pixel-level supervision for image segmentation to facilitate model learning with weakly-labeled and unlabeled training data. To this end, some semi-supervised federated learning approaches require clients to share supplementary information, e.g., client-specific disease relationship [32] , extracted features from raw data [34] , metadata of the training data [35] , and ensemble predictions from different clients' locally-updated models besides their parameters [33] . Additional information sharing beyond the locally-updated model parameters may leak privacy-sensitive information [45] about clients' data. Yang et al. [31] proposed to avoid additional information sharing by first training a fully-supervised federated learning model only on clients with available pixel-level supervision for several training rounds and then using the model to generate pseudo labels for local clients based on the unlabeled data. Those confident pseudo labels are used to supervise the local model updates on unlabeled clients for subsequent rounds. In this work, we design a unified federated learning framework that utilizes various weakly supervised data in addition to fully-supervised and unlabeled data for training while limiting the information sharing between clients to only locally-updated model parameters for privacy preservation. The deep learning-based image recognition technology has been used for various medical image segmentation tasks, e.g., optic disc segmentation [24] , lung nodules segmentation [2] , skin lesion boundary detection [4] , and COVID-19 lesion segmentation [3] . However, training a fully-supervised deep model for image semantic segmentation often requires access to a mass of pixel-level supervisions, which are expensive to acquire [21] . In particular, the problem of the expensive pixel-level supervision is much more obstructive for medical image segmentation [26] . To this end, efforts have been made to explore the use of some easily obtained image supervisions (e.g., scribbles [43] , image-level classes [6] , bounding boxes [7] , points [8] , and even unlabeled image [36] ) to train a pixel-level image segmentation model. However, most of the existing works are based on only one or two types of image supervisions, which greatly limits the model learning efficiency. In most cases, access to some pixel-level annotated data is required to facilitate model training, which may not always be available for each participating client. In our work, we carefully use image-level class labels, bounding box labels, and pixel-level labels to train local clients and propose an adaptive weight assignment procedure across clients for medical image segmentation. In this section, we first introduce the notation and experimental setting of the proposed unified federated learning framework, i.e., Fedmix, in Section 3.1. Then, we provide a framework overview in Section 3.2. Finally, we present implementation details including pseudo label generation, selection, and federated model update of the proposed FedMix in Section 3.3 and Section 3.4. To emulate the real scenario setting, we focus on deep learning from multi-source datasets, where each client's data is collected from different medical sources. We focus on exploring variations in cross-client supervisions and thus limit each client to a single level of labels. In this paper, we denote D = [D 1 , ..., D N ] as the collection of N clients' training data. Given client i, represent the training data that is pixel-level labeled, unlabeled, image-level class labeled, and bounding box-level labeled, respectively. X and Y represent the sets of the training images and the available labels. To integrate various levels of image labels, in our work, we modify the bounding box labels and image-level class labels to pixel-level labels. Specifically, the bounding box point representation is converted into pixel-level label where the foreground class falls inside the bounding box and the background class falls outside the bounding box. For image-level class labels, we constrain the pixellevel label to the corresponding image class. Consequently, Y gt , Y img , and Y bbox has the same dimension, e.g., Y ∈ R (C+1)×H×W , C indicates the total number of foreground classes while W and H indicates the weight and height of the respective image data. : D parameter: β, λ: hyperparameters for adaptive aggregation T : maximum training rounds : threshold for dynamic sample selection output : As illustrated in Fig. 2 , to fully utilize every level of labels at various clients, the pseudo-code of FedMix is presented in Algorithm 1 and FedMix has two main components: 1. Pseudo Label Generation and Selection. In the mixed supervised setting, clients without access to pixel-level label rely on the pseudo labels for training. To improve the pseudo labels' accuracy, we design a unified refinement process using every level of labels and dynamically select highquality pseudo labels for training. 2. Adaptive Aggregation for Federated Model Update. FedMix uses an adaptive aggregation operation where the weight of each client is determined by not only its data quantity but also the quality of its pseudo labels. Our aim is to learn a federated model for tumor segmentation, the local model updates without access to pixel-level labels have to be integrated with care. In this way, the reliable clients will be assigned higher aggregation weights, leading to a better federated model. Based on the cross-pseudo supervisions [36] , we train two differently initialized models, F 1 (.) and F 2 (.) to co-supervise each other with pseudo labels when no pixel-level label is available. The training image X is fed to the two models F 1 and F 2 to generate pseudo labels Y 1 and Y 2 , respectively. The pseudo labels are then refined, denoted asŶ 1 andŶ 2 , and used for training the model of each local client. Details of the corresponding refinement strategies for each type of label are introduced as follows: 1. Pixel-level labels: Under this kind of supervision, we do refine the pseudo labels, which can be expressed asŶ 1 =Ŷ 2 = Y gt . 2. Bounding box labels: Each of the predictions Y 1 = F 1 (X 1 ) and Y 2 = F 2 (X 2 ) is refined according to the corresponding bounding box label, i.e., 3. Image-level class labels: We do not apply pseudo label refinement, which can be formulated asŶ 1 = Y 1 , andŶ 2 = Y 2 . No labels (i.e., without supervisions): We do not refine the pseudo labels, which is formulated asŶ A specific client i is trained by minimizing: where L dice is the Dice loss function. Despite the effectiveness of the above pseudo label generation and refinement processes, the pseudo labels may be incorrect. Therefore, we propose a dynamic sample selection approach to select high-quality data and pseudo labels. Specifically, given client i and its training data D i , we generate a mask M i = {m 1 , ..., m |Di| |m i ∈ [0, 1]} to select reliable training samples according to Eq. 2. We measure the consistency between pseudo labels before refinement, i.e., Y 1 and Y 2 . Higher prediction consistency between Y 1 and Y 2 indicates a higher likelihood that the pseudo labels are closer to ground truth. The above process is expressed as: where ∈ [0, 1] is a threshold which is inversely proportional to the amount of selected training samples. For pixel-level labels, m i = 1 for all training samples asŶ 1 =Ŷ 2 = Y gt . As training progresses, the models are more capable of generating more accurate pseudo labels. Consequently, i=|Mi| i=1 m i progressively increases to |D i |, allowing the model to learn from a growing set of training data. More discussions of dynamic sample selection are provided in Section 5.1. At each training round, every local client i first receives the federated model's parameters θ t ξ from the server at time or iteration t. Then, every client updates the model locally with its training data D i . Finally, the gradient update from each local client ∆θ t+1 i will be sent to the server to update the federated model's parameters according to Eq. 3. In FedAvg [11] , the aggregation weight of each client, w i , is defined as In the mixed supervised setting, relying only on data quantity for weight assignment is sub-optimal. Thus, supervision availability of each client should also be taken into account during the aggregation process. To this end, we propose to utilize the client-specific training loss to infer the data quality. Each client's training loss not only provides a more objective measurement of its importance during FedMix optimization but also prevents the federated model from relying on the over-fitting clients. The proposed adaptive aggregation function is defined by and where λ and β are hyper-parameters to tune, impacting the degree of reliance towards different clients. More discussions of adaptive aggregation can be found in Section 5.2. Dataset. In our work, experiments are carried out on two challenging medical image segmentation tasks: • Breast tumor segmentation. In this task, three public breast ultrasound datasets, namely BUS [37] , BUSIS [38] , and UDIAT [39] , are used for evaluation and each of them is regarded as a separate client. More details of this dataset are introduced in Table 1 . • Skin tumor segmentation. HAM10K [40] consists of four different sources. Each source acts as a client in FL. The statistics of HAM10K are presented in Table 2 . Following the standard practice, the training data is randomly and patientwisely split into 80% for training and 20% for testing. All the breast ultrasound and skin dermoscopy images are resized to 256×256 pixels and then randomly flipped and cropped to 224×224 pixels for training. Evaluation metrics. In this work, Dice coefficient (DC) is used for the evaluation of the two segmentation tasks. Considering the two-model architecture of FedMix, the predictions or outputs of F 1 are used for evaluation. Network architectures. UNet [41] combined with the group norm [42] is selected as the baseline segmentation model. Supervision types. The following types of labels are included in our experiments: 1) pixel-level labels (denoted as L), 2) bounding box labels (denoted as B), 3) image-level class labels (denoted as I), and 4) unlabeled (denoted as U ), e.g., training with only the raw images. Comparison methods. The following four prevailing frameworks are included for comparison: • Local learning (LL): Each client trains a deep learning network based on its pixel-level labeled data. • Federated Averaging (FedAvg): All clients, owning pixel-level labels, collaboratively train a federated model. • Semi-supervised federated learning via self-training [31] (FedST): FedST is proposed to utilize both pixel-level labeled and unlabeled data for federated training. FedST is selected as it does not require additional information sharing beyond the locally-updated model parameters. • Our proposed Federated learning with mixed supervisions (FedMix): Fed-Mix integrates various levels of labels. The performance of FedAvg under the fully-supervised setting is regarded as an upper bound of federated learning techniques. We evaluate the performance of FedMix under the semi-supervised setting by comparing FedMix with FedST. We also evaluate the performance of FedMix under various settings to show how additional weak labels improve the federated model's performance. Training details. All the networks are trained using the Adam optimizer with an initial learning rate of 1e-3 and a batch size of 16. All methods are implemented within the PyTorch framework and trained on Nvidia GeForce Titan RTX GPUs for 300 rounds. The federated training is performed synchronously and the federated model parameters are updated every training round. We set = 0.9, λ = 10, and β = 1.5 and β = 3 for adaptive aggregation on breast tumor and skin lesion segmentation respectively. Experiment settings. Data from BUS, BUSIS, and UDIAT are represented by C1, C2, and C3 respectively. To better demonstrate the value of weak labels, C3, owning the least amount of data, is selected as the client with pixel-level Table. 4. For LL, the results of C1 and C2 are produced using the model trained on C3. Compared to the locally-learned models under the fully-supervised setting in Table 3 , there exists slight performance degradation on C1 and C2, i.e., 2.24% and 3.97% decrease in DC respectively, indicating the limitation of the model trained only on C3. By utilizing the unlabeled data on C1 and C2 for training, FedST and FedMix are able to train better federated models compared to LL. The overall improvements of FedST are quite limited with an average increase of 0.50% in DC while the segmentation results on C3 are badly affected. Comparatively, FedMix consistently improves the results of all the three clients, leading to an average increase of 3.32% and 2.82% in DC for LL and FedST respectively. One interesting observation is that FedMix in semi-supervised learning outperforms LL with full supervisions, demonstrating the effectiveness of FedMix in exploring hidden information in unlabeled data. Quantitative results of FedMix under different settings are presented in Table 5 . When C1 owns image-level labels, not only C1 but also C2 and C3 would benefit from the federation, shown by performance improvements across clients, i.e., an average of 0.36% increase in DC. When C1 and C2 have access to bounding box labels, the DC scores of C1 and C3 are further improved, with an average increase of 1.57% and 1.11% compared to FedMix with weaker supervisions. To validate the effectiveness of adaptive aggregation, we compare FedAvg and adaptive aggregation under the fully-supervised setting. The results are presented in Table 6 . Putting more emphasis on more reliable clients via adaptive aggregation effectively improves the DC by 1.12%. Qualitative evaluation. According to Fig. 3 , LL on C3 produces quite a few false positives on C2, indicating poor generalization capability due to limited training data. Under the semi-supervised setting, though the unlabeled data of C1 and C2 is used for training, the segmentation results of FedST are close to those of LL as learning from incorrect pseudo labels is not helpful and may be detrimental. Comparatively, FedMix can utilize the useful information in unlabeled data and the model generates predictions close to the experts' annotations. The introduction of stronger supervision signals (i.e., from U to I and B) to FedMix would further reduce false positives and improve the shape preservation of tumor regions. The utilization of adaptive aggregation in federated learning is beneficial even under the fully-supervised setting. Adaptively aggregated federated model can better capture the boundaries and shapes of the tumor regions and contain fewer false positives compared to the model learned using FedAvg. Experiment setting. Images from Rosendahl, Vidir-modern, Vidir-old, and Vidir-molemax are represented by C1, C2, C3, and C4 respectively, and C3, owning the least amount of data, is selected as the client with pixel-level labels. The levels of the labels on C1, C2, and C4 are adjusted accordingly for different cases. Quantitative results. From Table 7 , under the fully-supervised setting, Fe-dAvg improves the performance of the locally-learned models by an average of 0.96% in DC, indicating that cross-client collaboration is beneficial. The key for semi-supervised federated learning is to extract and use accurate information from the unlabeled data. Under the semi-supervised setting, where only C3 has access to annotation (i.e., L), we present the results in Table 8 . The Table 9 . Incorporating bounding box labels for training improves the pseudo labels' accuracy. Consequently, the segmentation performance of FedMix is further improved by 6.11%, approaching the performance of FedAvg under the fully-supervised setting. Bounding box labels are much easier to obtain than pixel-level labels, making FedMix more valuable in clinical scenarios. We further conduct a comparison between FedAvg and adaptive aggregation under the fully-supervised setting, presented in Table 10 . The proposed adaptive aggregation function can better utilize the high-quality data and balance the weights among clients, leading to better convergence and segmentation performance. Qualitative results. Qualitative results of skin lesion segmentation are shown in Fig. 4 . Consistent with the quantitative results, the segmentation maps on C1 and C2, produced by the locally-learned model on C3, are inaccurate, due to large inter-client variations between {C1, C2} and {C3, C4}. While the segmentation maps produced by FedST are slightly more accurate compared to LL, learning from confident pseudo labels is insufficient to train a generalizable model, shown through the inaccurate segmentation maps produced by FedST on C1 and C2. Under the same supervision setting, FedMix produces more accurate segmentation maps by dynamically selecting the high-quality pseudo labels for training. Given stronger supervisions, e.g., bounding box labels, Fed-Mix improves the segmentation quality, especially on tumor shape preservation. Through the comparison under the fully-supervised setting, we observe that the segmentation maps produced by adaptive aggregation contain fewer false negatives and have better shape consistencies with manual annotations compared to FedAvg. We remove the label refinement step in FedMix and utilize FedAvg for comparison. Quantitative results are presented in Table 11 . We can observe that without dynamic sample selection, the model may learn from incorrect pseudo labels which is detrimental for convergence. Dynamic sample selection is based on the intuition where the prediction consistencies between the two models given the same input image are positively correlated with the accuracy of the pseudo labels. We perform separate evaluations on the three datasets for breast tumor segmentation, (i.e., BUS (C1), BUSIS (C2), and UDIAT (C3)). For each client, we train two differently initialized models, F 1 and F 2 , locally on 80% of the data for 20 training rounds. The prediction consistencies between the two models, measured in DC (%), are used to select the evaluation set from the remaining 20% of the data according to the consistency threshold . With a smaller , more samples with lower prediction consistencies are included for evaluation. With the increase of , as only the samples with high prediction consistencies are selected, the overall DC accuracy is higher. The findings in Table 12 validate our assumption and demonstrate the value of dynamic sample selection in filtering inaccurate pseudo labels during training. We compare adaptive aggregation with FedAvg and present the results in Table 13 . For breast tumor segmentation, adaptive aggregation consistently improves performance across clients, with an average of 1.00% increase in DC compared to FedAvg. For skin lesion segmentation, due to the inter-client variations between {C1, C2} and {C3, C4}, adaptive aggregation focuses more on minimizing the training losses on C1 and C2. As a result, the average DC increase of {C1, C2} is 1.44% while the corresponding increase on C4 is limited to 0.19%. Overall, adaptive aggregation outperforms FedAvg. Till now, aggregation weight optimization in federated learning is still an open problem and should be further explored in the future. FedMix is the first federated learning framework that makes effective use of different levels of labels on each client for medical image segmentation. In FedMix, we first generate pseudo labels from clients and use supervision-specific refinement strategies to improve the accuracy and quality of pseudo labels. Then the high-quality data of each client is selected through dynamic sample selection for local model updates. To better update the federated model, FedMix utilizes an adaptive aggregation function to adjust the weights of clients according to both data quantity and data quality. Experimental results on two segmentation tasks demonstrate the effectiveness of FedMix on learning from various supervisions, which is valuable to reduce the annotation burden of medical experts. In the semi-supervised federated setting, FedMix outperforms the state-of-theart approach FedST. Compared to FedAvg, the proposed adaptive aggregation function achieves consistent performance improvements on the two tasks under the fully-supervised setting. We believe the methods proposed in FedMix are widely-applicable in FL for medical image analysis beyond mixed supervisions. Fully convolutional networks for semantic segmentation CT-realistic lung nodule simulation from 3D conditionoal generative adversarial networks for robust lung segmentation Dual-consistency semisupervised learning with uncertainty quantification for COVID-19 lesion segmentation from CT images Skin lesion analysis toward melanoma detection: A challenge at the 2017 international symposium on biomedical imaging Deep residual learning for image recognition Learning pixel-level semantic affinity with imagelevel supervision for weakly supervised semantic segmentation Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation What's the point: Semantic segmentation with point supervision Scribblesup: Scribble-supervised convolutional networks for semantic segmentation The cityscapes dataset for semantic urban scene understanding Communication-efficient learning of deep networks from decentralized data Privacy-preserving federated brain tumour segmentation Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: ABIDE results Self-supervised equivariant attention mechanism for weakly supervised semantic segmentation BBAM: Bounding box attribution map for weakly supervised semantic and instance segmentation Variationaware federated learning with multi-source decentralized medical data Multiinstitutional deep learning model without sharing patient data: A feasibility study on brain tumor segmentation Federated deep learning for detecting COVID-19 lung abnormalities in CT: A privacy-preserving multinational validation study Federated learning for predicting clinical outcomes in patients with COVID-19 Multi-view semisupervised 3D whole brain segmentation with a self-ensemble network Every annotation counts: Multi-label deep supervision for medical image segmentation The liver tumor segmentation benchmark (LITS) The multimodal brain tumor image segmentation benchmark (BRATS) Joint optic disc and cup segmentation based on multi-label deep network and polar transformation Active cell appearance model induced generative adversarial networks for annotation-efficient cell segmentation and identification on adaptive optics retinal images Transformationconsistent self-ensembling model for semisupervised medical image segmentation Semi-supervised semantic segmentation with cross-consistency training Guided collaborative training for pixel-wise semi-supervised learning Shape-aware semi-supervised 3D semantic segmentation for medical images Mixmatch: A holistic approach to semi-supervised learning Federated semi-supervised learning for COVID region segmentation in chest CT using multi-national data from China Federated semi-supervised medical image classification via inter-client relation matching FedPerl: Semi-supervised peer learning for skin lesion classification Federated contrastive learning for volumetric medical image segmentation Federated contrastive learning for decentralized unlabeled medical images Semi-supervised semantic segmentation with cross pseudo supervision Dataset of breast ultrasound images BUSIS: A benchmark for breast ultrasound image segmentation Automated breast ultrasound lesions detection using convolutional neural networks The HAM10000 dataset, a large collection of multisource dermatoscopic images of common pigmented skin lesions U-Net: Convolutional networks for biomedical image segmentation Group normalization Learning to segment medical images with scribblesupervision alone Weakly-supervised learning based-feature localization in confocal laser endomicroscopy glioma images Soteria: Provable defense against privacy leakage in federated learning from representation perspective Secure, privacy-preserving and federated machine learning in medical imaging Federated learning used for predicting outcomes in SARS-COV-2 patients Federated learning for computational pathology on gigapixel whole slide images Federated learning for COVID-19 screening from chest x-ray images The future of digital health with federated learning Multi-institutional collaborations fori mproving deep learning-based magnetic resonance image reconstruction using federated learning FedHealth: A federated transfer learning framework for wearable healthcare Federated transfer learning for EEG signal classification Model contrastive federated learning FedMD: Heterogeneous federated learning via model distillation FedDG: Federated domain generalization on medical image segmentation via episodic learning in continuous frequency space