key: cord-0625579-drwiqnf7
authors: Peng, Jialin; Wang, Ye
title: Medical Image Segmentation with Limited Supervision: A Review of Deep Network Models
date: 2021-02-28
journal: nan
DOI: nan
sha: 9cf979cfbb19d4b1c84d130fdcd9490537758ca4
doc_id: 625579
cord_uid: drwiqnf7

Despite the remarkable performance of deep learning methods on various tasks, most cutting-edge models rely heavily on large-scale annotated training examples, which are often unavailable for clinical and health care tasks. The labeling costs for medical images are very high, especially in medical image segmentation, which typically requires intensive pixel/voxel-wise labeling. Therefore, the strong capability of learning and generalizing from limited supervision, including a limited amount of annotations, sparse annotations, and inaccurate annotations, is crucial for the successful application of deep learning models in medical image segmentation. However, due to its intrinsic difficulty, segmentation with limited supervision is challenging and specific model design and/or learning strategies are needed. In this paper, we provide a systematic and up-to-date review of the solutions above, with summaries and comments about the methodologies. We also highlight several problems in this field, discussed future directions observing further investigations.

M EDICAL image segmentation, identifying the pixels/voxels of anatomical or pathological structures from background biomedical images, is of vital importance in many biomedical applications, such as computer-assisted diagnosis, radiotherapy planning, surgery simulation, treatment, and follow-up of many diseases. Typical medical image segmentation tasks include brain and tumor segmentation [1] - [3] , cardiac segmentation [4] , liver and tumor segmentation [5] - [8] , cell and subcellular structures [9] - [11] , multi-organ segmentation [12] and lung and pulmonary nodules [13] , vessel segmentation [14] , etc., and thus can deliver crucial information about the objects of interest. While semantic segmentation of medical images involves labeling each pixel/voxel with the semantic class, instance segmentation (such as cell segmentation) extends semantic segmentation to discriminate each instance within the same class. Recently, deep learning methods have achieved impressive performance improvements on various medical image segmentation tasks and set the new state of the art. Numerous image segmentation algorithms have been developed in the literature and have made great progress on the designs and performance of deep network models [15] , [16] .

However, the scarcity of high-quality annotated training data has been a significant challenge for medical image segmentation. The strong generalization capabilities of most cuttingedge segmentation models, which are usually deep and wide networks, highly rely on large-scale and high-quality pixelwise annotated data, which are often unavailable for clinical and health care tasks. In fact, it is an expensive and timeconsuming process to manually annotate medical images at pixel-level since it requires the knowledge of experienced clinical experts. The scarcity of annotated medical imaging data is further exacerbated by the data differences in patient populations, acquisition parameters and protocols, sequences, vendors, and centers, which may result in obvious statistical shifts. Thus, it is even challenging to collect a sufficiently large number of training data due to the heterogeneous nature of medical imaging data and the strict legal and ethical requirements for patient privacy. The data scarcity problem is much more severe for emerging tasks and new environments, where quick model employment is expected. However, only a limited amount of annotations with limited quality are available. Therefore, the high cost of pixel-level labeling and the privacy and security of data hinder the model training and their scalability to novel images of emerging tasks and new environments, which subsequently hamper the application of deep segmentation models in real-world clinical and health care usage. Thus, learning strong and robust segmentation models from limited labeled data and readily available unlabeled data is crucial for the successful application of deep learning models in clinical usage and health care.

These challenges have inspired many research efforts on learning with limited supervision, where the training data only have a limited amount of annotated examples, accurate but sparse annotations, inaccurate annotations, coarse-level annotations, and combinations of thereof. However, due to its intrinsic difficulty, segmentation with limited supervision is challenging, and specific model design and/or learning strategies are needed. Despite these challenges, researchers have introduced a diverse set of deep network models [16] that can handle incomplete, sparse, inaccurate, or coarse annotations. However, the progress is more slowly than that of fully supervised learning. In this paper, we will take a systematic and up-to-date look at the development of recent technologies that explored the unlabeled examples and prior knowledge to address the limited supervision and small data problem.

Several comprehensive surveys exist about the deep learning methods [15] , [17] or its subcategories such as generative adversarial networks (GAN) [18] for general medical image • We categorize the problem of limited-supervised segmentation into semi-supervised segmentation, partially supervised segmentation, and inaccurately-supervised segmentation and offer a structural review of recent advances in methods that can be used to address these problems. We also offer summaries and comments about the pros and cons of the methodologies in each category, and the connections of methods in different categories. • We also highlight several problems in this field and discuss the limitation and the new trends and future directions for medical image segmentation with limited supervision. The paper is organized as follows. In Section II, we provide preliminary knowledge about medical image segmentation and the basic deep network architectures for this task, as well as the categorization of medical image segmentation with limited supervision. Section III to Section V provide a detailed review of methods for semi-supervised segmentation, partiallysupervised segmentation, and inaccurately supervised segmentation, respectively. Section VI discusses future directions. Lastly, Section VI concludes this survey.

Medical image segmentation involves delineating the anatomical or pathological structures from medical images of various modalities. As pointed out in [17] , medical images are heterogonous with imbalanced classes and have multiple modalities with sparse annotations. Thus, it is complicated and challenging to analyze various medical images. Here, we focus on the medical image segmentation problem, which typically consists of semantic segmentation and instance segmentation. Semantic segmentation refers to the task of assigning each pixel/voxel with a semantic category label (such as liver, Fig. 4 : U-Net in [34] .

in the domain of medical image segmentation. Dense skip connections were introduced in the DenseNet [38] , and have been widely used in many segmentation models [15] . Please refer to [15] , [39] , [40] for comprehensive reviews of recent improvements of the FCN models for semantic segmentation of natural and medical images. Image segmentation with limited supervision. The cost of labor-intensive, pixel-level annotation of large scale medical imaging data can be reduced by utilizing 1) a small subset of labeled training data, also known as semi-supervised learning or few-shot learning; 2) partial annotations (including sparse annotations), i.e., partially-supervised learning; or 3) inaccurate annotations including noisy labels, bounding boxes, and boundary scribbles. It is noteworthy that, though the labeled data is scarce in the semi-supervised setting, these annotated are typically assumes to be precise and reliable, which is different from inaccurate and partial annotation settings. The extreme case of limited supervision is the unsupervised setting, where there is ultimately no labeled data available. However, methods only exploring unlabeled data, such as clustering, are usually task-agnostic and usually show very low performance for the complicated segmentation tasks. Recently, auxiliary tasks, such as the adaptation of a well-trained model from a similar domain with a similar task [11] , [41] , have been leveraged to migrate this problem. Although we will not cover the unsupervised segmentation and their solutions, such as unsupervised domain adaptation (UDA) [42] and zero-shot learning [43] , we mention it here to start by looking at all settings in the big picture. In this paper, we focus on methods that learn to segment medical images with incomplete, inexact, and inaccurate annotations by jointly leveraging a few labeled data and a large number of unlabeled examples.

Semi-supervised Segmentation is a common scenario in medical applications, where only a small subset of the training images are assumed to have full pixel-wise annotations. However, there is also an abundance of unlabeled images that can be used to improve both the accuracy and generalization capabilities. Since unlabeled data do not involve labor-intensive annotations, any performance gain conferred by using unlabeled data comes with a low cost. The major challenge of this learning scenario lies in how to efficiently and thoroughly exploit a large quantity of unlabeled data. The most common approaches for semi-supervised segmentation include 1) general strategies, e.g., transfer learning, data augmentation, prior knowledge learning, curriculum learning, and few-shot learning, and 2) specialized methods that make use of unlabeled data, e.g., self-training [44] - [47] , consistency regularization [48] , co-training, self-supervised learning, and adversarial learning.

In the following subsections, we first review general methods that can be used to address small labeled data problem, i,e., transfer learning, data augmentation, prior knowledge learning, curriculum learning, as shown in Fig. 5 . Then, we discuss specialized methods designed for semi-supervised learning. Finally, we elaborate on few-shot learning, which learns to generalize from a few examples with prior knowledge.

Transfer learning refers to reusing a model developed for a task as the starting point for a model on a second task, which may speed up the learning process, alleviate the problem of limited training data, and improve generalization on the second task. In contrast, training an entire deep network from scratch usually requires a large-scale labeled dataset. The "model pretraining and fine-tuning" strategy, a notable example of transfer learning, has been a simple but effective paradigm in many deep network applications since many tasks are related. In many deep learning studies, transfer learning also narrowly refers to the "model pretraining and fine-tuning" strategy. Typically, transfer learning from natural image data to medical datasets involves starting with standard network architectures, e.g., VGG [49] and ResNet [50] , and using corresponding pre-trained weights trained on large-scale external sources of natural images, namely ImageNet [51] and PASCAL VOC [52] , as initialization or a fixed feature extractor, and then fine-tuning the model on medical imaging data. This model reusing strategy tends to work if the features are general and suitable for both the source and target tasks. The transferability of features on different layers of the deep network was investigated in [53] . They showed that transferring features even from distant tasks can be better than using random features.

It is noteworthy that, various medical applications involves segmentation of 3D medical images, which hinders the transfer of pre-trained model on 2D natural images to the current task. While it is straightforward to reformulate volume image segmentation to slice-by-slice 2D segmentation, rich spatial 3D contexts will inevitably lose. Possible solutions for transferring 2D networks to 3D networks include 1) copying the 2D kernels along an axis [54] and 2) padding the pre-trained 2D kernels by zeros along an axis [55] , [56] . For instance, Liu et al. [56] proposed to transfer convolutional features learned from 2D images to 3D anisotropic volume and obtained desired strong generalization capability of the pre-trained 2D network.

For medical image analysis, another challenge of the pretrained model on large-scale natural image sets is the significant domain gap between natural images and medical images, even an obvious gap between medical images of different modalities. Tajbakhsh et al. [57] investigated the effectiveness of pre-trained deep CNNs with sufficient fine-tuning compared to training a deep network from scratch on four different medical imaging applications. They showed that, in most cases, fine-tuned network could outperform those trained from scratch and showed better robustness. However, Raghu et al. [58] recently evaluated the properties of transfer learning from ImageNet on two large scale medical imaging tasks. They demonstrated a contrasting result that transfer learning gained little performance benefit, and simple and lightweight models can perform comparably to large pre-trained networks. Zoph et al. [59] demonstrated that stronger data augmentation and more labeled data would diminish the benefit of pretraining for vision applications, but self-training is always helpful.

While a well pre-trained model trained on a large-scale medical image dataset may be more valuable for medical image segmentation, there is not a large scale annotated dataset like ImageNet in the medical domain. To obtain a universal pre-trained model with promising transferable and generalizable ability for medical image analysis, several studies have proposed to pre-train models on medical datasets that are limited to specific modalities or tasks. Zhou et al. [60] built a 3D pre-trained model, called Genesis Chest CT, using unlabeled 3D Chest Computed Tomography (CT) images with a novel self-supervised learning method. Similar pre-trained models for specific image domains were also similarly built, such as Genesis Chest CT 2D, and Genesis Chest X-ray, which used 2D chest CT and chest X-ray images, respectively. A universal 3D model was learned in [61] by leveraging a selfsupervised learning scheme from multiple unlabeled source datasets of different modalities and distinctive scan regions.

Since deep networks are heavily reliant on big data to learn discriminative representation and avoid overfitting, data augmentation [62] has been considered as a simple yet effective data-space solution to the problem of limited annotated data. Specifically, data augmentation aims to artificially enhance the size, diversity, and quality of the training data without collecting and manually labeling new data. Typical data augmentation methods not only include data warping methods [62] such as random affine and elastic transformations, random cropping [50] , random erasing [63] , [64] , intensity transformation and adversarial data augmentation [65] , [66] , but also include methods that can synthesize more diverse and realistic labeled examples, such as mixing images [67] - [70] , feature space augmentation [71] , and generative adversarial networks [18] , [72] - [74] . While general transformation augmentation methods such as random affine transformations, elastic transformations, and intensity transformations are easy to implement and have shown performance improvements in abundant applications [2] , [34] , [75] , they do not take advantage of the knowledge in unlabeled training data. Recently, there is a growing interest in developing augmentation that can simulate real variations of the data, and thus task-driven approach [76] - [78] is a promising direction. Schlesinger et al. [79] provided a recent review of data augmentation methods for brain tumor segmentation.

Mixing and cutting images [67] - [69] , [80] is a class of simple but effective augmentation methods in many applications [70] . Specifically, Mixup [67] linearly interpolates a random pair of training images and correspondingly their labels. Recently, Mixup has been improved in [69] with learned mixing policies to prevent manifold intrusion. Cutout [64] adopts the idea of the regional dropout strategy, that is occluding a portion of an image, on training data. Alternatively, CutMix [68] is a combination of aspects of Mixup and Cutout by replacing a portion of an image with a portion of a different image. For the application of medical image segmentation, Panfilov et al. [70] tested the efficiency of Mixup for knee MRI segmentation and showed improved model robustness.

Adversarial data augmentation involves harnessing adversarial examples to train robust models against unforeseen data corruptions or distribution shifts [65] , [66] , [81] and thus is plausible to cope with limited labeled training data. When applying to medical image segmentation, designing and constructing more realistic adversarial perturbations is a crucial problem [65] , [77] , [82] . For MR image segmentation, Chen et al. [77] introduced intensity inhomogeneity as a new type of adversarial attack using a realistic intensity transformation function learned with adversarial training to amplify intensity non-uniformity in MR images and simulate potential image artifacts, such as bias field. To obtain adversarial samples subject to a given transformation model, Olut et al. [83] proposed to learn a statistical deformation model that can capture plausible anatomical variations from unlabeled data by deep registration models. A similar idea was adopted in [84] .

Generative adversarial networks (GAN) [18] , [72] have also been utilized to conduct medical data augmentation by directly synthesizing new labeled data. Costa et al. [85] proposed training a generative model with adversarial learning to synthesize both realistic retinal vessel trees and retinal color images. For semi-supervised medical image segmentation, Chaitanya et al. [86] proposed to learn a generative network to synthesize new samples from both labeled and unlabeled data by simultaneously learning and applying realistic spatial deformation fields and additive intensity transformation fields. To improve cross-modal segmentation with limited training samples, Cai et al. [87] developed a cross-modality data synthesis approach to generate realistic looking 2D/3D images of a specific modality as data augmentation. Yu et al. [88] integrated edge information into conditional GAN [89] for cross-modality MR image synthesis. To segment pulmonary nodules, Qin et al. [90] augmented the training set with synthetic CT images and labels and achieved promising results. For one-shot brain segmentation, Zhao et al. [76] used a data-driven approach for synthesizing labeled images as data augmentation. Specifically, they proposed to model the set of spatial and appearance transformations between all the training data, including both the labeled and unlabeled images, and then applied the learned transformations on the single labeled image to synthesize new labeled images.

A group of semi-supervised methods has addressed semisupervised segmentation by incorporating prior knowledge/ domain knowledge such as anatomical priors about the objects of interest into the segmentation model as a strong regularization [91] - [96] . In fact, prior knowledge about location, shape, anatomy, and context is also crucial for manual annotation, especially in the presence of fuzzy boundaries or low image contrast. As for semantic segmentation with deep networks, the model training is typically guided by local or pixel-wise loss functions (e.g., Dice loss [97] and cross-entropy loss), which may not be sufficient to learn informative features about the underlying anatomical structures and global dependencies. Anatomical-prior guided methods usually assume the plausible solution space can be expressed in the form of a prior distribution, enforcing the network to generate more anatomically plausible segmentations.

Atlas-based segmentation [98] , [99] with single-or multiple-atlas has been widely used in medical image segmentation to exploit prior knowledge from previously labeled training images. An atlas consists of a reference model with labels related to the anatomical structures. Thus, it can provide crucial knowledge, such as information about location, texture, shape, spatial relationship, etc., for segmentation, especially when limited labeled data available for model training. Atlasbased methods essentially treat the segmentation problem as a registration problem, and non-rigid registration is typically used to account for the anatomical differences between subjects. Wang et al. [91] addressed one-shot segmentation of brain structures from Magnetic Resonance Images (MRIs) with single atlas-based segmentation, where reversible voxelwise correspondences between the atlas and the unlabelled images were learned with a correspondence-learning deep network. Ito et al. [100] considered semi-supervised segmentation of brain tissue from MRI. Specifically, they relied on image registration with one or more atlas to generate pseudo labels on unlabeled data. The expectation-maximization (EM) algorithm was used to update model parameters and pseudo labels alternatively. However, image registration, the process of geometrically aligning two or more images, is computationintensive, which may hamper its practical application. Please refer to [101] for a comprehensive review of both affine and deformable image registration with deep learning methods. A similar idea was employed by Chi et al. [102] , where they generated pseudo-labels by utilizing deformable image registration to propagate atlas labels onto unlabeled images. Xu and Niethammer et al. [92] proposed to jointly learn two deep networks for weakly-supervised image registration and semi-supervised segmentation, assuming that these two tasks can mutually guide each other's training on unlabeled images. He et al. [103] further proposed an improved joint learning model, which added a perturbation factor in the registration to its sustainable data augmentation ability and a discriminator to extract registration confidence maps for better guidance of the segmentation task. For 3D left cavity (LV) segmentation on echocardiography with limited annotation data, Dong et al. [104] introduced a deep atlas network with a lightweight registration network and a multi-level information consistency constraint. However, registration, which is a computation insensitive and challenging task, is not an essential segmentation. For semi-supervised 3D liver segmentation, Zheng et al. [105] proposed to combine probabilistic atlas, which can provide the shape and position prior, with deep segmentation networks using a prior weighted cross-entropy loss. The probabilistic atlas was obtained by averaging the manually labeled liver masks after aligning all labeled training images. Vakalopoulou et al. [106] developed an AtlasNet, which consisted of multiple deep networks trained after co-aligning multiple anatomies through multi-metric deformable registration. The multiple deep networks were used to map all training images to common subspaces to reduce biological variability.

Shape-prior based segmentation [94] - [96] , [107] - [113] has been an active research topic in the context of deep learning to obtain more accurate and anatomically plausible segmentation. While principal component analysis (PCA) based statistical shape model (SSM) [30] was widely adopted by traditional segmentation methods, it is not straightforward to combine SSM with deep networks. Ambellan et al. [114] combined 3D SSMs with 2D and 3D deep convolutional networks to obtain a robust and accurate segmentation of even highly pathological knee bone and cartilage. Specifically, they used SSM adjustment as a shape regularization of the outputs of the segmentation networks. Oktay et al. [94] initially used a stacked convolutional autoencoder to learn non-linear shape representations, which is integrated with the segmentation network to enforce its predictions to follow the learned anatomical priors. With the shape prior, their method obtained highly competitive performance for cardiac image segmentation while learning from a limited number (30) of labeled cases. Rather than using the compact codes produced by an autoencoder as the shape constraint in [94] , Yue et al. [113] used the reconstructions of the predicted segmentations to maintain a realistic shape of the resulting segmentation.

While the framework in [94] , [113] incorporated the learned anatomical prior into deep networks through a regularization term, Painchaud et al. [115] incorporated the anatomical priors through an additional post-processing stage. Specifically, they warped initial segmentation results toward the closest anatomically correct cardiac shape, which was leaned and generated with a constrained variational autoencoder. Ravishankar et al. [116] introduced a shape regularization network (convolutional autoencoder) after the segmentation. Larrazabal et al. [107] learned lower-dimensional representations of plausible shapes with a denoising autoencoder and use it as a post-processing step to impose shape constraints on the coarse output of the segmentation network.

As a novel extension of template deformation methods [30] in the context of deep networks, Lee et al. [110] introduced a template transformer network, where a shape template is deformed to match the underlying structure of interest through an end-to-end trained spatial transformer network. Zotti et al. [117] introduced a probabilistic image estimated by computing the pixel-wise empirical proportion of each class based on aligned ground truth label fields of the training images. The probabilistic shape-prior image was concatenated with network features for prior guidance. For the semi-supervised 3D segmentation of renal artery, He et al. [103] proposed assisting the segmentation network with multi-scale semantic features extracted from unlabeled data with an autoencoder.

Other types of anatomical priors such as star shape prior [108] , [118] - [121] , convex shape prior [122] , topology [123] - [127] , size [128] - [130] , etc., have also been introduced to improve the segmentation robustness and anatomically accuracy.

Given the greater complexity of semi-supervised segmentation over classification and the importance of starting small, the concept of curriculum learning [131] , or the easy-tohard strategy, has also been utilized. Curriculum learning describes a type of learning strategy that first starts with easier aspects of the task or easier subtasks and then gradually increases the difficulty level. In a broad sense, the most widely The flowchart of the self-training for semi-supervised segmentation and active learning for interactive segmentation on a conceptual level. For self-training, the segmentation model is initially trained only on the small size labeled data and then retrained with the extended data that consists of both the labeled data and confidence pseudo-labeled data. Human interaction is required for active learning, and the segmentation model is retrained with original labeled data and newly-labeled data.

used curriculum learning strategies include data curriculum learning and task curriculum learning. While early studies focused on data curriculum learning by reweighting the target training distribution, recent studies also investigated the varied easiness among different works [132] , i.e., task curriculum learning.

In data curriculum learning, non-uniform sampling of examples or mini-batches from the entire training data, rather than uniform-sampling as in standard deep network training, is used in model training. Therefore, the core tasks are how to rank the training examples and how to guide the order of presentation of examples based on this ranking [133] . Thus, it is flexible to incorporate prior knowledge about the data and task. It has been empirically demonstrated that this learning paradigm is useful in avoiding bad local minima and in achieving better generalization ability [134] . Data curriculum learning has recently been used in several medical applications, especially location and classification tasks [78] , [135] - [137] but few in segmentation tasks [138] , [139] . To train a deep network for the classification and location of thoracic diseases on chest radiographs, Tang et al. [135] first ranked the training images according to the difficulty (indicated by the severitylevels of the disease) and then fed them to the deep network to boost the representation learning gradually. For fracture classification, Jiménez-Sánchez et al. [137] , [140] assigned a degree of difficulty to each training example according to medical decision trees and inconsistencies in multiple experts' annotations. In addition to the predefined curriculum by prior knowledge and keeping it fixed after that, the curriculum can also be dynamically determined to adapt to the feedback of the learner, also known as self-paced curriculum learning [141] or self-paced learning [142] . For lung nodule segmentation/detection with extreme class imbalance, Jesson et al. [138] introduced an adaptive sampling strategy, which favors difficult-to-classify examples. For instance-level segmentation of pulmonary nodule, Wang et al. [139] employed pseudo labels as the surrogate of ground truth labels on unlabeled data. To utilize the pseudo-labeled data, they followed the idea of self-spaced curriculum learning [141] and embedded curriculum design as a regularization term into the learning objective.

Task curriculum learning consists of tackling easy but related tasks first to provide auxiliary information for more complicated tasks, which will be solved later. Task curriculum learning is highly related to multi-stage learning in segmentation [2] , [143] , [144] , where more easier tasks such as location or coarse segmentation are first solved with a simple method. After that, the more complex pixel-level segmentation is addressed. For example, for cross-domain segmentation of natural images, Zhang et al. [145] proposed to solve easy tasks first to infer necessary properties about the target domain. Specifically, they first estimated label distributions over both global images and some landmark superpixels of the target domain. They then enforced the semantic segmentation network to follow those target-domain properties as much as possible. For left ventricle segmentation in MR images, Kervadec et al. [146] introduced a curriculum-style strategy that first learned the size of the target region, which is the more easier task and then regularized the segmentation task, which is a difficult task, with pre-learned region size.

Self-training [44] , [147] - [149] (also called pseudo-labeling) is an iterative process that alternatively generates pseudo-labels on the unlabeled data and retrains the learner on the combined labeled and pseudo-labeled data. An illustration of the selftraining based semi-supervised segmentation is shown in Fig.  6 (a) (with a comparison of the strategy of human-in-theloop, i.e., active learning for interactive segmentation that will be discussed in Sec. IV). The generality and flexibility of self-training have been validated in many applications [59] . A fundamental property of self-training strategy is that it can be combined with any supervised learner and provides a straightforward but effective manner of leveraging unlabeled data.

Self-training is usually in a teacher-student paradigm [150] , [151] (as shown in Fig. 7) , which consists of first learning a teacher model from ground truth annotations and then using the predictions of the teacher model to generate pseudolabels on the unlabeled data. The ground truth annotations and pseudo labels with high confidence are further jointly digested iteratively to learn a powerful student model. Bai et al. [44] first trained a teacher neural network using the labeled data and then utilized prediction confidence (i.e., the probability prediction followed by a conditional random field (CRF) refitment) of the teacher model on the unlabeled data as the pseudo labels. Fan et al. [152] applied a similar strategy for lung infection segmentation from CT images. Typically, the selftraining method is iterative, and the quality of pseudo-labels should be gradually improved for a successful self-training approach [151] . The self-training strategy's main challenge lies in generating reliable pseudo labels and handling the negative impacts by adding incomplete and incorrect pseudo-labels, which may confuse the model training.

A promising direction to improve the quality of pseudo labels and reduce the negative impact of noisy pseudo labels is to estimate the uncertainty or confidence estimation [153] - [157] . To this end, it is pleasable to let the teacher model simultaneously generate the segmentation predictions as the pseudo labels and estimate the uncertainty maps for the unlabeled images. The uncertainty maps can be used as guidance to maintain reliable predictions [158] - [165] .

There are two categories of uncertainty [154] , [157] one can model, namely aleatoric uncertainty (data uncertainty), which is an inherent property of the data distribution and irreducible, and epistemic uncertainties (model uncertainty), which can be reduced through the collection of additional data. Popular approaches to generate pseudo-labels and quantify uncertainties in deep networks include Bayesian neural networks [155] , [166] , Monte Carlo Dropout [45] , [46] , [153] , [167] , Monte Carlo batch normalization [168] , and deep ensembles [169] . Bayesian neural networks capture model uncertainty by learning a posterior distribution over parameters. While Bayesian networks are often hard to implement and computationally slow to train [153] , [169] , non-Bayesian strategies, including Monte Carlo Dropout and deep ensembles, are more attractive. Jungo et al. [160] evaluated several widely-used pixel-wise uncertainty measures concerning their reliability and limitations for medical image segmentation, and also highlighted the importance of developing subject-wise uncertainty estimations.

The model uncertainty estimated with Monte Carlo Dropout [153] can be interpreted as an approximation of Bayesian uncertainty. Concretely, the predictive uncertainty is estimated by averaging the results of multiple stochastic forward passes of the deep network under random dropout. The widelyused uncertainty measures include normalized entropy of the varied probabilistic predictions, the variance of the Monte Carlo samples, mutual information, and predicted variance. Nair et al. [158] provided an in-depth analysis of the different measures based on medical image segmentation performance. Camarasa Fig. 7 : The flowchart of a typical self-training procedure in the teacher-student framework on a conceptual level. The segmentation model is initially trained only on the small size labeled data and then retrained with the extended training dataset that consists of both the labeled data and confident pseudo-labeled data.

the learning of the student model and applied it to the semisupervised segmentation of the left atrium. Similarly, Sedai et al. [45] conducted semi-supervised segmentation of retinal layers in OCT images with uncertainty guidance estimated with Monte Carlo Dropout.

Deep ensembles [169] were theoretically motivated by the bootstrap and have been empirically demonstrated to be a promising approach for boosting the accuracy and robustness of deep networks. Concretely, multiple networks using different training subsets and/or different initializations are separately trained to enforce variability, and then the predictions are combined by averaging for the uncertainty estimation. Mehrtash et al. [163] used deep ensembles for confidence calibration, where they trained multiple models with different initializations and random shuffling of the training data. They applied their confidence calibrated model for brain, heart, and prostate segmentation.

Ayhan and Berens [171] showed that applying traditional augmentation at the test time can be an effective and efficient estimation of heteroscedastic aleatoric uncertainty in deep networks, and they applied their method on fundus image analysis. Kendall and Gal [157] introduced a unified Bayesian framework to combine aleatoric and epistemic uncertainty estimations for deep networks. Wang et al. [164] validated the effectiveness of test-time augmentation as aleatoric uncertainty estimation on the segmentation of fetal brains and brain tumors.

Co-training initially introduced by Blum and Mitchell [172] exploits multiview data descriptions to learn from a limited number of labeled examples and a large amount of unlabeled data. The underlying assumption is that the training examples can be described by two or more different but complementary sets of features, called views, which are assumed conditionally independent in the ideal given the category. As an extension of self-training to multiple base learners, the original co-training for classification first learns a separate learner for each view using any labeled examples, then most confident predictions of all base learners on unlabeled data are gradually added g., Π model [150] ; b) dual model in the teacher-student framework with consistency constraint, e.g., Temporal Ensembling [150] , and Mean Teacher [151] .

to the labeled data of other base learners to continue the iterative training. By enforcing prediction agreements between the different but related views, the goal is to allow inexpensive unlabeled data to augment a much smaller set of labeled examples. Moreover, it is essential to ensure the different base learners giving different and complementary information about each instance [173] , namely, view difference constraint or diversity criterion. Peng et al. [174] applied the idea of cotraining to semi-supervised segmentation of medical images. Concretely, they trained multiple models on different subsets of the labeled training data and used a common set of unlabeled training images to exchange information with each other. Diversity across models was enforced by utilizing adversarial samples generated using both the labeled and unlabeled data as [175] . For semi-segmentation of multi-organ from 3D medical images, Zhou et al. [176] introduced multi-planar co-training, which involves training different segmentation models on multiple planes, i.e., axial, coronal, and sagittal planes, of a volume image in the teacher-student paradigm. Xia et al. [177] incorporated uncertainty estimation to the multi-planar co-training approach in [176] to generate more reliable pseudo labels for unlabeled data.

Consistency regularization [48] , [178] , [179] utilizes unlabeled data by relying on the assumption that favoring models should generate consistent predictions for similar inputs, as shown in Fig. 8 . More specifically, the trained model should output the same predictions for classification or equivariant predictions for segmentation when fed perturbed or transformed input. To this end, methods of this category learn to minimize the difference in predictions of passing perturbed or transformed versions of a training sample through the deep network, aiming to obtain a model of better generalization ability. The conceptual idea of consistency regularization in both single-model architecture and dual-model architecture is shown in Fig. 8 . Cui et al. [180] adapted the mean teacher model [151] , an improved teacher-student self-training strategy that also considered consistency regularization, to semi-supervised brain lesion segmentation. Specifically, they minimized the differences between the predictions of the teacher model and the student model for the same input under different noise perturbations. Yu et al. [46] further introduced uncertainty estimation into the mean teacher learning framework. Li et al. [181] introduced geometric-transformation consistent loss, which was integrated into the mean teacher learning framework [151] and applied to the semi-supervised segmentation of sin lesion, optic disc, and liver tumor. In the mean teacher framework, Zhou et al. [182] encouraged the predictions from the teacher and student networks to be consistent in both feature and semantic level under small perturbations. They applied their model to semi-supervised instance segmentation of cervical cells. In the teacher-student paradigm, Fotedar et al. [183] further considered consistency under extreme transformations, including a diverse set of intensity-based, geometric, and image mixing transformations, and conducted semi-supervised lesion segmentation and retinal vessel segmentation from skin and fundus images, respectively. Liu et al. [184] explicitly enforced the consistency of relationships among different samples under perturbations in the teacher-student framework. Instead of using the self-training strategy (i.e., the teacher-student paradigm), Bortsova et al. [48] enforced transformation consistency on both the labeled and unlabeled data within a Siamese network and achieved state-of-the-art performance on chest X-ray segmentation. A similar idea has been adopted in [185] to weakly supervised segmentation of covid-19 in CT images. For semi-supervised medical image segmentation, Peng et al. [186] further employed mutual-information-based clustering loss to explicitly enforce prediction consistency between nearby pixels in the unlabeled images and random perturbed unlabeled images. Fang and Li [187] developed a convolutional network with two decoder branches of different architectures and minimized the difference between soft masks generated by the two decoders. They applied their method to kidney tumor and brain tumor segmentation and showed promising results.

Self-supervised learning [188] - [190] (as shown in Fig. 9 ), a form of unsupervised learning, has been widely used to explore unlabeled data and has shown soaring performance on representation learning through discovering data-inherent patterns. Self-supervised learning leverages the unlabeled data with automatically generated supervisory signal and benefits the downstream tasks by self-supervised model pretraining. Then, the pretrained model and the leaned features are adapted to the target tasks of interest. Therefore, self-supervised learning aims to obtain a good representation of the training data without using any manual label. A popular solution is to learn useful features by introducing various pretext tasks, such as Jigsaw puzzles [189] , rotation prediction [191] , inpainting [192] , colorization [190] , relative position [188] , and a combination of a number of them, for the networks to solve, as demonstrated in Fig. 9 . In this way, unlabeled training data can also be leveraged to acquire generic knowledge under different concepts, which can be transferred to various downstream tasks. These pretext tasks share one common property: labels for the pretext task can be automatically generated based on a certain degree of image understanding. For semi-supervised medical image segmentation, Li et al. [47] proposed generating pseudo-label by recurrently optimizing the neural network with a self-supervised task, where Jigsaw puzzles were used as the pretext task. Tajbakhsh et al. [193] used three pretext tasks, i.e., rotation, reconstruction, and colorization, to pre-train a deep network for different medical image segmentation tasks in the context of having limited quantities of labeled training data. Taleb et al. [194] extended five different pretext tasks, including contrastive predictive coding, rotation prediction, Jigsaw puzzles, relative patch location, and exemplar networks to 3D context, and showed competitive results on brain tumor segmentation from 3D MRI and pancreas tumor segmentation from 3D CT images. In [195] , Taleb et al. introduced a multimodal puzzle task to pretrain a model from multi-modal images, which was then finetuned on a limited set of labeled data for the downstream segmentation task.

Generative adversarial learning was introduced by Goodfellow et al. [72] , which involves training two subnetworks, one serves as a discriminator that aims to identify whether a sample is drawn from true data or generated by the generator, and the other as a generator that aims to generate samples that are not distinguishable by the discriminator. The generator and discriminator are trained as a minimax two-player game. Ad-versarial training has been used in many applications, including fully-supervised image segmentation [196] - [198] , semisupervised segmentation [103] , [105] , [187] , [199] - [203] , and domain adaptive segmentation [11] , [41] , [204] - [207] . For semi-supervised segmentation tasks, a straightforward strategy [208] , [209] is to augment the standard segmentation (the generator) with a discriminator network designed to distinguish between the predicted segmentation and the ground truths and choose reliable pseudo labels on the unlabeled data. Zhang et al. [209] applied adversarial learning to biomedical image segmentation with a model consisting of two subnetworks: a segmentation network (generator) to conduct segmentation and an evaluation network (discriminator) to assess segmentation quality. Han et al. [198] introduced Spine-GAN, a recurrent Generative Adversarial Network, to segment multiple spinal structures from MRIs. For semi-supervised medical image segmentation, Nie et al. [208] followed a similar strategy introduced in [199] and utilized an adversarial network to select the trustworthy regions of unlabeled data to train the segmentation network. Generative adversarial learning has also been used as a data-space solution to the small data problem by directly synthesizing more realistic looking data. For the segmentation of unpaired multi-model cardiovascular volumes with limited training data, Zhang et al. [210] utilized a cycleconsistent adversarial network for training a cross-modality synthesis model, which can synthesize realistic looking 3D images. Cross-modality shape-consistency was enforced to guarantee the shape invariance of the synthetic images.

Few-shot segmentation (FSS) [211] aims at learning a model on base semantic classes but performing segmentation on novel semantic class with only k labeled images (i.e., kshot) of this unseen class without retraining the model. The k image-label pairs for the new class are typically referred to as the support set. Given the support set, FSS predicts a binary mask of the novel class for each query image. It is noteworthy that, in FSS, the base classes for model training are assumed to have sufficient labeled training data, and the novel class, i.e., the testing class, is not seen by the model during training. Although few-shot learning has shown promising performance for classification and detection, its application in segmentation is immensely challenging due to the need for pixel-wise prediction. A comprehensive review of fewshot learning (FSL) [212] has been provided in [213] . The application of FSL to semantic segmentation of natural images was initially introduced in [211] . Rather than fine-tuning the pre-trained model on the few support set as [214] , Shaban et al. [211] introduced a two-branched approach, where the first branch takes the labeled image as input and predicts in a single forward pass a set of parameters, which are used by the second branch to generate a prediction for a query image. It is noteworthy that fine-tuning a large network on a very small support set is prone to overfitting. Roy et al. [215] considered the few-shot segmentation of organs from medical volumetric images, where only a few annotated slices are available. Following the two-branch paradigm in [211] , they introduced strong interactions at multiple locations between the two branches by using Channel Squeeze & Spatial Excitation modules [216] , [217] , which is different from the one interaction at the final layer in [211] . Ouyang et al. [218] introduced a superpixel-based self-supervision technique for few-shot segmentation of medical images and showed the promising ability of generalization to unseen semantic classes.

In previous sections, we summarize popular techniques for semi-supervised segmentation of medical images. In summary, these methods address three crucial problems: 1) how to learn a reliable model from just a few labeled data without overfitting, 2) how to make the best use of the unlabeled data, and 3) how to use domain knowledge to learn a robust model with better generalization. Note that the three problems are not independent. The first problem can be easier to address when additional unlabeled data are available, or specific domain knowledge can be exploited.

The first problem can be partially addressed by data augmentation, curriculum learning, and transfer learning. Data augmentation is a simple yet effective data-space solution and artificially augment the labeled data. However, recent methods also exploit unlabeled data to capture real variations of the data. Data curriculum learning works on the data space by taking advantage of human knowledge about the training data. However, this method is not always effective. The transfer learning relies on the availability of external large benchmark datasets for model pretraining and can be regarded as a modelspace solution. However, the effectiveness of the transfer learning, that is, adapting the pretrained network to the current dataset, depends on the nature of the current dataset, such as the similarity of the benchmark dataset and the current dataset, and the size of the current dataset. Generally, when the similarity of the two datasets is high, and the size of the current dataset is small, the performance gain is significant. When transferring from natural image benchmarks to medical datasets, where the data similarity is relatively low, the benefit of transfer learning is not always significant [58] . Thus, pretraining on relevant domains and applying to the current domain with supervised or semi-supervised training, known as Domain Adaptation [219] or Domain generalization, has received growing attention. Please refer to [42] for comprehensive reviews of domain adaptation for semantic segmentation.

There are more methods that leverage unlabeled data, including the self-training, consistency regularization, adversarial learning, and self-supervised learning. The self-supervised learning strategy also follows the "pretrain, fine-tune" pipeline, but conducts model pretraining on the current unlabeled data in an unsupervised or self-supervised style, which is different from the supervised pretraining in transfer learning. In contrast, the consistency regularization strategy introduces unsupervised losses on the unlabeled data, and the unsupervised losses are jointly learned with the supervised loss on the labeled data. The self-training directly augments the labeled data through pseudo-labeling the unlabeled data.

There are many types of domain knowledge, such as anatomical priors by shape or atlas modeling, data or task priors by curriculum learning and transfer learning, and so on. Incorporating domain knowledge has proven to be effective in regularizing the model training and especially valuable in medical image segmentation. The few-shot learning aims to generalize from a few labeled examples with prior knowledge. Thus, it can help relieve the burden of data collecting and annotation and help learn from rare cases, which is crucial for biomedical applications.

While semi-supervised segmentation addresses the scenario that a small subset of the training data is fully annotated, partially-supervised segmentation refers to more challenging cases wherein partial annotations are available for all examples or a subset of examples. Obviously, a model that requires only partial annotations will further reduce the workloads of manual labeling. However, this problem is more challenging than semi-supervised learning.

Volume segmentation with sparsely annotated slices. For 3D medical image segmentation, uniformly sampled slices with annotations were used in [220] - [224] to train a 3D deep network model by assigning a zero weight to unannotated voxels in the loss function. Bai et al. [221] performed label propagation from annotated slices to unannotated slices based on non-rigid registration and introduce an exponentially weighted loss function for model training. Bitarafan et al. [223] considered a partially-supervised segmentation problem where only one 2D slice on each volume in the training data was annotated. They addressed this problem with a selftraining framework that alternatively generates pseudo labels and updates the 3D segmentation model. Specifically, they utilized the registration of consecutive 2D slices to propagate labels to unlabeled voxels. To segment 3D medical volumes with sparsely annotated 2D slices, Zheng et al. [224] utilized uncertainty-guided self-training to gradually boost the segmentation accuracy. Before training segmentation models with sparsely annotated slices, Zheng et al. [225] first identified the most influential and diverse slices for manual annotation with a deep network. After manual annotation of the selected slices, they conducted segmentation with a self-training strategy. Wang et al. [226] considered a 3D image training dataset with mixed types of annotations, i.e., image volumes with a few annotated consecutive slices, a few sparsely annotated slices, or full annotation. Under the self-training framework, they iteratively generated pseudo labels and updated the model with augmented labeled data. To take advantage of the incomplete annotated data, they introduced a hybrid loss, including a boundary regression loss on labeled data and a voxel classification loss on both labeled data and unlabeled data.

Segmentation with partially annotated regions. As sparse annotations, partially annotated regions or scribbles can provide the location and label information at a few pixels or partial regions. They have been widely used in segmentation tasks [29] , [227] - [230] , especially in the context of interactive segmentation [231] - [236] , where users give feedback to refine the segmentation iteratively. Scribbles have been recognized as a user-friendly way of interacting for both natural and medical image segmentations [6] , [29] , [227] , [231] . While scribblesupervised segmentation was usually addressed by optimizing a graphical model [29] or a variational model [31] , tackling this problem with deep networks has also been a hot topic.

Since medical images usually suffer from low tissue contrast, fuzzy boundaries, and image artifacts such as noise and intensity bias, an interactive strategy is also valuable for medical image segmentation. Zhang et al. [237] considered interactive medical image segmentation via a point-based interaction, where the physician should click on rough central points of the objects for each testing image. For MR image segmentation, Wang et al. [238] employed scribbles as user interaction to fine-tune coarse segmentation in the context of deep learning. Zhou et al. [239] introduced an interactive editing network trained using simulated user interactions to refine the existing segmentation. Liao et al. [240] proposed to solve the dynamic process of iteratively interactive image segmentation of 3D medical images with multi-agent reinforcement learning, where they treated each voxel as an agent with shared behaviors.

To reduce the annotation effort in the context of interactive segmentation, especially for instance segmentation, researchers have explored methods to suggest annotations [139] , [241] - [243] or select informative samples [244] . A promising solution is active learning, the process of selecting the examples or regions that need to get human labels. In this way, we can obtain a model that achieves the desired accuracy faster and with low annotation cost. The flowchart of active learning for interactive segmentation on a conceptual level is shown in Fig. 6 (b) . A comparison with self-training, which explores no human interaction, is also demonstrated. Yang et al. [241] combined the deep network model with active learning to identify the most representative and uncertain areas for annotation. In instance-level segmentation of pulmonary nodules, Wang et al. [139] utilized active learning to overcome the annotation bottleneck by querying the most confusing unannotated instances for manual annotation. The most confusing unannotated instances were identified with high-uncertainty. For breast cancer segmentation on immunohistochemistry images, Sourati et al. [243] introduced a new active learning method with Fisher information for deep networks to identify a small number of the most informative samples to be manually annotated. To achieve a rapid increase in the segmentation performance, Shen et al. [245] designed three criteria, i.e., dissatisfaction, representativeness, and diverseness, in the framework of active learning to select an informative subset for labeling, which can obviously reduce the cost of annotation. Given an initial segmentation, Wang et al. [235] used uncertainty estimation to identify a subset of slices that require user interactions.

Rather than performing interactive segmentation, Lin et al. [227] directly used the given sparse scribbles as the supervision to train a deep convolutional network for natural image segmentation. Tang et al. [228] investigated partiallysupervised segmentation with scribbles as annotations and introduced several regularized losses, including CRF [29] loss, high-order normalized cut loss, and kernel cut loss in the context of deep convolutional networks. In [229] , Tang et al. further introduced normalized cut loss. Ji et al. [246] investigated the segmentation of brain tumor substructures with whole tumor/normal brain scribbles and the image-level labels as supervision. To capture fine tumor boundaries, they augmented the segmentation network with a dense CRF [247] loss. For 3D instance segmentation in medical images, Zhao et al. [248] considered a mix of 3D bounding boxes for all instances and voxel-wise annotation for a small fraction of the instances. They addressed this problem with a cascade of two stages: an instance detection stage with boundingbox annotations and an instance segmentation stage with full annotations for a small number of instances.

For semantic segmentation of emphysema with both annotated and unannotated areas in training data, in the self-training framework, Peng et al. [249] utilized the similarities of deeply learned features between labeled and unlabeled areas to guide the label propagation to unannotated areas. Then, the selected regions with confident pseudo-labels are used to enrich the training data. For the segmentation of cancerous regions in gigapixel whole slide images (WSIs), Cheng et al. [230] considered a partially labeled scenario, where only partial cancer regions in WSIs were annotated by pathologists due to time constraints or misinterpretation. To tackle this problem, they integrated the idea of the teacher-student learning paradigm and self-similarity learning to enforce nearby patches in a WSI to be similar in feature space. A similar prediction ensemble strategy was used to generate pseudo labels, which were used to filter out noisy labels.

Dong et al. [250] considered neuron segmentation from macaque brain images, where both central points and rough masks were used as the supervision. Zheng et al. [251] proposed to use boundary scribble as the weak supervision for tumor segmentation, where boundary scribbles are coarse lesion edges. While boundary scribbles provide both location information about the lesions and more accurate boundary information than bounding boxes, they still lack information about accurate boundaries. For cell segmentation with scribble annotations, Lee and Jeong [252] proposed to generate reliable labels through the integration of pseudo-labeling and label filtering in the mean teacher framework [151] .

Segmentation with point annotations. An extremely sparse annotation is point annotation [253] , which is labeling only one point in each object as examplified in Fig. 10 . Point annotations [253] - [257] have also been considered to reduce the cost of manual labeling, which is especially useful for multiclass segmentation and instance-level segmentation. Point annotation is one of the fastest ways to label objects. As has shown in [253] , point annotation is significantly cheaper than dense pixel-level annotation. Despite its cost-efficiency, point annotations are extremely sparse annotations and only contain location information. Thus, most studies have utilized point annotations on object detection and counting tasks [258] - [260] , such as cell detection [260] , [261] and nuclei detection [262] . Yoo et al. [263] investigated nuclei segmentation with point annotations. Since point annotations do not contain nucleus boundary information, they augmented the segmentation network with an auxiliary network for edge detection with the Sobel filtered prediction map of the segmentation network as the supervision signal. To segment mitosis from breast histopathology images with centroid labels, Li et al. [264] expanded the single-pixel label to a novel label with concentric circles, where the inside circle was regarded as a mitotic region, and the regions outside the outer ring were regarded as non-mitosis. They introduced a concentric loss to make the segmentation network be trained only with the estimated labels in the inside circle and outside the outer circle. For nuclei segmentation, Qu et al. [256] , [265] addressed a more challenging case, where only sparse points annotation, i.e., only a small portion of nuclei locations in each image, were annotated with center points. Their method consists of two stages, the first stage conducts nuclei detection with a self-training strategy, and the second stage performs semisupervised segmentation with pseudo-labels generated with Voronoi diagram and k-means clustering. The dense CRF loss was utilized in training to refine the segmentation.

Multiclass segmentation from multiple few-class-labeled datasets. Segmentation of various anatomical structures from medical images, such as multi-organ segmentation, is a fundamental problem in medical image analysis and downstream clinical usage. However, besides the cost of data collection, obtaining sufficient multiclass annotations on a large dataset in itself is a labor-intensive and often impossible task. In contrast, there are various annotated datasets from different medical centers for their own clinical and research purposes but with missing annotations for several classes, or even only singleclass annotations available. Most public available medical image data are designed and annotated for specific clinical and research purpose, such as Sliver07 [23] for evaluation of liver segmentation [6] - [8] , [266] , LiTS [267] for evaluation of liver-tumor segmentation, KiTS [268] for evaluation of kidney-tumor segmentation [269] , LUNA [270] for lung nodule analysis [271] , [272] , and BRATS [273] for brain tumor analysis [2] , [3] . Thus, a significant challenge is how to learn a universal multi-class segmentation model from multiple partially annotated datasets with missing annotated classes.

Zhou et al. [274] first considered a partially-supervised multi-organ segmentation problem where a small fully labeled dataset and several partially-labeled datasets are available. They developed a prior-aware neural network that explicitly incorporated anatomical priors on abdominal organ sizes as domain-specific knowledge to guide the training process. Dmitriev et al. [275] further removed the need for the fully labeled data and investigated the problem of multi-class (e.g.,multi-organ) segmentation from single-class (e.g., singleorgan) labeled datasets [274] . They proposed to condition a single convolutional network for multi-class segmentation with non-overlapping single-class datasets for training. Concretely, they inserted the conditional information as an intermediate activation between convolutional operation and the activation function. Huang et al. [276] tackled the problem of partiallysupervised multi-organ segmentation in the co-training framework, where they collaboratively trained multiple networks, and each network was taught by other networks on unannotated organs. Yan et al. [277] proposed to develop a universal lesion detection algorithm to detect a comprehensive variety of lesions from multiple datasets with partial labels. Specifically, they introduced intra-patient lesion matching and cross-dataset lesion mining to address missing annotations and utilized feature sharing, proposal fusion, and annotation mining to integrate different datasets. Shi et al. [278] addressed partially-supervised multi-organ segmentation with two novel losses: the marginal loss that aims to merge all unlabeled organ pixels with the background label, and the exclusion loss that constrains multi-organs exclusive. To segment multiorgan and tumors from multiple partially labeled datasets, Zhang et al. [279] proposed an encoder-decoder network with a single but dynamic head, in which the kernels are generated adaptively by a controller, conditioned on both the input image and assigned task. Dong et al. [250] considered a special case where full labels for all classes are not available on the whole training set, but labels of different classes are available on different subsets. They addressed this problem with a data augmentation strategy by exploiting the assumption that patients share anatomical similarities.

Summary. Reducing annotation cost essentially echoes the real-world environments, where the annotations are often incomplete or even sparse. This section covers four types of partial annotations, including partially annotated slices for 3D images, partially annotated regions, point annotations, and multiple few-class-labeled datasets. As shown in previous sections, most methods to address these scenarios are based on self-training and regularization techniques. When collecting annotated data with a human in the loop, suggestive annotations can significantly reduce the annotation effort, mostly when extensive scale data should be annotated, or a large number of instances need annotations. Thus, a critical question is which data samples or image regions should be selected for annotations to achieve high-quality performance faster. This active learning paradigm, as exemplified in Fig. 6 (b) , has been an active research field, where more efforts are still needed.

Segmentation with inaccurate or imprecise annotations refers to the scenario where the ground truth labels are corrupted with (random, class-conditional or instance-conditional [280] , [281] ) noises, thus also referring to noisy label learning [282] , [283] . Imprecise boundaries, and mislabeling are also inaccurate annotations. Moreover, bounding-box annotations can be treated as annotations with inaccurate boundaries and mislabeled regions. Note that, as shown in [284] , boundarylocalized errors are more challenging than random label errors.

Learning from noisy labels has recently drawn much attention in many applications, including medical image analysis [25] , [285] , [286] . It is expensive and sometimes infeasible to obtain accurate labels, especially on medical imaging data where labeling requires domain expertise, and annotating huge-size imaging data is inherently a daunting task. In contrast, noisy labels such as those generated by non-experts [287] or computers [236] are easy to obtain. Moreover, it is impractical to manually correct the label errors, which is not only time-consuming bust also requires a stronger committee of experts. Karimi et al. [25] provided a review of the stateof-the-art deep learning methods (published in 2019 or earlier) in handling label noise. However, most approaches for dealing noisy (low-quality) annotations are developed for classification [288] - [292] and detection [293] . Herein, we focus on medical image segmentation with noise labels.

While a class of methods struggles to model and learn label noises [294] , [295] , other methods choose to select confidence examples, reducing the side effects without explicitly modeling the label noise [289] , [296] , [297] , such as the co-teaching paradigm [296] and reweighting strategy [294] . To eliminate the disturbance of segmentation from inaccurate labels, Zhu et al. [297] developed a label quality evaluation strategy with a deep neural network to automatically assess the label quality. They trained the segmentation model on examples with clean annotations. For chest X-ray segmentation with imperfect labels, Xue et al. [298] adopted a cascade strategy consisting of two stages: a sample selection stage, which selects the clean annotated examples as the co-teaching paradigm, and a label correction and model learning stage, which learns the segmentation model from both the corrected labels and original labels. To segment skin lesions from noisy annotations, Mirikharaji et al. [299] adopted a spatially adaptive reweighting approach to emphasize the learning from clean labels and reduce the side effect of noisy pixel-level annotations. A meta-learning was adopted to assign higher importance to pixels. Shu et al. [300] proposed to enhance supervision of noisy labels by capturing local visual saliency features, which are less affected by supervised signals from inaccurate labels. For noisy-labeled medical image segmentation, Zhang et al. [301] integrated confidence learning [302] , which can identify the label errors through estimating the joint distribution between the noisy annotations and the true (latent) annotations, into the teacher-student framework to identify the corrupted labels at pixel-level. Soft label correction based on spatial label smoothing regularization was also adopted to generate highquality labels. Rather than using fully manual annotations for vessel segmentation, Zhang et al. [236] proposed to learn the segmentation from noisy pseudo labels obtained from automatic vessel enhancement, which usually has system bias. To tackle this problem, they adopted improved selfpaced learning with online guidance from additional sparse manual annotations. The self-paced learning strategy enabled the model training to focus on easy pixels, which have a higher chance to have correct labels. To minimize manual annotations, they introduced a model-vesselness uncertainty estimation for suggestive annotation. To weaken the influence of the noise pseudo labels in semi-supervised segmentation, Min et al. [303] introduced a two-stream mutual attention network with hierarchical distillation, where the multiple attention layers were used to discover incorrect labels and indicate potentially incorrect gradients.

Segmentation with bounding-box annotations. A appealing weak supervision is bounding-box annotations [304] - [306] , which are easy to obtain and can yield confirmed information about backgrounds and rich information about the foreground. Moreover, the bounding box, as shown in Fig.  10 , can be simply represented by two corners, which allows light storage. Given the uncertainty of figure-ground separation within each bounding box [307] , one of the core tasks for bounding-box supervised segmentation is to generate accurate pseudo-labels. A popular pseudo-label generation approach is Grabcut [231] , which iteratively estimates the foreground and background's distributions and conducts segmentation with CRF models such as graph cut [29] . The iterative strategy that alternatively updates segmentation model parameters and pseudo labels have been widely used to address bounding box annotations [304] , [308] . In the context of natural image segmentation with deep networks, the BoxUp model [304] iterated between automatically generating region proposals and training convolutional networks. For fetal brain segmentation from MR images with bounding-box annotations, Rajchl et al. [308] introduced DeepCut model, an extension of the GrabCut method to estimate distributions by training a deep network classifier. Specifically, they iteratively optimized a denselyconnected CRF model and a deep convolutional network. Kervadec et al. [309] leveraged the classical bounding-box tightness prior [310] to regularize the output of deep segmentation network. Concretely, the bounding-box tightness prior was reformulated as a set of foreground constraints and a global background emptiness constraint, which enforced the regions outside bounding box to contain no foreground. They solved the introduced energy function with inequality constraints a sequence of unconstrained losses based on an extension of the log-barrier method. Wang et al. [311] investigated the segmentation of male pelvic organs in CT from 3D bounding box annotations, which was addressed with iterative learning of deep network model and pseudo labels. A label denoising module, which evaluated the consistency of predictions across consecutive iterations, was designed to identify the voxels with unreliable labels. Zheng et al. [251] proposed to use boundary scribble as the weak supervision for tumor segmentation, where boundary scribbles are coarse lesion edges. While boundary scribbles include locations of lesions and provide more accurate boundary information than bounding boxs, they still lack accurate information about boundaries.

Summary. Lower the requirement of precise annotations can also significantly reduce annotation efforts. In this section, we have reviewed two types of inaccurate annotations, i.e., noisy labels and boxing-box label. While the partial annotations reviewed in Section IV are reliable annotations for the positive classes (except for the background class), inaccurate annotations in this section refer to unreliable labels. For example, the noisy labels can be regarded as labels corrupted from ground truth labels; the bounding-box annotations, as shown in Fig. 10 (d) , contain both foreground pixels and background pixels. It is known that deep network models are susceptible to the presentation of label corruptions [25] , [312] . Thus, addressing label noises has gained increasing attention in recent years and has been a popular topic in top conference venues.

In this section, we discuss some ongoing or future directions of medical image segmentation with limited supervision.

Task-agnostic versus task-specific use of unlabeled data Semi-supervised segmentation methods partially differ in how to leverage unlabeled data. There are two typical ways to make use of unlabeled data: 1) the task-agnostic approach, which leverages unlabeled data through unsupervised or selfsupervised pretraining, such as the self-supervised learning strategy in Sec. III-H; 2) the task-specific approach, which jointly leverages the labeled and unlabeled data by enforcing a form of regularization, such as the consistency regularization strategy in Sec. III-G and self-training in Sec. III-E. While the task-agnostic approaches utilize the unlabeled data for unsupervised representation learning followed by supervised finetuning, the task-specific approaches use the unlabeled data to directly augment the labeled data through pseudo-labeling, or regularize the supervised model learning through consistency regularization. Both paradigms have shown promising results and received substantial attention in the fields of medical imaging and computer vision. Recently, an encouraging progress in self-supervised learning is the contrastive learning [313] , [314] , which formulates the task of discriminating similar and dissimilar things in the learning model. The Momentum Contrast model [314] with contrastive unsupervised pretraining outperformed its supervised pretrained counterpart in several natural image segmentation tasks. The contrastive learning strategy also has been used in medical image segmentation with limited annotations and has shown promising results [315] . However, the gap between the objectives of the selfsupervised pretraining and downstream segmentation task is non-negligible. More work in this direction is expected to push the boundaries on medical image segmentation tasks. Another promising direction is integrating the task-agnostic and taskspecific approaches in an elegant way. A possible solution is introduced in [316] . They first fine-tuned the unsupervised pretrained model and then distilled the model into a smaller one with the unlabeled data.

More constructive theoretical analysis is needed. Although diverse strategies, such as self-supervised learning, and curriculum learning, have been introduced and achieved promising results, more studies are needed to identify their mechanisms. For example, it is still unclear how to automatically design an adaptive curriculum for the given segmentation task instead of using a predefined curriculum [131] and when will curriculum-like strategies, especially data curriculum, benefit the deep model training. Possible promising directions for automatic curriculum design may be the self-pace paradigm [141] and teacher-student paradigm [317] . Whereas curriculum learning has achieved success on classification and detection tasks [78] , [135] - [137] , it has relatively limited applications on semi-supervised medical image segmentation tasks [138] , [139] . In addition to more extensive experimental results on diverse segmentation tasks, there is also a need for theoretical guarantees on their effectiveness [133] , [318] , which is the foundation for its application on specific tasks. A solid foundation is also needed for segmentation methods based on self-supervised learning, especially those using consistency loss or contrastive loss [319] .

Lightweight and efficient segmentation models are favorable. Deep and wide models are slow to train and, more importantly, they may easily overfit on datasets with limited annotation. Lightweight models with few parameters and few computational resource requirements are favorable for model training and deploying on computationally limited platforms, which may significantly improve the clinical application's efficiency [2] , [320] , [321] . Two strategies are usually employed: model compression [322] and efficient model architecture designs [2] , [320] , [323] . Moreover, model cascade, model ensemble, maintaining multiple networks, and combining them, such as the self-training strategy, are usually used, which inevitably increase system complexity and degrades training efficiency [2] . Thus, maintaining a more simple model system is challenging future direction.

Hyper-parameter searching is challenging. There are usually more hyper-parameters, such as the trade-off parameters, in segmentation methods with limited supervision. However, there are not enough labeled data for reliable hyperparameter searching, resulting in high-variance in performance. A possible solution is using meta-learning [324] , the goal of which is 'learning to learn better'. In other words, meta-learning seeks to improve the learning algorithm itself with either task-agnostic or task-specific prior knowledge and thus can improve both data and computational efficiency. Thus, there is a rapid growth in interest in meta-learning and its various applications, including medical image segmentation with limited supervision [325] . Utilizing meta-loss on a small set of labeled data has shown promising results in few-shot learning [324] , [326] , [327] .

Complex label noises are challenging. Label noises in real-world applications are usually a mix of several types of noises, such as class-dependent noise, instance-dependent noise, and adversarial noise, which tends to confuse models on ambiguous regions or instances. Thus, training models with the ability to tackle complex noises is valuable for real-world clinical applications. Karimi et al. [25] reviewed deep learning methods dealing with label noises for medical image analysis, where most of the representative studies are about medical image classification. However, the challenge of dealing with label noises is particularly significant in segmentation tasks since pixel-wise labeling of large datasets is resource-intensive and requires experts' domain knowledge. The limited imaging resolution also makes the annotators difficult to identify small objects and fuzzy boundaries. More label noises exist in the large number of annotations produced by non-experts or automatics labeling software with little human refitments. To analyze and address various kinds of label errors, an important thing is to construct large scale datasets with real noises, which in itself is a challenging task. Currently, most studies still use public datasets with simulated label perturbations [284] , [298] , [301] or private datasets [25] , [300] . Building up public benchmarks with real noises is crucial to make further breakthroughs, especially for clinical usage.

Learning to represent and integrate domain knowledge is still challenging. Although domain knowledge has dramatically boosted medical image segmentation methods, especially in settings with limited supervision, the selection and representation of prior knowledge are still challenging since it is usually highly dependent on the specific task. Xie et al. [21] summarized recent progress on integrating domain knowledge into deep learning models for medical image analysis. Moreover, translating the original representation of prior knowledge in clinical settings to the representations that are ready for the integration with deep networks is challenging.

In this review, we covered effective solutions for the segmentation of biomedical images with limited supervision, namely, semi-supervised segmentation, partially-supervised segmentation, and inaccurately-supervised segmentation. We reviewed a diverse set of methods for these problems. For semi-supervised segmentation, we provide a taxonomy of existing methods that are with the ability to leverage labeled data, unlabeled data, and also prior knowledge. For the task of partially-supervised segmentation, we considered segmentation with partially annotated regions, point annotations, or partially annotated slices, interactive segmentation, and multiclass segmentation from multiple partial-class-labeled datasets and shown the current technical status regarding recent solutions. For the task of inaccurately-supervised segmentation, we summarized the methods addressing noisy labels and bounding-box annotations. We also have discussed possible future directions for further studies.

State-of-the-art methods for brain tissue segmentation: A review

Hdc-net: Hierarchical decoupled convolution network for brain tumor segmentation

Automatic brain tumor segmentation using cascaded anisotropic convolutional neural networks

Deep learning for cardiac image segmentation: A review

Hdenseunet: hybrid densely connected unet for liver and tumor segmentation from ct volumes

Liver segmentation with constrained convex variational model

3d liver segmentation using multiple region appearances and graph cuts

Automatic 3d liver segmentation based on deep learning and globally optimized surface evolution

Deep learning in digital pathology image analysis: a survey

Mitochondria segmentation from em images via hierarchical structured contextual forest

Unsupervised mitochondria segmentation in em images via domain adaptive multi-task learning

Computational anatomy for multi-organ analysis in medical imaging: A review

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

Blood vessel segmentation algorithms-review of methods, datasets and evaluation metrics

A survey on deep learning in medical image analysis

Deep semantic segmentation of natural and medical images: A review

A review of deep learning in medical imaging: Image traits, technology trends, case studies with progress highlights, and future promises

Generative adversarial network in medical imaging: A review

Going deep in medical image analysis: Concepts, methods, challenges, and future directions

Deep neural network models for computational histopathology: A survey

A survey on domain knowledge powered deep learning for medical image analysis

A review on brain tumor segmentation of mri images

Comparison and evaluation of methods for liver segmentation from ct datasets

Image segmentation using deep learning: A survey

Deep learning with noisy labels: Exploring techniques and remedies in medical image analysis

A survey on deep learning of small sample in biomedical image analysis

Embracing imperfect datasets: A review of deep learning solutions for medical image segmentation

Semantic texton forests for image categorization and segmentation

Interactive graph cuts for optimal boundary & region segmentation of objects in nd images

Active shape models-their training and application

Algorithms for finding global minimizers of image segmentation and denoising models

Fully convolutional networks for semantic segmentation

Learning deconvolution network for semantic segmentation

U-net: Convolutional networks for biomedical image segmentation

Pyramid scene parsing network

Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs

Attention u-net: Learning where to look for the pancreas

Densely connected convolutional networks

A brief survey on semantic segmentation with deep learning

Deep semantic segmentation of natural and medical images: A review

Unsupervised domain adaptation in brain lesion segmentation with adversarial networks

Unsupervised domain adaptation in semantic segmentation: a review

A survey of zero-shot learning: Settings, methods, and applications

Semisupervised learning for network-based cardiac mr image segmentation

Uncertainty guided semi-supervised segmentation of retinal layers in oct images

Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation

Self-loop uncertainty: A novel pseudo-label for semi-supervised medical image segmentation

Semi-supervised medical image segmentation via learning consistency under transformations

Very deep convolutional networks for large-scale image recognition

Deep residual learning for image recognition

Imagenet: A large-scale hierarchical image database

The pascal visual object classes (voc) challenge

How transferable are features in deep neural networks?

Quo vadis, action recognition? a new model and the kinetics dataset

3-d convolutional encoder-decoder network for lowdose ct via transfer learning from a 2-d trained network

3d anisotropic hybrid network: Transferring convolutional features from 2d images to 3d anisotropic volumes

Convolutional neural networks for medical image analysis: Full training or fine tuning?

Transfusion: Understanding transfer learning for medical imaging

Rethinking pre-training and self-training

Models genesis: Generic autodidactic models for 3d medical image analysis

Universal model for 3d medical image analysis

Gradient-based learning applied to document recognition

Random erasing data augmentation

Improved regularization of convolutional neural networks with cutout

Generalizing to unseen domains via adversarial data augmentation

Towards deep learning models resistant to adversarial attacks

mixup: Beyond empirical risk minimization

Cutmix: Regularization strategy to train strong classifiers with localizable features

Mixup as locally linear out-ofmanifold regularization

Improving robustness of deep learning based knee mri segmentation: Mixup and adversarial domain adaptation

Dataset augmentation in feature space

Generative adversarial nets

A gaussian process model based generative framework for data augmentation of multi-modal 3d image volumes

Learning to segment brain anatomy from 2d ultrasound with less data

Em-net: Centerlineaware mitochondria segmentation in em images via hierarchical viewensemble convolutional network

Data augmentation using learned transformations for one-shot medical image segmentation

Realistic adversarial data augmentation for mr image segmentation

Automatic cnn-based detection of cardiac mr motion artefacts using k-space data augmentation and curriculum learning

Deep learning for cardiovascular risk stratification

Manifold mixup: Better representations by interpolating hidden states

Explaining and harnessing adversarial examples

Learning to compose domain-specific transformations for data augmentation

Adversarial data augmentation via deformation statistics

Anatomical data augmentation via fluid-based image registration

End-to-end adversarial retinal image synthesis

Semi-supervised and task-driven data augmentation

Towards cross-modal organ translation and segmentation: A cycle-and shape-consistent generative adversarial network

Eagans: edge-aware generative adversarial networks for cross-modality mr image synthesis

Conditional generative adversarial nets

Pulmonary nodule segmentation with ct sample synthesis using adversarial networks

Lt-net: Label transfer by learning reversible voxelwise correspondence for one-shot medical image segmentation

Deepatlas: Joint semi-supervised learning of image registration and segmentation

Bb-unet: U-net with bounding box prior

Anatomically constrained neural networks (acnns): application to cardiac image enhancement and segmentation

Anatomical priors in convolutional networks for unsupervised biomedical segmentation

Anatomical priors for image segmentation via post-processing with denoising autoencoders

V-net: Fully convolutional neural networks for volumetric medical image segmentation

Multi-atlas segmentation of biomedical images: a survey

A review of atlas-based segmentation for magnetic resonance brain images

Semi-supervised deep learning of brain tissue segmentation

A deep learning framework for unsupervised affine and deformable image registration

Deep learning based medical image segmentation with limited labels

Dpa-densebiasnet: Semisupervised 3d fine renal artery segmentation with dense biased network and deep priori anatomy

Deep atlas network for efficient 3d left ventricle segmentation on echocardiography

Semi-supervised segmentation of liver using adversarial learning with deep atlas prior

Atlasnet: multi-atlas non-linear deep networks for medical image segmentation

Postdae: Anatomically plausible segmentation via post-processing with denoising autoencoders

Star shape prior in fully convolutional networks for skin lesion segmentation

Automatic 3d bi-ventricular segmentation of cardiac images by a shape-refined multi-task deep learning approach

Tetris: template transformer networks for image segmentation with shape priors

Contour transformer network for one-shot segmentation of anatomical structures

Shape constrained cnn for cardiac mr segmentation with simultaneous prediction of shape and pose parameters

Cardiac segmentation from lge mri using deep neural network incorporating shape and spatial priors

Automated segmentation of knee bone and cartilage combining statistical shape knowledge and convolutional neural networks: Data from the osteoarthritis initiative

Cardiac segmentation with strong anatomical guarantees

Learning and incorporating shape models for semantic segmentation

Convolutional neural network with shape prior applied to cardiac mri segmentation

Deep convolutional neural networks with spatial regularization, volume and star-shape priori for image segmentation

Star-convex polyhedra for 3d object detection and segmentation in microscopy

Combining convolutional neural networks and star convex cuts for fast whole spine vertebra segmentation in mri

Cell detection with star-convex polygons

Convex shape prior for deep neural convolution network based eye fundus images segmentation

Topology aware fully convolutional networks for histology gland segmentation

Medical knowledge constrained semantic breast ultrasound image segmentation

Explicit topological priors for deep-learning based image segmentation using persistent homology

A topological loss function for deep-learning based image segmentation using persistent homology

A persistent homology-based topological loss function for multi-class cnn segmentation of cardiac mri

Constrained-cnn losses for weakly supervised segmentation

Constrained domain adaptation for segmentation

Learning active contour models for medical image segmentation

Curriculum learning

Curriculum learning of multiple tasks

On the power of curriculum learning in training deep networks

How do humans teach: On curriculum learning and teaching dimension

Attention-guided curriculum learning for weakly supervised classification and localization of thoracic diseases on chest radiographs

Egdcl: An adaptive curriculum learning framework for unbiased glaucoma diagnosis

Medicalbased deep curriculum learning for improved fracture classification

Cased: curriculum adaptive sampling for extreme data imbalance

Deep active self-paced learning for accurate pulmonary nodule segmentation

Curriculum learning for annotation-efficient medical image analysis: scheduling data with prior knowledge and uncertainty

Selfpaced curriculum learning

Self-paced learning for latent variable models

Instance-aware semantic segmentation via multi-task network cascades

ω-net (omega-net): fully automatic, multi-view cardiac mr detection, orientation, and segmentation with deep neural networks

Curriculum domain adaptation for semantic segmentation of urban scenes

Curriculum semisupervised segmentation

Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks

Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study

Signet ring cell detection with a semi-supervised learning framework

Temporal ensembling for semi-supervised learning

Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep 20 learning results

Inf-net: Automatic covid-19 lung infection segmentation from ct images

Dropout as a bayesian approximation: Representing model uncertainty in deep learning

Aleatory or epistemic? does it matter

Weight uncertainty in neural networks

Can you trust your model's uncertainty? evaluating predictive uncertainty under dataset shift

What uncertainties do we need in bayesian deep learning for computer vision

Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation

Mild-net: minimal information loss dilated network for gland instance segmentation in colon histology images

Assessing reliability and challenges of uncertainty estimations for medical image segmentation

Uncertainty aware temporal-ensembling model for semi-supervised abus mass segmentation

Bayesian uncertainty matching for unsupervised domain adaptation

Confidence calibration and predictive uncertainty estimation for deep medical image segmentation

Aleatoric uncertainty estimation with test-time augmentation for medical image segmentation with convolutional neural networks

Automated muscle segmentation from clinical ct using bayesian unet for personalized musculoskeletal modeling

Towards bayesian deep learning: A framework and some existing methods

Uncertainty estimates as data selection criteria to boost omnisupervised learning

Bayesian uncertainty estimation for batch normalized deep networks

Simple and scalable predictive uncertainty estimation using deep ensembles

Quantitative comparison of monte-carlo dropout uncertainty measures for multi-class segmentation

Test-time data augmentation for estimation of heteroscedastic aleatoric uncertainty in deep neural networks

Combining labeled and unlabeled data with co-training

A new analysis of co-training

Deep cotraining for semi-supervised image segmentation

Deep cotraining for semi-supervised image recognition

Semi-supervised multi-organ segmentation via multi-planar cotraining

Uncertainty-aware multi-view co-training for semi-supervised medical image segmentation and domain adaptation

Regularization with stochastic transformations and perturbations for deep semi-supervised learning

Learning with pseudoensembles

Semi-supervised brain lesion segmentation with an adapted mean teacher model

Transformation-consistent self-ensembling model for semisupervised medical image segmentation

Deep semi-supervised knowledge distillation for overlapping cervical cell instance segmentation

Extreme consistency: Overcoming annotation scarcity and domain shifts

Semi-supervised medical image classification with relation-driven self-ensembling model

A weakly supervised consistency-based learning method for covid-19 segmentation in ct images

Mutual information deep regularization for semi-supervised segmentation

Dmnet: Difference minimization network for semi-supervised segmentation in medical images

Unsupervised visual representation learning by context prediction

Unsupervised learning of visual representations by solving jigsaw puzzles

Learning representations for automatic colorization

Unsupervised representation learning by predicting image rotations

Context encoders: Feature learning by inpainting

Surrogate supervision for medical image analysis: Effective deep learning from limited quantities of labeled data

3d self-supervised methods for medical imaging

Multimodal selfsupervised learning for medical image analysis

Semantic segmentation using adversarial networks

Automatic liver segmentation using an adversarial image-to-image network

Spine-gan: Semantic segmentation of multiple spinal structures

Adversarial learning for semi-supervised semantic segmentation

Semi and weakly supervised semantic segmentation using generative adversarial network

Semi-supervised segmentation of lesion from breast ultrasound images with attentional generative adversarial network

Good semi-supervised learning that requires a bad gan

Shape-aware semi-supervised 3d semantic segmentation for medical images

Unsupervised cross-modality domain adaptation of convnets for biomedical image segmentations with adversarial loss

Pnp-adanet: Plug-and-play adversarial domain adaptation network at unpaired cross-modality cardiac segmentation

Patch-based output space adversarial learning for joint optic disc and cup segmentation

Unsupervised bidirectional cross-modality adaptation via deeply synergistic image and feature alignment for medical image segmentation

Asdnet: Attention based semi-supervised deep networks for medical image segmentation

Deep adversarial networks for biomedical image segmentation utilizing unannotated images

Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network

One-shot learning for semantic segmentation

One-shot learning of object categories

Generalizing from a few examples: A survey on few-shot learning

One-shot video object segmentation

squeeze & excite' guided few-shot segmentation of volumetric images

Concurrent spatial and channel 'squeeze & excitation' in fully convolutional networks

Squeeze-and-excitation networks

Selfsupervision with superpixels: Training few-shot medical image segmentation without annotation

A theory of learning from different domains

3d u-net: learning dense volumetric segmentation from sparse annotation

Recurrent neural networks for aortic image sequence segmentation with sparse annotations

A sparse annotation strategy based on attention-guided active learning for 3d medical image segmentation

3d image segmentation with sparse annotation by self-training and internal registration

Cartilage segmentation in highresolution 3d micro-ct images via uncertainty-guided self-training with very sparse annotation

An annotation sparsification strategy for 3d medical image segmentation via representative selection and self-training

Ct male pelvic organ segmentation via hybrid loss network with incomplete annotation

Scribblesup: Scribblesupervised convolutional networks for semantic segmentation

On regularized losses for weakly-supervised cnn segmentation

Normalized cut loss for weakly-supervised cnn segmentation

Self-similarity student for partial label histopathology image segmentation

grabcut" interactive foreground extraction using iterated graph cuts

Content-aware multi-level guidance for interactive instance segmentation

Deep learning assisted image interactive framework for brain image segmentation

Deepigeos: a deep interactive geodesic framework for medical image segmentation

Uncertainty-guided efficient interactive refinement of fetal brain segmentation from stacks of mri slices

Weakly supervised vessel segmentation in x-ray angiograms by self-paced learning from noisy labels with suggestive annotation

Interactive medical image segmentation via a point-based interaction

Interactive medical image segmentation using deep learning with image-specific fine tuning

Interactive deep editing framework for medical image segmentation

Iteratively-refined interactive 3d medical image segmentation with multi-agent reinforcement learning

Suggestive annotation: A deep active learning framework for biomedical image segmentation

Deep active learning for joint classification & segmentation with weak annotator

Intelligent labeling based on fisher information for medical image segmentation using deep learning

Efficient active learning for image classification and segmentation using a sample selection and conditional generative adversarial network

Deep active learning for breast cancer segmentation on immunohistochemistry images

Scribble-based hierarchical weakly supervised learning for brain tumor segmentation

Efficient inference in fully connected crfs with gaussian edge potentials

Deep learning based instance segmentation in 3d biomedical images using weak annotation

Semi-supervised learning for semantic segmentation of emphysema with partial annotations

Towards neuron segmentation from macaque brain images: A weakly supervised approach

Weakly supervised deep learning for breast cancer segmentation with coarse annotations

Scribble2label: Scribble-supervised cell segmentation via self-generating pseudo-labels with consistency

What's the point: Semantic segmentation with point supervision

Learning to segment medical images with scribblesupervision alone

Iterative multi-path tracking for video and volume segmentation with sparse point supervision

Weakly supervised deep nuclei segmentation using partial points annotation in histopathology images

Nuclei segmentation using mixed points and masks selected from uncertainty

Where are the blobs: Counting by localization with point supervision

Point in, box out: Beyond counting persons in crowds

Renal cell carcinoma detection and subtyping with minimal point-based annotation in whole-slide images

You should use regression to detect cells

Sfcn-opi: Detection and fine-grained classification of nuclei using sibling fcn with objectness prior interaction

Pseudoedgenet: Nuclei segmentation only with point annotations

Weakly supervised mitosis detection in breast histopathology images using concentric loss

Weakly supervised deep nuclei segmentation using points annotation in histopathology images

Automatic 3d liver location and segmentation via convolutional neural network and graph cut

The liver tumor segmentation benchmark (lits)

The kits19 challenge data: 300 kidney tumor cases with clinical context, ct semantic segmentations, and surgical outcomes

The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge

Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: the luna16 challenge

Semi-supervised adversarial model for benign-malignant lung nodule classification on chest ct

A two-stage convolutional neural networks for lung nodule detection

The multimodal brain tumor image segmentation benchmark (brats)

Prior-aware neural network for partially-supervised multi-organ segmentation

Learning multi-class segmentations from single-class datasets

Multi-organ segmentation via co-training weight-averaged models from few-organ datasets

Learning from multiple datasets with heterogeneous and partial labels for universal lesion detection in ct

Marginal loss and exclusion loss for partially supervised multi-organ segmentation

Dodnet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

Learning from binary labels with instance-dependent noise

Learning with bounded instance and label-dependent label noise

Learning with noisy labels

Learning from noisy examples

Imperfect segmentation labels: How much do they matter?" in Intravascular Imaging and Computer Assisted Stenting and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis

Pancreatic cancer detection in whole slide images using noisy label annotations

Deep label distribution learning with label ambiguity

Robust medical image segmentation from non-expert annotations with tri-network

Training a neural network based on unreliable human annotation of medical images

Robust learning at noisy labeled medical images: Applied to skin lesion classification

Masking: A new perspective of noisy supervision

Deep learning from small amount of medical data with noisy labels: A metalearning approach

Breast tumor classification through learning from noisy labeled ultrasound images

Learning to detect brain lesions from noisy annotations

Learning to reweight examples for robust deep learning

Modelling class noise with symmetric and asymmetric distributions

Co-teaching: Robust training of deep neural networks with extremely noisy labels

Pick-and-learn: Automatic quality evaluation for noisy-labeled image segmentation

Cascaded robust learning at imperfect labels for chest x-ray segmentation

Learning to segment skin lesions from noisy annotations

Lvc-net: Medical image segmentation with noisy label based on local visual cues

Characterizing label errors: Confident learning for noisylabeled image segmentation

Confident learning: Estimating uncertainty in dataset labels

A two-stream mutual attention network for semi-supervised biomedical segmentation with noisy labels

Boxsup: Exploiting bounding boxes to supervise convolutional networks for semantic segmentation

Weaklyand semi-supervised learning of a deep convolutional network for semantic image segmentation

Simple does it: Weakly supervised instance and semantic segmentation

Weakly supervised instance segmentation using the bounding box tightness prior

Deepcut: Object segmentation from bounding box annotations using convolutional neural networks

Bounding boxes for weakly supervised segmentation: global constraints get close to full supervision

Image segmentation with a bounding box prior

Iterative label denoising network: Segmenting male pelvic organs in ct from 3d bounding box annotations

Adversarial attacks and defences: A survey

A simple framework for contrastive learning of visual representations

Momentum contrast for unsupervised visual representation learning

Contrastive learning of global and local features for medical image segmentation with limited annotations

Big self-supervised models are strong semi-supervised learners

Teacher-student curriculum learning

Theory of curriculum learning, with convex loss functions

A theoretical analysis of contrastive unsupervised representation learning

Mo-bilenetv2: Inverted residuals and linear bottlenecks

Lightweight deep learning models for detecting covid-19 from chest x-ray images

Model compression and acceleration for deep neural networks: The principles, progress, and challenges

Shufflenet: An extremely efficient convolutional neural network for mobile devices

Optimization as a model for few-shot learning

Domain generalizer: A fewshot meta learning framework for domain generalization in medical imaging

Differentiable meta-learning model for few-shot semantic segmentation

Meta-learning for semisupervised few-shot classification