Image segmentation is a fundamental problem in computer vision and has been studied for decades. It is also an essential preliminary step for quantitative biomedical image analysis and computer-aided diagnoses and studies. Recently, deep learning (DL) based methods have witnessed huge success on various image analysis tasks in terms of accuracy and generality. However, it is not straightforward to apply known semantic segmentation algorithms directly to biomedical images due to different imaging techniques and special application scenarios (e.g., volumetric images, multi-modal data, small amounts of annotated data, domain knowledge from experts). In this dissertation, I develop new deep learning methods to save annotation efforts, improve model efficacy, and generalize well to different biomedical image segmentation tasks. First, I introduce advanced model architectures and training algorithms to make use of abundant 3D information from volumetric images (e.g., MR and CT images) for delineating detailed structures. The heterogeneous feature aggregation network utilizes anisotropic 3D convolutional kernels to explicitly extract and fuse contextual information from orthogonal geometric views. I also devise a new ensemble learning framework to unify the merits of both 2D and 3D DL models and boost segmentation performance significantly. Second, I introduce the representative annotation method to only select the most effective areas/samples for annotation, thus saving manual efforts. Our method decouples the selection process from the segmentation process and makes a one-shot suggestion. It can achieve comparable performance to full annotation and active learning based methods. Third, noticing that using sparse annotation leads to huge performance degradation, I introduce two semi-supervised methods to leverage unlabeled images and utilize automatically generated labels (i.e., pseudo labels) in model training. Specifically, I propose combining representative annotation and ensemble learning to bridge the performance gap compared with full annotation methods. I also propose a method to estimate the uncertainty of pseudo labels and use them to guide iterative self-training. Fourth, I present a new self-supervised learning framework to extract generic knowledge directly from unlabeled data and demonstrate its high robustness and efficiency on diverse downstream segmentation tasks.