key: cord-0181902-lnrygyrj authors: Sun, Li; Yu, Ke; Batmanghelich, Kayhan title: Context Matters: Graph-based Self-supervised Representation Learning for Medical Images date: 2020-12-11 journal: nan DOI: nan sha: 76d2a53b3e6c8db7a0139e0e4da671f5b0846bf4 doc_id: 181902 cord_uid: lnrygyrj Supervised learning method requires a large volume of annotated datasets. Collecting such datasets is time-consuming and expensive. Until now, very few annotated COVID-19 imaging datasets are available. Although self-supervised learning enables us to bootstrap the training by exploiting unlabeled data, the generic self-supervised methods for natural images do not sufficiently incorporate the context. For medical images, a desirable method should be sensitive enough to detect deviation from normal-appearing tissue of each anatomical region; here, anatomy is the context. We introduce a novel approach with two levels of self-supervised representation learning objectives: one on the regional anatomical level and another on the patient-level. We use graph neural networks to incorporate the relationship between different anatomical regions. The structure of the graph is informed by anatomical correspondences between each patient and an anatomical atlas. In addition, the graph representation has the advantage of handling any arbitrarily sized image in full resolution. Experiments on large-scale Computer Tomography (CT) datasets of lung images show that our approach compares favorably to baseline methods that do not account for the context. We use the learnt embedding to quantify the clinical progression of COVID-19 and show that our method generalizes well to COVID-19 patients from different hospitals. Qualitative results suggest that our model can identify clinically relevant regions in the images. While deep neural network trained by the supervised approach has made breakthroughs in many areas, its performance relies heavily on large-scale annotated datasets. Learning informative representation without human-crafted labels has achieved great success in the computer vision domain (Wu et al. 2018; Chen et al. 2020a; He et al. 2020) . Importantly, the unsupervised approach has the capability of learning robust representation since the features are not optimized towards solving a single supervised task. Self-supervised learning has emerged as a powerful way of unsupervised learning. It derives input and label from an unlabeled dataset and formulates heuristics-based pretext tasks to train a model. Contrastive learning, a more principled variant of self-supervised learning, relies on instance discrimination (Wu et al. 2018) or contrastive predictive coding (CPC) (Oord, Li, and Vinyals 2018) . It has achieved state-of-the-art performance in many aspects, and can produce features that are comparable to those produced by supervised methods (He et al. 2020; Chen et al. 2020a ). However, for medical images, the generic formulation of self-supervised learning doesn't incorporate domainspecific anatomical context. For medical imaging analysis, a large-scale annotated dataset is rarely available, especially for emerging diseases, such as COVID-19. However, there are lots of unlabeled data available. Thus, self-supervised pre-training presents an appealing solution in this domain. There are some existing works that focus on self-supervised methods for learning image-level representations. proposed to learn image semantic features by restoring computerized tomography (CT) images from the corrupted input images. (Taleb et al. 2019 ) introduced puzzle-solving proxy tasks using multi-modal magnetic resonance images (MRI) scans for representation learning. (Bai et al. 2019) proposed to learn cardiac MR image features from anatomical positions automatically defined by cardiac chamber view planes. Despite their success, current methods suffer from two challenges: (1) These methods do not account for anatomical context. For example, the learned representation is invariant with respect to body landmarks which are highly informative for clinicians. (2) Current methods rely on fix-sized input. The dimensions of raw volumetric medical images can vary across scans due to the differences in subjects' bodies, machine types, and operation protocols. The typical approach for pre-processing natural images is to either resize the image or crop it to the same dimensions, because the convolutional neural network (CNN) can only handle fixed dimensional input. However, both approaches can be problematic for medical images. Taking chest CT for example, reshaping voxels in a CT image may cause distortion to the lung (Singla et al. 2018) , and cropping images may introduce undesired artifacts, such as discounting the lung volume. To address the challenges discussed above, we propose a novel method for context-aware unsupervised representation learning on volumetric medical images. First, in order to incorporate context information, we represent a 3D im-age as a graph of patches centered at landmarks defined by an anatomical atlas. The graph structure is informed by anatomical correspondences between the subject's image and the atlas image using registration. Second, to handle different sized images, we propose a hierarchical model which learns anatomy-specific representations at the patch level and learns subject-specific representations at the graph level. On the patch level, we use a conditional encoder to integrate the local region's texture and the anatomical location. On the graph level, we use a graph convolutional network (GCN) to incorporate the relationship between different anatomical regions. Experiments on a publicly available large-scale lung CT dataset of Chronic Obstructive Pulmonary Disease (COPD) show that our method compares favorably to other unsupervised baselines and outperforms supervised methods on some metrics. We also show that features learned by our proposed method outperform other baselines in staging lung tissue abnormalities related to COVID-19. Our results show that the pre-trained features on large-scale lung CT datasets are generalizable and transfer well to COVID-19 patients from different hospitals. Our code and supplementary material are available at https://github.com/batmanlab/Context Aware SSL In summary, we make the following contributions: • We introduce a context-aware self-supervised representation learning method for volumetric medical images. The context is provided by both local anatomical profiles and graph-based relationship. • We introduce a hierarchical model that can learn both local textural features on patch and global contextual features on graph. The multi-scale approach enables us to handle arbitrary sized images in full resolution. • We demonstrate that features extracted from lung CT scans with our method have a superior performance in staging lung tissue abnormalities related with COVID-19 and transfer well to COVID-19 patients from different hospitals. • We propose a method that provides task-specific explanation for the predicted outcome. The heatmap results suggest that our model can identify clinically relevant regions in the images. Our method views images of every patient as a set of nodes where nodes correspond to image patches covering the lung region of a patient. Larger lung (image) results in more spread out patches. We use image registration to an anatomical atlas to maintain the anatomical correspondences between nodes. The edge connecting nodes denote neighboring patches after applying the image deformation derived from image registration. Our framework consists of two levels of self-supervised learning, one on the node level (i.e., patch level) and the second one on the graph level (i.e., subject level). In the following, we explain each component separately. The schematic is shown in Fig. 1 . We use X i to denote the image of patient i. To define a standard set of anatomical regions, we divide the atlas image into a set of N equally spaced 3D patches with some overlap. We use {p j } N j to denote the center coordinates of the patches in the Atlas coordinate system. We need to map {p j } N j to their corresponding location for each patient. This operation requires transformations that densely map every coordinate of the Atlas to the coordinate on patients. To find the transformation, we register a patient's image to an anatomical atlas by solving the following optimization problem: where Sim is a similarity metric (e.g., 2 norm), φ i (·) is the fitted subject-specific transformation, Reg(φ i ) is a regularization term to ensure the transformation is smooth enough. The φ i maps the coordinate of the patient i to the Atlas. After solving this optimization for each patient, we can use the inverse of this transformation to map {p j } N j to each subject (i.e., {φ −1 i (p j )}). We use well-established image registration software ANTs (Tustison et al. 2014) to ensure the inverse transformation exists. To avoid clutter in notation, we use p j i as a shorthand for φ −1 i (p j ). In this way, patches with the same index across all subjects map to the same anatomical region on the atlas image: To incorporate the relationship between different anatomical regions, we represent an image as a graph of patches (nodes), whose edge connectivity is determined by the Euclidean distance between patches' centers. With a minor abuse of notation, we let V i = {x j i } N j denote the set of patches that cover the lung region of subject i. More formally, the image X i is represented as G i = (V i , E i ), where V i is node (patch) information and E i denotes the set of edges. We use an adjacency matrix A i ∈ N × N to represent E i , defined as: where dist(·, ·) denotes the Euclidean distance; p j i and p k i ∈ R 3 are the coordinates of centers in patches x j i and x k i , respectively; ρ is the threshold hyper-parameter that controls the density of graph. Local anatomical variations provide valuable information about the health status of the tissue. For a given anatomical region, a desirable method should be sensitive enough to detect deviation from normal-appearing tissue. In addition, the anatomical location of lesion plays a role in patients' survival outcomes, and the types of lesion vary across different anatomical locations in lung. In order to extract anatomyspecific features, we adopt a conditional encoder E(·, ·) that takes both patch x j i and its location index j as input. It is Query Figure 1 : Schematic diagram of the proposed method. We represent every image as a graph of patches. The context is imposed by anatomical correspondences among patients via registration and graph-based hierarchical model used to incorporate the relationship between different anatomical regions. We use a conditional encoder E(·, ·) to learn patch-level textural features and use graph convolutional network G(·, ·) to learn graph-level representation through contrastive learning objectives. The detailed architecture of the networks are presented in Supplementary Material. composed with a CNN feature extractor C(·) and a MLP head f l (·), thus we have the encoded patch-level feature: where denotes concatenation, . We adopt the InfoNCE loss (Oord, Li, and Vinyals 2018), a form of contrastive loss to train the conditional encoder on the patch level: where q j i denotes the representation of query patch x j i , k + and k − denotes the representation of the positive and the negative key respectively, and τ denotes the temperature hyper-parameter. We obtain a positive sample pair by generating two randomly augmented views from the same query patch x j i , and obtain a negative sample by augmenting the patch x j v at the same anatomical region j from a random subject v, specifically: We adopt the Graph Convolutional Network (GCN) (Duvenaud et al. 2015 ) to summarize the patch-level (anatomyspecific) representation into the graph-level (subjectspecific) representation. We consider each patch as one node in the graph, and the subject-specific adjacent matrix determines the connection between nodes. Specifically, the GCN model G(·, ·) takes patch-level representation H i and adjacency matrix A i as inputs, and propagates information across the graph to update node-level features: ) is a N × F matrix containing F features for all N nodes in the image of the subject i, and W is a learnable projection matrix, σ is a nonlinear activation function. We then obtain subject-level representation by global average pooling all nodes in the graph followed by a MLP head f g : We adopt the InfoNCE loss to train the GCN on the graph level: where r j i denotes the representation of the entire image X i , t + and t − denotes the representation of the positive and the negative key respectively, and τ denotes the temperature hyper-parameter. To form a positive pair, we take two views of the same image X i under random augmentation at patch level. We obtain a negative sample by randomly sample a different image X v , specifically: The model is trained in an end-to-end fashion by integrating the two InfoNCE losses obtained from patch level and graph level. We define the overall loss function as follows: Since directly backpropagating gradients from L g (G) to the parameters in the conditional encoder E is unfeasible due to the excessive memory footprint accounting for a large number of patches, we propose an interleaving algorithm that alternates the training between patch level and graph level to solve this issue. The algorithmic description of the method is shown below: Algorithm 1 Interleaving update algorithm Understanding how the model makes predictions is important to build trust in medical imaging analysis. In this section, we propose a method that provides task-specific explanation for the predicted outcome. Our method is expanded from the class activation maps (CAM) proposed by (Zhou et al. 2016) . Without loss of generality, we assume a Logistic regression model is fitted for a downstream binary classification task (e.g., the presence or absence of a disease) on the extracted subject-level features S i . The log-odds of the target variable that Y i = 1 is: where S i = Pool(H i ), the MLP head f g in Eq. 7 is discarded when extracting features for downstream tasks following the practice in (Chen et al. 2020a,b) , β and W are the learned logistic regression weights. Then we have M j i = W h j i as the activation score of the anatomical region j to the target classification. We use a sigmoid function to normalize {M j i } N j , and use a heatmap to show the discriminative anatomical regions in the image of subject i. We train the proposed model for 30 epochs. We set the learning rate to be 3×10 −2 . We also employed momentum = 0.9 and weight decay = 1 × 10 −4 in the Adam optimizer. The patch size is set as 32×32×32. The batch size at patch level and subject level is set as 128 and 16, respectively. We let the representation dimension F be 128. The lung region is extracted using lungmask (Hofmanninger et al. 2020) . Following the practice in MoCo, we maintain a queue of data samples and use a momentum update scheme to increase the number of negative samples in training; as shown in previous work, it can improve performance of downstream task (He et al. 2020 ). The number of negative samples during training is set as 4096. The data augmentation includes random elastic transform, adding random Gaussian noise, and random contrast adjustment. The temperature τ is chosen to be 0.2. There are 581 patches per subject/graph, this number is determined by both the atlas image size and two hyperparameters, patch size and step size. The experiments are performed on 2 GPUs, each with 16GB memory. The code is available at https://github.com/batmanlab/Context Aware SSL. Unsupervised learning aims to learn meaningful representations without human-annotated data. Most unsupervised learning methods can be classified into generative and discriminative approaches. Generative approaches learn the distribution of data and latent representation by generation. These methods include adversarial learning and autoencoder based methods. However, generating data at pixel space can be computationally intensive, and generating fine detail may not be necessary for learning effective representation. Discriminative approaches use pre-text tasks for representation learning. Different from supervised approaches, both the inputs and labels are derived from an unlabeled dataset. Discriminative approaches can be grouped into (1) pre-text tasks based on heuristics, including solving jigsaw puzzles (Noroozi and Favaro 2016) , context predication (Doersch, Gupta, and Efros 2015), colorization (Zhang, Isola, and Efros 2016) and (2) contrastive methods. Among them, contrastive methods achieve state-of-the-art performance in many tasks. The core idea of contrastive learning is to bring different views of the same image (called 'positive pairs') closer, and spread representations of views from different images (called 'negative pairs'). The similarity is measured by the dot product in feature space (Wu et al. 2018) . Previous works have suggested that the performance of contrastive learning relies on large batch size (Chen et al. 2020a ) and large number of negative samples (He et al. 2020 ). Graphs are a powerful way of representing entities with arbitrary relational structure (Battaglia et al. 2018) . Several algorithms proposed to use random walk-based methods for unsupervised representation learning on the graph (Grover and Leskovec 2016; Perozzi, Al-Rfou, and Skiena 2014; Hamilton, Ying, and Leskovec 2017) . These methods are powerful but rely more on local neighbors than structural information (Ribeiro, Saverese, and Figueiredo 2017) . Graph convolutional network (GCN) (Duvenaud et al. 2015; Kipf and Welling 2016) was proposed to generalize convolutional neural networks to work on the graphs. Recently, Deep Graph Infomax (Velickovic et al. 2019 ) was proposed to learn node-level representation by maximizing mutual information between patch representations and corresponding high-level summaries of graphs. We evaluate the performance of the proposed model on two large-scale datasets of 3D medical images. We compare our model with various baseline methods, including both supervised approaches and unsupervised approaches. The experiments are conducted on three volumetric medical imaging datasets, including the COPDGene dataset COPDGene Dataset COPD is a lung disease that makes it difficult to breathe. The COPDGene Study (Regan et al. 2011 ) is a multi-center observational study designed to identify the underlying genetic factors of COPD. We use a large set of 3D thorax computerized tomography (CT) images of 9,180 subjects from the COPDGene dataset in our study. We use 3D CT scans of 1,110 subjects from the MosMed dataset (Morozov et al. 2020 ) provided by municipal hospitals in Moscow, Russia. Based on the severity of lung tissues abnormalities related with COVID-19, the images are classified into five severity categories associated with different triage decisions. For example, the patients in the mild category are followed up at home with telemedicine monitoring, while the patients in the critical category are immediately transferred to the intensive care unit. To verify whether the learned representation can be transferred to COVID-19 patients from other sites, we collect a multi-hospital 3D thorax CT images of COVID-19. The combined dataset has 80 subjects, in which 35 positive subjects are from multiple publicly available COVID-19 datasets (Jun et al. 2020; Bell 2020; Zhou et al. 2020) , and 45 healthy subjects randomly sampled from the LIDC-IDRI dataset (Armato III et al. 2011 ) as negative samples. We evaluate the performance of proposed method by using extracted representations of subjects to predict clinically relevant variables. We first perform self-supervised pretraining with our method on the COPDGene dataset. Then we freeze the extracted subject-level features and use them to train a linear regression model to predict two continuous clinical variables, percent predicted values of Forced Expiratory Volume in one second (FEV1pp) and its ratio with Forced vital capacity (FVC) (FEV 1 /FVC), on the the log scale. We report average R 2 scores with standard deviations in five-fold cross-validation. We also train a logistic regression model for each of the six categorical variables, including (1) Global Initiative for Chronic Obstructive Lung Disease (GOLD) score, which is a four-grade categorical value indicating the severity of airflow limitation, (2) Centrilobular emphysema (CLE) visual score, which is a six-grade categorical value indicating the severity of emphysema in centrilobular, (3) Paraseptal emphysema (Para-septal) visual score, which is a three-grade categorical value indicating the severity of paraseptal emphysema, (4) Acute Exacerbation history (AE history), which is a binary variable indicating whether the subject has experienced at least one exacerbation before enrolling in the study, (5) Future Acute Exacerbation (Future AE), which is a binary variable indicating whether the subject has reported experiencing at least one exacerbation at the 5-year longitudinal follow up, (6) Medical Research Council Dyspnea Score (mMRC), which is a five-grade categorical value indicating dyspnea symptom. We compare the performance of our method against: (1) supervised approaches, including Subject2Vec (Singla et al. 2018) , Slice-based CNN (González et al. 2018 ) and (2) unsupervised approaches, including Models Genesis (Zhou et al. 2019) , MedicalNet (Chen, Ma, and Zheng 2019) , MoCo (3D implementation) (He et al. 2020) , Divergencebased feature extractor (Schabdach et al. 2017) , K-means algorithm applied to image features extracted from local lung regions (Schabdach et al. 2017) , and Low Attenuation Area (LAA), which is a clinical descriptor. The evaluation results are shown in Table 1 . For all results, we report average test accuracy in five-fold cross-validation. The results show that our proposed model outperforms unsupervised baselines in all metrics except for Future AE. While MoCo is also a contrastive learning based method, we believe that our proposed method achieves better performance for three reasons: (1) Our method incorporates anatomical context. (2) Since MoCo can only accept fixedsize input, we resize all volumetric images into 256 × 256 × 256. In this way, lung shapes may be distorted in the CT images, and fine-details are lost due to down-sampling. In comparison, our model supports images with arbitrary sizes in full resolution by design. (3) Since training CNN model with volumetric images is extremely memory-intensive, we can only train the MoCo model with limited batch size. The small batch size may lead to unstable gradients. In comparison, the interleaving training scheme reduces the usage of memory footprint, thus it allows us to train our model with a much larger batch size. Our method also outperforms supervised methods, including Subject2Vec and 2D CNN, in terms of CLE, Paraseptal, AE History, Future AE and mMRC; for the rest clinical variables, the performance gap of our method is smaller than other unsupervised methods. We believe that the improvement is mainly from the richer context information incorporated by our method. Subject2Vec uses an unordered set-based representation which does not account for spatial locations of the patches. 2D CNN only uses 2D slices which does not leverage 3D structure. Overall, the results suggest that representation extracted by our model preserves richer information about the disease severity than baselines. Ablation study: We perform two ablation studies to validate the importance of context provided by anatomy and the relational structure of anatomical regions: (1) Removing conditional encoding (CE). In this setting, we replace the proposed conditional encoder with a conventional encoder which only takes images as input. (2) Removing graph. In this setting, we remove GCN in the model and obtain subject-level representation by average pooling of all patch/node level representations without propagating information between nodes. As shown in Table 1 , both types of context contribute significantly to the performance of the final model. We first perform self-supervised pretraining with our method on the MosMed dataset. Then we freeze the extracted patient-level features and train a logistic regression classifier to predict the severity of lung tissue abnormalities related with COVID-19, a five-grade categorical variable based on the on CT findings and other clinical measures. We compare the proposed method with benchmark unsupervised methods, including MedicalNet, ModelsGenesis, MoCo, and one supervised model, 3D CNN model. We use the average test accuracy in five-fold cross-validation as the metric for quantifying prediction performance. Table 2 shows that our proposed model outperforms both the unsupervised and supervised baselines. The supervised 3D CNN model performed worse than the other unsupervised methods, suggesting that it might not converge well or become overfitted since the size of the training set is limited. The features extracted by the proposed method show superior performance in staging lung tissue abnormalities related with COVID-19 than those extracted by other unsupervised benchmark models. We believe that the graph-based feature extractor provides additional gains by utilizing the full-resolution CT images than CNN-based feature extractor, which may lose information after resizing or downsampling the raw images. The results of ablation studies support that counting local anatomy and relational structure of different anatomical regions is useful for learning more informative representations for COVID-19 patients. Since the size of COVID-19 CT Dataset is very small (only 80 images are available), we don't train the networks from scratch with this dataset. Instead, we use models pre-trained on the COPDGene dataset and the MosMed dataset to extract patient-level features from the images in the COVID-19 CT Dataset, and train a logistic regression model on top of it to classify COVID-19 patients. We compare the features extracted by the proposed method to the baselines including MedicalNet, ModelsGenesis, MoCo (unsupervised), and 3D CNN (supervised). We report the average test accuracy in five-fold cross-validation. Table 3 shows that the features extracted by the proposed model pre-trained on the MosMed dataset perform the best for COVID-19 patient classification. They outperform the features extracted by the same model pre-trained on the COPDGene dataset. We hypothesize that this is because the MosMed dataset contains subjects with COVID-19 related pathological tissue, such as ground glass opacities and mixed attenuation. However, the COPDGene dataset also shows great performance for transfer learning with both the ModelsGenesis model and our model, which shed light on the utility of unlabeled data for COVID-19 analysis. To visualize the learned embedding and understand the model's behavior, we use two methods to visualize the model. The first one is embedding visualization, we use UMAP (McInnes, Healy, and Melville 2018) to visualize the patient-level features extracted on the COPDGene dataset in two dimensions. Figure 2 shows a trend, from lower-left to upper-right, along which the value of FEV1pp decreases or the severity of disease increases. In addition, we use the model explanation method intro- Brighter color indicate higher relevance to the disease severity. The figure illustrates that high activation region overlaps with the ground glass opacities. duced before to obtain the activation heatmap relevant to the downstream task, COVID-19 classification. Figure 3 (left) shows the axial view of the CT image of a COVID-19 positive patient, and Figure 3 (right) shows the corresponding activation map. The anatomical regions received high activation scores overlap with the peripheral ground glass opacities on the CT image, which is a known indicator of COVID-19. We also found that activation maps of non-COVID-19 patients usually have no obvious signal, which is expected. This result suggests that our model can highlight the regions that are clinically relevant to the prediction. More examples can be found in Supplementary Material. In this paper, we introduce a novel method for context-aware unsupervised representation learning on volumetric medical images. We represent a 3D image as a graph of patches with anatomical correspondences between each patient, and incorporate the relationship between anatomical regions. In addition, we introduced a multi-scale model which includes a conditional encoder for local textural feature extraction and a graph convolutional network for global contextual feature extraction. Moreover, we propose a task-specific method for model explanation. The experiments on multiple datasets demonstrate that our proposed method is effective, generalizable and interpretable. University of Pittsburgh, USA {lis118, key44, kayhan}@pitt.edu In the tables below, we show the detailed architectures of conditional encoder E(·, ·), including C(·) and f l (·), and graph convolutional network G(·, ·). The patch size is set as 32 × 32 × 32. Cosine schedule (?) is used to update the learning rate. For MoCo (?), we implement a 3D encoder to handle the 3D data and train the model on COPDGene and MosMed dataset. For ModelsGenesis (?), we train the model on COPDGene and MosMed dataset with the original setting. For MedicalNet (?), since it's training requires segmentation mask, we use pretrained weights provided by the authors. Figure 1 : Embedding of subjects in 2D using UMAP. Each dot represents one subject colored by the GOLD score. We can find a trend, from lower-left to upper-right, along which we can see increasing GOLD score. To visualize the learned embedding and understand the model's behavior, we use two methods to visualize the model. The first one is embedding visualization, we use UMAP (?) to visualize the patient-level features extracted on the COPDGene dataset in two dimension. In Fig 1, we found that subjects with GOLD score of (0,1) and (3,4) are separable under two dimension. But subjects with GOLD score 2 are scattered. It requires further investigation to understanding of embedding pattern of subjects subjects with GOLD score 2. In Fig 1, we can find a trend, from lower-left to upper-right, along which we can see increasing GOLD score. We use the model explanation method described before to visualize discriminative image regions used by our model for prediction in downstream task. In Fig. 2 , we apply the explanation method using the target logit of GOLD score = 4 on a GOLD 4 subject in COPDGene dataset. The dark area on the right lung, where lung tissue is severely damaged, received highest activation value. Figure 3 (left) shows the axial view of the CT image of a COVID-19 positive patient, and Figure 3 (right) shows the corresponding activation map. The anatomical regions received high activation scores overlap with the peripheral ground glass opacities on the CT image, which is a known indicator of COVID-19. This result suggests that our model can highlight the regions that are clinically relevant to the prediction. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans Self-supervised learning for cardiac mr image segmentation by anatomical position prediction Relational inductive biases, deep learning, and graph networks COVID-19 CT segmentation dataset Self-supervised learning for medical image analysis using image context restoration Med3d: Transfer learning for 3d medical image analysis A simple framework for contrastive learning of visual representations Improved baselines with momentum contrastive learning Unsupervised visual representation learning by context prediction Convolutional networks on graphs for learning molecular fingerprints Disease staging and prognosis in smokers using deep learning in chest computed tomography node2vec: Scalable feature learning for networks Inductive representation learning on large graphs Momentum contrast for unsupervised visual representation learning Automatic lung segmentation in routine imaging is a data diversity problem, not a methodology problem Towards Efficient COVID-19 CT Annotation: A Benchmark for Lung and Infection Segmentation Semi-supervised classification with graph convolutional networks Umap: Uniform manifold approximation and projection for dimension reduction Related Findings Dataset Unsupervised learning of visual representations by solving jigsaw puzzles Representation learning with contrastive predictive coding Deepwalk: Online learning of social representations Genetic epidemiology of COPD (COPDGene) study design struc2vec: Learning node representations from structural identity A likelihood-free approach for characterizing heterogeneous diseases in large-scale studies Subject2Vec: generative-discriminative approach from a set of image patches to a vector Multimodal self-supervised learning for medical image analysis Large-scale evaluation of ANTs and FreeSurfer cortical thickness measurements Unsupervised feature learning via non-parametric instance discrimination Colorful image colorization Learning deep features for discriminative localization A Rapid, Accurate and Machine-Agnostic Segmentation and Quantification Method for CT-Based COVID-19 Diagnosis Models genesis: Generic autodidactic models for 3d medical image analysis Medical Images