key: cord-0669246-ik4igjko
authors: Zhou, S. Kevin; Greenspan, Hayit; Davatzikos, Christos; Duncan, James S.; Ginneken, Bram van; Madabhushi, Anant; Prince, Jerry L.; Rueckert, Daniel; Summers, Ronald M.
title: A review of deep learning in medical imaging: Image traits, technology trends, case studies with progress highlights, and future promises
date: 2020-08-02
journal: nan
DOI: nan
sha: c4c4a61edf42ed162651135dbf07e24b695ec2aa
doc_id: 669246
cord_uid: ik4igjko

Since its renaissance, deep learning has been widely used in various medical imaging tasks and has achieved remarkable success in many medical imaging applications, thereby propelling us into the so-called artificial intelligence (AI) era. It is known that the success of AI is mostly attributed to the availability of big data with annotations for a single task and the advances in high performance computing. However, medical imaging presents unique challenges that confront deep learning approaches. In this survey paper, we first highlight both clinical needs and technical challenges in medical imaging and describe how emerging trends in deep learning are addressing these issues. We cover the topics of network architecture, sparse and noisy labels, federating learning, interpretability, uncertainty quantification, etc. Then, we present several case studies that are commonly found in clinical practice, including digital pathology and chest, brain, cardiovascular, and abdominal imaging. Rather than presenting an exhaustive literature survey, we instead describe some prominent research highlights related to these case study applications. We conclude with a discussion and presentation of promising future directions.

. The main traits of medical images and the associated technological trends for addressing these traits.

As described below and illustrated in Figure 1 , medical images have several traits that influence the suitability and nature of deep learning solutions.

Medical images have multiple modalities and are dense in pixel resolution. There are many existing imaging modalities and new modalities such as spectral CT are being routinely invented. Even for commonly used imaging modalities, the pixel or voxel resolution has become higher and the information density has increased. For example, the spatial resolution of clinical CT and MRI has reached the sub-millimeter level and the spatial resolution of ultrasound is even better while its temporal resolution exceeds real-time.

Medical images are heterogeneous and isolated. Although medical imaging data exist in large numbers in the clinic, due to differences in equipment, scan protocols, and the patients themselves, their appearance is heterogeneous, leading to the so-called "distribution drift" phenomenon. Due to patient privacy and clinical data management requirements, images are scattered among different hospitals and imaging centers, and truly centralized open source medical big data are rare.

The disease patterns in medical images are numerous and they exhibit long tail distributions. Radiology Gamuts Ontology [2] defines 12,878 "symptoms" (conditions that lead to results) and 4,662 "diseases" (imaging findings). The incidence of disease has a typical long-tailed distribution: while a small number of common diseases have sufficient observed cases for large-scale analysis, most diseases are infrequent in the clinic. In addition, novel contagious diseases that are not represented in the current ontology, such as the outbreak of COVID-19, occur with some frequency.

The labels associated with medical images are sparse and noisy. Labeling or annotating a medical image is timeconsuming and expensive. Also, different tasks require different forms of annotation which creates the phenomenon of label sparsity. Because of variable experience and different conditions, both inter-user and intra-user labeling inconsistency is high [3] and labels must therefore be considered to be noisy. In fact, the establishment of gold standards for image labeling remains an open issue.

Samples are imbalanced and follow a multi-modal distribution. Because the appearance variations among images are large, in the already labeled image samples, the probability distribution from which the positive or negative samples are drawn typically is multi-modal. The ratio between positive and negative samples is extremely uneven. For example, the number of pixels belonging to a tumor is usually one to many orders of magnitude less than that of normal tissue.

Medical imaging processing and analysis tasks are complex and diverse. Medical imaging has a rich body of tasks. At the technical level, there is an array of technologies including reconstruction, enhancement, restoration, classification, detection, segmentation, and registration. When these technologies are combined with multiple image modalities and numerous disease types, there is a very large number of highly-complex tasks associated with numerous applications that can be defined.

Medical imaging is often a key part of the medical diagnosis and treatment process. Typically, a radiologist will review the acquired medical images and write a summarizing report of their findings. The referring physician will define a diagnosis and treatment plan based on the images and radiologist's report. Often, medical imaging will be ordered as part of a patient's follow-up to verify successful treatment. In addition, images are becoming an important component of invasive procedures, being used both for surgical planning as well as for real-time imaging during the procedure itself.

As a specific example we can look at what we term the "radiology challenge". In the past decade technology focused on improving the image acquisition process, such that devices have improved in speed and resolution. For example, in 1990 a CT scanner might acquire 50-100 slices whereas today's CT scanners might acquire 1000-2500 slices per case. A single whole slide digital pathology image corresponding to a single prostate biopsy core can easily occupy 10Gb of space at 40x magnification. Overall, there are billions of medical imaging studies conducted per year, worldwide, and this number is growing.

Most interpretations of medical images are performed by physicians and, in particular, by radiologists. Image interpretation by humans, however, is limited due to human subjectivity, the large variations across interpreters, and fatigue. Radiologists that review cases have limited time to review an everincreasing number of images, which leads to missed findings, long turn-around times, and a paucity of numerical results or quantification. This, in turn, drastically limits the medical community's ability to advance towards more evidence-based personalized healthcare.

AI tools such as deep learning technology can provide support to physicians by automating image analysis, leading to what we can term "Computational Radiology". Among the automated tools that can be developed are Detection of pathological findings, Quantification of disease extent, Characterization of pathologies (e.g., into benign vs malignant), and assorted software tools that can be broadly characterized as Decision Support. This technology can also extend physicians' capabilities to include the characterization of threedimensional and time-varying events, which are often not included in today's radiological reports because of both limited time and limited visualization and quantification tools.

Several key technologies arise from the various medical imaging applications, including: [4] , [5] , [6] , [7] • Medical image reconstruction, which aims to form a visual representation (aka an image) from signals acquired by a medical imaging device such as a CT or MRI scanner. Reconstruction of high quality images from low doses and/or fast acquisitions has important clinical implications. • Medical image enhancement, which aims to adjust the intensities of an image so that the resultant image is more suitable for display or further analysis. Enhancement methods include denoising, super-resolution, MR bias field correction, and image harmonization. Recently, much research has focused on modality translation and synthesis, which can be considered as image enhancement steps. • Medical image segmentation, which aims to assign labels to pixels so that the pixels with the same label form a segmented object. Segmentation has numerous applications in clinical quantification, therapy, and surgical planning. • Medical image registration, which aims to align the spatial coordinates of one or more images into a common coordinate system. Registration finds wide use in population analysis, longitudinal analysis, and multimodal fusion, and is also commonly used for image segmentation via label transfer. • Computer aided detection (CADe) and diagnosis (CADx).

CADe aims to localize or find a bounding box that contains an object (typically a lesion) of interest. CADx aims to further classify the localized lesion as benign/malignant or one of multiple lesion types. • Others technologies include landmark detection, image or view recognition, automatic report generation, etc. In mathematics, the above technologies can be regarded as function approximation methods, which approximate the true mapping F that takes an image (or multiple images if multimodality is accessible) as input and outputs a specific y, y = F (x). The definition of y varies depending on the technology, which itself depends on the application or task.

In CADe, y denotes a bounding box. In image registration, y is a deformation field. In image segmentation, y is a label mask. In image enhancement, y is a quality-enhanced image of the same size of the input image x.

There are many ways to approximate F ; however, deep learning (DL) [8] , a branch of machine learning (ML), is one of the most powerful methods for function approximation. Since its renaissance, deep learning has been widely used in various medical imaging tasks and has achieved substantial success in many medical imaging applications. Because of its focus on learning rather than modeling, the use of DL in medical imaging represents a substantial departure from previous approaches in medical imaging. Take supervised deep learning as an example. Assume that a training dataset {(x n , y n ); n = 1, . . . , N } is available and that a deep neural network is parameterized by θ, which includes the number of layers, the number of nodes of each layer, the connecting weights, the choices of activation functions, etc. The neural network that is found to approximate F can be written as φθ(x) whereθ are the parameters that minimize the so-called loss function

Here, l(φ θ (x), y) is the item-wise loss function that penalizes the prediction error, R 1 (φ θ (x n )) reflects the prior belief about the output, and R 2 (θ) is a regularization term about the network parameters. Although the neural network φθ(x) does represent a type of model, it is generally thought of as a "black box" since it does not represent a designed model based on well-understood physical or mathematical principles. There are many survey papers on deep learning based key technologies for medical image analysis [9] , [10] , [11] , [12] , [13] , [14] , [15] , [16] , [17] , [18] . To differentiate the present review paper from these works, we specifically omit any presentation of the technical details of DL itself, which is no longer considered new and is well-covered in numerous other works, and focus instead on the connections between the emerging DL approaches and the specific needs in medical imaging and on several case examples that illustrate the state of the art.

Here, we briefly outline the development timeline of DL in medical imaging. Deep learning was termed one of the 10 breakthrough technologies of 2013 [19] . This followed the 2012 large-scale image categorization challenge, that introduced the CNN superiority on the ImageNet dataset [20] . At that point DL emerged as the leading machine-learning tool in the general imaging and computer vision domains. At that time, the medical imaging community began a debate about whether DL would be applicable in the medical imaging space. The concerns were due to the challenges we have outlined above, with the main challenge being the lack of sufficient labeled data, known as the Data Challenge.

Several steps can be pointed to as enablers of the DL technology within the medical imaging space: In 2015-2016 techniques were developed using "transfer learning" (TL) (or what was also called "learning from non-medical features" [21] ) to apply the knowledge gained via solving a source problem to a different but related target problem. A key question was whether a network pre-trained on natural imagery would be applicable to medical images? Several groups showed this to be the case (e.g. [22] , [23] , [21] ); using the deep network trained based on ImageNet and fine-tuning to a medical imaging task was helpful in order to speed up training convergence and improve accuracy.

In 2017-2018 works emerged that were focused on a second solution developed in the medical imaging community to process limited datasets, which was synthetic data augmentation. Classical augmentation is a key component of any network training. Still, key questions to address were whether it was possible to synthesize medical data using schemes such as generative modeling and whether the synthesized data would serve as viable medical examples and would in practice increase performance of the medical task at hand? Several works across varying domains demonstrated that this was in fact the case. In [24] , for example, GAN-based synthetic image augmentation was shown to generate lesion image samples that were not recognized as synthetic by the expert radiologists and also increased CNN performance in classifying liver lesions. GANs, variational encoders, and variations on these are still being explored and advanced in recent works, as will be described in the following Section.

For image segmentation, one of the key contributions that emerged from the medical imaging community was the U-Net architecture [25] . Originally designed for microscopic cell segmentation, the U-Net has proven to efficiently and robustly learn many medical image segmentation tasks.

Network architectures. Deep neural networks have a larger model capacities and stronger generalization capabilities than shallow neural networks. Deep models trained on large scale annotated databases for a single task achieve outstanding performances, far beyond traditional algorithms or even human capability.

Making it deeper. Starting from AlexNet [20] , there was a research trend to make networks deeper, as represented by VGGNet [26] , Inception Net [27] , and ResNet [28] . The use of skip connections makes a deep network more trainable as in DenseNet [29] and U-Net [25] . U-net was first proposed to tackle segmentation while the other networks were developed for image classification. Deep supervision [30] further improves discriminative power.

Adversarial and attention mechanisms. In the generative adversarial network (GAN) [31] , Goodfellow et al. propose to accompany a generative model with a discriminator that tells whether a sample is from the model distribution or the data distribution. Both the generator and discriminator are represented by deep networks and their training is done via a minimax optimization. Adversarial learning is widely used in medical imaging [14] , including medical image reconstruction [32] , image quality enhancement [33] , and segmentation [34] .

Attention mechanism [35] allows automatic discovery of "where" and "what" to focus on when describing image contents or making a holistic decision. Squeeze and excitation [36] can be regarded as a channel attention mechanism. Attention is combined with GAN in [37] and with U-Net in [38] .

Neural architecture search (NAS) and light weight design. NAS [39] aims to automatically design the architecture of a deep network for high performance geared toward a given task. Zhu et al. [40] successfully apply NAS to volumetric medical image segmentation. Light weight design [41] , [42] , on the other hand, aims to design the architecture for computational efficiency on resource-constrained devices such as mobile phones while maintaining accuracy.

Annotation efficient approaches. To address sparse and noisy labels, we need DL approaches that are efficient with respect to annotations. So, a key idea is to leverage the power and robustness of feature representation capability derived from existing models and data even though the models or data are not necessarily from the same domain or for the same task and to adapt such representation to the task on hand. To do this, there are a handful of methods proposed in the literature [15] , including transfer learning, domain adaptation, self-supervised learning, semi-supervised learning, weakly/partially supervised learning, etc.

Transfer learning (TL) aims to apply the knowledge gained via solving a source problem to a different but related target problem. One commonly used TL method is to use the deep network trained based on ImageNet and fine tune it to a medical imaging task in order to speed up training convergence and improve accuracy. With the availability of a large number of annotated datasets, such TL methods [23] achieve remarkable success. However, ImageNet consists of natural images and its pretrained models are for 2D images only and are not necessarily the best for medical images, especially for smallsample settings [43] . Liu et al. [44] propose a 3D anisotropic hybrid network that effectively transfers convolutional features learned from 2D images to 3D anisotropic volumes. In [45] , Chen et al. combine multiple datasets from several medical challenges with diverse modalities, target organs, and pathologies and learn one 3D network that provides an effective pretrained model for 3D medical image analysis tasks.

Domain adaptation is a form of transfer learning in which the source and target domains have the same feature space but different distributions. In [46] , domain-invariant features are learned via an adversarial mechanism that attempts to classify the domain of the input data. Zhang et al. [47] propose to synthesize and segment multimodal medical volumes using generative adversarial networks with cycle-and shapeconsistency. In [48] , a domain adaptation module that maps the target input to features which are aligned with source domain feature space is proposed for cross-modality biomedical image segmentation, using a domain critic module for discriminating the feature space of both domains. Huang et al. [49] propose a universal U-Net comprising domain-general and domainspecific parameters that deals with multiple organ segmentation tasks on multiple domains. This integrated learning mechanism offers a new possibility of dealing with multiple domains and even multiple heterogeneous tasks. Self-supervised learning, a form of unsupervised learning, learns a representation through a proxy task, in which the data provides supervisory signals. Once the representation is learned, it is fine tuned by using annotated data. The models genesis method [50] uses a proxy task of recovering the original image using a distorted image as input. The possible distortions include non-linear gray-value transformation, local pixel shuffling, and image out-painting and in-painting. In [51] , Zhu et al. proposes solving a Rubik's Cube proxy task that involves three operations, namely cube ordering, cube rotating, and cube masking. This allows the network to learn features that are invariant to translation and rotation and robust to noise as well.

Semi-supervised learning often trains a model using a small set of annotated images, then generates pseudo-labels for a large set of images with annotations, and learns a final model by mixing up both sets of images. Bai et al. [52] implements such a method for cardiac MR segmentation. In [53] , Nie et al.

proposed an attention based semi-supervised deep network for segmentation. It adversarially trains a segmentation network, from which a confidence map is computed as a regionattention based semi-supervised learning strategy to include the unlabeled data for training.

Weakly or partially supervised learning. In [54] , Wang et al. solve a weakly-supervised multi-label disease classification from a chest x-ray. To relax the stringent pixel-wise annotation for image segmentation, weakly supervised methods that use image-level annotations [55] , or weak annotations like dots and scribbles [56] are proposed. For multi-organ segmentation, Shi et al. [57] learn a single multiclass network from a union of multiple datasets, each with a low sample size and partial organ label, using newly proposed marginal loss and exclusion loss. Schleg et al. [58] builds a deep model from only normal images to detect abnormal regions in a test image.

Unsupervised learning and disentanglement. Unsupervised learning does not rely on the existence of annotated images. A disentangled network structure is designed with an adversarial learning strategy that promotes the statistical matching of deep features has been widely used. In medical imaging, unsupervised learning and disentanglement have been used in image registration [59] , motion tracking [60] , artifact reduction [61] , improving classification [62] , domain adaptation [63] , and general modeling [64] .

Embedding knowledge into learning. Knowledge arises from various sources such as imaging physics, statistical constraints, and task specifics and ways of embedding into a DL approach vary too. For chest x-ray disease classification, Li et al. [65] encode anatomy knowledge embedded in unpaired CT into a deep network that decomposes a chest xray into lung, bone and the remaining structures (see Fig. 2 ). With augmented bone-suppressed images, classification performance is improved in predicting 11 out of 14 common lung diseases. In [66] lung radiographs are enhanced by learning to extract lung structures from CT based simulated x-ray (DRRs) and fusing with the original x-ray image. The enhancement was shown to augment results of pathology characterization in real x-ray images. In [67] , a dual-domain network is proposed to reduce metal artifacts on both the image and sinogram domains, which are seemingly integrated into one differential framework through a Radon inverse layer, rather than two separate modules.

Federated learning. To combat issues related to data data privacy, data security, and data access rights, it has become increasingly important to have the capability of learning a common, robust algorithmic model through distributed computing and model aggregation strategies so that no data are transferred outside a hospital or an imaging lab. This research direction is called federated learning (FL) [68] , which is in contrast to conventional centralized learning with all the local datasets uploaded to one server. There are many ongoing research challenges related to federate learning such as reduced communication burden [69] , data heterogeneity in various local sites [70] , and vulnerability to attacks [71] .

Despite its importance, work on FL in medical imaging has only been reported recently. Sheller et al. [72] present the first use of FL for multi-institutional DL model without sharing patient data and report similar brain lesion segmentation performances between the models trained in a federated or centralized way. In [73] , Li et al. study several practical FL methods while protecting data privacy for brain tumour segmentation on the BraTS dataset and demonstrate a tradeoff between model performance and privacy protection costs. Recently, FL is applied, together with domain adaptation, to train a model with boosted analysis performance and reliable discovery of disease-related biomarkers [74] .

Interpretability. Clinical decision-making relies heavily on evidence gathering and interpretation. Lacking evidence and interpretation makes it difficult for physicians to trust the ML model's prediction, especially in disease diagnosis. In addition, interpretability is also the source of new knowledge. Murdoch et al. [75] define interpretable machine learning as leveraging machine-learning models to extract relevant knowledge about domain relationships contained in data, aiming to provide insights for a user into a chosen domain problem. Most interpretation methods are categorized as model-based and post-hoc interpretability. The former is about constraining the model so that it readily provides useful information (such as sparsity, modularity, etc.) about the uncovered relationships. The latter is about extracting information about what relationships the model has learned.

Model-based interpretability. For cardiac MRI classification [76] , diagnostically meaningful concepts in the latent space are encoded. In [77] , when training the model for healthy and hypertrophic cardiomyopathy classification, it leverages interpretable task-specific anatomic patterns learned from 3D segmentations.

Post-hoc interpretability. In [78] , the feature importance scores are calculated for graph neural network by comparing its interpretation ability with Random Forest. Li et al. [79] propose a brain biomarker interpretation method through a frequency-normalized sampling strategy to corrupt an image. In [80] , various interpretability methods are evaluated in the context of semantic segmentation of polyps from colonoscopy images. In [81] , a hybrid RBM-Random Forest system on brain lesion segmentation is learned with the goal of enhancing interpretability of automatically extracted features.

Uncertainty quantification characterizes the model prediction with confidence measure [82] , which can be regarded as a method of post-hoc interpretability, even though often the uncertainty measure is calculated along with the model prediction. Recently there are emerging works that quantify uncertainty in deep learning methods for medical image segmentation [83] , [84] , [85] , lesion detection [86] , chest x-ray disease classification [87] , and diabetic retinopathy grading [88] , [89] . One additional extension to uncertainty is its combination with the knowledge that the given labels are noisy. Works are now starting to emerge that take into account label uncertainty in the modeling of the network architecture and its training [90] .

Given that DL has been used in a vast number of medical imaging applications, it is nearly infeasible to cover all possible related literature in a single paper. Therefore, we cover several selected cases that are commonly found in clinical practices, which include chest, neuro, cardiovascular, abdominal, and microscopy imaging. Further, rather than presenting an exhaustive literature survey for each studied case, we provide some prominent progress highlights in each case study.

Lung diseases have a high mortality and morbidity. In the top ten causes of death worldwide we find lung cancer, chronic obstructive pulmonary disease (COPD), pneumonia, and tuberculosis (TB). At the moment of writing this overview, COVID-19 has a death rate comparable to TB. Imaging is highly relevant to diagnose, plan treatment and learn more about the causes and mechanisms underlying these and other lung diseases. Next to that, pulmonary complications are common in hospitalized patients. As a result, chest radiography is by far the most common radiological examination, often comprising over a third of all studies in a radiology department.

Plain radiography and computed tomography are the two most common modalities to image the chest. The high contrast in density between air-filled lung parenchyma and tissue makes CT ideal for in vivo analysis of the lungs, obtaining highquality and high-resolution images even at very low radiation dose. Nuclear imaging (PET or PET/CT) is used for diagnosing and staging of oncology patients. MRI is somewhat limited in the lungs, but can yield unique functional information.

Ultrasound imaging is also difficult because sound waves reflect strongly at boundaries of air and tissue, but pointof-care ultrasound is used at the emergency department and is widely used to monitor COVID-19 patients for which the first decision support applications based on deep learning have already appeared [91] .

Segmentation of anatomical structures. For analysis and quantification from chest CT scans, automated segmentation of major anatomical structures is an important prerequisite. Recent publications demonstrate convincingly that deep learning is now the state-of-the-art method to achieve this. This is evident from inspecting the results of LOLA11 2 , a competition started in 2011 for lung and lobe segmentation in chest CT. The test dataset for this challenge included many challenging cases with lungs affected by severe abnormalities. For years, the best results were obtained by interactive methods. In 2019 and 2020, seven fully automatic methods based on U-Nets or variants thereof made the top 10 for lung segmentation, and for lobe segmentation, two recent methods obtained results outperforming the best interactive methods. Both these methods [92] , [93] were trained on thousands of CT scans from the COPDGene study [94] , illustrating the importance of large high quality datasets to obtain good results with deep learning. This data is publicly available on request. Both methods used a multi-resolution U-net like architecture with several customizations. Gerard et al. integrated a previously developed method for finding the fissures [95] . Xie et al. [93] added a non-local module with self-attention and finetuned their method on data of COVID-19 suspects to accurately segment the lobes in scans affected by ground-glass and consolidations.

Segmentation of the vasculature, separated in to arteries and veins, and the airway tree, including labeling of the branches and segmentation of the bronchial wall, is another important area of research. Although methods that use convolutional networks in some of their steps have been proposed, developing an architecture entirely based on deep learning that can accurately track and segment intertwined tree structures and take advantage of the known geometry of these complex structures, is still an open challenge.

Detection and diagnosis in chest radiography. Recently the number of publications on detecting abnormalities in the ubiquitous chest x-ray has increased enormously. This trend has been driven by the availability of large public datasets, such as ChestXRay14 [54] , CheXpert [96] , MIMIC [97] and PadChest [98] , totaling 868k images. Labels for presence or absence of over 150 different abnormal signs were gathered by text-mining the accompanying radiology reports. This makes the labels noisy. Most publications use a standard approach of inputting the entire image in a popular convolutional network architecture. Methodological contributions include novel ways of preprocessing the images, handling the label uncertainty and the large number of classes, suppressing the bones [65] , and exploiting self-supervised learning as a way of pretraining. So far, only few publications analyze multiple exams of the same patient to detect interval change or analyze the lateral views. Decision support in lung cancer screening. Following the positive outcome of the NLST trial, the United States has started a screening program for heavy smokers for early detection of lung cancer with annual low-dose CT scans. Many other countries worldwide are expected to follow suit. In the United States, screening centers have to use a reporting system called Lung-RADS [99] . Reading lung cancer screening CT scans is time consuming and therefore automating the various steps in Lung-RADS has received a lot of attention.

The most widely studied topic is nodule detection [100] . Nodules may represent lung cancer. Many DL systems were compared in the LUNA16 challenge 3 . Lung-RADS classifies scans in categories based on the most suspicious nodule, and this is determined by the nodule type and size. DL systems to determine nodule type have been proposed [101] and measuring the size can be done by traditional methods based on thresholding and mathematical morphology but also with DL networks. Finally, Lung-RADS contains the option to directly refer scans with a nodule that is highly suspicious for cancer. Many DL systems to estimate nodule malignancy have been proposed.

The advantage of automating the LUNG-RADS guidelines step-by-step is that this leads to an explainable AI solution that can directly support radiologists in their reading workflow. Alternatively, one could ask a computer to directly predict if a CT scan contains an actionable lung cancer. This was the topic of a Kaggle challenge organized in 2017 4 in which almost 2000 teams competed for a one million dollar prize purse. The top 10 solutions all used deep learning and are open source. Two years later, a team from Google published an implementation [102] following the approach of the winning team in the Kaggle challenge, employing modern architectures such as a 3D inflated Inception architecture (I3D) [103] . The I3D architecture builds upon the Inception v1 model for 2D image classification but inflates the filters and pooling kernels into 3D. This enables the use of an image classification model pre-trained with 2D data for a 3D image classification task. The paper showed that the model outperformed six radiologists that followed Lung-RADS. The model was also extended to handle follow-up scans where it obtained performance slightly below human experts.

COVID-19 case study. As an illustration how DL with pretrained elements can be used to rapidly build applications, we briefly discuss the development of two tools for COVID-19 detection, for chest radiographs and chest CT. In March 2020, many European hospitals were overwhelmed by patients presenting at the emergency care with respiratory complaints. There was a shortage of molecular testing capacity for COVID-19 and turnaround time for test results was often days. Hospitals therefore used chest x-ray or CT to obtain a working diagnosis and decide whether to hospitalize patients and how to treat them. In just six weeks, researchers from various Dutch and German hospitals, research institutes and a company managed to create a solution for COVID-19 detection from an x-ray and from a CT scan. Figure 3 shows the result of this CORADS-AI system for a COVID-19 positive case.

The x-ray solution started from a convolutional network using local and global labels, pretrained to detect tuberculosis [104] , finetuned using public and private data of patients with and without pneumonia to detect pneumonia in general, and subsequently finetuned on x-ray data from patients of a Dutch hospital in a COVID-19 hotspot. The system was subsequently evaluated on 454 chest radiographs from another Dutch hospital and shown to perform comparably to six chest radiologists [105] . The system is currently being field tested in Africa.

The CT solution, called CO-RADS [106] , aimed to automate a clinical reporting system for CT of COVID-19 suspects. This system assesses the likelihood of COVID-19 infection on a scale from CO-RADS 1 (highly unlikely) to CO-RADS 5 (highly likely) and quantifies the severity of disease using a score per lung lobe from 0 to 5 depending on percentage affected lung parenchyma for a maximum CT severity score of 25 points. The previously mentioned lobe segmentation [93] was employed. Abnormal areas in the lung were segmented using a 3D U-net built with the nnU-Net framework [107] in a cross-validated fashion with 108 scans and corresponding reference delineations to segment ground-glass opacities and consolidation in the lungs. The CT severity score was derived from the segmentation results by computing the percentage of affected parenchymal tissue per lobe. nnU-Net was compared with several other approaches and performed best. For assessing the CO-RADS score, the previously mentioned I3D architecture [103] performed best.

In recent years, deep learning has seen a dramatic rise in popularity within the neuroimaging community. Many neuroimaging tasks including segmentation, registration, and prediction now have deep learning based implementations. Additionally, through the use of deep generative models and adversarial training, deep learning has enabled new avenues of research in complex image synthesis tasks. With the increasing availability of large and diverse pooled neuroimaging studies, deep learning offers interesting prospects for improving accuracy and generalizability while reducing inference time and the need for complex preprocessing. CNNs, in particular, have allowed for efficient network parameterization and spatial invariance, both of which are critical when dealing with highdimensional neuroimaging data. The learnable feature reduction and selection capabilities of CNNs have proven effective in high level prediction and analysis tasks and has reduced the need for highly specific domain knowledge. Specialized networks such as U-nets [25] , V-nets [108] , and GANs [31] are also popular in neuroimaging, and have been leveraged for a variety of segmentation and synthesis tasks.

Neuroimage segmentation and tissue classification. Accurate segmentation is an important preprocessing step that informs much of the downstream analytic and predictive tasks done in neuroimaging. Commonly used tools such as FreeSurfer [110] rely on atlas based methods, whereby an atlas is deformably registered to the scan, which requires time consuming optimization problems to be solved. Proposed deep learning based approaches, however, are relatively computationally inexpensive during inference. Recent research has focused on important tasks such as deep learning based brain extraction [111] , cortical and subcortical segmentation [112] , [113] , [114] , [115] , and tumor and lesion segmentation [116] , [117] . Some interesting research has looked at improving the generalization performance of deep learning based segmentation methods across neuroimaging datasets imaged at different scanners. In particular, Kamnitsas et al. [46] has proposed a training schema which leverages adversarial training to learn scanner invariant feature representations. They use an adversarial network to classify the origin of the input data based on the downstream feature representation learned by the segmentation network. By penalizing the segmentation network for improved performance of the adversarial network, they show improved segmentation generalization across datasets. Brain tumor segmentation has been another active area of research in the neuroimaging community where deep learning has shown promise. In the past, brain tumor datasets have remained relatively small, particularly ones with subjects imaged at a single institution. The Brain Tumor Segmentation Challenge (BraTS) [118] has provided the community with an accessible dataset as well as a way to benchmark various approaches against one another. While it has been seen that deep learning has difficulty in training on datasets with relatively few scans, new architectures and training methods are becoming increasingly effective at this. Havaei et al. [116] demonstrate the performance of their glioblastoma segmentation network on the BraTS dataset, achieving high accuracy while being substantially faster than previous methods. Another task where deep networks are finding increasing success is semantic segmentation in which anatomical labels are not necessarily well-defined by image intensity changes but can be identified by relative anatomical locations. A good example is that of cerebellum parcellation where deep networks performed best in a recent comparison of methods [119] . The even newer ACAPULCO method [109] uses two cascaded deep networks to produce cerebellar lobule labels, as shown in Figure 4 .

Deformable image registration. Image registration allows for imaging analysis in a single subject across imaging modalities and time points. Deep learning based deformable registration with neuroimaging data has proven to be a difficult problem, especially considering the lack of ground truth. Still, some unique and varied approaches have achieved state-of-theart results with relatively fast run times [120] , [121] , [122] , [123] . Li et al. [121] propose a fully convolutional "selfsupervised" approach to learn the appropriate spatial transformations at multiple resolutions. Balakrishnan et al. [120] propose a method for unsupervised image registration which attempts to directly compute the deformation field.

Neuroimaging prediction. With many architectures being borrowed from the computer vision community, deep learning based prediction in neuroimaging has quickly gained popularity. Traditionally, machine learning based prediction on neuroimaging data has relied on careful feature selection/engineering; often taking the form of regional summary measures which may not account for all the informative variation for a particular task. Whereas, in deep learning, it is common to work with raw imaging data, where the appropriate feature representations can be learned through optimization. This can be particularly useful for high-level prediction tasks in which we do not know what imaging features will be informative. Further, by working on the raw image, reliance on complex and time consuming preprocessing can be reduced. In recent years, a large amount of work has been published on deep learning based prediction tasks such as brain age prediction [124] , [125] , [126] , Alzheimer's disease classification and trajectory modeling [127] , [128] , [129] , [130] , and schizophrenia classification [131] , [132] . Some work has considered the use of deep Siamese networks for longitudinal image analysis. Siamese networks have gained popularity for their success in facial recognition. They work by jointly optimizing a set of weights on two images with respect to some distance metric between them. This setup makes them effective at identifying longitudinal changes on some chosen dimension. Bhagwat et al. [133] consider the use of longitudinal Siamese network for the prediction of future Alzheimer's disease onset, using two early time points. They show substantially improved performance in identifying future Alzheimer's cases, with the use of two time points versus only a baseline scan.

The use of GANs in neuroimaging. GANs have enabled complex image synthesis tasks in neuroimaging, many of which have no comparable analogs in traditional machine learning. GANs and their variants have been used in neuroimaging for cross-modality synthesis [134] , motion artifact reduction [135] , resolution upscaling [136] , [137] , [138] , estimating full-dosage PET images from low-dosage PET [139] , [140] , [141] , image harmonization [33] , [142] , heterogeneity analysis [143] , and more. To help facilitate such work, the popular MedGAN [144] proposes a series of modifications and new loss functions to traditional GANs, aimed at preserving anatomically relevant information and fine details. They use auxiliary classifiers on the translated image to ensure the resulting image feature representation is similar to the expected image representation for a given task. Additionally, they use style-transfer loss in combination with adversarial loss to ensure fine structures and textural details are matched in the translation. Some promising new work attempts to reduce the amount of radioactive tracer needed in PET imaging, potentially reducing associated costs and health risks. This problem can be framed as an image synthesis task, whereby the final image can be synthesized from the low dose image. In [145] , the pixel location information is integrated into a deep network for image synthesis. Kaplan and Zhu [141] propose a deep generative based denoising method, that uses paired scans of subjects imaged with both low and full dose PET. They show that despite a ten-fold reduction in tracer material, they are able to preserve important edge, structural, and textural details. Consistent quantification in neuroimaging has been hampered for decades by the high variability in MR image intensities and resolutions between scans. Dewey et al. [33] use a U-Net style architecture and paired subjects who have been scanned with two different protocols, to learn a mapping between the two sites. Resolution differences are addressed by applying a super-resolution method to the images acquired at lower resolutions [146] . They are able to use the network to reduce site based variation, which improves consistency in segmentation between the two sites.

While deep learning in neuroimaging has certainly opened up many interesting avenues of investigation, certain areas still lack a rigorous understanding. Important lines of research such as learning from limited data, optimal hyperparameter selection, domain adaptation, semi-supervised designs, and improving robustness require further investigation.

The quantification and understanding of cardiac anatomy and function has been transformed by the recent progress in the field of data-driven deep learning. There has been significant recent work in a variety of sub-areas of cardiovascular imaging including image reconstruction [147] , end-to-end learning of cardiac pathology from images [148] and incorporation of nonimaging information (e.g. genetics [149] and clinical information) for analysis. Here we briefly focus on three key aspects of deep learning in this field: cardiac chamber segmentation, cardiac motion/deformation analysis and analysis of cardiac vessels. Motion tracking and segmentation both play crucial roles in the detection and quantification of myocardial chamber dysfunction and can help in the diagnosis of cardiovascular disease (CVD). Traditionally, these tasks are treated uniquely and solved as separate steps. Often times, motion tracking algorithms use segmentation results as an anatomical guide to sample points and regions of interest used to generate displacement fields e.g. [150] . In part due to this, there have also been efforts to combine motion tracking and segmentation.

Cardiac image segmentation is an important first step for many clinical applications. The aim is typically to segment the main chambers, e.g. the left ventricle (LV), right ventricle (RV), left atrium (LA) and right atrium (RA). This enables the quantification of parameters that describe cardiac morphology, e.g. LV volume or mass, or cardiac function, e.g. wall thickening and ejection fraction. There has been significant deep learning work on cardiac chamber segmentation, mostly characterized by the type of images (modalities) employed and whether the work is in 2D or 3D. One of the first efforts to apply a fully convolutional network (FCN) [151] to segment the left ventricle (LV), myocardium and right ventricle from 2D short-axis cardiac magnetic resonance (MR) images was done by Tran [152] , significantly outperforming traditional methods in accuracy and speed. Since this time, a variety of other FCN-based strategies have been developed [153] , especially focusing on the popular U-Net approach, often including both 2D and 3D constraints (e.g. [154] ). The incorporation of spatial and temporal context has also been an important research direction, including efforts to simultaneously segment the heart in both the end-diastolic and end-systolic states [155] . Shape-based constraints had previously been found useful for LV chamber segmentation using other types of machine learning (e.g. [156] ) and were nicely included in an anatomicallyconstrained deep learning strategy by [157] . This stacked convolutional autoencoder approach was successfully applied to LV segmentation from 3D echocardiography data as well.

Other important work has been aimed at atrial segmentation from MRI [158] , whole heart segmentation from CT [159] and LV segmentation from 3D ultrasound image sequences (e.g. [160] ), the latter using a combination of atlas registration and adversarial learning. Progress in deep learning for cardiac segmentation is enabled by a number of ongoing challenges in the field (e.g. [161] , [162] ).

Cardiac motion tracking is key for deformation/strain analysis and is important for analyzing the mechanical perfor- mance of heart chambers. A variety of image registration, feature-based tracking and regularization methods using both biomechanical models and data-driven learning have been developed. One special type of dataset useful for tracking are MRI tagging acquisitions, and deep learning has recently played a role in tracking these tags and quantifying the displacement information for motion tracking and analysis [164] using a combination of recurrent neural networks (RNNs) and convolutional neural netorks (CNNs) to estimate myocardial strain from short axis MR tag image sequences. Estimating motion displacements and strain is also possible to do from both standard MR image sequences and 4D echocardiography, most often by integrating ideas of image segmentation and mapping between frames using some type of image registration. Recent efforts for cardiac motion tracking from magnetic resonance (MR) imaging have adopted approaches from the computer vision field that suggest that the tasks of motion tracking and segmentation are closely related and information used to complete one task may complement and improve the overall performance of the other. In particular, an interesting deep learning approach proposed for joint learning of video object segmentation and optical flow (motion displacements) is termed SegFlow [165] , an end-to-end unified network that simultaneously trains both tasks and exploits the commonality of these two tasks through bi-directional feature sharing. Among the first to integrate this idea into cardiac analysis was Qin et al. [60] , who successfully implemented the idea of combining motion and segmentation on 2D cardiac MR sequences by developing a dual Siamese style recurrent spatial transformer network and fully convolutional segmentation network to simultaneously estimate motion and generate segmentation masks. This work was mainly aimed at 2D MR images, which have higher SNR than echocardiographic images and, therefore, more clearly delineated LV walls. It remains challenging to directly apply this approach to echocardiography. Very recent efforts by Ta et al. [163] (see Fig. 5 ) propose a 4D (3D+t) semi-supervised joint network to simultaneously track LV motion while segmenting the LV wall. The network is trained in an iterative manner where results from one branch influence and regularize the other. Displacement fields are further regularized by a biomechanically-inspired incompressibility constraint that enforces realistic cardiac motion behavior. The proposed model is different from other models in that it expands the network to 4D in order to capture out of plane motion. Finally, clinical interpretability of deep learningderived motion information will be an important topic in the years ahead (e.g. [166] ).

Cardiac vessel segmentation is another important task for cardiac image analysis and includes the segmentation of the vessels including the great vessels (e.g. aorta, pulmonary arteries and veins) as well as the coronary arteries. The segmentation of large vessels such as the aorta is important for accurate mechanical and hemodynamic characterization, e.g. for assessment of aortic compliance. Several deep learning approaches have been proposed for this segmentation task, including the use of recurrent neural networks in order to track the aorta in cardiac MR image sequences in the presence of noise and artefacts [167] . A similarly important task is the segmentation of the coronary arteries as a precursor to quantitative analysis for the assessment of stenosis or the simulation of blood flow simulation for the calculation of fractional flow reserve from CT angiography (CTA). The approaches for coronary artery segmentation can be divided into those approaches that extract the vessel centerline and those that segment the vessel lumen.

One end-to-end trainable approach for the extraction of the coronary centerline has been proposed in [168] . In this approach the centerline is extracted using a multi-task fully convolutional network which simultaneously computes centerline distance maps and detects branch endpoints. The method generates single-pixel-wide centerlines with no spurious branches. An interesting aspect of this technique is that it can handle an arbitrary vessel tree with no prior assumption regarding depth of the vessel tree or its bifurcation pattern. In contrast to this, Wolterink et al. [169] propose a CNN that is trained to predict the most likely direction and radius of the coronary artery within a local 3D image patch. Starting from a seed point, the coronary artery is tracked by following the vessel centerline using the predictions of the CNN.

Alternative approaches to centerline extraction are based on techniques that instead aim to segment the vessel lumen, e.g. using CNN segmentation methods that perform segmentation by predicting vessel probability maps. One elegant approach has been proposed by Moeskops et al. [170] : Here a single CNN is trained to perform three different segmentation tasks including coronary artery segmentation in cardiac CTA. Instead of performing voxelwise segmentation, Lee et al. [171] introduce a tubular shape prior for the vessel segments. This is implemented via a template transformer network, through which a shape template can be deformed via network-based registration to produce an accurate segmentation of the input image, as well as to guarantee topological constraints.

More recently, geometric deep learning approaches have also been applied for coronary artery segmentation. For example, Wolterink et al. [172] used graph convolutional networks for coronary artery segmentation. Here vertices on surface of the coronary artery are used as graph nodes and their locations is optimized in an end-to-end fashion.

Recently there has been an accelerating progress in automated detection, classification and segmentation of abdominal anatomy and disease using medical imaging. Large public data sets such as the MICCAI Data Decathlon and Deep Lesion data sets have facilitated progress [173] , [174] .

Organs and lesions. Multi-organ approaches have been popular methods for anatomy localization and segmentation [175] . For individual organs, the liver, prostate and spine are arguably the most accurately segmented structures and the most actively investigated with deep learning. Other organs of interest to deep learning researchers include pancreas, lymph nodes and bowel.

A number of studies have used U-Net to segment the liver and liver lesions and assess for hepatic steatosis [176] , [177] , [178] . Dice coefficients for liver segmentation typically exceed 95%. In the prostate, gland segmentation and lesion detection has been the subject of an SPIE/AAPM challenge (competition) and numerous publications [179] , [180] . Several groups have used data sets such as TCIA CT pancreas to improve pancreas segmentation with Dice coefficients reaching the mid 80 percentile [181] , [182] , [183] , [184] . Automated detection of pancreatic cancer using deep learning has also been reported [184] . Deep learning has been used for determining pancreatic tumor growth rates in patients with neuroendocrine tumors of the pancreas [185] . The spleen has been segmented with a Dice score of 0.962 [186] . Recently marginal loss and exclusion loss [57] have been proposed to train a single multiorgan segmentation network from a union of partially labelled datasets.

Enlarged lymph nodes can indicate the presence of inflammation, infection, or metastatic cancer. Studies have assessed abdominal lymph nodes on CT in general and for specific diseases such as prostate cancer [187] , [22] , [188] . The TCIA CT lymph node dataset has enabled progress in this area [189] .

In the bowel, CT colonography computer-aided polyp detection was a hot topic in abdominal CT image analysis over a decade ago. Recent progress with deep learning has been limited but studies have reported improved electronic bowel cleansing, and higher sensitivities and lower false-positive rates for precancerous colonic polyp detection [187] , [190] . Deep learning using persistent homology has recently shown success for small bowel segmentation on CT [191] . Colonic inflammation can be detected on CT with deep learning [192] . Appendicitis can be detected on CT scans by pre-training with natural world videos [193] . The Inception V3 convolutional neural network could detect small bowel obstruction on abdominal radiographs [194] .

Kidney function can be predicted using deep learning of ultrasound images [195] . Potentially diffuse disorders such as ovarian cancer and abnormal blood collections were detectable using deep learning [196] , [197] . Organs at risk for radiation therapy of the male pelvis, such as bladder and rectum, have been segmented on CT using U-Net [198] .

Universal lesion detectors [174] , [199] have been developed for body CT including abdominal CT ( Figure 6 ). The universal lesion detector identifies, classifies and measures lymph nodes and a variety of tumors throughout the abdomen. This detector was trained using the publicly available Deep Lesion data set.

Opportunistic screening to quantify and detect underreported chronic diseases has been an area of recent interest. Example deep learning methods for opportunistic screening in the abdomen include automated bone mineral densitometry, visceral fat assessment, muscle volume and quality assessment, and aortic atherosclerotic plaque quantification [200] . Studies indicate that these measurements can be done accurately and generalize well to new patient populations. These opportunistic screening assessments also enable prediction of survival and cardiovascular morbidity such as heart attack and stroke [200] .

Deep learning for abdominal imaging is likely to continue to advance rapidly. For translation to the clinic, some of the most important advances sought will be in demonstrating generalizability across different patient populations and variations in image acquisition.

With the advent of whole slide scanning and the development of large digital datasets of tissue slide images, there has been a significant increase in application of deep learning approaches to digital pathology data [201] . While the initial application of these approaches in the area of digital pathology primarily focused on its utility for detection and segmentation of individual primitives like lymphocytes and cancer nuclei, they have now progressed to addressing higher level diagnostic and prognostic tasks and also the application of DL approaches to predict the underlying molecular underpinning and mutational status of the disease. Briefly below we describe the evolving applications of DL approaches to digital pathology. Nuclei detection and segmentation. One of the early applications of DL to whole slide pathology images was in the detection and segmentation of individual nuclei. Xu et al. [202] presented an approach using stacked spare autoencoder approach to identify the location of individual cancer nuclei on breast cancer pathology images. Subsequently work from Janowczyk et al. [203] demonstrated the utility of DL approaches for identifying and segmenting a number of different histologic primitives including lymphocytes, tubules, mitotic figures, cancer extent and also for classifying different disease categories pertaining to leukemia. The comprehensive tutorial also went into great detail with regard to best practices for annotation, network training and testing protocols. Subsequently Cruz-Roa et al. demonstrated that convolutional neural networks could be applied for accurately identifying cancer presence and extent on whole slide breast cancer pathology images [204] . The approach was shown to have a 100% accuracy at identifying the presence or absence of cancer on a slide or patient level. Subsequently Cruz-Roa et al also demonstrated the use of a high-throughput adaptive sampling approach for improving the efficiency of the CNN presented previously in [205] . In [206] Veta and colleagues discussed the diagnostic assessment of DL algorithms for detection of lymph node metastases in women with breast cancer, as part of the CAMELYON16 challenge. The work found that at least 5 DL algorithms performed comparably to a pathologist interpreting the slides in the absence of time constraints and that some DL algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic a routine pathology workflow. In a related study of lung cancer pathology images, Coudray et al. [207] trained a deep CNN (inception V3) on WSIs from The Cancer Genome Atlas (TCGA) to accurately and automatically classify them into lung adenocarcinoma, squamous cell carcinoma or normal lung tissue, yielding an area under the curve of 0.97.

One of the challenges with the CNN based approaches described in [204] , [205] , [206] , [207] is the need for detailed annotations of the target of interest. This is a labor intensive task given that annotations of disease extent typically need to be provided by pathologists who have a minimal amount of time to begin with. In a comprehensive paper by Campanella et al. [208] , the team employed a weakly supervised approach for training a deep learning algorithm for identifying the presence or absence of cancer on a slide level. They were able to demonstrate on a large scale study of over 44K WSIs from over 15K patients, that for prostate cancer, basal cell carcinoma and breast cancer metastases to axillary lymph nodes, the corresponding areas under the curve were all above 0.98. The authors suggested that the approach could be used by pathologists to exclude 65-75% of slides while retaining 100% cancer detection sensitivity.

Disease grading. Pathologists can reliably identify disease type and extent on H&E slides and have made observations for decades that there are features of disease that correlate with its behavior. However they are unable to reproducibly identify or quantify these histologic hallmarks of disease behavior with enough rigor, to use these features routinely to dictate disease outcome and treatment response. One of the areas that DL has been applied has been to try and mimic pathologist's identification of disease hallmarks, especially in the context of cancer. For instance in prostate cancer pathologists typically aim to place the cancer into one of 5 different categories, referred to as the Gleason grade groups [209] . However this grading system, as with many other cancers and diseases, is subject to inter-reader variability and disagreement. Consequently, a number of recent DL approaches for prostate cancer grading have been presented. Bulten et al. [210] and Strom et al. [211] both recently published large cohort studies involving 1243 and 976 patients respectively, that DL approaches could be used for achieving a performance of Gleason grading that was comparable to pathologists [212] .

Mutation identification and pathway association. Disease morphology reflects the sum of all temporal genetic and epigenetic changes and alterations within the disease. Recognizing this, some groups have begun to explore the role of DL approaches for identifying disease specific mutations and associations with biological pathways. Oncotype DX is a 21-gene expression assay that is prognostic and predictive of benefit of adjuvant chemotherapy for early stage estrogen receptor positive breast cancers. In two related studies [213] , [214] , Romo-Bucheli showed that DL could be used to identify tubule density and mitotic index from pathology images and demonstrated a strong association between these measurements with the Oncotype DX risk categories (low, intermediate and high) for breast cancers. Interestingly, tubule density and mitotic index are an important component of breast cancer grading. Microsatellite instability (MSI) is a condition that results from impaired DNA mismatch repair. To assess whether a tumor is MSI, genetic or immunohistochemical tests are required. A study by Kather et al. [215] showed that DL could predict MSI from histology images in gastrointestinal cancers with an AUC=0.84. Coudray et al. [207] show that a DL network can be trained to recognize many of the commonly mutated genes in non-small cell lung adenocarcinoma. They showed that six of these mutated genes-TK11, EGFR, FAT1, SETBP1, KRAS, and TP53-can be predicted from pathology images, with AUCs from 0.733 to 0.856.

Survival and disease outcome prediction. More recently, there has been interest in applying DL algorithms to pathology images to directly predict survival and disease outcome. In a recent paper, Skrede et al. [ [217] presented an approach employing DL for predicting patient outcome in the case of mesothelioma. Saillard et al. [218] used DL to predict survival after hepatocellular carcinoma resection. While the studies described above clearly reflect the growing influence and impact of DL on a variety of image analysis and classification problems in digital pathology, there are still concerns with regards to its interpretability, the need for large training sets, the need for annotated data, and generalizability. Attempts have been made to use approaches like visual attention mapping [219] to provide some degree of transparency with respect to where in the image the DL network appears to be focusing its attention. Another approach to imbue interpretability is via hybrid approaches, wherein DL is used to identify specific primitives of interest (e.g. lymphocytes) in the pathology images (in other words using it as a detection and segmentation tool) and then deriving handcrafted features from these primitives (e.g. spatial patterns of arrangement of lymphocytes) to perform prognosis and classification tasks [220] , [221] , [222] . However, as Bera et al. [201] noted in a recent review article, while DL approaches might be feasible for diagnostic indications, clinical tasks relating to outcome prediction and treatment response might still involve approaches that provide greater interpretability. While it seems highly likely that research in DL and its application to digital pathology are likely to continue to grow, it remains to be seen how these approaches fare in a prospective and clinical trial setting, which in turn might ultimately determine their translation to the clinic.

III. DISCUSSION Technical challenges ahead. In this overview paper, many technological challenges across several medical domains and tasks were reviewed. In general, most challenges are met by continuous improvement of solutions to the well known Data challenge. The community as a whole is continuously developing and improving transfer learning based solutions and data augmentation schemes. As systems are starting to be implemented across datasets, hospitals, and countries, a new spectrum of challenges is arising including System robustness and Generalization across acquisition protocols, machines, and hospitals. Here, data pre-processing, continuous model learning, and fine-tuning across systems are a few of the new developments ahead. Detailed reviews of the topics presented herein, as well as additional topics, such as robustness to adversarial attacks and interpretability, can be found in several recent DL review articles such as [223] .

How do we get new tools into the clinic? The question whether DL tools are used in the clinic is often raised. This question is particularly relevant because results in many tasks and challenges show radiologist-level performance. In several recent works conducted to estimate the utility of AI-based technology as an aid to the radiologist, it is consistently shown that human experts with AI perform better than those without AI [224] . The excitement in the field has led to the emergence of many AI medical imaging startup companies. Still, to date not much technology has evolved from research to actual clinical use. There are a variety of reasons for this including: users being cautious regarding the technology, specifically the prospect of being replaced by AI; the need to prove that the technology can address real user needs and bring quantifiable benefits; regulatory pathways that are long and costly, patient safety considerations, and economic factors such as who will pay for AI tools.

The forecast going forward is that this is an emerging field, with enormous promise going forward. How would we get there? An interesting possibility is that the worldwide experience with the COVID-19 pandemic will actually serve to bridge the gap between the need and AI-with users eagerer to receive support, and even the regulatory steps more adaptive to facilitate a transition of general computational tools, with COVID-AI related tools in particular. Within the last several months, we saw several interesting advances: We experienced the ability of AI to rapidly adapt from existing pretrained models to new disease manifestations of COVID-19, using the many tools described in this article and in [225] , [105] . We see strong and robust DL based solutions for COVID-19 detection, localization, quantification, and characterization starting to support the initial diagnosis and more so the followup of hospitalized patients. AI-based tools are being developed to support the assessment of disease severity and, recently, they are starting to provide tools for assessing treatment and predicting treatment success [226] , [227] .

Future promise. As we envision future possibilities, one immediate step forward is to combine the image with additional clinical context, from patient record to additional clinical descriptors (such as blood tests, genomics, medications, vital signs, and non-imaging data such as ECG). This step will provide a transition from image space to patient-level information. Collecting cohorts will enable population-level statistical analysis to learn about disease manifestations, treatment responses, adverse reactions from and interactions between medications, and more. This step requires building complex infrastructure, along with the generation of new privacy and security regulations-between hospitals and academic research institutes, across hospitals, and in multi-national consortia. As more and more data become available, DL and AI will enable unsupervised explorations within the data, thus providing for new discoveries of drugs and treatments towards advancement and augmentation of healthcare as we know it.

ACKNOWLEDGEMENT Ronald M. Summers was supported in part by the National Institutes of Health Clinical Center. Anant Madabhushi thanks research support by the National Institutes of Health under award numbers 1U24CA199374-01, R01CA202752-01A1, R01CA208236-01A1, R01 CA216579-01A1, R01 CA220581-01A1, 1U01 CA239055-01, 1U01CA248226-01, 1R43EB028736-01 and the VA Merit Review Award IBX004121A from the United States Department of Veterans Affairs Biomedical Laboratory Research and Development Service. We extend our special acknowledgement to Vishnu Bashyam for his help.

Handbook of Medical Imaging

Informatics in radiology: radiology gamuts ontology: differential diagnosis for the semantic web

Statistical validation of image segmentation quality based on a spatial overlap index1: scientific reports

Handbook of Medical Imaging: Medical image processing and analysis

Medical Imaging Signals and Systems

Handbook of Medical Image Processing and Analysis

Handbook of Medical Image Computing and Computer Assisted Intervention

Deep learning

Guest editorial deep learning in medical imaging: Overview and future promise of an exciting new technique

A survey on deep learning in medical image analysis

Deep learning for medical image analysis

Deep learning in medical image analysis

Deep learning applications in medical image analysis

Generative adversarial network in medical imaging: A review

Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis

Deep learning techniques for medical image segmentation: Achievements and challenges

Biomedical imaging and analysis in the age of big data and deep learning

Deep learning in medical image registration: a survey

ImageNet classification with deep convolutional neural networks

Chest pathology detection using deep learning with non-medical training

Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning

Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs

GAN-based synthetic medical image augmentation for increased CNN performance in liver lesion classification

U-net: Convolutional networks for biomedical image segmentation

Very deep convolutional networks for large-scale image recognition

Going deeper with convolutions

Deep residual learning for image recognition

Densely connected convolutional networks

Holistically-nested edge detection

Generative adversarial nets

Machine learning for tomographic imaging

DeepHarmony: a deep learning approach to contrast harmonization across scanner changes

Automatic liver segmentation using an adversarial image-to-image network

Show, attend and tell: Neural image caption generation with visual attention

Squeeze-and-excitation networks

Self-attention generative adversarial networks

Attention u-net: Learning where to look for the pancreas

Neural architecture search: A survey

V-nas: Neural architecture search for volumetric medical image segmentation

MobileNets: Efficient convolutional neural networks for mobile vision applications

ShuffleNet: An extremely efficient convolutional neural network for mobile devices

Transfusion: Understanding transfer learning for medical imaging

3D anisotropic hybrid network: Transferring convolutional features from 2D images to 3D anisotropic volumes

Med3D: Transfer learning for 3D medical image analysis

Unsupervised domain adaptation in brain lesion segmentation with adversarial networks

Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network

Unsupervised cross-modality domain adaptation of convnets for biomedical image segmentations with adversarial loss

3D U 2 -net: A 3D universal u-net for multi-domain medical image segmentation

Models genesis: Generic autodidactic models for 3d medical image analysis

Rubik's cube+: A self-supervised feature learning framework for 3D medical image analysis

Semisupervised learning for network-based cardiac MR image segmentation

ASDNet: Attention based semi-supervised deep networks for medical image segmentation

Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases

Weakly supervised histopathology cancer image segmentation and classification

Constrained-CNN losses for weakly supervised segmentation

Marginal loss and exclusion loss for partially supervised multi-organ segmentation

f-AnoGan: Fast unsupervised anomaly detection with generative adversarial networks

Unsupervised deformable registration for multi-modal images via disentangled representations

Joint learning of motion estimation and segmentation for cardiac MR image sequences

ADN: Artifact disentanglement network for unsupervised metal artifact reduction

Improving CNN training using disentanglement for liver lesion classification in CT

Unsupervised domain adaptation via disentangled representations: Application to cross-modality liver segmentation

Disentangled representation learning in cardiac image analysis

Encoding CT anatomy knowledge for unpaired chest X-Ray image decomposition

Lung structures enhancement in chest radiographs via ct based fcnn training

DudoNet: Dual domain network for CT metal artifact reduction

Federated machine learning: Concept and applications

Federated learning: Strategies for improving communication efficiency

Federated learning with non-IID data

How to backdoor federated learning

Multi-institutional deep learning modeling without sharing patient data: A feasibility study on brain tumor segmentation

Privacy-preserving federated brain tumour segmentation

Multi-site fMRI analysis using privacy-preserving federated learning and domain adaptation: Abide results

terpretable machine learning: definitions, methods, and applications

Global and local interpretability for cardiac MRI classification

Learning interpretable anatomical features through deep generative models: Application to cardiac remodeling

Graph neural network for interpreting task-fMRI biomarkers

Brain biomarker interpretation in ASD using deep learning and fMRI

Uncertainty and interpretability in convolutional neural networks for semantic segmentation of colorectal polyps

Enhancing interpretability of automatically extracted machine learning features: application to a RBM-random forest system on brain lesion segmentation

Dropout as a bayesian approximation: Representing model uncertainty in deep learning

Phiseg: Capturing uncertainty in medical image segmentation

Estimating uncertainty in MRFbased image segmentation: A perfect-MCMC approach

Assessing reliability and challenges of uncertainty estimations for medical image segmentation

Exploring uncertainty measures in deep networks for multiple sclerosis lesion detection and segmentation

Quantifying and leveraging classification uncertainty for chest radiograph assessment

Expert-validated estimation of diagnostic uncertainty for deep neural networks in diabetic retinopathy detection

DR|GRADUATE: uncertaintyaware deep learning-based diabetic retinopathy grading in eye fundus images

Training a neural network based on unreliable human annotation of medical images

Deep learning for classification and localization of covid-19 markers in point-of-care lung ultrasound

Pulmonary lobe segmentation using a sequence of convolutional neural networks for marginal learning

Relational modeling for robust and efficient pulmonary lobe segmentation in CT scans

Genetic epidemiology of COPD (COPDGene) study design

FissureNet: A deep learning approach for pulmonary fissure detection in ct images

CheXpert: A large chest radiograph dataset with uncertainty labels and expert comparison

MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs

PadChest: A large chest x-ray image dataset with multi-label annotated reports

Lung CT Screening Reporting & Data System v1.1

Evolving the pulmonary nodules diagnosis from classical approaches to deep learning-aided decision support: three decades' development course and future prospect

Towards automatic pulmonary nodule management in lung cancer screening with deep learning

End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography

Quo Vadis

Computer aided detection of tuberculosis on chest radiographs: An evaluation of the CAD4TB v6 system

COVID-19 on the chest radiograph: A multi-reader evaluation of an AI system

CO-RADS -a categorical CT assessment scheme for patients with suspected COVID-19: definition and evaluation

Automated design of deep learning methods for biomedical image segmentation

V-net: Fully convolutional neural networks for volumetric medical image segmentation

Automatic cerebellum anatomical parcellation using u-net with locally constrained optimization

FreeSurfer

Deep MRI brain extraction: A 3D convolutional neural network for skull stripping

3D fully convolutional networks for subcortical segmentation in MRI: A large-scale study

VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images

DeepNAT: Deep convolutional neural network for segmenting neuroanatomy

3d whole brain segmentation using spatially localized atlas network tiles

Brain tumor segmentation with deep neural networks

Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation

Identifying the best machine learning algorithms for brain tumor segmentation, progression assessment, and overall survival prediction in the BRATS challenge

Comparing fully automated state-of-the-art cerebellum parcellation from magnetic resonance images

VoxelMorph: a learning framework for deformable medical image registration

Non-rigid image registration using fully convolutional networks with deep self-supervision

A CNN regression approach for real-time 2D/3D registration

QuickSilver: Fast predictive image registration-a deep learning approach

MRI signatures of brain age and disease over the lifespan

Predicting brain age with deep learning from raw imaging data results in a reliable and heritable biomarker

Brain age prediction using deep learning uncovers associated sequence variants

Early diagnosis of Alzheimer's disease with deep learning

Multimodal and multiscale deep neural networks for the early diagnosis of Alzheimer's disease using structural MR and FDG-PET images

Brain MRI analysis for Alzheimer's disease diagnosis using an ensemble system of deep convolutional neural networks

Landmark-based deep multiinstance learning for brain disease diagnosis

Discriminating schizophrenia using recurrent neural network applied on time courses of multi-site FMRI data

Multi-site diagnostic classification of schizophrenia using discriminant deep learning with functional connectivity MRI

Modeling and prediction of clinical symptom trajectories in Alzheimer's disease using longitudinal data

Deep MR to CT synthesis using unpaired data

Motion artifact reduction in abdominal MR imaging using the U-NET network

Brain MRI super resolution using 3D deep densely connected neural networks

Brain MRI super-resolution using deep 3D convolutional networks

Super-resolution PET imaging using convolutional neural networks

Ultra-low-dose 18F-florbetaben amyloid PET imaging using deep learning with multi-contrast MRI inputs

200x low-dose PET reconstruction using deep learning

Full-dose PET image estimation from lowdose PET image using deep learning: a pilot study

Interscanner harmonization of high angular resolution DW-MRI using null space deep learning

Smile-GANs: Semi-supervised clustering via GANs for dissecting brain disease heterogeneity from medical images

MedGAN: Medical image translation using GANs

Cross-domain synthesis of medical images using efficient location-sensitive deep network

A deep learning based anti-aliasing self super-resolution algorithm for MRI

From compressedsensing to artificial intelligence-based cardiac mri reconstruction

Explainable cardiac pathology classification on cine MRI with motion characterization by semisupervised learning of apparent flow

Artificial intelligence for cardiac imaging-genetics research

Flow network tracking for spatiotemporal and periodic point matching: Applied to cardiac motion analysis

Fully convolutional networks for semantic segmentation

A fully convolutional neural network for cardiac segmentation in short-axis MRI

Automated cardiovascular magnetic resonance image analysis with fully convolutional networks

Automatic cardiac disease assessment on cine-MRI via timeseries segmentation and domain specific features

Automatic segmentation and disease classification using cardiac cine mr images

Contour tracking in echocardiographic sequences via sparse representation and dictionary learning

Anatomically Constrained Neural Networks (AC-NNs): Application to Cardiac Image Enhancement and Segmentation

Fully automatic left atrium segmentation from late gadolinium enhanced magnetic resonance imaging using a dual fully convolutional neural network

Multi-depth fusion network for whole-heart CT image segmentation

3D left ventricle segmentation on echocardiography with atlas guided generation and voxel-to-voxel discrimination

Left ventricle full quantification challenge (miccai 2019)

Multi-centre, multi-vendor & multi-disease cardiac image segmentation challenge (m&ms))," in Medical Image Computing and Computer Assisted Intervention (MICCAI)

A semisupervised joint network for simultaneous left ventricular motion tracking and segmentation in 4d echocardiography

Fully Automated Myocardial Strain Estimation from Cardiovascular MRI-tagged Images Using a Deep Learning Framework in the UK Biobank

SegFlow: Joint learning for video object segmentation and optical flow

Deeplearning cardiac motion analysis for human survival prediction

Recurrent neural networks for aortic image sequence segmentation with sparse annotations

DeepCenterline: A multi-task fully convolutional network for centerline extraction

Coronary artery centerline extraction in cardiac CT angiography using a CNN-based orientation classifier

Deep learning for multi-task medical image segmentation in multiple modalities

TETRIS: Template transformer networks for image segmentation with shape priors

Graph convolutional networks for coronary artery segmentation in cardiac CT angiography

A large annotated medical image dataset for the development and evaluation of segmentation algorithms

DeepLesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning

Computational anatomy for multi-organ analysis in medical imaging: A review

Modified U-Net (mU-Net) with incorporation of object-dependent high level features for improved liver and liver-tumor segmentation in CT images

Automated liver fat quantification at nonenhanced abdominal CT for population-based steatosis assessment

E2Net: An edge enhanced network for accurate liver and tumor segmentation on CT scans

PROSTATEx challenges for computerized classification of prostate lesions from multiparametric magnetic resonance images

Fully automated prostate whole gland and central gland segmentation on MRI using holistically nested networks with short connections

Spatial aggregation of holistically-nested convolutional neural networks for automated pancreas localization and segmentation

Data from Pancreas-CT

An application of cascaded 3D fully convolutional networks for medical image segmentation

Application of deep learning to pancreatic cancer detection: Lessons learned from our initial experience

Spatio-temporal convolutional LSTMs for tumor growth prediction by learning 4d longitudinal patient data

Fully automatic volume measurement of the spleen at ct using deep learning

Improving computer-aided detection using convolutional neural networks and random view aggregation

One click lesion RECIST measurement and segmentation on CT scans

CT lymph nodes

Deep learning electronic cleansing for single-and dual-energy CT colonography

Deep small bowel segmentation with cylindrical topological constraints

Detection and diagnosis of colitis on computed tomography using deep convolutional neural networks

AppendiXNet: Deep learning for diagnosis of appendicitis from a small dataset of CT exams using video pretraining

Detection of high-grade small bowel obstruction on conventional radiography with convolutional neural networks

Automation of the kidney function prediction and classification through ultrasoundbased kidney imaging using deep learning

Deep learning provides a new computed tomographybased prognostic biomarker for recurrence prediction in high-grade serous ovarian cancer

Performance of a deep learning algorithm for automated segmentation and quantification of traumatic pelvic hematomas on CT

Segmentation of the prostate and organs at risk in male pelvic CT images using deep learning

Bounding maps for universal lesion detection

Automated CT biomarkers for opportunistic prediction of future cardiovascular events and mortality in an asymptomatic screening population: a retrospective cohort study

Artificial intelligence in digital pathology-new tools for diagnosis and precision oncology

Stacked sparse autoencoder (SSAE) for nuclei detection on breast cancer histopathology images

Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases

Accurate and reproducible invasive breast cancer detection in wholeslide images: A deep learning approach for quantifying tumor extent

Highthroughput adaptive sampling for whole-slide histopathology image analysis (HASHI) via convolutional neural networks: Application to invasive breast cancer detection

Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer

Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning

Clinical-grade computational pathology using weakly supervised deep learning on whole slide images

Reliability of gleason grading system in comparing prostate biopsies with total prostatectomy specimens

Automated deep-learning system for gleason grading of prostate cancer using biopsies: a diagnostic study

Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study

Deep-learning approaches for gleason grading of prostate biopsies

Automated tubule nuclei quantification and correlation with oncotype DX risk categories in ER+ breast cancer whole slide images

A deep learning based strategy for identifying and associating mitotic activity with gene expression derived risk categories in estrogen receptor positive breast cancers

Deep learning can predict microsatellite instability directly from histology in gastrointestinal cancer

Deep learning for prediction of colorectal cancer outcome: a discovery and validation study

Deep learning-based classification of mesothelioma improves prediction of patient outcome

Predicting survival after hepatocellular carcinoma resection using deep-learning on histological slides

Attention-based deep neural networks for detection of cancerous and precancerous esophagus tissue on histopathological slides

Relevance of spatial heterogeneity of immune infiltration for predicting risk of recurrence after endocrine therapy of ER+ breast cancer

Spatial organization and molecular correlation of tumor-infiltrating lymphocytes using deep learning on pathology images

Spatial architecture and arrangement of tumor-infiltrating lymphocytes for predicting likelihood of recurrence in early-stage non-small cell lung cancer

Model-based and data-driven strategies in medical image computing

Integration of chest CT CAD into the clinical workflow and impact on radiologist efficiency

Rapid AI development cycle for the coronavirus (COVID-19) pandemic: Initial results for automated detection & patient monitoring using deep learning CT image analysis

Review of artificial intelligence techniques in imaging data acquisition, segmentation and diagnosis for covid-19

Position paper on COVID-19 imaging and AI: From the clinical needs and technological challenges to initial ai solutions at the lab and national level towards a new era for ai in healthcare