key: cord-0958415-ilsmc0vf authors: Xie, Chen-Yi; Pang, Chun-Lap; Chan, Benjamin; Wong, Emily Yuen-Yuen; Dou, Qi; Vardhanabhuti, Varut title: Machine Learning and Radiomics Applications in Esophageal Cancers Using Non-Invasive Imaging Methods—A Critical Review of Literature date: 2021-05-19 journal: Cancers (Basel) DOI: 10.3390/cancers13102469 sha: 67de3dd383b4ac1cdd38543bd954bf2cbe8ade21 doc_id: 958415 cord_uid: ilsmc0vf SIMPLE SUMMARY: Non-invasive imaging modalities are commonly used in clinical practice. Recently, the application of machine learning (ML) techniques has provided a new scope for more detailed imaging analysis in esophageal cancer (EC) patients. Our review aims to explore the recent advances and future perspective of the ML technique in the disease management of EC patients. ML-based investigations can be used for diagnosis, treatment response evaluation, prognostication, and investigation of biological heterogeneity. The key results from the literature have demonstrated the potential of ML techniques, such as radiomic techniques and deep learning networks, to improve the decision-making process for EC patients in clinical practice. Recommendations have been made to improve study design and future applicability. ABSTRACT: Esophageal cancer (EC) is of public health significance as one of the leading causes of cancer death worldwide. Accurate staging, treatment planning and prognostication in EC patients are of vital importance. Recent advances in machine learning (ML) techniques demonstrate their potential to provide novel quantitative imaging markers in medical imaging. Radiomics approaches that could quantify medical images into high-dimensional data have been shown to improve the imaging-based classification system in characterizing the heterogeneity of primary tumors and lymph nodes in EC patients. In this review, we aim to provide a comprehensive summary of the evidence of the most recent developments in ML application in imaging pertinent to EC patient care. According to the published results, ML models evaluating treatment response and lymph node metastasis achieve reliable predictions, ranging from acceptable to outstanding in their validation groups. Patients stratified by ML models in different risk groups have a significant or borderline significant difference in survival outcomes. Prospective large multi-center studies are suggested to improve the generalizability of ML techniques with standardized imaging protocols and harmonization between different centers. Esophageal cancer (EC) is a malignancy affecting more than 500,000 people worldwide, ranking as the sixth leading cause of cancer death [1, 2] . ECs are classified based on cell types on histopathology, most commonly identified as either adenocarcinoma or squamous cell carcinoma (SCC). Late presentation is a frequent occurrence which means a higher information embedded in medical images could aid in improving prediction and reflect more accurate clinical endpoints is the driving force of radiomics. More specifically for cancer imaging, extracting image features based on tumor texture, phenotype, and habitat, radiomics' potential in quantifying intratumoral and intertumoral heterogeneity brings hope for more personalized and effective treatments for cancer patients, particularly where contemporary treatments may fail due to intratumor heterogeneity [20] . Varying methods of feature extraction exist between different radiomics pipelines [21] . The image biomarker standardization initiative (IBSI) is an independent international collaboration, which aims to improve the reproducibility and standardization of radiomics studies [22] . The IBSI reference manual has made recommendations for a set of standardized radiomics feature calculation and processing guidelines to practice radiomic studies. It is suggested to check radiomics software's compliance with the IBSI standard before application for better clinical translation. The workflow of a radiomic study can be categorized into six stages: problem identification, data curation, feature extraction, feature reduction, modelling, and model validation [23, 24] . The first step is to identify the clinical problem and what researchers are aiming to predict. A prediction endpoint (PE) usually refers to a clinical endpoint relevant to the patient and attending clinician. PEs could be categorical (for example treatment response patterns) or scaler (for example disease-free survival in months). Information about imaging modality that would be applicable to the clinical problem is needed. Image acquisition follows and involves obtaining the medical images from all patients in full accordance with the protocol to ensure consistency. Radiomics features are typically extracted from the regions of interests (ROIs). Delineation of ROI is conducted manually, semi-automatically or fully automatically. Manual segmentation requires experienced radiologists to contour the full tumor regions, which is still regarded as the gold standard in many applications [25] [26] [27] . Semi-automatic segmentation is usually realized by thresholding and region-growing techniques, which is more efficient as it requires less manual intervention [28, 29] . The manual or semi-automatic segmentation could be conducted using open-source or commercial software, such as ITK-SNAP [30] , LIFEx [31] , MITK [32] , ImageJ [33] , and 3D Slicer [34] . Bias caused by inter-algorithm and inter-reader variability [35] could be limited by multiple segmentation, such as manual segmentation by multiple clinicians or using a diverse combination of segmentation algorithms. A recent systematic review by Traverso et al. [36] pointed out that radiomics textural and shape features could be sensitive to manual segmentation differences. Semiautomated segmentation methods are found to provide a better alternative to manual slice-by-slice delineations in terms of feature reproducibility [37, 38] . Automatic segmentation methods are suggested to be broadly applied in EC studies to optimize the clinical applicability. Auto-segmentation using deep learning is now a rapidly developing technique and has been proved to be potentially a reliable and reproducible tool for tumor delineation [39, 40] . U-Net is a commonly used network architecture for auto-segmentation in medical images with exceptional performance demonstrated on many applications, which is a convolutional neural network consisting of a contracting path and a symmetric expanding path [41] . More complex deep neural network architectures with residual connections, skip connections, dense connections, or attention mechanisms have also been investigated for boosting automated segmentation accuracy. In addition to medical imaging data, researchers will also need to collect related clinical data (for example survival outcomes, demographics, clinical categorization). Feature extraction with high throughput is the essence of radiomics and involves quantitative descriptions of characteristics of ROI. There are several features that can be extracted, including shape-based features, first order, and higher-order features. Currently, there are several open-source packages and software available that can be utilized to create up to thousands of features. Readers are advised to refer to the following excellent articles for further insight [22, 31, 42] . It is important to note that features can be computed based on pixel spacing, quantization method, and the number of gray levels but care must be taken to ensure a consistent approach is taken when generating these features. Moreover, radiomics features could be both 2-dimensional (2D), depicting the lesion in a representative layer (usually the layer with the maximal diameter of largest tumor crosssection) or 3-dimensional (3D), contouring the whole lesion volume ROIs. Controversy still exists over the discriminative ability of 2D and 3D radiomics features. The 3D ROIs are considered to contain more comprehensive information and have shown good classification ability [43] . However, Shen et al. has demonstrated that 2D radiomics features showed better prognostic performance than 3D features of non-small cell lung cancer, probably due to inconsistent spatial resolution in CT images [44] . Yang et al. has proved that a combination of 2D and 3D features could optimize the prediction performance [45] . Radiomics features are sensitive to CT acquisition and reconstruction parameters. It is impractical to standardize platforms and parameters from all institutions in clinical settings. To reduce the batch effect, the ComBat harmonization has been recently shown to correct the difference in extracted radiomics feature values resulting from different image acquisition protocols [46, 47] . This method has been used to properly correct for the scanner effect while preserving the specific characteristics of each texture pattern. Increasing the number of features does not necessarily mean better performance. In fact, it has been shown that a high number of features often leads to high redundancy and poor repeatability necessitating feature reduction [48] . The purpose of feature reduction is to optimize the feature space (either to selected few from the original list, engineered, or by reducing the dimension) before inputting into the model. Feature selection is a common method used to remove irrelevant and redundant features. Traditionally, they are divided into filter, embedded, and wrapper methods. Filter methods have been most favored, and these are independent of the machine learning classifier. A subset class of the filter method known as ranker filters are techniques that have been most commonly employed in the early machine learning application in radiomics. In practical terms, the process usually begins with the removal of correlated features, usually based on a correlation matrix. For example, a Pearson's correlation matrix can be computed on the features set, and the highly correlated features (set for example as p > 0.95) can then be excluded. Association scores can further be created based on univariate or multivariate analyses and these can be ranked according to the association scores. A common method is to then choose the top-ranked features (either in number or proportion). Additionally, features can also be subsequently ranked based on their predictive power, for example using Maximum Relevance Minimum Redundancy (mRMR) and recursive feature elimination (RFE) [49, 50] . Care must be taken when using other feature selection methods such as wrapper selection bias could be introduced, although this could be mitigated by ML approaches such as Borda method or the bootstrap method [51, 52] . Investigators have also used regularization techniques. Regularization techniques, such as least absolute shrinkage and selection operator (LASSO) and ridge regression in prediction models, aim to reduce (or shrink or smooth) the number of input features by removing redundant or highly correlated features [53] . Finally, investigators have also used unsupervised dimension reduction methods that could reduce feature number without the introduction of PE information, for example principal component (PCA) or linear discriminant analysis (LDA). These have been shown to reduce overfitting and improve prediction performance in ML models [54, 55] . Reducing the number of features helps with interpretation, making a more plausible linkage between feature and outcome easier to identify. Model development aims to classify patients into different risk groups in relation to PE. ML methods explore the correlation between selected features and the targeted PE in the training set [25] . Logistic regression is a classic binary classification tool that fits a logistic function to classify data into one of two dependent variables and can be extended to classify multiple groups of events [56] . Support vector machine is one of the most commonly used ML models for radiomics studies [49, 57, 58] . This technique identifies a hyperplane or set of hyperplanes in the high-dimensional feature space for the classification of new data [59] . K-nearest neighbor classifies an unknown datum by comparing it with labelled data [60] . Decision trees are a series of classification branches that help visualize the process of making a prediction about the PE [61] . Random forests are a combination of multiple decision trees developed from the same training set to correct for potential overfitting of using a single decision tree [62] . Previous studies comparing the predictive performances across different classifiers have shown the optimal performance achieved by Random forests [63, 64] . Overfitting may result from complex model architectures which have the issue of overparameterization. The model may perform well in training data, but poorly in validation data, even split from the same cohort source. This can be reduced by acquiring sufficient data for training or making the training data more diverse so that it can be representative of the population. If the rate of the expected outcome is low making the dataset imbalanced, there are resampling techniques that may help to boost the minority class and improve predictive performance [63, 65] . While it is ideal to keep the number of features as low as possible, the number of features is not a major issue if the model is validated externally with reasonable performance [18] . The selection of the optimal model for one specific dataset requires an understanding of the classifier mechanism and characterization of the dataset or exploratory data experiments. For better clinical application, a nomogram incorporating the radiomics signature and clinical risk factors could be built. This graphical calculation instrument is a user-friendly tool for patient-clinician communication [66] . Akaike information criterion (AIC), which measures the goodness of fit of the model, is commonly used for selection of important clinical variables for the final nomogram model [67] . In machine learning model development, it is preferred for the input dataset to be divided into the training set and the test set. A "training set" is only used to train the model. A "test set" could be further divided into internal or external, and is only used to evaluate the model. If the test set is from an independent center, this is often termed "external validation" set in clinical studies. This means that there is no overlap with the training data, and it is also from a different center. Nonetheless, it may be difficult to recruit significant amounts of independent data in preliminary or pilot radiomics studies. If this is not possible, then internal validation method of the test set is used. Note that the term "internal validation has different meanings in clinical and machine learning literatures. For the purpose of this article "internal validation" refers to validation with the dataset from one single center. In some cases, data can be used for model hyper parameters optimization requiring "cross-validation", in which part of the training dataset is used for training and part for validation. There are several methods to do this. Firstly, in k-fold cross-validation [13] , the dataset is divided into k (usually k = 10) groups (called folds). (k-1) groups will be used as the training set whilst the remaining group is the testing set. A small proportion of data from the training set is used to validate the model during training time, in order to select hyper-parameters of ML models. This process is repeated k times in the test set once and training set k-1 times, with each group being in the training set k-1 times and testing set once. This ensures that all data is used to test the model for performance evaluation. Secondly, in leave-one-out cross-validation [68] , which is similar to k-fold cross-validation, the data is divided such that each patient is treated as one fold. In each cycle, one patient will be the testing set whilst all others will be the training set; this process repeats until all patients have been used in the testing set once. This scheme is used only when the dataset is small, in order to keep more data for model training. Thirdly, hold-out validation [69] , which is the simplest form of internal validation in one center, involves randomly separating the data into a training set (again including part of data for model hyper-parameter validation) and a testing set [70] . Besides these, a more ideal way is to validate the model in a prospective dataset. Several performance metrics have been used in the evaluation of ML performance, and a few commonly used methods will be discussed here. The most commonly used method typically involves an analysis of the area under curve (AUC) of the receiver operating characteristic (ROC) curve. Comparison between AI algorithms can be done by evaluating the AUC and individual sensitivity, specificity values of each ROC. The optimal AI algorithm depends on the context in which the model will be implemented. For comparison between humans and AI performance, a summary ROC curve rather than the convention of using one point on an ROC graph can more accurately quantify human performance and allow for robust comparison between ML models and humans [71] . Other commonly used metrics such as Matthew's correlation coefficient (MCC) have also been used which incorporates true and false positives, negatives, and equals [72] . A coefficient of +1 represents a perfect prediction, 0 no better than chance, and −1 total disagreement between prediction and the actual outcome. Precision-Recall (PR) curve has also been used, which plots precision against recall to show the trade-off between them for the different feature or parameter settings. For classification tasks involving datasets with very few positive cases, the area under the PR curve (AUPRC) is a better measure of accuracy than AUROC [73] . Validation reflects the discrimination and calibration capacity of the model, which respectively quantifies model sensitivity and specificity, and the agreement between predicted and observed outcomes. Relying purely on the performance metrics outlined above may have limitations. For example, risk prediction models may be highly discriminatory but poorly calibrated [74] or have sensitivity/specificity cut-points that fail to maximize clinical utility [75] . Different from the conventional low-level radiomics features with hand-crafted descriptors, deep learning [76] has recently emerged to automatically learn high-level features in a data-driven way, which is capable of capturing the complicated characteristics of medical images. Among different classes of deep learning algorithms, convolutional neural network is one of the most widely-used solutions in analyzing image data to extract hierarchical feature representations. A convolutional neural network (CNN) is composed of multiple convolutional layers with a set of filters (also called kernels), which is the core component, combining with pooling operations and non-linear activation functions. The weight-sharing of convolution kernels over the entire input image significantly reduces the number of trainable parameters. CNNs with 2D kernels (namely 2D CNNs) are generally applicable to different types of imaging modalities, but for volumetric images, 3D CNNs have been demonstrated to be more effective in capturing important contextual information along the third dimension [77, 78] . For recent deep learning models, the classifier learning can be performed in a unified network with the feature extraction and trained in an end-to-end process. As previously mentioned, CNNs are the most popular deep networks in medical image analysis and have shown exceptional performance. Alexnet [79] is one of the commonly used CNN architectures in cancer imaging [80] , which is an early proposed shallow network with 11 layers. Inception network [81] is another popular architecture [13] , which has a more complex design with deeper architecture and multiple filters of different sizes operating on the same level. ResNet [82] is another recent CNN architecture with skip connections to benefit gradient backpropagation [83] . Besides using standard network architectures, some studies have designed specific CNNs for the targeted problem. For example, Bizzego et al. [72] use two identical and parallel CNN streams for CT and PET data respectively, that each stream consisting of five convolutional layers combining with normalization, dropout, and pooling layers, and the outputs of the two streams are merged at the fully connected layer. Effective training of a CNN usually requires large amounts of data to sufficiently optimize the large number of parameters in a network and to reduce the overfitting problem. If the size of available dataset is large enough, training a deep network from scratch can better optimize the model for the specific problem and dataset. However, obtaining annotations from medical experts is costly and typically limits the amount of available data for network training. When data is limited, data augmentation techniques, such as generative or associative modelling can generate additional, artificial training data based on real training sets [84, 85] . Transfer learning is commonly adopted to mitigate the problem of inadequate training data. With transfer learning, a deep network is firstly trained with available large-scale natural images or medical images of different problems, the pre-trained network is then fine-tuned with the desired data and task. The effectiveness of transfer learning in reducing the overfitting problem caused by a small number of training samples and improving the model's performance has been well studied in previous works on mammographic tumor classification [86] and thoraco-abdominal lymph node detection [87] . Deep features can be extracted from medical images by training CNNs with pre-defined prediction tasks such as disease classification, tumor stage diagnosis, or survival prediction. For deep learning-based radiomics with end-to-end training, the extracted features can be directly used by the classifier layer of the rest of the deep networks for the targeted analysis tasks. There are also studies separating the learned features from the deep networks to be combined with the hand-crafted radiomics features for subsequent modelling [72] . The major advantage of deep learning-based feature extraction is that no specific domain knowledge is required for feature engineering, and the representative and high-level features can be learned in a completely automatic manner. Recent works [88, 89] have shown that the automatically learned deep features of neural networks can outperform the hand-crafted ones in some applications. One key challenge of applying deep networks in clinical decision making is that deep networks are black box models with multilayer nonlinear operations, thus the reasoning behind the results from deep networks are very difficult to interpret clinically. Explainable AI is an emerging field of active research in trying to address this challenge [90, 91] . Recent studies [72, 80] have shown that the conventional handcrafted radiomics and the deep learning-based radiomics can be fused to improve analysis accuracy. The fusion operation can be performed at the decision level or at the feature level. For decision-level fusion, the hand-crafted radiomics and deep radiomics are first independently trained to produce prediction respectively, then the outputs are then combined together with a certain voting strategy to achieve the final decision. For feature-level fusion, the hand-crafted features and deep features are extracted separately, and then the two types of features are combined to go through a classifier for the final prediction. On the other hand, the combination of handcrafted radiomics and deep learning features did not enhance the prediction performance in Yun's previous study [92] . This is an area of active research, and further investigations are needed to properly address the value of radiomics in combination with deep learning methods [93] . The use of deep learning methods, however, is advantageous in potentially being more automated than radiomics features extraction, as the latter requires segmentation, which depending on the disease entity may not be automated. For the identification of optimal pattern for a specific dataset, the discovered machine learning pipeline could be quite variable, as the selection of the ML approach and setting of parameters are dependent on individual researchers. In recent years, there has been a trend for automation of machine learning pipeline in clinical workflows. Cai et al. [94] provided an online calculator based on the radiomics model for the prediction of treatment response of bevacizumab in brain necrosis after radiotherapy. Such a user-friendly tool could help the facilitation of personalized and precise treatment of patients. Su et al. [95] applied the Tree-based Pipeline Optimization Tool for the construction of the optimal radiomics model. Without human annotation, the automatically optimized machine learning pipeline showed good prediction accuracy for H3 K27M mutation status in patients with midline gliomas. More recently, automation of machine learning pipeline has found promising applications for fast diagnosis and risk stratification for patients with infectious diseases [96] [97] [98] . Wang et al. [96] developed a novel fully automatic deep learning system using CT imaging for fast screening of COVID-19 to identify potential high-risk patients, which could be fast and more repeatable as it required no time-consuming human involvement. Automation of machine learning pipeline in clinical workflows could significantly improve the optimization process of medical resource. In this paper, a comprehensive review of studies based on ML methods for the diagnosis of any aspect in EC patients using non-invasive medical imaging was performed. According to the scope of the review, we consider ML-based investigations aimed at relevant objectives in clinical practice: treatment response evaluation, prognostication prediction, diagnosis, and biological characterization. Non-invasive imaging modalities considered are CT, PET, PET-CT, and MRI. Studies using invasive endoscopy or endoscopic ultrasound were not included. We searched for articles in 3 database resources including PubMed, EMBASE and Cochrane Library. All the English publications from 1 January 2000 until 16 October 2020 were searched. The reference articles of the selected papers were also checked. The complete search strategy is shown in Supplementary Material. We included any study design, with a minimum of 10 patients, except for letters to the author, comments, and case reports. Studies with only correlation analyses between individual imaging features and outcomes, without utilizing ML approaches were excluded. The measurements obtained in the validation groups were used as the main results. For papers with multiple clinical outcomes, the primary aim was chosen. The AUC was preferred to summarize the predictive value of proposed models. If unavailable, other accuracy metrics were recorded (for example sensitivity, specificity, C-index, etc.). We rated AUCs by the following range: 0.60-0.70 as poor, 0.70-0.80 as acceptable, 0.80-0.90 as excellent, and 0.90-1.00 as outstanding [99] . For prognostic groups, cox-proportional hazards ratios were used as outcome measures for risk stratification. For articles with multiple test models for the same cohort of subjects, the final model with the best performance was recorded. Confidence intervals of these accuracy metrics were retrieved if available. The results from the included studies are summarized in Table 1 . The studies for imaging machine learning applications in EC commonly followed certain steps, which are shown in Figure 1 . as excellent, and 0.90-1.00 as outstanding [99] . For prognostic groups, cox-proportional hazards ratios were used as outcome measures for risk stratification. For articles with multiple test models for the same cohort of subjects, the final model with the best performance was recorded. Confidence intervals of these accuracy metrics were retrieved if available. The results from the included studies are summarized in Table 1 . The studies for imaging machine learning applications in EC commonly followed certain steps, which are shown in Figure 1 . Most published studies (n = 12) focused on the prediction of treatment response for patients receiving chemoradiotherapy or nCRT [83, [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] . The ML algorithms achieved an AUC of 0.78-1.00. There were six studies using PET, four studies using CT, one study using MRI, and one study using PET-CT. Zhang et al. [110] , which constructed a support vector machine model based on PET radiomics features combined with conventional imaging and clinical variables. They achieved the best prediction result (AUC = 1.00) with 20 patients although with no validation on a test set. Ypsilantis et al. [109] provided initial evidence for the potential predictive power of CNNs with an averaged sensitivity and specificity of 80.7% and 81.6%, respectively. Desbordes et al. [108] applied a random forest classifier based on conventional and textural features for treatment response prediction by using baseline PET images. Both individual features and the constructed model with combined features were of predictive value, with an AUC of 0.810 and 0.836. Beukinga et al. published two studies evaluating radiomics for the prediction of response to nCRT. The first study [107] selected only pre-treatment clinical and radiomics features from PET/CT scans. For the latter study [104] , significant features were derived from PET images of two phases (both baseline and restaging). The discrimination accuracy was improved from acceptable (AUC = 0.78) to excellent (AUC = 0.81). Van Rossum et al. [106] also showed the added value of posttreatment radiomics features for disease evaluation. Hou et al. studied two modalities for the evaluation of CRT response. Radiomics features extracted from both the CT [105] and MRI [103] showed predictive capabilities with AUC of 0.972 for CT and AUC of 0.929 for MRI, using imaging interpretation as a reference standard. Yang et al. [102] developed three predictive models for treatment response after nCRT and noted that overfitting was a problem for small sample size studies. Hu et al. [101] found that a combination of peritumoral radiomics features appeared to improve the predictive performance of intratumoral radiomics for pre-treatment prediction of pCR (AUC = 0.85, 95% CI, 0.75-0.95) for nCRT using CT radiomics features, with a cohort of 231 patients and external validation of the results. The same cohort of patients was used for the exploration of the transfer learning approach and the results showed that ResNet50-based deep learning features had the predictive ability for treatment response in esophageal SCC [83] . Cao et al. [100] used the limma method, which is commonly used for genetic analysis, for the identification of significant radiomics features. The selected features were then fitted into the least absolute shrinkage and selection operator (LASSO) logistic regression model and achieved an AUC of 0.84 in the test set to predict response for concurrent CRT. The majority of studies [83, 101, 102, 104, 106, 107, 109, 110] evaluated pCR for nCRT before surgery and used histology results as the reference standard. Other studies [100, 103, 105, 108] collected patients receiving chemoradiotherapy and adopted PET/CT/MRI imaging and follow-up information for pCR evaluation because surgical resection was not performed and, therefore, pathological reference was not available. ML methods have shown the potential to act as prognostic tools for risk stratification in ECs. There are seven studies in the prognostic group [111] [112] [113] [114] [115] [116] [117] , including three PET, three CT, and one PET-CT studies. Patients in different risk groups had a significant or borderline significant difference in survival outcomes. The primary outcomes were overall survival (OS), recurrence-free survival (RFS), and disease-free survival (DFS) predictions. Xiong et al. [117] extracted PET radiomics features from different treatment time points and concluded that mid-treatment features could be more informative for predicting local control. Foley et al. [116] demonstrated that a clinical prognostic model incorporating PET radiomics features could bring additional benefit for patients' risk stratification. Larue et al. [115] validated their radiomics-based model for EC patients receiving nCRT in an independent external cohort, which split risk groups for survival rates with borderline significance. Xie et al. [114] found that sub-regional radiomics features were of prognostic prediction value. The authors further correlated the imaging traits with the clinical variables and gene data to verify the biological significance. Yang et al. [113] proved that deep learning-based prediction could be an independent survival prognosticator for EC patients. Chen et al. [112] constructed a scoring system based on both clinical and PET radiomics features and enabled better stratification of patients into different long-term prognosis. Qiu et al. [111] developed three prognostic models, the nomogram based on both radiomics and clinical features achieved optimal performance with a C-index of 0.72 in the validation set. There were four studies [88, [118] [119] [120] focusing on the lymph node metastasis status of EC, with three studies using CT and one study using MRI. Shen et al. [120] built a nomogram incorporating radiomics features, CT-reported suspicious lymph node number and tumor location, which showed good discrimination of lymph node status with a Cindex of 0.75 in the validation set. Tan et al. [119] demonstrated that radiomics nomogram provided a good estimation of lymph node metastasis (AUC = 0.77) and outperformed size criteria. Most studies delineated whole tumor volume in all slices as the ROIs for feature selection. Qu et al. [118] provided evidence that MRI-based radiomics features were of predictive value for lymph node involvement. Wu et al. [88] showed that twodimensional (2D) ROIs based on the largest cross-sectional area of the tumor lesion were also of prediction value (AUC = 0.84). There were two studies focusing on the diagnosis of EC and one study focusing on gene expression [121] [122] [123] . CT imaging was used. Li et al. [121] found differences in the same parts of the normal esophageal wall and EC lesions using a multivariate regression classifier based on radiomics features. Ou et al. [122] investigated the value of radiomic models related to the resectability of ECs. Hoshino et al. [123] found that the gene expression level of miR-1246 could be inferred by imaging radiomics features, which was predictive of the prognosis of EC patients. In general, all studies were retrospective design and most studies adopted radiomics approaches. Four studies [88, 109, 113, 115] applied deep learning networks in the model construction. Incorporation of clinical features into data-based ML models could improve the prediction accuracy in 11 studies [88, 104, 106, [110] [111] [112] [113] 116, 119, 120] . The majority of studies [83, 88, [100] [101] [102] [103] 105, [111] [112] [113] [114] 119, [121] [122] [123] evaluated SCC patients and some [104, [106] [107] [108] [109] [110] 115, 116] mainly focused on adenocarcinoma. Many studies (n = 11) did not provide sufficient sample size with less than 100 patients in the study cohort [102] [103] [104] [105] 107, 108, 110, 112, 117, 121, 123] . Four studies [104, 107, 121, 123] lacked independent validation by using the same datasets as both training and test sets. Feature harmonization was conducted in only two studies to remove inter-site technical variability [83, 101] . Nine studies [88, 103, 104, 106, 111, 116, 119, 120, 122] applied additional statistical analysis on the prediction outcomes to evaluate model fitting, calibration, and clinical usefulness. Three studies [83, 101, 114] explored the correlation between imaging features extracted by ML and the genomic profiles for biological interpretation. Two studies [83, 109] used visualization techniques to highlight important regions in the medical images. The visualized pictures generated from specific layers of the deep learning model highlighted the regions of interest for patient characterization. Visualizing the hot zone in the feature map could help to evaluate the interpretability of the models. This study summarizes the main results and basic characteristics of ML techniques relevant to the clinical practice in EC patients. Surgery is one of the most frequently used treatments for resectable EC, but the long-term survival remains unsatisfactory even after curative operation [124, 125] . The optimal treatment for EC is still unclear. Neoadjuvant or adjuvant therapies administrated as chemotherapy, radiotherapy, or simultaneous chemoradiotherapy have been adopted in clinical practice. The adjuvant chemotherapy or radiation therapy has not been shown to have additional survival benefit to patients compared to surgery alone [126] . More recently, as shown by recent landmark trials, nCRT plus surgery could be the most effective strategy for the improvement of resectability and maximization of long-term survival for locally advanced EC patients [3, 4, 127, 128] . The updated CROSS study has shown that the median overall survival was 48.6 months in patients receiving nCRT plus surgery cohort and 24.0 months in the surgery cohort (p = 0.030) [128] . The recent NEOCRTEC5010 trial also showed that EC patients receiving nCRT followed by surgery have a significantly increased median overall survival than those receiving surgery alone (100.1 vs. 66.5 months, p = 0.025) [4] . But patients have different responses to nCRT treatment, which significantly affects the survival outcomes. According to a recent review by Eyck et al. [129] , the ability of commonly used imaging modalities to detect pCR after nCRT of EC patients was insufficient, indicated by the pooled sensitivities and specificities. An accurate estimation of residual disease could improve patient management. For this purpose, there has been an increasing number of studies in the literature that utilized new methods for more accurate prediction. In our review article, we assessed the potential of ML approaches for a more accurate evaluation of pCR. This showed that ML models evaluating treatment response achieved reliable predictions ranging from acceptable (AUC = 0.70 to 0.80, n = 3) to excellent (AUC = 0.80 to 0.90, n = 7), and outstanding (AUC > 0.90, n = 1) in the validation group. The reviewed studies adopted a variety of ML approaches. Advanced imaging features were extracted using radiomics calculation algorithm or transfer learning technique from various pre-trained CNNs. According to Hu's study characterizing relatively large sample sizes with extrapolation of developed models by external validation, the model using deep learning features extracted from Resnet had a better performance than the handcrafted radiomics model [83] . It is suggested that noninvasive ML-based imaging applications, such as radiomic techniques and deep learning networks are potentially useful for individualized patient's tumor characterization. Patients' risk stratification mainly relies on the American Joint Committee on Cancer (AJCC) tumor, node, and metastasis (TNM) staging classification of epithelial cancers of the esophagus. Improvement of the staging method will aid in the current clinical practice. Involvement of lymph node invasion has been proved to be an important prognostic factor for both esophageal SCC and adenocarcinoma [130, 131] . Recent refinement of the N descriptors (regional lymph node invasion) subcategory in the eighth edition has been shown to provide a more accurate and reliable risk stratification of EC patients [132] . Compared with the seventh edition, this new N subcategory excludes some regional lymph node stations common in the staging system of lung cancers [133] . Determination of clinical N (regional lymph node invasion) mainly relies on current imaging techniques. CT, one of the most commonly used non-invasive imaging techniques is often initially used for the evaluation of tumor growth and other structures. However, it is not optimal for lymph node determination [134, 135] . Besides, the efficacy of MRI for the accurate estimation of N staging is still uncertain [136] . Improvement in imaging assessment of regional lymph nodes by using ML techniques is of clinical importance for prognosis and treatment decisions for EC patients. Radiomics, as an emerging tool, has shown potential values in predicting LN metastasis by extracting high-throughput quantitative features from medical images. Less attention has been paid to the prediction of LN metastasis in previous years. With the first study published in 2018, specific attention is paid to the evaluation of LN metastasis in EC patients. The reviewed studies [118] [119] [120] were mainly focused on the application of radiomics and Wu et al. [88] showed the prediction value of deep learning features for LN metastasis. According to our review, the new ML approach significantly improved the evaluation of N status using non-invasive CT and MRI modalities (AUC = 0.762-0.840 in the validation set), even outperforming the size criteria [88, [118] [119] [120] . However, current published studies were limited to binary discrimination of LN status. The specific N staging information (N0/N1/N2/3) determined by evidence acquired before treatment affects the therapy scheme. Moreover, the drainage of the lymph nodes in EC is more complicated than in some cancers like lung cancers and breast cancers. The regional lymph node stations for EC patients extended from cervical to celiac regions, and the lymphatic drainages could be different at different anatomic sites. An optimal lymphadenectomy is necessary to maximize patients' survival. The extent of lymphadenectomy is variable in current clinical practice [137, 138] . Greater extent of lymphadenectomy was associated with patients' longterm survival and its effect on postoperative complications [139, 140] . The evaluation of individual LN requires laborious and time-consuming work from experienced radiologists. More accurate evaluation of more precise N staging or individual lymph node status could lead to a more precise assessment of the ideal range of lymph node dissection. In our review, we have uncovered several shortcomings of existing clinical applications using machine learning. We will discuss these issues more broadly and then more specifically in the following paragraphs. The topic of explainability has been shown to be an important aspect of ML research [141] . Doctors may find it difficult to apply ML models routinely. CNNs, for example, could contain millions of trainable parameters without clearly understandable biological patterns for human readers. It is crucial that clinicians and patients have the ability to understand the reasonings of predictions of these models for better informed decision-making. The accountability of ML approaches is of vital importance if we introduce computerized ML systems into clinical practice. Currently, there is a paucity in the understanding of the relationship between radiomics and underlying tumor biology. The exact relationship of a combination of radiomics features used in the predictive modelling remains unclear. Further exploration has been attempted by, for example, doing radiogenomics studies, which were done in a few reviewed studies, providing proof of potential biological explanation. The reliability of the established prediction models could also be validated by statistical methods to reduce the concerns regarding applicability. To this end, some models were validated by statistical analysis using model fitting, calibration, and clinical usefulness. Simple data-driven correlation may not be robust and links between imaging, clinical, and genetic features should be built. Note that the precise relationship between imaging features and tumor heterogeneity is not simply straightforward. Further investigation of the advanced statistical relationship should be established. Visualization is suggested to be employed to improve the interpretability of ML models. More work should be dedicated to making ML models more interpretable or explainable for decision-making. A few more specific limitations are noted from our review. Firstly, ML algorithms utilized in some studies were trained and validated on the same dataset, which is not an accurate estimation of ML models. Secondly, the sample size is one of the vital factors for clinical studies, which could significantly affect the repeatability, reproducibility, and statistical power. Adequate sample size is needed to train and test the ML methods to minimize overfitting and improve the estimation performance. The sample size criteria for some ML studies for EC patients were established with less than 100 patients, which limits the reliability of the proposed models. Clinically meaningful size is required for significant discrimination of patients in different risk groups. Thirdly, the reference standard for each outcome in question may be different, reflecting the heterogeneity of the study design impacting comparison. Most studies evaluating treatment response, for example, compared their model with histology findings by pathologists, which is the current gold standard. However, some studies [100, 103, 105, 108] used pre-treatment and post-treatment CT images to assess the response based on the Response Evaluation Criteria in Solid Tumors (RECIST) [142] , when resected specimens were not available. RECIST is commonly used for residual disease evaluation for solid tumors but has limited ability in EC for its obscure boundaries or scarred tissues after chemoradiation therapy [143] . Fourthly, standardization in imaging protocols (for example, CT with or without intravenous contrast), radiomics extraction methods, and attempts to harmonize imaging data prior to predictive modelling were scantily performed. These have been known to affect the extracted features and remain major obstacles to generalizability. We have a few recommendations for future investigations. Firstly, prospective large multi-center studies should be performed to improve the ML techniques and generalizability with standardized imaging protocols and harmonization between different centers. In our review, only six studies [83, 88, 100, 101, 114, 115] included multi-center datasets. Secondly, the techniques of machine learning analysis should be standardized. More strict adherence to the standard working pipeline as well as openly available source code is suggested to increase reproducibility and generalizability, particularly for radiomics features pipelines [144] . Thirdly, the application of ML methods should be expanded from restricted data-processing computers to portable machines accessible by cloud services or to intraoperative decisions in real-time. There are some challenges to overcome such as automated segmentation and real-time inference processing which currently limits its wider utility in the clinical environment. If this could be done, it would allow for prospective study design, and if randomized, it could add significant value to the assessment of the utility of these techniques in real-world clinical practices. Finally, the discrimination impact using these models needs to be clinically meaningful. Several studies based their performance assessment on AUC. Clinically meaningful size effects like decision curve analysis should be included more routinely. Our review has summarized the predictive performance of non-invasive imaging ML applications in EC patients. Recent advances and future perspectives of the ML technique demonstrate its potential to provide novel quantitative imaging markers in medical imaging. A few recommendations are made to improve study design and future applicability. Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/cancers13102469/s1, Table S1 : Systematic search. Author Contributions: C.-Y.X., C.-L.P., and V.V. contributed to literature search, figures, study design, data collection, data analysis, data interpretation, manuscript editing, and writing; B.C. and E.Y.-Y.W. contributed to data collection, data analysis, data interpretation, manuscript editing, and writing; Q.D. contributed to data collection, data analysis, data interpretation, manuscript editing, and writing. All authors have read and agreed to the published version of the manuscript. Preoperative Chemoradiotherapy for Esophageal or Junctional Cancer Neoadjuvant Chemoradiotherapy Followed by Surgery Versus Surgery Alone for Locally Advanced Squamous Cell Carcinoma of the Esophagus (NEOCRTEC5010): A Phase III Multicenter, Randomized, Open-Label Clinical Trial Patterns and risk of recurrence in patients with esophageal cancer with a pathologic complete response after chemoradiotherapy followed by surgery Role of Chemoradiotherapy in Oesophageal Cancer-Adjuvant and Neoadjuvant Therapy State-of-the-art molecular imaging in esophageal cancer management: Implications for diagnosis, prognosis, and treatment Does [18F] fluorodeoxyglucose-positron emission tomography/computed tomography have a role in cervical nodal staging for esophageal squamous cell carcinoma? Radiomics: The process and the challenges Artificial intelligence in oncology Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning The Application of Deep Learning in Cancer Prognosis Prediction Dermatologist-level classification of skin cancer with deep neural networks Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning Federated deep learning for detecting COVID-19 lung abnormalities in CT: A privacy-preserving multinational validation study Deep learning-based thigh muscle segmentation for reproducible fat fraction quantification using fat-water decomposition MRI Radiomics with artificial intelligence: A practical guide for beginners Committee on the Review of Omics-Based Tests for Predicting Patient Outcomes in Clinical Trials; Board on Health Care, Services; Board on Health Sciences Policy Evolution of Translational Omics: Lessons Learned and the Path Forward Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing Comparison of radiomics tools for image analyses and clinical prediction in nasopharyngeal carcinoma The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping Radiomics: Images Are More than Pictures, They Are Data The bridge between medical imaging and personalized medicine Introduction to Radiomics A Radiomics Nomogram for Preoperative Prediction of Microvascular Invasion in Hepatocellular Carcinoma A CT-based radiomics model to detect prostate cancer lymph node metastases in PSMA radioguided surgery patients A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: An imaging biomarker, retrospective multicohort study Radiomics in Vulvar Cancer: First Clinical Experience Using 18F-FDG PET/CT Images An interactive tool for semi-automatic segmentation of multi-modality biomedical images LIFEx: A Freeware for Radiomic Feature Calculation in Multimodality Imaging to Accelerate Advances in the Characterization of Tumor Heterogeneity The medical imaging interaction toolkit (MITK): A toolkit facilitating the creation of interactive software by extending VTK and ITK Image processing with 3D Slicer as an image computing platform for the Quantitative Imaging Network A semiautomatic CT-based ensemble segmentation of lung tumors: Comparison with oncologists' delineations and with the surgical specimen Repeatability and Reproducibility of Radiomic Features: A Systematic Review Robust Radiomics Feature Quantification Using Semiautomatic Volumetric Segmentation Comparison of different methods for delineation of 18F-FDG PET-positive tissue for target volume definition in radiotherapy of patients with non-Small cell lung cancer Classification of Cancer at Prostate MRI: Deep Learning versus Clinical PI-RADS Assessment Multiple U-Net-Based Automatic Segmentations and Radiomics Feature Stability on Ultrasound Images for Patients with Ovarian Cancer Computational Radiomics System to Decode the Radiographic Phenotype A radiomics evaluation of 2D and 3D MRI texture features to classify brain metastases from lung cancer and melanoma 2D and 3D CT Radiomics Features Prognostic Performance Comparison in Non-Small Cell Lung Cancer Development of a radiomics nomogram based on the 2D and 3D CT features to predict the survival of non-small cell lung cancer patients Buvat, I. Validation of A Method to Compensate Multicenter Effects Affecting CT Radiomics Performance comparison of modified ComBat for harmonization of radiomic features for multicenter studies Feature reduction for classification of multidimensional data A radiomics approach based on support vector machine using MR images for preoperative lymph node status evaluation in intrahepatic cholangiocarcinoma Radiomics and Machine Learning With Multiparametric Preoperative MRI May Accurately Predict the Histopathological Grades of Soft Tissue Sarcomas Evaluating reproducibility of AI algorithms in digital pathology with DAPPER Entropy-based gene ranking without selection bias for the predictive classification of microarray data Regression Shrinkage and Selection Via the Lasso What is principal component analysis? Linear Discriminant Analysis-A Brief Tutorial Logistic Regression Discrimination of pulmonary ground-glass opacity changes in COVID-19 and non-COVID-19 patients using CT radiomics analysis Radiomic machine-learning classifiers for prognostic biomarkers of advanced nasopharyngeal carcinoma Support vector machines Decision Trees. In Data Mining and Knowledge Discovery Handbook Random Forests Radiomics-based Prognosis Analysis for Non-Small Cell Lung Cancer Effect of machine learning re-sampling techniques for imbalanced datasets in 18F-FDG PET-based radiomics model on prognostication performance in cohorts of head and neck cancer patients How to Build and Interpret a Nomogram for Cancer Prognosis Application of Akaike's information criterion (AIC) in the evaluation of linear pharmacokinetic equations A Discriminative Model-Constrained Graph Cuts Approach to Fully Automated Pediatric Brain Tumor Segmentation in 3-D MRI A clinically applicable deep-learning model for detecting intracranial aneurysm in computed tomography angiography images Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer Summary Receiver Operating Characteristic Curve Analysis Techniques in the Evaluation of Diagnostic Tests Integrating deep and radiomics features in cancer bioimaging T2 Maucort-Boulch, D. The precision-recall curve overcame the optimism of the receiver operating characteristic curve in rare diseases Big Data and Predictive Analytics Making Machine Learning Models Clinically Useful Deep learning Transfer learning for 3d medical image analysis. arXiv 2019 Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation Imagenet classification with deep convolutional neural networks New polyp image classification technique using transfer learning of network-in-network structure in endoscopic images Rethinking the inception architecture for computer vision Deep Residual Learning for Image Recognition T2 Computed tomographybased deep-learning prediction of neoadjuvant chemoradiotherapy treatment response in esophageal squamous cell carcinoma A survey on Image Data Augmentation for Deep Learning The effectiveness of data augmentation in image classification using deep learning Digital mammographic tumor classification using transfer learning from deep convolutional neural networks Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning Multiple Level CT Radiomics Features Preoperatively Predict Lymph Node Metastasis in Esophageal Cancer: A Multicentre Retrospective Study Special Section Guest Editorial: Radiomics and Deep Learning Radiomics Model to Predict Early Progression of Nonmetastatic Nasopharyngeal Carcinoma after Intensity Modulation Radiation Therapy: A Multicenter Study Artificial intelligence: Deep learning in oncological radiomics and challenges of interpretability and data harmonization Radiomic features and multilayer perceptron network classifier: A robust MRI classification strategy for distinguishing glioblastoma from primary central nervous system lymphoma Handcrafted versus deep learning radiomics for prediction of cancer therapy response A Radiomics Model for Predicting the Response to Bevacizumab in Brain Necrosis after Radiotherapy Automated machine learning based on radiomics features predicts H3 K27M mutation in midline gliomas of the brain A fully automatic deep learning system for COVID-19 diagnostic and prognostic analysis Automated detection and quantification of COVID-19 pneumonia: CT imaging analysis by a deep learning-based software The study of automatic machine learning base on radiomics of non-focus area in the first chest CT of different clinical types of COVID-19 pneumonia Receiver Operating Characteristic Curve in Diagnostic Test Assessment Development and validation of a radiomics signature on differentially expressed features of 18F-FDG PET to predict treatment response of concurrent chemoradiotherapy in thoracic esophagus squamous cell carcinoma Assessment of Intratumoral and Peritumoral Computed Tomography Radiomics for Predicting Pathological Complete Response to Neoadjuvant Chemoradiation in Patients With Esophageal Squamous Cell Carcinoma CT-based radiomic signatures for prediction of pathologic complete response in esophageal squamous cell carcinoma after neoadjuvant chemoradiotherapy Radiomic analysis in T2W and SPAIR T2W MRI: Predict treatment response to chemoradiotherapy in esophageal squamous cell carcinoma Prediction of Response to Neoadjuvant Chemotherapy and Radiation Therapy with Baseline and Restaging 18F-FDG PET Imaging Biomarkers in Patients with Esophageal Cancer Radiomic analysis in contrast-enhanced CT: Predict treatment response to chemoradiotherapy in esophageal carcinoma The Incremental Value of Subjective and Quantitative Assessment of 18F-FDG PET for the Prediction of Pathologic Complete Response to Preoperative Chemoradiotherapy in Esophageal Cancer Predicting Response to Neoadjuvant Chemoradiotherapy in Esophageal Cancer with Textural Features Derived from Pretreatment18F-FDG PET/CT Imaging Predictive value of initial FDG-PET features for treatment response and survival in esophageal cancer patients treated with chemo-radiation therapy using a random forest classifier Predicting Response to Neoadjuvant Chemotherapy with PET Imaging Using Convolutional Neural Networks Modeling Pathologic Response of Esophageal Cancer to Chemoradiation Therapy Using Spatial-Temporal 18F-FDG PET Features, Clinical Parameters, and Demographics Development and Validation of a Radiomics Nomogram Model for Predicting Postoperative Recurrence in Patients With Esophageal Squamous Cell Cancer Who Achieved pCR After Neoadjuvant Chemoradiotherapy Followed by Combining the radiomic features and traditional parameters of 18F-FDG PET with clinical profiles to improve prognostic stratification in patients with esophageal squamous cell carcinoma treated with neoadjuvant chemoradiotherapy and surgery Deep Convolutional Neural Network-Based Positron Emission Tomography Analysis Predicts Esophageal Cancer Outcome Sub-region based radiomics analysis for survival prediction in oesophageal tumours treated by definitive concurrent chemoradiotherapy Pre-treatment CT radiomics to predict 3-year overall survival following chemoradiotherapy of esophageal cancer Development and validation of a prognostic model incorporating texture analysis derived from standardised segmentation of PET in patients with oesophageal cancer The Role of PET-Based Radiomic Features in Predicting Local Control of Esophageal Cancer Treated with Concurrent Chemoradiotherapy The MR radiomic signature can predict preoperative lymph node metastasis in patients with esophageal cancer Radiomics nomogram outperforms size criteria in discriminating lymph node metastasis in resectable esophageal squamous cell carcinoma Building CT Radiomics Based Nomogram for Preoperative Esophageal Cancer Patients Lymph Node Metastasis Prediction Heterogeneity of T3 stage esophageal squamous cell carcinoma in different parts based on enhanced CT radiomics CT radiomic features for predicting resectability of oesophageal squamous cell carcinoma as given by feature analysis: A case control study Radiogenomics predicts the expression of microRNA-1246 in the serum of esophageal cancer patients National Cancer Data Base report on esophageal carcinoma Advances in the surgical treatment of esophageal cancer Survival After Neoadjuvant and Adjuvant Treatments Compared to Surgery Alone for Resectable Esophageal Carcinoma Phase III Trial of Trimodality Therapy With Cisplatin, Fluorouracil, Radiotherapy, and Surgery Compared With Surgery Alone for Esophageal Cancer: CALGB 9781 Neoadjuvant chemoradiotherapy plus surgery versus surgery alone for oesophageal or junctional cancer (CROSS): Long-term results of a randomised controlled trial Accuracy of Detecting Residual Disease After Neoadjuvant Chemoradiotherapy for Esophageal Cancer Numeric pathologic lymph node classification shows prognostic superiority to topographic pN classification in esophageal squamous cell carcinoma Association Between Clinically Staged Node-Negative Esophageal Adenocarcinoma and Overall Survival Benefit from Neoadjuvant Chemoradiation Cancer of the esophagus and esophagogastric junction-Major changes in the American Joint Committee on Cancer eighth edition cancer staging manual Clinical implication of the innovations of the 8th edition of the TNM classification for esophageal and esophagogastric cancer Comparison of endoscopic ultrasonography (EUS), positron emission tomography (PET), and computed tomography (CT) in the preoperative locoregional staging of resectable esophageal cancer Comparison between positron emission tomography and computed tomography in the use of the assessment of esophageal carcinoma Whole Body MRI in the Staging of Esophageal Cancer-A Prospective Comparison with Whole Body 18F-FDG PET-CT What Should Be the Gold Standard for the Surgical Component in the Treatment of Locally Advanced Esophageal Cancer Optimum Lymphadenectomy for Esophageal Cancer Three-Field or Two-Field Resection for Thoracic Esophageal Cancer: A Meta-Analysis Current Applications and Future Impact of Machine Learning in Radiology New response evaluation criteria in solid tumours: Revised RECIST guideline (version 1.1) Evaluation of Response to Neoadjuvant Chemotherapy for Esophageal Cancer: PET Response Criteria in Solid Tumors Versus Response Evaluation Criteria in Solid Tumors IBSI: An international community radiomics standardization initiative Funding: This research received no external funding. Data Availability Statement: Primary data cited in this review are openly available in PubMed, Embase and Cochrane database. The authors declare no conflict of interest.