key: cord-0031985-xt9ycphc authors: Tong, Xu; Li, Jing title: Noninvasively predict the micro-vascular invasion and histopathological grade of hepatocellular carcinoma with CT-derived radiomics date: 2022-05-16 journal: Eur J Radiol Open DOI: 10.1016/j.ejro.2022.100424 sha: d541df29c9febc20856a66a38e0bb989060bffae doc_id: 31985 cord_uid: xt9ycphc OBJECTIVES: This research aims to predict the micro-vascular invasion and histopathologic grade of hepatocellular carcinoma with the CT-derived radiomics. METHODS: The clinical and image data of 82 patients were accessed from the TCGA-LIHC collection in The Cancer Imaging Archive. Then the radiomics features were extracted from the CT images. For obtaining the appropriate feature subset, the redundant features were removed by means of intra-class agreement analysis, the Student t test, LASSO-regression and support vector machine (SVM) Recursive feature elimination (SVM-RFE). Then several machine-learning-based classifiers including SVM and random forest (RF) were established. To accurately evaluate the tumor grade and MVI with the integration of the Radiomics and clinical insights, the nomogram-based clinical models were constructed. The diagnostic performance was evaluated with ROC analysis. RESULTS: 7 and 10 radiomics features were selected via LASSO regression and SVM-RFE for identifying the tumor grade with regard to 13 and 10 features selected via LASSO regression and SVM-RFE for evaluating the MVI. The combination of the classifier—RF and the selection strategy of SVM-RFE yielded the best performance for grading HCC (AUC: 0.898). Differently, the combination of the classifier—RF and the selection strategy of LASSO regression resulted in the best performance for identifying MVI (AUC: 0.876). Finally, two nomograms were constructed with radiomics score (Rscore) and clinical risk factors, which showed excellent predictive value for both tumor grade (AUC: 0.928) and MVI (AUC: 0.945). CONCLUSION: CT-derived radiomics were valuable for noninvasively assessing the micro-vascular invasion and histopathologic grade of hepatocellular carcinoma. Hepatocellular carcinoma (HCC) has been a tremendous threat to human health for a very long time because of the notoriously high incidence together with the high mortality [1] . At present, the first-line treatment options for different types of HCC contain the surgical resection, radiofrequency ablation, Transhepatic Arterial Chem therapy And Embolization (TACE) and so forth [2] [3] [4] . However, the poor prognosis is broadly regarded as the huge challenge. Accurate prognostic prediction and evaluation may, to some extent, guide the clinical management of HCC [5] . Currently, evaluating the prognosis-related histopathological markers have been accepted as the effective approach for prognostic prediction. For example, high histopathological grade and the presence of micro-vascular invasion indicate the high probability of recurrence, lymphatic metastasis, strong tumor invasion and metastasis [6] [7] [8] [9] . Assessing the prognosis related histopathological factors such as tumor grade [10] , pathological stage [11] , micro-vascular invasion [9] , the expression of some histopathological markers containing Ki67 [12] , CK-19 [13] have drawn innumerable attention. Nevertheless, current gold standard for evaluating the prognostic markers, histopathological examination, is with many disadvantages containing invasiveness, time-consumption and potential sampling bias. Novel approaches with complementary advantages are urgently required. During the past few years, growing attention has been paid to imagebased prognostic prediction. Various prognosis related histopathological markers of HCC including the tumor grade, micro-vascular invasion, capsule formation, and the expression of Ki67 and CK-19 have been broadly assessed through exploring the representative image predictors of cancer [6, 9, 14] . With the core ideology of that images are more than pictures and they are data, radiomics have paved the unprecedented way for exploring the diagnostic markers and models from images [15] . Additionally, the integration of high-throughput radiomics features and robust artificial intelligence-based modes have been widely reported to yield extra clinical benefits in lesion discrimination, disease diagnosis and treatment efficacy prediction [16] [17] [18] . CT-based radiomics have shown great value in evaluating the prognostic markers of HCC [19, 20] . Several previous studies also aimed to apply the CT-derived radiomics for characterizing the histopathological grade, micro-vascular invasion or other pathological markers of HCC [19, 21, 22] . However, limited number of studies aimed to apply the CT-derived radiomics for simultaneously predicting multiple prognostic markers of HCC. Besides, previous results varied for different studies and included cohorts, which demonstrated that more explorations and researches should be conducted. Therefore, this research aims to extract the radiomics features from CT images and then established the machine-learning-based diagnostic models for identifying the histopathological grade and micro-vascular invasion of HCC. Both the image data and other clinical data were accessed from the TCGA-LIHC collection (https://wiki.cancerimagingarchive.net/display/ Public/TCGA-LIHC) in Cancer Imaging Archive (TCIA). The local ethical approval (20-1574AB) and the written informed consents of all patient were successfully obtained, which is declared in the data source. In brief, the Cancer Genome Atlas Liver Hepatocellular Carcinoma (TCGA-LIHC) data collection provided a convenient community for researchers to investigate the hepatocellular carcinoma with insights of radiological findings, pathology, clinical outcome and genotype. Detailed multiinstitution based data description of TCGA-LIHC can be found in the previous research [23] . In total, a dataset of 97 subjects were downloaded from the TCGA-LIHC collection. The available data were included according to the inclusion criterion and exclusion criterion as followings: 1) Pathologically confirmed as HCC without preoperative treatment. 1) The absence of CT image data. 2) The absence of pathological results regarding the micro-vascular invasion or histopathological grade. 3) The poor image quality of CT images. All subjects included into this research underwent abdominal multiphasic dynamic contrast-enhanced CT with the multi-detector row CT (MDCT) units (GE LightSpeed QX/I, GE Healthcare, USA or Siemens Sensation 16, Siemens, Germany). Detailed imaging parameters are listed as follows: 120 kV, auto tube current, field of view (FOV): 320-500 mm× 320-500 mm, scanning matrix: 512 × 512, reconstruction kernel: standard, scan type: helical, slice thickness 5 mm, slice gap: 5 mm, reconstructed section thickness 2 mm. The arterial-phase (AP), venous-phase (VP) and delay-phase (DP) CT were performed at 30-35 s, 65-70 s and 150-180 s after intravenous injection of contrast enhanced agent (Ultravist 370, Bayer Schering Pharma, Berlin, German, Dose: 1.5 mL/kg, injection rate: 3.0 mL/s). Pathological and clinical characteristics were likewise accessed from the TCGA-LIHC collection. Gender (0) and gender (1) respectively represent female and male. HCC was pathologically staged as IA (1), IB (2), II (3), IIIA (4), IIIB (5), IVA (6) and IVB (7) . Percutaneous fine needle aspiration biopsy of liver lesion was performed in patients with local infiltration anesthesia. Then formalinfixed paraffin-embedded biopsy specimens' sections were stained with hematoxylin and eosin (H&E) for the following histopathological evaluation. The schematic flowchart of this study is shown in Fig. 1 . The detailed processes were listed as the followings: The entire tumor in CT images were segmented via two abdominal radiologists with 17 years' and 13 years' experience, respectively, through an open-sourced software named as ITK-SNAP (http://www. itksnap.org/pmwiki/pmwiki.php). The lesions were segmented in the venous-phase (VP) CT images and then the volume of interests were copied to the other phases. The In total, 321 image features were obtained from the CT images (arterial phase, venous phase, delayed phase) of each patient. According to the statistical and algorithmic guidelines [25] , redundant meaningfulness radiomics features will unnecessarily increase the model complexity and then holds potential risk of overfitting, which means the established models only have satisfying performance in training cohorts but have poor performance in validation cohorts. Therefore, it has been widely reported that well-designed feature selection strategies are necessary for establishing robust models [26, 27] . For this study, the image features were selected according to the following steps: (1) The intra-class coefficients (ICC) of each image feature were calculated to quantify the agreement and reproducibility. The image features with the ICC of less than 0.8 were removed. (2) The Student t test was utilized to screen the image features with significant differences between different subgroups (with micro-vascular invasion vs without micro-vascular invasion, high-grade HCC vs low-grade HCC). (3) Next, Least Absolute Shrinkage and Selection Operator (LASSO) regression or Support Vector Machine-Recursive Feature Elimination (SVM-RFE) was carried out to determine the ultimate feature subset. Either for predicting the patients with micro-vascular invasion or the patients with high-grade HCC, machine-learning-based classifiers including the random forest (RF) and support vector machine (SVM) were established for achieving the diagnostic purpose with the different combination of feature subset by means of the different R packages including the randomForest and e1071. Consequently, a total of 8 models were constructed. 5-fold cross-validation was then used to select the best radiomics-based model with the highest area under the curve (AUC) of receiver operating characteristic (ROC) curves. To avoid the sampling bias, stratified sampling was performed in this study. The binary logistic regression model was firstly utilized to screen the independent clinical risk factors and establish the clinical model. The predictive probability of best radiomics model was determined as the Radiomics score (Rscore). Then, the nomogram-based predictors were constructed with Rscore and independent clinical risk factors. ICC was calculated to quantify the intraclass agreement of the feature values given by two observers. The student t test was performed to explore the image features showing significant differences between different subgroups. Diagnostic performance of different models was evaluated by receiver operating characteristic (ROC) curve analysis. The detailed indexes of diagnostic performance included the sensitivity, specificity, area under the curves (AUC) and Youden index. It should be noted that in order to obtain the statistical results with reliability, the establishment, evaluation and comparison of radiomics model, clinical model and nomogram predictor were based on same 5-fold splitting data. P values of less than 0.05 were regarded as statistically different. All the statistical analysis were conducted with the SPSS 26.0 (SPSS, Chicago, IL, USA), R (R language 4.0.3, R Core Team, 2020) and Medcalc (MedCalc Software, Belgium). A total of 97 patients were accessed from the TCGA-LIHC collection (https://wiki.cancerimagingarchive.net/display/Public/TCGA-LIHC) in Cancer Imaging Archive (TCIA). 10 patients were excluded because of the absence of the complete CT images. 3 patients were excluded because of the unavailable histopathological results. 2 patients were excluded as the image quality was poor or incomplete images. Ultimately, 82 patients (Male: 54, Female: 28; Age: 61.8 ± 14.0, Min: 20, Max: 85;) were included. Detailed baseline clinical characteristics were listed in Table 1 . A total of 321 Radiomics features were extracted from the arterial, venous and delayed phase of CT images of each included patient. 39 features were removed due to the low intra-class coefficient (ICC < 0.8). For identifying the histopathological grade and MVI, 217 and 251 features were then removed because there were no significant differences between the subgroups (high grade vs low grade and MVI (+) vs MVI (-)). Next, two feature selection strategies including LASSO Regression and SVM-RFE were respectively performed to further eliminate the redundant features. As Table 2 and Table 3 show, 7 and 10 radiomics features were selected via LASSO regression and SVM-RFE for assessing the tumor grade with regard to 13 and 10 features selected via LASSO regression and SVM-RFE for evaluating the MVI. Next, random combination of two feature selection methods and two machine-learning classifying algorithms resulted in four radiomics-based predictive AP, VP, and DP represent the arterial-phase (AP), venous-phase (VP) and delayphase (DP), respectively. models for either evaluating grade or evaluating MVI. The 5-fold diagnostic performance of above eight models were exhibited in Fig. 2 . Consequently, the 10 features selected via SVM-RFE and 13 features selected via LASSO regression severed as the best feature subsets for evaluating the grade and MVI, respectively. Fig. 3 shows the categorical distribution of the aforementioned two feature subsets. Among the best feature subset for grading HCC, the number of features belonging to First-order features, GLCM Features, GLDM features, GLRLM features, GLSZM features, NGTDM features and shape features were 1 (10%), 2 (20%), 3 (40%), 0, 0, 1 (10%) and 2 (20%), respectively. Additionally, among the best feature subset for assessing MVI, the number of features belonging to First-order features, GLCM Features, GLDM features, GLRLM features, GLSZM features, NGTDM features and shape features were 2 (15.4%), 3 (23.1%), 1 (7.7%), 1 (7.7%), 2 (15.4%), 1 (7.7%) and 3 (23.1%), respectively. Fig. 4 displayed the value distribution of selected features in different subgroups (high grade, low grade, MVI (+), MVI (-)). According to the binary logistic regression established with different clinical factors as independent variables and grade or MVI status as dependent variables, age, gender, alpha fetal protein (AFP) and tumor stage were identified as the independent risk factors of tumor grade (p < 0.05), and age, gender, AFP, tumor stage together with Eastern Cooperative Oncology Group (ECOG) score were identified as independent risk factors of MVI (p < 0.05) ( Table 5) . Therefore, the above risk factors and Rscore were utilized for constructing the Nomograms. The Nomograms utilized to assess the HCC grade and MVI status were displayed in Fig. 5 and Fig. 6 . In addition, Fig. 5 and Fig. 6 also exhibited the diagnostic performance of different models including clinical models established with clinical factors, radiomics models established with radiomics features and nomogram predictors. The nomogram predictor possessed the best performance for predicting the tumor grade (AUC: 0.928) followed by radiomics model (AUC: 0.876) and the clinical model (AUC: 0.731). Similarly, the nomogram predictor also possessed the best performance for identifying the MVI status (AUC: 0.945) followed by the radiomics model (AUC: 0.890) and clinical model (AUC: 0.716) (Fig. 5, Fig. 6 and Table 6 ). As shown in Table 6 , for predicting the HCC grade and MVI status, the diagnostic efficacy of the radiomics model was significantly higher than that of the clinical models. Furthermore, the results also indicated that the diagnostic performance of nomogram predictors for evaluating the grade as well as MVI is significantly better than not only clinical models but Radiomics model (p < 0.05). The highlights of this research are as the followings: (1) The CTderived radiomics features were utilized to construct the diagnostic models for predicting dual prognostic markers including the histopathological grade and MVI. Compared to a lot of previously-reported studies aiming to evaluate the single prognostic factor, more evaluation insights regarding the prognostic indicators will provide more comprehensive characterization of the tumor during clinical management. (2) The included patients in this study were from multi-centers. Besides, as displayed in Table 1 , the patients in this study belonged to multiple races. The aforementioned data source will be conducive to prove the applicability of the strategy proposed in this research. (3) Integrating the clinical risk factors and radiomics features derived from CT images, the nomogram predictors showed excellent diagnostic efficacy for evaluating the histopathological grade and MVI. In this research, most features selected for predicting tumor grade and the MVI status were high-order texture features rather than widelyused first-order features. The results demonstrated that there were only 10.0% and 15.3% first-order features in the best feature subset, which was similar to plenty of previous researches [28] [29] [30] . During daily clinical practice, CT images based diagnostic conclusions are usually drawn by naked eyes. The diagnostic insights are essentially based on the first-order features such as the overall attenuation (mean, median value). Invisible to the naked eyes, a lot of high-order texture features are of great importance for clinical application [31] [32] [33] . On the one hand, texture features are able to serve as the quantitative image markers for biomedical application. On the other hand, texture features can be utilized to construct the diagnostic models for various clinical applications such as tumor diagnosis, treatment efficacy evaluation and prognostic prediction. Moreover, the rapid development of artificial intelligence technology, including machine learning, deep learning, reinforcement learning and transfer learning, also brings unlimited possibilities for radiomics [34, 35] . Aforementioned issues further indicated the advantage of extracting CT-derived radiomics features for biomedical application. In detail, the best feature subset for grading HCC contained 10 features including Median, Autocorrelation, Contrast, Low Gray Level Emphasis (LGLE), Dependence Entropy (DE), Dependence Non-Uniformity (DN), Large Dependence Emphasis (LDE), Coarseness, Elongation and Flatness. Similarly, best feature subset for identifying the MVI status contained 13 features including Mean, Joint Average, Autocorrelation, Cluster Shade, Difference Entropy, Small Dependence Emphasis (SDE), Gray Level Non-Uniformity (GLN), Low Gray Level Zone Emphasis (LGLZE), Zone Percentage (ZP) and Spherical Disproportion. Above features are able to provide the characterization of tumor micro-structural heterogeneity in terms of gray level distribution, inhomogeneity of signal intensity, morphological differences and so forth. For example, Autocorrelation can be applied for quantifying the magnitude of the fineness and coarseness of texture. Tumors with high heterogeneity tend to have a coarser texture [36] . Contrast can be utilized to quantify the variation of local signal intensity [37] . Elongation severs as the measure of irregularity of ROI shape [38] . With the assistance of high-order features hidden under the naked eyes, different clinical models can be established to achieve different clinical goals. In this research, great diagnostic performance for evaluating the tumor grade and MVI were achieved with the radiomics based predictive model. The potential causes were as the followings: (1) Feature selection strategy was carefully designed. The feature selection in this study mainly contained 3 steps. Instable and meaningfulness features were firstly removed according to the ICC and Student t Test. Then, LASSO Regression and SVM-RFE were respectively performed. LASSO Regression and SVM-RFE are two machine learning-based feature selection strategies showing great potential in constructing the clinical predictive models [39, 40] . (2) Two machine learning classifiers including SVM and RF were then established. Compared to conventional linear classifiers such as regression-based models, through nonlinear transformation to high-dimensional feature space, SVM can construct a discriminant function in the high-dimensional feature space to realize the classification of samples, and cleverly avoids the problem of "dimension disaster" [41] . By means of integrating multiple classification tree, the random forest can achieve higher classification accuracy. In addition, due to the introduction of randomness, it has a certain anti-noise ability [42] . (3) Rather than utilizing single feature selection approach and single classifier to construct single model for clinical application, the random combination of two feature selection approaches (LASSO-regression and SVM-RFE) and two classifiers (SVM and RF) altogether yielded 8 predictive models in this research, which was conducive to obtain the model with the best performance. To obtain the more powerful predictors, nomogram-based predictors were constructed with clinical risk factors and the radiomics model. Age, gender, tumor stage along with AFP were screened as the independent risk factors of HCC grade, and age, gender, AFP, tumor stage, and ECOG score were selected as independent risk factors of MVI. High tumor stage and higher expression of AFP were more common in the patients with MVI and high-grade HCC. Furthermore, higher ECOG score, in this study, also indicated a high probability of MVI. The above results were consistent with many previous researches [43] [44] [45] . Besides, our results also demonstrated that age and gender were also associated with the histopathologic grade and MVI of HCC, which corresponded to some previous findings that the age and gender also served as independent risk factors and then were incorporated into the nomogram-based predictors [46] [47] [48] . Importantly, this study suggested that the integration of clinical indicators and radiomics resulted in fascinating diagnostic power for assessing the tumor grade and MVI (AUC > 0.900). The diagnostic efficacy of nomogram-based predictors was significantly better than that of either the radiomics model or clinical model. The above results revealed that CT-based radiomics can be applied for simultaneously predicting multiple important prognostic markers, which will be of great clinical potential for many other applications not limited to hepatic diseases but other cancers. The excellent predictive power may result from the following factors: (1) The combination of clinical risk factors and radiomics led to a comprehensive characterization of HCC from multiple perspectives. (2) radiomics-based model laid a solid foundation for the excellent performance of nomogram-based predictors. Several limitations should be acknowledged in this study. Firstly, although some approaches such as cross-validation have been carried out, the sample size of this study is not very large, which may hold potential risk for statistical bias. In the subsequent study, efforts need to be made to enroll more patients and further enhance the evidence. Secondly, no patient cohort was utilized as an external validation group. Thirdly, only two prognostic factors including MVI and grade were incorporated as the predictive target. More important markers should be incorporated to evaluate the feasibility of applying the radiomics-based model for predicting multiple markers. This research indicated that CT-derived high-throughput radiomics features can serve as the quantitative biomarkers for characterizing hepatocellular carcinoma. Furthermore, with the assistance of machine learning, accurate and non-invasive prediction of histopathological grade as well as micro-vascular invasion can be achieved, which holds great potential for guiding the clinical management and predicting the prognosis of patients with HCC. The local ethical approval (20-1574AB) was obtained from the Institutional Review Board of Qiqihar Medical University. The written informed consents of all patient were successfully obtained, which is declared in the data source. Tumour evolution in hepatocellular carcinoma Combination of interventional therapies in hepatocellular carcinoma EASL clinical practice guidelines: management of hepatocellular carcinoma AASLD guidelines for the treatment of hepatocellular carcinoma Circulating tumour DNA methylation markers for diagnosis and prognosis of hepatocellular carcinoma Prediction of the histopathological grade of hepatocellular carcinoma using qualitative diffusion-weighted, dynamic, and hepatobiliary phase MRI Role of baseline volumetric functional MRI in predicting histopathologic grade and patients' survival in hepatocellular carcinoma Diffusion-weighted imaging (DWI) of hepatocellular carcinomas: a retrospective analysis of the correlation between qualitative and quantitative DWI and tumour grade Blood oxygen level-dependent liver MRI: Can It predict microvascular invasion in HCC? Preoperative assessment of hepatocellular carcinoma tumor grade using needle biopsy: implications for transplant eligibility Increased prevalence of regulatory T cells in the tumor microenvironment and its correlation with TNM stage of hepatocellular carcinoma DNA topoisomerase IIα and Ki67 are prognostic factors in patients with hepatocellular carcinoma CK19 and glypican 3 expression profiling in the prognostic indication for patients with HCC after surgical resection Diagnostic value of Gd-EOB-DTPA-enhanced MRI for the expression of Ki67 and microvascular density in hepatocellular carcinoma Radiomics: images are more than pictures, they are data A deep look into radiomics A review of original articles published in the emerging field of radiomics Noninterpretive uses of artificial intelligence in radiology Radiomic analysis of contrast-enhanced CT predicts microvascular invasion and outcome in hepatocellular carcinoma Preoperative radiomics nomogram for microvascular invasion prediction in hepatocellular carcinoma using contrast-enhanced CT A radiomics nomogram for preoperative prediction of microvascular invasion risk in hepatitis B virus-related hepatocellular carcinoma Preoperative prediction for pathological grade of hepatocellular carcinoma via machine learning-based radiomics Radiology data from the cancer genome atlas liver hepatocellular carcinoma [TCGA-LIHC] collection Primary carcinoma of the liver. A study of 100 cases among 48,900 necropsies An overview of overfitting and its solutions MR imaging of rectal cancer: radiomics analysis to assess treatment response after neoadjuvant therapy CT radiomics, radiologists, and clinical information in predicting outcome of patients with COVID-19 pneumonia Radiomics analysis of susceptibility weighted imaging for hepatocellular carcinoma: exploring the correlation between histopathology and radiomics features Development and validation of a contrast-enhanced CT-based radiomics nomogram for prediction of therapeutic efficacy of anti-PD-1 antibodies in advanced HCC patients Clear cell renal cell carcinoma: CT-based radiomics features for the prediction of Fuhrman grade The predictive value of CT-based radiomics in differentiating indolent from invasive lung adenocarcinoma in patients with pulmonary nodules CT-based radiomics and machine learning to predict spread through air space in lung adenocarcinoma Correlation between CT based radiomics features and gene expression data in non-small cell lung cancer Machine learning-based radiomics for molecular subtyping of gliomas Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer Classification of the glioma grading using radiomics analysis Computational radiomics system to decode the radiographic phenotype Radiomic mapping model for prediction of Ki-67 expression in adrenocortical carcinoma Development and validation of a radiomics nomogram for preoperative prediction of lymph node metastasis in colorectal cancer Radiomics assessment of bladder cancer grade using texture features from diffusion-weighted imaging What is a support vector machine? Random forest for bioinformatics. Ensemble Machine Learning A model combining TNM stage and tumor size shows utility in predicting recurrence among patients with hepatocellular carcinoma after resection Correlation analysis of preoperative serum alphafetoprotein (AFP) level and prognosis of hepatocellular carcinoma (HCC) after hepatectomy Brain metastases from hepatocellular carcinoma: prognostic factors and outcome A genomic-clinicopathologic nomogram for predicting overall survival of hepatocellular carcinoma Nomograms based on inflammatory biomarkers for predicting tumor grade and micro-vascular invasion in stage I/II hepatocellular carcinoma Preoperative radiomics nomogram for microvascular invasion prediction in hepatocellular carcinoma using contrast-enhanced CT None. None. European Journal of Radiology Open 9 (2022) 100424