key: cord-0847544-x0lq0ogm
authors: Yang, Han; Chen, Hongjie; Zhang, Guorui; Li, Hongyi; Ni, Ran; Yu, Yali; Zhang, Yepeng; Wu, Yongjun; Liu, Hong
title: Diagnostic value of circulating genetically abnormal cells to support computed tomography for benign and malignant pulmonary nodules
date: 2022-04-09
journal: BMC Cancer
DOI: 10.1186/s12885-022-09472-w
sha: b3cf39c842a14c829f8d4684cb8a7d4f174bab43
doc_id: 847544
cord_uid: x0lq0ogm

BACKGROUND: The accuracy of CT and tumour markers in screening lung cancer needs to be improved. Computer-aided diagnosis has been reported to effectively improve the diagnostic accuracy of imaging data, and recent studies have shown that circulating genetically abnormal cell (CAC) has the potential to become a novel marker of lung cancer. The purpose of this research is explore new ways of lung cancer screening. METHODS: From May 2020 to April 2021, patients with pulmonary nodules who had received CAC examination within one week before surgery or biopsy at First Affiliated Hospital of Zhengzhou University were enrolled. CAC counts, CT scan images, serum tumour marker (CEA, CYFRA21–1, NSE) levels and demographic characteristics of the patients were collected for analysis. CT were uploaded to the Pulmonary Nodules Artificial Intelligence Diagnostic System (PNAIDS) to assess the malignancy probability of nodules. We compared diagnosis based on PNAIDS, CAC, Mayo Clinic Model, tumour markers alone and their combination. The combination models were built through logistic regression, and was compared through the area under (AUC) the ROC curve. RESULTS: A total of 93 of 111 patients were included. The AUC of PNAIDS was 0.696, which increased to 0.847 when combined with CAC. The sensitivity (SE), specificity (SP), and positive (PPV) and negative (NPV) predictive values of the combined model were 61.0%, 94.1%, 94.7% and 58.2%, respectively. In addition, we evaluated the diagnostic value of CAC, which showed an AUC of 0.779, an SE of 76.3%, an SP of 64.7%, a PPV of 78.9%, and an NPV of 61.1%, higher than those of any single serum tumour marker and Mayo Clinic Model. The combination of PNAIDS and CAC exhibited significantly higher AUC values than the PNAIDS (P = 0.009) or the CAC (P = 0.047) indicator alone. However, including additional tumour markers did not significantly alter the performance of CAC and PNAIDS. CONCLUSIONS: CAC had a higher diagnostic value than traditional tumour markers in early-stage lung cancer and a supportive value for PNAIDS in the diagnosis of cancer based on lung nodules. The results of this study offer a new mode of screening for early-stage lung cancer using lung nodules. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12885-022-09472-w.

Lung cancer is the main contributor to cancer mortality globally [1, 2] . The main screening method for the early diagnosis of lung cancer is low-dose spiral CT (LDCT) when lung nodules are small in size, for which aspiration biopsy is not suitable. However, according to the report of the National Lung Screening Test (NLST), only 3.6% of lung nodules screened by LDCT are diagnosed as lung cancer [3] , and this situation often causes overdiagnosis or a significant delay in the early diagnosis of lung cancer, and patients may lose the opportunity to receive timely treatment [4] . Moreover, due to differences in the experience and understanding of imaging readers, there remains a need for a method to assist in the analysis of CT results.

To date, there have been considerable efforts to improve the efficiency of diagnosis of lung cancer based on imaging, which includes computer-aided diagnosis (CAD) systems. Indeed, CAD systems can help in detecting lung nodules in LDCT and in determining the nature of nodules by extracting and analysing the imaging characteristics of nodules, including their size, shape, and density, among others [5] . A matched case-control study using NLST data found that the CAD image analysis method significantly improves diagnostic accuracy for lung nodules detected at low-dose CT [6] . Nevertheless, the imaging features of early-stage lung cancer are usually atypical, and it is still a challenge to use CAD alone to separate small malignant nodules from the majority of benign nodules. Furthermore, CAD lacks rigorous evidence to make explainable medical decisions because of the black-box-based inference process of deep learning [7] . Therefore, CAD cannot be applied for medical diagnosis and decision-making alone, yet the combination of multiple clinical indicators may help to improve diagnostic accuracy [7] [8] [9] .

Besides a more reliable method to analyse and interpret CT results, biomarker tests from blood sample are also with great potential in lung cancer diagnosis. In addition to traditional tumour markers, noninvasive liquid biopsies, such as circulating free nucleic acids (RNA and DNA) and circulating tumour cells (CTCs), have been reported in recent years.

However, liquid biopsy has not yet been adopted in routine clinical practice owing to many limiting factors [10] , and traditional tumour markers are limited because of their low sensitivity and false positives caused by infection or other factors [11] . Moreover, circulating tumour cells (CTCs) of lung cancer often display nonepithelial characteristics, and CTCs are difficult to detect through epithelial cell adhesion molecule (EpCAM)dependent methods [12] . The recently proposed biomarker of circulating genetically abnormal cells (CACs) may solve this dilemma.

CACs are defined as peripheral blood mononuclear cells carrying mutations on chromosome 3 (3p22.1, 3q29) and chromosome 10 (10q22.3, CEP10); the detection of these cells are not EpCAM dependent and therefore overcome the limitation of CTCs detection [13] . Abnormalities at the above loci have been shown through comparative genomic hybridization analysis to commonly occur in lung cancer [14] . Katz et al. then confirmed genomic abnormalities in the sputum, tissue and blood of patients with non-small-cell lung cancer (NSCLC) [15] [16] [17] . Katz et al. also proved that CACs have auxiliary diagnostic value in different stages of lung cancer, with the latest research showing a sensitivity and specificity of 88.8% and 100%, respectively, for lung cancer diagnosis [17] . Therefore, CACs have great potential for diagnosing pulmonary nodules [18] .

In this work, we retrospectively analysed data for patients with pulmonary nodules and attempted to identify a novel biomarker to support the ability of CT to differentiate malignant from benign pulmonary nodules. The objective of this study was to explore new ways of diagnosing pulmonary nodules by establishing new diagnostic models based on artificial intelligence-based CAD and comparing the diagnostic efficiency of different models.

This was a retrospective study of patients with pulmonary nodules detected by CT at First Affiliated Hospital of Zhengzhou University; Totally, 111 patients were included from May 2020 to April 2021.

The inclusion criteria for the study were as follows: (1) ≥ 18 years of age; (2) pulmonary nodule diameter no more than 30 mm (measured by CT scan), including single and multiple pulmonary nodules; (3) diagnosis histologically confirmed using nonsurgical biopsy (including fibre bronchoscope biopsy, computed tomography or ultrasonic-guided percutaneous transthoracic biopsy) or surgical resection; and (4) CAC tests performed within 1 week prior to surgery or biopsy. The exclusion criteria were as follows: (1) CT slice thickness greater than 2 mm; Keywords: Circulating genetically abnormal cells (CAC), Pulmonary nodules, Lung cancer, Early diagnosis, Computed tomography (CT)

(2) a history of malignant tumours; (3) malignant nodules that were not classified as stage I based on the 8th edition of the American Joint Committee on Cancer (AJCC) staging system [19] ; and (4) malignant nodules that were not primary malignant tumours of the lung. Ultimately, 93 patients were enrolled and divided into benign and malignancy groups based on histopathologic results (Fig. 1 ). Tumour pathology was classified according to the World Health Organization (WHO) classification standard of lung tumours (2015 edition) [20] .

Clinical data for the patients were collected, including sex, age, smoking history and family history of malignant tumours. The results of preoperative serum tumour marker levels, including carcinoembryonic antigen (CEA), cytokeratin fragment 21-1 (CYFRA21-1) and neuron-specific enolase (NSE), for 66 patients were collected. The chest CT imaging data for the enrolled patients were separately exported, and the imaging features of nodules (including the diameter, type, location, counts, number and spiculation of nodules) were independently assessed by two senior physicians. When opinions differed, a consistent conclusion was reached through discussion with the third senior physician.

PNAIDS is an artificial intelligence-based CAD that applies machine learning technology and a deep convolutional neural network to realize 3D reconstruction and segmentation of nodules and predict the malignant probability of pulmonary nodules [21] . All chest CT scans were obtained during deep inspiration; the CT images were of no more than 5 mm of layer thickness and reconstructed with a slice thickness less than 2 mm. Imaging of the lung window was downloaded in DICOM format and uploaded to a cloud platform in the same format. The malignancy probability of each nodule was calculated. The highest malignancy probability value of all nodules was used for analysing patients with multiple nodules.

Ten millilitres of peripheral venous blood was collected within one week before surgery or biopsy, blood samples were collected into an anticoagulation tube containing EDTA and fixed with cell preservation solution (including solution A containing phosphatase inhibitor and protease inhibitor and solution B containing formaldehyde) within 2 h. Peripheral blood mononuclear cells (PBMCs) were isolated by Ficoll-Hypaque density gradient centrifugation within 96 h. PBMCs were diluted to 40,000/100 μl, and a smear was prepared. Four-colour (3p22.1, 3q29 and 10q22.3, CEP10) fluorescence in situ hybridization was performed using a mononuclear cell chromosome abnormality detection kit (Zhuhai SanMed Biotech Inc.). The scanning, imaging and analysis procedures were automatically completed by a pathological section scanner (The Duet System, Allegro Plus, Bioview Ltd.). A total of 10,000 cells were randomly selected for a 15-layer cell scan, and the number of CACs was calculated. CACs were defined as cells exhibiting abnormal amplification at specific sites and at least three fluorescent signals at two or more specific probe sites (as presented in Fig. 2 ).

The widely accepted Mayo Clinic model [22] was also performed to predict the malignant probability of nodules. The model expresses the malignancy probability as a function of six predictors: (1) probability of malignancy = e x / (1 + e x ); (2) x = -6.8272 + (0.0391 × age) + (0.7917 × smoking) + (1.3388 × cancer) + (0.1274 × nodule diameter) + (1.0407 × spiculation) + (0.7838 × upper lobe); (3) e is the natural logarithm; age is the patient's age (years), if the patient is a current or former smoker, smoking = 1 (otherwise = 0); if the patient has a history of extrathoracic malignancy more than 5 years, cancer = 1 (otherwise = 0); the nodule diameter is the diameter of the nodule (mm); if there are burrs at the edge of the nodule, 

Statistical analyses were performed using SPSS 21.0. Quantitative variables are expressed as the mean ± standard deviation (X ± S) or median and quartiles [M(QL, QU)], and independent sample t-tests or Mann-Whitney U tests were applied. Categorical variables are expressed as n (%) and analysed using the Chi-square test or Fisher's exact test. A receiver operating characteristic curve (ROC) and area under the curve (AUC), sensitivity (SE), specificity (SP), positive predictive value (PPV), negative predictive value (NPV) and Youden index was used to determine the cut-off value. To validate the robustness of the diagnostic model, logistic regression and Fisher discriminate analysis were both performed. The Chi-square test was applied for correlation analysis of classification variables. Two-sided P < 0.05 was considered significant. Correlation between numerical variables was analysed by calculating the Spearman rank correlation coefficient, with two-sided P < 0.01 considered significant. Boxplots, forest plots, and heatmaps were drawn in R (v4.0.10). DeLong's test was applied to compare AUC between ROC curves (R package pROC).

A total of 111 patients were initially screened in this study, among which 18 were excluded for different reasons (7 cases were not stage I, 3 were not primary lung cancer, 4 involved a malignancy history, 4 were without slice CT data) (Fig. 1) . Ninety-three patients were ultimately included in the analysis, of which 59 (63.4%) were diagnosed with lung cancer and 34 (36.6%) with benign nodules. There were 39 males (41.9%) and 54 females (58.1%), with a mean age of 53.11 ± 10.74 years.

There were statistically significant differences in sex (P = 0.003), smoking history (P = 0.035), and type of nodules (P = 0.001), whereas no differences in age, family history of cancer, diameter of nodules, multiple nodules, upper lobe nodules, or burr signs were found between the benign group and the malignancy group. As none of the females had a history of smoking, a subgroup analysis of smoking history was performed, stratified by sex. Stratified analysis showed a nonsignificant difference in smoking history between the benign and malignancy groups in the male subgroup, with Chi-square test statistic of 0.300 (P = 0.584). The basic characteristics of the two groups are shown in Table 1 .

There was no significant differences between the benign group and the malignancy group at the time before surgery or biopsy (3(1,5) days for benign group and 4(2,5) days for malignancy group; P = 0.393). The median CAC counts was 1.5(0, 3) in the benign group and 4 (3, 6) in the malignancy group; the Fig. 2a) , the blue and yellow probes located in chromosome 10 have two signals (see blue and yellow arrows in Fig. 2a) , indicating normal cells. b) both the green and red probe which located in chromosome 3 have three signals (see green and red arrows in Fig. 2b) , the blue and yellow probes located in chromosome 10 have two signals (see blue and yellow arrows in Fig. 2b) , indicating that the cell has abnormal amplification on chromosome 3, which is CAC median PNAIDS was 67.5% (59.5%, 78.8%) and 82.0% (70.0%, 90.0%), respectively. The distribution of CAC (U = 1562.5) and PNAIDS (U = 1396.5) between the benign and malignancy groups was statistically significant, at P < 0.001 and P = 0.002, respectively (Fig. 3) .

Based on PNAIDS, CAC counts, Mayo Clinic model, and tumour marker levels in the benign and malignancy groups, the ROCs were drawn (Fig. 4) . The AUC, 95% confidence interval (CI), and Youden index of all these indicators are presented in Fig. 5 . SE, SP, PPV and NPV are shown in 

Correlation analysis among CAC counts, PNAIDS, age, CEA, CYFRA21-1, NSE and nodule diameter showed a weak correlation between CAC counts and age (r = 0.311, P = 0.002), NSE and diameter of lung nodules (r = 0.323, P = 0.008). PNAIDS did not exhibited significant correlation with any of these indicators (Fig. 6) . Notably, no significant correlation between PNAIDS and CAC was observed.

Numerical variables (PNAIDS and CAC counts) were converted to categorical variables according to cutoff values. Because PNAIDS was accurate to 2 decimal places, it was classified by whether it was less than 70.0%. In correlation analysis of PNAIDS, CAC counts and other categorical variables, PNAIDS correlated significantly with the type of nodule (P = 0.035), but no statistical significance with other indicators was detected. In addition, there was a nonsignificant correlation between CAC counts and all these categorical variables ( Table 3) .

As tumour markers tend to be used in combination in the clinic, a logistic regression model named TM was established using CEA, CYFRA21-1 and NSE, and its ROC curve used to diagnose lung nodules (Fig. 7) . First, Model 1 was established by combining TM and PNAIDS. Second, PNAIDS and CAC counts were combined to build Model 2. To better apply CAC counts to the model, we transformed this marker into ln (CAC counts + 1), which was then applied to the logistic regression model. Similarly, the transformed data were used to build Model 3, which combined PNAIDS, CAC counts and TM. The ROCs of the three models are shown in Fig. 7; Fig. 5 and Table 2 formulas of logistic regression models are presented in Additional file 2.

The best cut-off point was obtained according to the maximum Youden index in the ROC curve of Model 2 to divide patients into predicted benign and predicted malignancy groups. The distribution of CACs (U = 1936.0) and PNAIDS (U = 1550.5) between these groups was significantly different, with both at P < 0.001 (Fig. 8 ).

In this study, the value of CT in discriminating lung cancer from small lung nodules was questioned, even with applying a current advanced artificial intelligence screening method which significantly increased the [23] . Therefore, the traditional mode of small lung nodule diagnosis should be investigated. Previous works as well as this work suggest that CACs are an ideal candidate marker, as such a test only requires the simple process of blood collection and its accuracy has been reported in early lung cancer patients [18, 24, 25] .

Despite the unsatisfactory result of the efficiency of CT alone, this approach still performed better than the Mayo Clinic model and tumour markers. As a common clinical imaging examination, CT has unquestionable value in the diagnosis of a variety of lung diseases [26] . A multicentre study involving 534 patients showed that PNAIDS had a higher diagnostic accuracy than the Mayo Clinic model and radiologists [21] , consistent with our results. Several classic clinical indicators and imaging features are included in the Mayo Clinic model but show poor efficiency in distinguishing early lung cancer from benign pulmonary nodules. Thus, more specific imaging data may have higher diagnostic value than traditional imaging features.

Moreover, the diagnostic value of CACs in comparison with other traditional biomarkers has been confirmed using a cohort of patients with lung nodules, which is an independent validation to the work conducted by Ye et al. [24] . In the present study, the highest diagnostic efficiency was achieved when a CAC counts of 3 was chosen as the cut-off value, this result is similar to the study conducted by Qiu [25] and Ye [18] et al. Overall, CAC counts presented better diagnostic value than commonly used tumour markers (CEA, CYFRA21-1, and NSE), which agrees with the results of several studies reporting the advantages of CACs for the diagnosis of lung cancer [17, 18, 24, 25, 27] , CACs have the potential to become a better novel diagnostic marker of lung cancer.

Biomarkers and imaging are often used in combination to improve diagnostic accuracy [8, 28] , our results also showed that the efficiency has been greatly improved when CAC is combined with PNAIDS in the diagnosis of lung nodules. Correlation analysis further suggested that PNAIDS and CACs are independent of each other, which is consistent with the premise of the model that variables are independent. Interestingly, Model 2, which combined CAC counts and PNAIDS, displayed significantly higher diagnostic efficiency than CAC counts or PNAIDS alone. CAC counts and PNAIDS reflect the biogenetics and imaging features of patients, respectively. The 95% confidence intervals of the AUC of NSE, CYFRA21-1, CEA and the combined index TM all contained 0.5. However, Models 1 and 3, which further included TM, did not show improved diagnostic efficiency compared with PNAIDS or with PNAIDS combined with CAC. This result also suggests the limitation of the currently clinically used TM, that is, lack of sensitivity and specificity.

In addition, we analysed correlation between the indicators and demographic characteristics, which indicated a weak correlation between CAC counts and age. In the study by Liu [27] , there was no relationship between a positive CAC result (CAC counts ≥ 1) and age (age ≥ 60), which is contrary to our finding. Liu et al. treated age and CAC counts as dichotomic variables, which may have led to poorer testing efficiency, whereas we directly analysed the correlation between age and CAC counts. The observed correlation may be due to genetic mutations that accumulate in cells with age, which also suggests that the age of the population may be a factor that needs to be controlled for or corrected in CAC detection. It should be noted that age is also a risk factor for NSCLC, further research is needed to explore whether there is a biological significance between CAC and age. The serum level of NSE had a significant correlation with nodular diameter, which can be explained by tumour burden [29, 30] . PNAIDS only showed a correlation with the type of nodule, suggesting the independence of imaging features and the value of imaging data for early screening of lung cancer.

Nevertheless, there are limitations in this study. First, as most malignant nodules screened by CT were adenocarcinoma, stratified analysis of different pathological types could not be applied to further explore the potential bias resulting from other pathological types of lung cancer. Second, smoking history was not common among the female patients, who comprised most cases; therefore, a larger sample size is required to assess the association between smoking and CAC counts or other indicators. Third, the sample size of this study was relatively small. Although the main statistical analysis yielded positive results, more studies with larger sample sizes are still needed to further confirm the practicability of our findings. Fourth, there is still scope to improve the diagnostic accuracy of PNAIDS, more data will be included in the future to train the PNAIDS model and construct a predictive model with higher accuracy. It is noteworthy that detection results of CACs can be obtained within 5 working days, quickly providing a more reliable auxiliary diagnostic basis when combined with PNAIDS, with a wide range of clinical application prospects. Our results indicate that this diagnostic model is promising for lung nodule diagnosis; with more data support in the future, it may be able to be extended worldwide.

In conclusion, this work suggests that CACs, as a novel lung cancer biomarker from liquid biopsy, show higher diagnostic value than traditional tumour markers in earlystage lung cancer and a supportive value for CT scans in the diagnosis of cancer based on small lung nodules. The results of this study pave the way for further applications of CACs and offer a potential new mode for screening early-stage lung cancer using small lung nodules. 

Ready to submit your research Ready to submit your research ? Choose BMC and benefit from:

? Choose BMC and benefit from:

Cancer incidence, mortality, and burden in China: a time-trend analysis and comparison with the United States and United Kingdom based on the global epidemiological data released in 2020

Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries

Reduced lung-cancer mortality with low-dose computed tomographic screening

Lung cancer LDCT screening and mortality reduction -evidence, pitfalls and future perspectives

Evolving the pulmonary nodules diagnosis from classical approaches to deep learning-aided decision support: three decades' development course and future prospect

Added value of computer-aided CT image features for early lung cancer diagnosis with small pulmonary nodules: a matched case-control study

Automated lung nodule detection and classification using deep learning combined with multiple strategies

Development of a machine learning-based multimode diagnosis system for lung cancer

Autoantibodies as diagnostic biomarkers for lung cancer: a systematic review

Liquid biopsy for early detection of lung cancer

Thymidine kinase 1 combined with CEA, CYFRA21-1 and NSE improved its diagnostic value for lung cancer

Circulating tumor cells in patients with lung cancer: developments and applications for precision medicine

Genetically abnormal circulating cells in lung cancer patients: an antigen-independent fluorescence in situ hybridization-based case-control study

Genomic profiles in stage I primary non small cell lung cancer using comparative genomic hybridization analysis of cDNA microarrays

Automated detection of genetic abnormalities combined with cytology in sputum is a sensitive predictor of lung cancer

1 and 10q22.3 deletions detected by fluorescence in situ hybridization (FISH): a potential new tool for early detection of non-small cell lung Cancer (NSCLC)

Identification of circulating tumor cells using 4-color fluorescence in situ hybridization: validation of a noninvasive aid for ruling out lung cancer in patients with low-dose computed tomography-detected lung nodules

Circulating genetically abnormal cells add non-invasive diagnosis value to discriminate lung cancer in patients with pulmonary nodules ≤10 mm

The new 8th TNM staging system of lung cancer and its potential imaging interpretation pitfalls and limitations with CT image demonstrations

The 2015 World Health Organization classification of lung tumors: impact of genetic, clinical and radiologic advances since the 2004 classification

Artificial intelligence based on deep learning for differential diagnosis between benign and malignant pulmonary nodules: a real-world, multicenter, diagnostic study

The probability of malignancy in solitary pulmonary nodules. Application to small radiologically indeterminate nodules

Diagnostic value of artificial intelligence in earlystage lung cancer

Detection of circulating genetically abnormal cells using 4-color fluorescence in situ hybridization for the early detection of lung cancer

Application of circulating genetically abnormal cells in the diagnosis of early-stage lung cancer

Diagnostic value and key features of computed tomography in Coronavirus Disease

Detection of circulating genetically abnormal cells in peripheral blood for early diagnosis of nonsmall cell lung cancer

Significance of tumor-associated autoantibodies in the early diagnosis of lung cancer

Comprehensive analysis of marker gene detection and computed tomography for the diagnosis of human lung cancer

Prognostic value of neuron-specific enolase for small cell lung cancer: a systematic review and meta-analysis

The authors would like to thank Professor Bingjie Li for study design work and for his critical review of the manuscript.

The online version contains supplementary material available at https:// doi. org/ 10. 1186/ s12885-022-09472-w.Additional file 1.

Authors' contributions HY, HC and YW designed the study; data collection was performed by RN, YY and YZ; HYL and GZ analyzed and interpreted the data; the first draft of the manuscript was written by HY and HL. All authors read and approved the final manuscript.

This study was funded by the program for Health Commission of Henan Province (SB201901016), The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the manuscript.

To preserve patient confidentiality the datasets generated for this study are not publicly available, but are available from the corresponding author on reasonable request.

Ethics approval and consent to participate Approval was obtained from the ethics committee of First Affiliated Hospital of Zhengzhou University (Ethics approval number: 2021-KY-0606-001). Due to the retrospective study design, the ethical review board approved a waiver of written informed consent.

Not applicable.

There is no conflict of interest involved in this study.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.