key: cord-0821670-rhxv9fu4 authors: Kumar, Yogesh; Gupta, Surbhi; Singla, Ruchi; Hu, Yu-Chen title: A Systematic Review of Artificial Intelligence Techniques in Cancer Prediction and Diagnosis date: 2021-09-27 journal: Arch Comput Methods Eng DOI: 10.1007/s11831-021-09648-w sha: b437347cb6f52903ef697b95bdb2e0f7a904e852 doc_id: 821670 cord_uid: rhxv9fu4 Artificial intelligence has aided in the advancement of healthcare research. The availability of open-source healthcare statistics has prompted researchers to create applications that aid cancer detection and prognosis. Deep learning and machine learning models provide a reliable, rapid, and effective solution to deal with such challenging diseases in these circumstances. PRISMA guidelines had been used to select the articles published on the web of science, EBSCO, and EMBASE between 2009 and 2021. In this study, we performed an efficient search and included the research articles that employed AI-based learning approaches for cancer prediction. A total of 185 papers are considered impactful for cancer prediction using conventional machine and deep learning-based classifications. In addition, the survey also deliberated the work done by the different researchers and highlighted the limitations of the existing literature, and performed the comparison using various parameters such as prediction rate, accuracy, sensitivity, specificity, dice score, detection rate, area undercover, precision, recall, and F1-score. Five investigations have been designed, and solutions to those were explored. Although multiple techniques recommended in the literature have achieved great prediction results, still cancer mortality has not been reduced. Thus, more extensive research to deal with the challenges in the area of cancer prediction is required. The word cancer comes from the ancient Greek kapkivoc, which means crab and tumor. Cancer was introduced to the medical world in the 1600 s and is associated with abnormally growing cells that can invade or spread to other parts of the body [136] . The uncontrolled growth of cells starts from a site in the human body and further spreads to other body parts known as cancer metastasis [43, 172] . Cancer cells are categorized into benign and malignant cells. The benign cells do not spread to other parts, while malignant cells metastasize and are considered more destructive. Due to high mortality and recurrence rate, its process of treatment is very long and costly. There is a need to accurately diagnose it early to enhance cancer patient's survival rate. It is a genetic disease triggered due to genetic mutations that control our cell's function, especially how they grow and divide. As the tumor cells continue to grow, additional changes will occur. In a nutshell, cancer cells have more genetic changes, such as mutations in DNA, than normal cells [116] , 110] . Though the immune system generally discards damaged or abnormal cells from the body, few cancer cells can hide from the immune system. The tumor also uses the immune system to grow and stay alive [179] . The name of the cancer type is based on the site where tumor cells grow, for example, cancer that arises in the lungs and spreads to the liver is called lung cancer. Cancer diagnosis includes three predictive predictions related to cancer risk assessment, cancer recurrence, and cancer survivability prediction. Initially, the probability of cancer occurrence is assessed, followed by the second step, predicting cancer recurrence. The last step is to predict the aspects like progression, life expectancy, tumordrug sensitivity, survivability [95] . The motivation behind this research is the rapid growth in cancer incidence and mortality cases worldwide [10] . The reasons are complex but reflect both aging and growth of the population and changes in the prevalence and distribution of the main risk factors for cancer. Figure 1 depicts the cancer incidence cases and death statistics reported by the American Cancer Society and other reliable resources. Multiple investigations have been done in cancer research; for example, Rong et al. [142] have led a mortality and survival study by gender orientation. Dolatkhah et al. [49] have introduced the investigation that revealed the endurance information and pattern examination of malignant breast growth in Iran. Goodarzi et al. [65] had introduced the assessment dependent on distinct cross-sectional malignant growth studies. Azamjah et al. [13] aimed to determine the 25-year breast cancer mortality rate in 7 super regions defined by the Health Metrics and Evaluation (IHME). Momenimovahed et al. [115] presented a study that determined that breast cancer incidence varies significantly with race and ethnicity and is higher in developed countries. Haggar et al. [66] introduced the examination which demonstrated the frequency, mortality, and survival rates for colorectal malignancy are with consideration paid to provincial varieties and changes after some time. Zhang et al. [184] led an investigation to gather the CRC frequency information from the Cancer Incidence in Five Continents. Wong et al. [174] observed a positive correlation between incidence and country-specific socio-economic development. Nguyen et al. [124] summarized the diagnosis and treatment of thyroid cancer, with recommendations from the American Thyroid Association regarding thyroid nodules and differentiated thyroid cancer. Lee et al. [176] have stated that from March 18 to April 26, 2020, 800 patients analyzed with a diagnosis of cancer and symptomatic COVID-19. 412 (52%) patients had a mild COVID-19 disease course. 226 (28%) patients died, and the risk of death was significantly associated with advancing patient age. Al-Zhou et al. [6] evaluated the demographic characteristics and histological trends of skin cancer in Southern areas of Yemen. Artificial Intelligence (AI) is one of the exceptional achievements of computer science conceived around the 1940s [5, 130] . AI has marked its significance in advanced clinical diagnostics by providing unique opportunities to incorporate the tools into the healthcare area [4, 131] . AI aims to analyze the associations between treatment techniques and patient outcomes. In cancer research, AI has proved its potential to affect several facets of cancer therapy, improved the accuracy and speed of diagnosis, and provided more reliable clinical decisions, leading to better health outcomes [182, 183] . AI provides an unprecedented cancer prediction accuracy level higher than a general statistical expert [152, 180] . Thus, AI-based cancer detection models can assist in health centers and help medical experts affirm their medical verdicts without any obstruction. Hence, the article aims to highlight the contribution made by the researchers in the field of artificial intelligence techniques for the early detection and diagnosis of cancer. We conducted an extensive survey of the conventional machine and deep learning models proposed in cancer research. The paper presents a comparative analysis of the existing research works using AI-based techniques and medical imaging for cancer diagnosis, medical imaging for diagnosis, and automated analysis in cancer diagnosis. Most of the techniques proposed in the different papers were based on the deep learning framework and provided appreciable prediction outcomes. The paper provides a description of cancer complications and clinical applications, cancer classification using AI-based techniques, the role of deep learning in cancer research, limitations of cancer predictionrelated using automated learning, multiple investigations, and challenges corresponding to cancer research using AIbased techniques. The rest of the paper is organized as follows. Section 2 elaborates the research methodology. This section discusses the approach used for selecting the literature. Section 3 highlights the Cancer complications and clinical Applications. Section 4 expresses the reported work, which covers the deep learning perspective in cancer. This section further discusses the comparative analysis, which includes the challenges of the current work with performance evaluation using various other parameters. Section 5 delivers a thorough discussion; all the investigations are discussed in this section. Section 6 concludes the paper and discusses future directions. We conducted this systematic review under the PRISMA guidelines [40] . We performed an efficient search for selecting research articles on three different electronic databases, i.e., the web of science, EBSCO, and EMBASE. These are all openly available web indexes that list the entire content or metadata of academic writings. The articles were selected using the query ((Artificial Intelligence) or (Cancer Diagnosis) or (Early Detection) or (Machine Learning) or (Deep Learning)). The exclusion and inclusion standards used to select the articles are discussed in Sect. 2.1. Figure 2 presents the PRISMA flowchart depicting the detailed screening of the collected papers. The articles published from 2009 to April 2021 have been included in this study. Total 350 studies were selected, and after removing duplicate ones, 275 studies remained. Subsequently, 210 papers were selected, and the studies focused on diseases other than cancer, treatment & surgery, a language other than English were excluded. Also, after this phase, the complete articles were evaluated, and the research articles that used methods other than AI-based techniques were also excluded from further analysis. Finally, the 185 selected articles were analyzed in the study. The DNA present inside a cell is packaged into a vast number of individual genes and has instructions that communicate the cell's functions. [15] . DNA mutations are the reason for cancer development. The original functioning of the cells ultimately turns cancerous due to some error interruption in the multistage process [104, 185] . Figure 3 shows different factors that affect the spread of cancers. Tobacco, alcohol, improper diet, and few physical activities are the leading cancer risk factors worldwide. Some chronic infections are the risk factors for cancer and have major significance in low-and middle-income countries. While undergoing cancer treatment, one can experience many complications that affect the health of the patient. However, not all cancers are painful while undergoing cancer treatment, but they still may have to experience some pain. But there are few medications and other approaches that help treat cancer-related pain [129, 184] . During cancer, one can experience fatigue and many symptoms, but usually, it is manageable [3] . Tiredness happens because of radiation therapy or chemotherapy treatments,however, it is generally shortterm. Breathing is another complication because of cancer or cancer treatment [120] . However, treatments may bring relief whereas, some types of cancer and treatment of cancer can lead to nausea [34] . Cancerous cells deprive normal cells of required nutrients, which may ultimately cause a loss in weight. Majorly, even if nutrients are provided with the help [26] of artificial ways via tubes in the vein or stomach, it still does not impact the reduction of weight [169] , 21] . Cancer can also uplift severe complications because of the imbalance of the average chemical balance in the human body. Frequent urination, confusion, excessive thirst, and constipation might be the signs and symptoms of chemical imbalances [46] . In some instances, cancer can impact the body's immune system by attacking cancer cells to normal and fit cells. Paraneoplastic syndrome, a very uncommon reaction, can bring on several symptoms and signs like a problem in walk and seizures [7] . Cancer immensely affects the functioning of that body part as it may press on nearby nerves. It can cause headaches and signs and symptoms of stroke and maybe a weakness on one side of the human body if it involves the brain [47] . Suppose someone becomes successful in defeating once it may save one temporarily because cancer survivors always remain at the risk of occurrence [36] . So, the patient needs to hear from the doctor about the precautions. Doctors can develop a plan for the future, consisting of scans and examine at regular fixed intervals of time (in the months or years) after the patient's treatment to investigate radiation treatment: In a radiation treatment, cancerous cells are targeted [30, 54] . A significant fraction of cancer cases and deaths can be preventable by having an excellent epidemiological and mechanistic understanding of environmental and behavioral risk factors. Cancer therapeutics presently have the most minimal clinical preliminary achievement pace of every significant sickness. Due to the scarcity of successful anti-cancer drugs, malignant growth will be the leading source of mortality in created nations. As a sickness inserted in the essentials of our science, cancerous growth presents troublesome difficulties that would profit by joining specialists from a wide cross-segment of related and random fields [55] . Along with causes, we have factors for identifications of the initial staging of cancer. Diagnosing cancer at an early stage ultimately leads to higher survival rates, less morbidity, and less expensive treatment [27] . Three essential steps need to be taken in a well-timed way: • Alertness and get into precaution • Medical valuation, analysis, and staging • Get into therapeutics. The relevancy of early diagnosis is high in every situation and most cancers. Programs can be formulated to lessen hold-up in and obstruction to care, letting patients gain treatment well in time [31] . The section presents a description on the clinical practices applied in the medical sector for cancer prediction at present. The methodologies are described as follows: 1. Screening: Screening aims to find people of particular cancer or pre-cancer who have not developed any symptoms and direct them quickly for analysis and treatment. For the specific type of cancer, screening can be effective when tests are used according to the need and stages [149] . Moreover, screening is a more complicated process to follow than early diagnosis. Screening is of utmost necessary to have an accurate diagnosis [10] . The main reason behind every type of cancer is that cancer needs a unique treatment schedule that includes single or extra modalities, such as chemotherapy, surgical procedures, and radiotherapy [16] . The main aim is to treat the tumor and significantly extend lifespan because improving a patient's life is also an unforgettable target [ From the last couple of years, artificial intelligence has taken society's imagination and created interest in its potential to progress our lives [91] . Now the usage of AI has been increasing rampantly to uplift disease recognition, its management, and the ramification of therapies. Because of the growing number of patients identified with cancer and the ample amount of data gathered during the treatment process [77, 119] . It leads to the need for AI to improve oncologic care. Cancer prediction can diminish the mortality rate [57, 118] . The section consists of cancer diagnosis based on deep learning methods, medical imaging for cancer, the mortality rate for different cancers, cancer dataset, and automated and semi-automated methods for cancer detection. In clinical imaging, computer-aided detection (CADe) or computer-aided diagnosis (CADx) is the system-based framework that helps specialists to make decisions rapidly [70] . Medical imaging manages data in the picture that the clinical specialist and specialists need to assess and examine abnormality in a timeframe [182, 183] . Clinical images prepared with AI strategies can propel the exactness in various cancer growth stages [121] . In this way, early malignancy determination and recognition clinical imaging is a robust method. Without a doubt, clinical imaging has been generally utilized for early malignancy discovery, checking, and follow-up after the medicines [44, 101, 102] . Figure 4 shows different kinds of scans used for cancer diagnosis. A computed tomography (CT) scan can help doctors diagnose cancer and determine the shape and size of the tumor. Nuclear medicine scans can help medical experts determine cancer metastasis. The most common nuclear scans are bone scans, PET (positron emission tomography) scans, Thyroid scans, MUGA (multigated acquisition) scans, and gallium scans. MRI assists specialists with discovering malignancy in the body and search for signs that it has spread. X-ray additionally can help specialists plan malignant growth therapy, similar to medical procedure or radiation, and Mammograms are low-portion x-beams that can help discover breast disease. Detection of Cancer usually includes radiological imaging that examines the extent of cancer and improvement after treatment. Oncological imaging is constantly turning into more wide-ranging and precise [95] . Suberi et al. [162] proposed an image-based computer-aided system for cancer immunotherapy. The proposed approach enhanced the preparation of the vaccine with Dendritic Cells (DCs) immunotherapy. The study has incorporated various image-based algorithms have into the system with low computational time. Nirupama and Damodhar [126] predicted lung cancer using the MRI scans (Dicom images). Win et al. [171] developed a computer-aided decision system to detect the cancer cells in cytological pleural effusion images. Initially, median filtering and intensity adjustment were applied to enhance the quality of the picture. They used a hybrid segmentation method to extract cell nuclei based on simple linear iterative clustering and K-means clustering. In a K-means clustering algorithm, the error of each data point is computed using the distance (Euclidean) between the data point and nearest centroid as shown in Eq. (1), and further compute the total sum of the squared errors. In the Eq. (1), D , m, and n represent the objective function, the number of clusters, and number of cases, respectively. Also, x (i) j represents jth case of ith cluster and c i is the centroid for ith cluster. Another distance metric used in K-means clustering is cosine similarity, expressed mathematically in Eq. (2). In Eq. (2), a and b are the Euclidean norms of the vector a and vector b , respectively. Rosalidar et al. [140] presented the asymmetrical thermal distribution on breast thermograms using computer-assisted technology. The reported work has shown that the current neural learning models have increased the classification accuracy of breast cancer thermograms. Taher et al. [165] worked on the CAD system to diagnose lung cancer. They used the database of 100 sputum color images of different patients collected from the Tokyo Centre of lung cancer. The new CAD system processed the sputum images and classified them into benign or cancerous cells. Another factor observed in the study was the superior performance of Bayesian classification over the rule-based heuristic classification. The Bayesian algorithm works by computing posterior probabilities as shown in Eq. (3). In Eq. (3), f (c) and f (x) are the prior probability of class and predictor, respectively. Also, f (c|x) and f (x|c) denote the posterior probability of target (c ) given predictor (x ) and the probability of x given c , respectively. Naeem et al. [117] introduced the AI (ML) strategies for liver malignancy order using a fused dataset of two-dimensional (2D) computed tomography (CT) and attractive reverberation imaging (MRI). From that point, a combination of MRI and CTfilter datasets produced the fused optimized hybrid-feature dataset. The MLP has indicated a promising exactness of 99% among all the conveyed classifiers. Kalaiselvi et al. [80] have also proposed a fuzzy c-means method to detect automatic brain tumors from T2-weighted MRI brain images using the principle of modified minimum error thresholding (MET). Lee et al. [99] discovered the most widely recognized type of disease types, particularly breast malignancy, prostate disease, cellular breakdown in the lungs, and skin disease. A new proposed distributed computing structure has motivated the specialists to use the current deals with picture-based disease investigation and build up a more flexible CAD framework for discovery [87] . introduced an edge technique for sectioning mammographic pictures to identify Breast malignancy in its beginning phases. [127] evaluated a computer-aided diagnosis (CADx) system for lung nodule classification. The retrospective study hand-crafted imaging features with machine learning algorithms and compared support vector machine (SVM) and gradient tree boosting (XGBoost) as machine learning algorithms. Gradient boosting classifiers works by first computing the error done by each misclassified instance as shown in Eq. (4) and then increasing the weight of misclassified instances in the next layer as shown in Eq. (4). Here, E denotes the error, w is the weight associated with each instance and m is the size of the dataset, and p denotes the number of the weak learners. The hypothesis ℏ s m for each of the s instances is evaluated under the condition function C . The weight Updation formula is given in Eq. (5). Deep learning is a sub-part of AI, which falls under artificial intelligence. Deep learning is a technique that takes in the features from the data, for instance, text, pictures, or sound. Deep learning is one of the most significant attributes of AI [101, 102] . Traditional AI methodologies require gathering steps to achieve the portrayal task, including pre-getting ready, feature extraction, and wary selection of features, learning, and request [113] . The introduction of these systems is solidly dependent on the picked features, which may not be the right features to isolate between classes. At the same time, Deep learning engages the robotized learning of the capacities for different endeavors instead of standard AI methodology. It can achieve the learning and gathering in one shot [114] . Figure 5 shows the deep learning methods for cancer diagnosis and detection by analyzing the medical imaging in different steps. This section discusses the purpose of various deep learning models such as auto-encoder, transfer learning, Convolutional Neural Networks, Gradient Descent, Generative Adversarial Networks, and Boltzmann Machines for cancer diagnosis and detection. Yu et al. [178] built up an information-based discovery technique that utilized deep learning strategies for lincRNA discovery and created DNA genome examination [82] . Second, approving the commented on lincRNAs record locales and testing the presence of deep learning strategy by contrasting and customary procedures. For the primary objective, the auto-encoder method accomplished a 100% rate. An auto-encoder strategy is made out of three primary strides, as demonstrated in Fig. 6 : building, pre-preparing, and approving. The fundamental design, including an input layer, concealed layer, and initiation capacities, is fabricated in the initial step. Also, the encoder and the decoder are prepared layer by coating following the pre-arranged cycles. Thirdly, fine-grained preparing/approval is performed through the whole model. All in all, the initial step develops the fundamental system of the deep neural organization, the subsequent one trains the layer-wise hubs, and the last one moves through all layers for approval. Brosch et al. [35] described a method that learned the 3D brain image using a deep belief network. Their approach took low computational time and less memory. Kadam et al. [79] also proposed a feature ensemble learning based on Sparse Auto-encoders and Softmax Regression for classification of Breast Cancer into benign (non-cancerous) and malignant (cancerous). An Auto-encoder consists of an encoder part and a decoder part, an artificial neural network Fig. 6 Working of auto-encoder method [126] trained using unsupervised learning that applies the back-propagation approach. Sparse Auto-encoder (SA) is an Autoencoder imposed with sparseness constraints on all hidden nodes and the sparse penalty term. The cost function for training a Sparse Auto-encoder (given by Eq. (6) includes three attributes. The first term is called mean square error, which offers the discrepancy between input and reconstructs the whole training data. where = The coefficient for the L2 regularization term. Mean Squared Error computes the average squared difference between predicted and the actual value. MSE is expressed mathematically in Eq. (7) where G and G i are the vectors of observed and predicted values Li [100] also proposed a practical and self-interpretable invasive cancer diagnosis solution for the diagnosis of breast cancer. Also, Krithiga et al. [88] carried a systematic review on breast cancer that focused on the call for specific action in the diagnostic processes. Similarly, Bulten et al. [32] , Sajja et al. [145] also proposed a deep neural network based on Goog-leNet with a maximum dropout ratio to moderate the processing time for detection of lung cancer using CT scan images. In the proposed approach, 60% of neurons are at a fully connected layer with which higher drop rate than the existing GoogleNet. Experiments were conducted using the three pre-trained CNN architectures such as AlexNet, GoogleNet, and ResNet50 on LIDC pre-process dataset. ResNet50 produced the highest accuracy than the pre-trained architectures and the state-ofthe-art methods. The main components working behind the deep learning architecture are the "neurons" that compute average k vector values, and q denotes the column vector of weights. The working is mathematically expressed in Eq. (8). Further, bias (b) gets updated with each iteration and added to adjust the output, as shown in Eq. (9). The functioning of layer k is explained in Eq. (10), where g and a are the non-linear function and activation functions. = The coefficient for the sparsity regularization term. The function of each is further computed, as shown in Eq. (11). Kassani et al. [78] proposed a successful deep learningbased technique utilizing a DCNN descriptor and pooling activity to characterize breast malignancy. The creators likewise utilized diverse information enlargement strategies to help the exhibition of order and explored the impact of various stain standardization strategies. The proposed approach using the pre-prepared Xception model accomplished 92.50% order precision. Chen et al. [37] proposed a transfer learning-based depiction group (TLSE) strategy by incorporating preview outfit learning with move learning in a brought together and composed manner. Preview outfit gives troupe benefits inside a solitary model preparing methodology while moving learning centers around the little example issue in cervical cell arrangement. Figure 7 portrays the transfer learning-based approach ensemble strategy for cervical cell arrangement reason. The TLSE technique is assessed on a pap-smear dataset called Herlev dataset and is demonstrated to have a few superiorities over the leaving strategies. It shows that TLSE can improve the exactness with just one preparing measure for the little example in fine-grained cervical cells arrangement. Alzubaidi et al. [9] introduced a crossover deep convolutional neural organization to arrange hematoxylin-eosinstained bosom biopsy pictures into four classes: obtrusive carcinoma, in-situ carcinoma, kind tumor, and normal tissue. The model consolidated two ideas, which are equal convolutions with various channel sizes and leftover connections. The foundational layout of the proposed model has as conspicuous attributes a superior component portrayal and the mix of highlights at multiple levels. This study achieved a precision of 90% precision in predicting breast cancer. Sasikala et al. [151] performed the detection of skin cancer lesions as malignant (melanoma) or benign using the CNN. The system's performance was evaluated using the accuracy and error rate with varying learning rates. Hosny et al. [76] introduced a programmed skin injuries grouping framework with a higher characterization rate utilizing the hypothesis of move learning and the pre-prepared deep neural organization. The exchange learning has been applied to the Alex-net in various manners, including the arrangement layer with a softmax layer. The presentation of the framework is measured with the ISIC dataset and got 93% precision. Nivaashini and Soundariya [128] The proposed system uses a Deep Boltzmann Machine (DBM) to find an efficient set of features. Deep Neural Network (DNN) classifier is used to classify the tumor into benign or malignant breast cancer groups. The proposed system obtained a higher detection rate of 99.73% than the conventional machine learning models. Figure 8 shows the typical segmentation with Deep Learning: A Convolutional Neural Network (CNN) based model is discovered. It first packs up the source picture with a heap of various convolution, actuation, and pooling layers. The inverse operation extends the compacted latent representation. The organization is kept from start to finish trainable. At the test time, a forward pass gives the segmentation labels, which first packs the information picture measurements with a heap of convolutional and pooling layers. Altaf et al. [1] , Gomez et al. [59] also proposed a CNNbased breast disease diagnosis technique by utilizing thermal pictures. The creators showed that an all-around delimited data set split method is required to decrease the bias and overfitting during the training process. They likewise introduced the studies on the DMR-IR data set. Exploratory outcomes affirmed that the data set split approach limits the [1] overfitting and bias during training. The creators also passed on that state-of-the-art benchmark of CNN models, for example, ResNet, SeResNet, VGG16, Inception, Inception-ResNetV2, and Xception, the DMR-IR data set. Albahar [8] proposed a prediction model that grouped skin injuries into kind-hearted or harmful sores dependent on a novel regularize method. The proposed model accomplished a standard exactness of 97.49%, which indicated its prevalence over other state-of-the-art strategies. The presentation of CNN as far as AUC-ROC with an implanted novel regularizer was tried on various use cases. The Area under the curve (AUC) accomplished for nevus against melanoma sore is 77%. Ragab et al. [135] proposed a computer-aided diagnosis (CAD) structure for requesting thoughtful and undermining mass tumors in breast mammography pictures. The deep convolutional neural association (DCNN) is used to incorporate extraction. An outstanding DCNN design named AlexNet is used and is aligned to mastermind two classes instead of 1,000 classes. The last related convolution layer is associated with the support vector machine (SVM) classifier to improve exactness. The results are obtained using the going with transparently open datasets (1) the electronic informational index for screening mammography (DDSM) and (2) the Curated Breast Imaging Subset of DDSM (CBIS-DDSM). The mathematical working of linear, polynomial, and radial basis function (rbf) kernel is expressed in the Eqs. (12) , (13) , (14) , respectively. Here, k i andk j are n-dimensional inputs. Here, r is the constant and t is the degree of freedom. Here, is the free parameter. Saraf and Kalpana [148] presented the work for classifying the benign and the malignant thyroid nodules in ultrasound images. The author performed pre-processing, segmentation, feature extraction as well as the classification for thyroid detection. Edge detection techniques have been used for segmentation purposes and detected malignant nodule using ANN. Similarly, Dov et al. [51] also presented the work for predicting thyroid-malignancy from the ultrahigh-resolution whole-slide images of the cytopathology. A deep-learning-based algorithm has been used for the cytopathologist diagnosing the slides. The projected algorithm assigns the relevant image regions to the local malignancy scores, which are incorporated into global malignancy. The reported output of the presented work using the MIL method is 0.87 Area under the curve (AUC) and 0.743 average precision (AP). Ma et al. [106] also proposed that the CNN diagnose thyroid-based diseases using the SPECT images. The projected method used the modified DenseNet architecture as well as the improved training method. The accuracy achieved using the proposed method is 99.08% for Grave's disease, 99.25% for Hashimoto disease, and 99.67% for Subacute disease. Sokoutil et al. [161] presented the work for detecting tumors in the thyroid gland. The reported work depicts the image processing technique and the simple, intelligent system like the hill-climbing algorithm. Malathi et al. [107] presented the CNN method for the segmentation of brain tumors and achieved high prediction accurateness [132] , compared three segmentation algorithms and proposed a Random Forest (RF) classifier, and convolution neural network. RF and CNN yielded an average Dice's coefficient (DC) of 0.862 and 0.876, respectively. The RF classification method computes the information gain for a split using Entropy (E). Mathematically, E is expressed in Eq. (15) . Here, y is the number of classes (binary or multi) and n is the likelihood that an instance belongs to the class n. Image processing techniques have been widely used in various health sectors, especially detecting and diagnosing cancer early. Huidrom et al. [75] used Juxta-Pleural nodules inclusion which was a fully automated lung segmentation method, and it consisted of two main stages. In its first stage, the Lung region was extracted, also known as lung field extraction, followed by the second stage, lungs were segmented using boundary analysis and segmentation techniques. It has been observed that their proposed method yielded a better result than that of the existing ones. Whereas, Asideu et al. [12] proposed a technique in which automatic features were extracted and classified for acetic acid and Lugol's iodine cervigrams. The study employed various techniques for combining the features in cervigrams and used a support vector machine model to classify cervigrams. Cheng et al. [38] used a CAD system to detect and classify breast cancer. They did it in four stages, i.e., pre-processing, segmentation, feature extraction, and feature classification. Patil et al. [131] presented the automated system to build the mammogram breast detection model with improved hybrid classifiers. Image processing, tumor segmentation, feature extraction, and diagnosis are the welldesigned steps for detecting projected breast cancer. [122] launched automated multi-strategy-based lung nodule detection and the classification system, which contains the objective of the bogus positive decrease at the beginning phases. Cui et al. [41] proposed the strategy to perceive lung nodules The comparative analysis section highlighted the study of different researchers for cancer disease detection using AI techniques. The prediction outcomes are classified on basis of parameters such as accuracy, sensitivity/recall, precision, specificity, dice score, Area under the Curve. Figure 9 provides the description of multiple evaluation parameters. Table 1 comprises the comparative analysis based on multiple evaluation parameters for various cancer types. As shown in the comparative analysis, many research works have been analyzed for cancer diagnosis and detection using conventional machine and deep learning methods. It can be observed that most of the deep learning techniques have performed well and achieved high accurateness in terms of the prediction scores obtained. Also, most of the research articles have been published recently (2020). Also, most of the studies have worked on the diagnosis of breast cancer. In the current review, we have presented recently published research studies that employed AI-based Learning techniques for predicting malignancy. This study highlights research works related to cancer diagnosis prediction and predicting post-operative life expectancy of cancer patients using AI-based learning techniques. (17) . Other than breast and kidney, most researchers have worked on brain, colorectal, cervical, and prostate cancer prediction. Figure 11 depicts the distribution of the research works based on cancer sites. The type of data used to train the prediction model significantly affects the performance of the model. The reliability and the prediction outcomes are dependent on the data used to train the classification model. Most of the research studies reviewed in this paper has used Magnetic Resonance Imaging (MRI). The second most commonly used data is Computed Tomography (CT) scan images. Other image types like dermoscopic, mammographic, endoscopic, and pathological were also used in the literature. Figure 12 highlights the distribution of papers based on the type of data used to train the prediction model. (30) . There are few papers from the year 2021 as we could only extract papers published up to April 2021. Based on the analysis of Fig. 13 , we can conclude that number of research studies has increased gradually in recent years. • Regarding the specificity of the type of classification models used for specific cancer: Convolutional Neural Networks models have been used to predict almost every type of cancer such as brain, colorectal, skin, thyroid, and lungs. Most of the studies that explored the prediction of breast cancer diagnosis used hybrid modes or novel approaches for the purpose. Also, Neural networks have been applied to almost all breast and cervical cancer datasets. Regarding Stomach cancer, only Convolutional Neural Networks have been used. Support Vector machines have been used for the prediction of liver and breast cancer. In a nutshell, Convolutional Neural Networks can be applied with different datasets. Also, ensemble learners have been used with almost every kind of cancer. • Investigation 5: Challenges faced by the researchers in the construction of AI-based prediction models. Although AI-based techniques have marked their significance in the field of cancer prediction research, there are still many challenges faced by the researchers that need to be addressed. i. Limited Data size The most common challenge faced by most of the studies was insufficient data to train the model. A small sample size implies a smaller training set which does not authenticate the efficiency of the proposed approaches. Good sample size can train the model better than the limited one. ii. High dimensionality Another data-related issue faced in cancer research is high dimensionality. High dimensionality is referred to a vast number of features as compared to cases. However, multiple dimensionality reduction techniques [155] are available to deal with this issue. However, the requirement of a generic approach to handle this issue is there. iii. Class imbalance problem A leading challenge faced by medical data sets, especially cancer data, is the uneven distribution of classes. Class imbalance arises due to a miss-match of the sample size of each class. Classification models tend to be biased towards the class with a majority of samples. Most of the existing techniques handle the imbalance well on binary classes but fail in multi-class patterns. iv. Computational time About 90% of studies have endorsed deep learning approaches to predict cancer using medical images than other techniques. However, the deep learning-based approaches are highly complex. About 41% of the studies have used the CNN classifier, which has performed significantly but at the cost of high computational time and space. v. Efficient feature selection technique Many studies have achieved exceptional prediction outcomes. However, the requirement of a computationally effective feature selection method is still there to eradicate the data cleaning procedures while generating high cancer prediction accuracy. vi. Model Generalizability A shift in research towards improving the generalizability of the model is required. Most of the studies have proposed a prediction model that is validated on a single site. There is a need to validate the models on multiple sites that can help improve the model's generalizability. vii. Clinical Implementation AI-based models have proved their dominance in cancer research; still, the practical implementation of the models in the clinics is not incorporated. These models need to be validated in a clinical setting to assist the medical practitioner in affirming the diagnosis verdicts. This review study attempts to summarize the various research directions for AI-based cancer prediction models. AI has marked its significance in the area of healthcare, especially cancer prediction. The paper provides a critical and analytical examination of current state-of-the-art cancer diagnostic and detection analysis approaches-a thorough examination of the machine and deep learning models used in cancer early detection using medical imaging. The AI techniques play a significant role in early cancer prognosis and detection using machine and deep learning techniques for extracting and classifying the disease features. Our study concluded that most previous literature works employed deep learning techniques, especially Convolutional Neural Networks. Another significant factor noted in our study is that most studies have worked on breast cancer data. It was examined that when deep learning models are applied to pre-processed and segmented medical images, the images perform better in classification metrics such as AUC, Sensitivity, Dice-coefficient, and Accuracy. There is scope to work on early detection of head and neck cancers because less study has been conducted for both types of cancer. Also, the federated learning model can be used for cancer detection based on distributed datasets. hence, we intend to use a federated learning model for the detection of cancer disease by creating the decentralized training model for cancer datasets in remote places. This study highlights the challenges faced by the researchers in the construction of AI-based prediction models. Although multiple pieces of research have displayed significant results, there is still a need to address the challenges in cancer research in future. The authors declare no conflict of interest. Going deep in medical image analysis Breast cancer detection using image enhancement and segmentation algorithms Computer aided assessment of diagnostic images for epidemiological research Deep learning techniques for skin lesion analysis and melanoma cancer detection: a survey of stateof-the-art Artificial intelligence techniques for cancer detection and classificiation Lung cancer detection and classification with 3D convolutional neural network Skin Leison classification using convolutional neural network with novel regularizer Optimizing the performance of breast cancer classification by employing the same domain transfer learning from hybrid deep convolutional neural network model Mortality results from a randomized prostate cancer screening trail Computational models and simulations of cancer metastasis Development of algorithms for automated detection of cervical pre-cancers with a low -cost, point-of-care Global trend of breast cancer mortality rate: A 25-year study Breast tumor classification using an ensemble machine learning method Automatic human brain tumor detection in MRI image using template-based K means improved fuzzy C means clustering algorithm Machine learning approach for brain tumor detection A machine learning approach for the classification of kidney cancer subtypes using miRNA genome data A fully-automated deep learning pipeline for cervical cancer classification Computer-aided diagnosis systems for lung cancer: challenges and methodologies Deep learning for lung cancer detection and classification Automatic lung cancer prediction from chest X-ray images using the deep learning approach Deep learning with convolutional neural networks for identification of liver masses and hepatocellular carcinoma: a systematic review Benefits and harms of CT screening for lung cancer Lysine-specific histone demethylase 1A (LSD1) in cervical cancer Federated learning systems for healthcare: perspective and recent progress Screening for cervical cancer using automated analysis of PAP-smears Single circulating tumor cell detection and overall survival in nonmetastatic breast cancer Risk of second primary cancers in individuals diagnosed with index smoking-and nonsmoking-related cancers Abiraterone and increased survival in metastatic prostate cancer Oncology services in corona times: a flash interview among German cancer patients and their physicians Unsupervised prostate cancer detection on H&E using convolutional adversarial autoencoders Machine learning to predict occult nodal metastasis in early oral squamous cell carcinoma Thyroid cancer: zealous imaging has increased detection ad treatment oflow risk tumors Manifold learning of brain MRIs by deep learning Pharmacogenomics of breast cancer: highlighting CYP2D6 and tamoxifen Improving computeraided cervical cells classification using transfer learning based snapshot ensemble Automated breast cancer detection and classification using ultrasound images High precision localization of pulmonary nodules on chest CT utilizing axial slice number labels Automatic liver tumor segmentation in CT with fully convolutional neural networks and object-based postprocessing Automatic lung nodule detection using multi-scale dot nodule-enhancement filter and weighted support vector machines in chest computed tomography Deep learning based liver cancer detection using watershed transform and Gaussian mixture model techniques Learning where to attend with deep architectures for image tracking Classification of cervical cancer using artificial neural networks Automatic brain tumor detection and classification of grades of astrocytoma Biomakers for the early detection of acute kidney injury Detection of blast-related traumatic brain injury in US military personnel No end in sight: the skin cancer epidemic continues Breast cancer survival and incidence: 10 Years cancer registry data in the northwest, Iran Automatic brain tumor detection and segmentation using U-Net based fully convolutional networks A deep-learning algorithm for thyroid malignancy prediction from whole slide cytopathology images. 1-10 Breast cancer classification using machine learning Breast cancer: are long-term and intermittent endocrine therapies equally effective? Blood Plasma surface-enhanced Raman spectroscopy for non invasive optical detection of cervical cancer Estimates of worldwide burden of cancer in 2008: GLOBOCAN Automatic polyp detection in pillcam colon 2 capsule images and videos: preliminary feasibility report Automated detection of polyps in CT colonography images using deep learning algorithms in colon cancer diagnosis CNN based methodology for breast cancer diagnosis using thermal images. arXiv A joint deep learning approach for automated liver and tumor segmentation Deep learning based classification of ultrasound images for thyroid nodules: a large scale of pilot study Screening of cervical cancer by artificial intelligence based analysis of digitized papanicolaou-smear images A comprehensive data-level investigation of cancer diagnosis on imbalanced data Computational prediction of cervical cancer diagnosis using ensemble-based classification algorithm Epidemiology, incidence and mortality of thyroid cancer and their relationship with the human development index in the world: an ecology study Colorectal cancer epidemiology: incidence, mortality, survival, and risk factors The classification of renal cancer in 3-phase ct images using a deep learning method Skin cancer detection using convolutional neural network Integrated use of rough sets and artificial neural network for skin cancer disease classification Machine learning with autophagy-related proteins for discriminating renal cell carcinoma subtypes Oral cancer awareness campaign in Northern Germany: first positive trends in incidence and tumour stages Application of artificial intelligence using a convolutional neural network for detecting gastric cancer in endoscopic images Artificial intelligence and polyp detection Breast cancer and background parenchymal enhancement at breast magnetic resonance imaging: a meta-analysis Automated lung segmentation on computed tomography image for the diagnosis of lung cancer Classification of skin lesions using transfer learning and augmentation with Alexnet Artificial intelligence in gastric cancer: a systematic review Breast cancer diagnosis with transfer learning and global pooling. arXiv Breast cancer diagnosis using feature ensemble learning based on stacked sparse autoencoders and softmax regression A rapid automatic brain tumor detection method for MRI images using modified minimum error thresholding technique Automaic threedimensional cephalometric annotation system using three dimensional convolution neural networks Intellectual detection and validation of automated mammogram breast cancer images by multi-class SVM using deep learning classification Automated segmentation technique with self driven post processing for histopathological breast cancer images Classification of melanoma and nevus in digital images for diagnosis of skin cancer Deep learning for gastric pathology detection in endoscopic images Multi-categorical classification using deep learning applied to the diagnosis of gastric cancer Mammographic cancer detection using computer aided diagnosis system Breast cancer detection, segmentation and classification on histopathology images analysis: a systematic review PI3K pathway protein analyses in metastatic breast cancer patients receiving standard everolimus and exemestane Big data analytics and its benefits in healthcare Intelligent behavior of fog computing with IOT for healthcare system Segmentation and classification of cervical cells using deep learning Oral cancer analysis using machine learning techniques Cancer cells vs normal cell. Cancer research from technology networks Identification of prognostic biomarkers for major subtypes of non-smallcell lung cancer using genomic and clinical data Liver tumor segmentation from MR images using 3d fast marching algorithm and single hidden layer feedforward neural network Supervised classification of histopathological images using convolutional neuronal networks for gastric cancer detection Prostate cancer diagnosis using deep learning with 3D multiparametric MRI Computer-aided diagnosis in endoscopy: a novel application toward automatic detection of abnormal lesions on magnifying narrowband imaging endoscopy in the stomach Discriminative pattern mining for breast cancer histopathology image classification via fully convolutional Autoencoder. arXiv Evolving the pulmonary nodules diagnosis from classical approaches to deep learning-aided decision support: three decades' development course and future prospect MicroRNAs that regulate PTEN as potential biomarkers in colorectal cancer: a systematic review Image based computer aided diagnosis system for cancer detection Prometeo: a CNN-based computer-aided daignosis system for WSI prostate cancer detection Automated detection and segmentation of thoracic lymph nodes from CT using 3D foveal fully convolutional neural networks Thyroid diagnosis from SPECT images using convolutional neural network with optimization Brain tumour segmentation using convolutional neural network with tensor flow A method for melanoma skin cancer detection using dermoscopy images Automated detection of nonmelanoma skin cancer using digital images: a systematic review Exciting new advances in oral cancer diagnosis: avenues to early detection Automatic segmentation and analysis of thermograms using texture descriptors for breast cancer detection Classification using deep learning neural networks for brain tumors Cancer diagnosis using deep learning Deep learning-based breast cancer classification through medical imaging modalities: state of the art and research challenges Epidemiological characteristics of and risk factors for breast cancer in the world Nature meets nurture: molecular genetics of gastric cancer Machine learning based hybrid-feature analysis for liver cancer classification using fused images Skin cancer prevention practices among malignant melanoma survivors: a systematic review Detection of oral bacteria in cardiovascular specimens Ultraviolet radiation and skin cancer Robust machine learning for colorectal cancer risk prediction and stratification Automated lung nodule detection and classification using deep learning combined with multiple strategies Breast cancer detection using machine learning way Diagnosis and treatment of patient with thyroid cancer A deep learning method for lincRNA detection using auto-encoder algorithm A GSM based computer aided diagnosis system for lung cancer detection Computer aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization Deep Boltzmann machine based breast cancer risk detection for healthcare systems Automating Skin disease diagnosis using image classification Radiomics and deep learning: Hepatic applications Automated mammogram breast cancer detection using the optimized combination of convolutional and recurrent neural network Evaluation of commonly used algorithms for thyroid ultrasound images segmentation and improvement using machine learning approaches Automatic breast segmentation and cancer detection via SVM in mammograms Automatic diagnosis of skin cancer using neural networks Breast cancer detection using deep convolutional neural networks and support vector machines Methods used in computer aided daignosis for breast cancer detection using mammograms Automated liver tumor detection using markov random field segmentation Automatic diagnosis of liver tumor in CT images Deep learning for lung cancer nodules detection and classification in CT scans A review on recent progress in thermal imaging and deep learning approaches for breast cancer detection Automated cervical cancer detection using pap smear images Analysis of mortality and survival rate of liver cancer in Zhejiang .Province in China: A general population-based study Automated cervical cancer detection through RGVF segmentation and SVM classification Automated cervical cancer detection through rgvf segmentation and SVM classification A study of association of oncotype DX recurrence score with DCE-MRI characteristics using multivariate machine learning models Lung cancer detection based on CT scan images by using deep transfer learning Automatic detection of early gastric cancer in endoscopic images using a transferring convolutional neural network Kidney tumor segmentation using an ensembling multi-stage deep learning approach. A contribution to the KiTS19 challenge. 1-11 Thyroid cancer detection using image processing Segmentation of cervical cells for automated screening of cervical cancer: a review Lung Cancer detection and classification using deep CNN Towards Improving skin cancer detection using transfer learning Breast cancer detection in mammogram images using deep learning technique Lung cancer detection using image segmentation by means of various evolutionary algorithms Automatic lung cancer detection from CT image using improved deep neural network and ensemble classifier A systematic review of applications of machine learning in cancer prediction and diagnosis Brain tumor segmentation by cascaded deep neural networks using multiple image scales MRI-based radiomics approach for differentiation of hypovascular non-functional pancreatic neuroendocrine tumors and solid pseudopapillary neoplasms of the pancreas Automated detection and segmentation of early gastric cancer from endoscopic images using mask R-CNN Automatic colon polyp detection using region based deep CNN and post learning approaches Kidney tumor segmentation and detection on computed tomography data Computer aided diagnosis of thyroid cancer using image processing techniques Dendritic cell recognition in computer aided system for cancer immunotherapy Advanced morphological technique for automatic brain tumor detection and evaluation of statistical parameters Pan-renal cell carcinoma classification and survival prediction from histopathology images using deep learning Computer aided diagnosis system for early lung cancer detection Using machine learning to predict progression in the gastric precancerous process in a population from a developing country who underwent a gastroscopy for dyspeptic symptoms Generative adversarial neural networks for pigmented and non-pigmented skin lesions detection in clinical images Serum microRNA-29a is a promising novel marker for early detection of colorectal liver metastasis Hedgehog pathway": a potential target of itraconazole in the treatment of cancer Deep learning-based segmentation of the lung in MRimages acquired by a stack-of-spirals trajectory at ultra-short echo-times Computer-assisted screening for cervical cancer using digital image processing of pap smear images Computer aided diagnosis system for detection of cancer cells on cytological pleural effusion images Automatic classification of cervical cancer from cytological images by using convolutional neural network Incidence and mortality of kidney cancer: temporal patterns and global trends in 39 countries Development of a real-time endoscopic image diagnosis support system using deep learning technology in colonoscopy COVID-19 mortality in patients with cancer on chemotherapy or other anticancer treatments: a prospective cohort study Prostate cancer detection using deep convolutional neural networks A deep learning method for lincRNA detection using auto-encoder algorithm ) machine learning with applications in breast cancer diagnosis and prognosis Smart Software can diagnose prostate cancer as well as pathologist Optimization of the convolutional neural networks for automatic detection of skin cancer Low molecular weight heparin and cancer survival: clinical trials and experimental mechanisms Automatic detection and classification of colorectal polyps by transferring low-level CNN features from nonmedical domain Nanotechnology in cancer diagnosis: progress, challenges and opportunities Current challenges in cancer treatment Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations