Breast cancer classification using deep belief networks Expert Systems With Applications 46 (2016) 139–144 Contents lists available at ScienceDirect Expert Systems With Applications journal homepage: www.elsevier.com/locate/eswa Breast cancer classification using deep belief networks Ahmed M. Abdel-Zaher∗, Ayman M. Eldeib Department of Systems and Biomedical Engineering, Cairo University, Giza, Egypt a r t i c l e i n f o Keywords: Breast cancer diagnosis CAD Classification Deep learning based classifier Pattern recognition a b s t r a c t Over the last decade, the ever increasing world-wide demand for early detection of breast cancer at many screening sites and hospitals has resulted in the need of new research avenues. According to the World Health Organization (WHO), an early detection of cancer greatly increases the chances of taking the right decision on a successful treatment plan. The Computer-Aided Diagnosis (CAD) systems are applied widely in the detection and differential diagnosis of many different kinds of abnormalities. Therefore, improving the accuracy of a CAD system has become one of the major research areas. In this paper, a CAD scheme for detection of breast cancer has been developed using deep belief network unsupervised path followed by back propagation supervised path. The construction is back-propagation neural network with Liebenberg Marquardt learning function while weights are initialized from the deep belief network path (DBN-NN). Our technique was tested on the Wisconsin Breast Cancer Dataset (WBCD). The classifier complex gives an accuracy of 99.68% indicating promising results over previously-published studies. The proposed system provides an effective classification model for breast cancer. In addition, we examined the architecture at several train-test partitions. © 2015 Elsevier Ltd. All rights reserved. 1 n c c c f & w t C d p a e i w s c t i 2 ( 2 c i e t q L w c v a o ( t ( 9 t a m o h 0 . Introduction Breast cancer is the most common cancers among women with early 1.7 million new cases diagnosed in 2012 (Centers for disease ontrol and prevention, cancer prevention control, 2014) (World can- er research fund, 2014). Breast cancer represents 18.3% of the total ancer cases in Egypt. A percentage of 37.3% of breast cancer could be ully healed especially in case of early detection (Salama, Abdelhalim, Zeid, 2012). In Egypt and Arab countries, the breast cancer targets omen in the age of 30 and represents 42 cases per 100 thousand of he population (Salama et al., 2012). An accurate classifier is the most important component of any AD scheme that is developed to assist medical professionals in early etecting mammographic lesions. CAD systems are designed to sup- ort radiologists in the process of visually screening mammograms to void miss-diagnosis because of fatigue, eyestrain, or lack of experi- nce. The use of an accurate CAD system for early detection could def- nitely save precious lives. In this study, back propagation neural net- ork initialized by weights from a trained deep belief network with imilar architecture (DBN-NN) was used to diagnose the breast can- er. Our data source is the Wisconsin Breast Cancer Dataset (WBCD) aken from the University of California at Irvine (UCI) machine learn- ng repository (Wisconsin breast cancer dataset (WBCD) (original), 014). ∗ Corresponding author. Tel.: +20 1114488419. E-mail addresses: ahmedallah.m.s@gmail.com (A.M. Abdel-Zaher), eldeib@ieee.org A.M. Eldeib). n N w ttp://dx.doi.org/10.1016/j.eswa.2015.10.015 957-4174/© 2015 Elsevier Ltd. All rights reserved. . Background A variety of classification techniques were developed for breast ancer CAD systems. The accuracy of many of them was evaluated us- ng the dataset taken from the UCI machine-learning repository. For xample, Goodman, Boggess, and Watkins, tried different methods hat produced the following accuracies: optimized learning vector uantization (optimized-LVQ) method’s performance was 96.7%, big- VQ method reached 96.8%, and the last method, they proposed AIRS, hich depending on the artificial immune system, obtained 97.2% of lassification accuracy (Goodman, Boggess, & Watkins, 2002). Quinlan reached 94.74% classification accuracy using 10-fold cross alidation with C4.5 decision tree method (Quinlan, 1996). Abonyi nd Szeifert used Supervised Fuzzy Clustering (SFC) technique and btained 95.57% accuracy (Abonyi & Szeifert, 2003). Salama et al. 2012) performed an experiment on WBC dataset and results showed hat the fusion between MLP and J48 classifiers with feature selection PCA) is superior to the other classifiers. Hamilton, Shan, and Cercone (1996) with RIAC method obtained 6% accuracy. Polat and Günes (2007) examined the robustness of he least square Support Vector Machine (SVM) by using classification ccuracy, analysis of sensitivity and specificity, k-fold cross-validation ethod, and confusion matrix. They obtained classification accuracy f 98.53%. Nauck and Kruse (1999) obtained 95.06% with neuro-fuzzy tech- iques. Pauline and Santhakumaran used Feed Forward Artificial eural Networks and back propagation algorithm to train the net- ork (Pauline, 2011).The performance of the network is evaluated http://dx.doi.org/10.1016/j.eswa.2015.10.015 http://www.ScienceDirect.com http://www.elsevier.com/locate/eswa http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.10.015&domain=pdf mailto:ahmedallah.m.s@gmail.com mailto:eldeib@ieee.org http://dx.doi.org/10.1016/j.eswa.2015.10.015 140 A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144 Fig. 1. Confusion matrix of DBN-NN. i S t v s k a r v l m c & ( l h a l t t R d h b a b ( w s t a using Wisconsin breast cancer dataset for various training algo- rithms. The highest accuracy of 99.28% is achieved when using Lev- enberg Marquardt algorithm. The accuracy obtained by Pena-Reyes & Sipper (1999) was 97.36% using fuzzy-GA method. Akay (2009) combined SVM with fea- ture selection obtaining highest classification accuracy (99.51%) for SVM model that contains five features. Moreover, Setiono (2000) was reached 98.1% using the Neuro-rule method. Übeyli (2007) used SVM and obtain 99.54% accuracy at 37% train and 63% test partition. Mert, Kılıç, Bilgili, and Akan (2015)., explored features reduction properties of independent component analysis (ICA) on breast cancer decision support system. They proofed that a one-dimensional fea- tures vector obtained from (ICA) causes Radial Bases Function Neural Network (RBFNN) classifier to be more distinguishing with the in- creased accuracy from 87.17% to 90.49%. Nahato, Nehemiah, and Kannan (2015), used a rough set indis- cernibility relation method with back propagation neural network (RS-BPNN). This work has two stages. The first stage handles miss- ing values to obtain a smooth data set and to select appropriate at- tributes from the clinical dataset by indiscernibility relation method. The second stage is classification using back propagation neural net- work. The accuracy obtained from the proposed method was 98.6% on breast cancer dataset. Dheeba, Singh, and Selvi (2014), investigated a new classifica- tion approach for detection of breast abnormalities in digital mam- mograms using Particle Swarm Optimized Wavelet Neural Network (PSOWNN). The proposed abnormality detection algorithm is based on extracting Laws texture energy measures from mammograms and classifying the suspicious regions by applying a pattern classifier. They achieved 93.671%, 92.105% and 94.167% for accuracy, specificity, and sensitivity, respectively. In our study, we applied deep belief network (DBN) in an unsu- pervised phase to learn input features statistics of the original WBCD dataset. Then, we transferred the obtained network weight matrix of DBN to back propagation neural network with similar architecture to start the supervised phase. In supervised phase, we tested both conjugate gradient and Levenberg-Marquardt algorithm for learning back propagation neural network. 3. From back propagation (BP) to deep belief network (DBN) In 1985, the second-generation neural networks with back prop- agation algorithm have emerged. However, the learning algorithm struggle to adjust network weights so that output neurons state y rep- resent the learning example t. A common method for measuring the discrepancy between the expected output t and the actual output y is using the squared error measure: E = (t − y)2 (1) The change in weight, which is added to the old weight, is equal to the product of the learning rate and the gradient of the error function, multiplied by −1: �wi j = − ∂E ∂wi j (2) where almost all data is unlabeled. However, back propagation neural network requires a labeled training data. Therefore, the biggest issue with back propagation NN appears as its possibility to get stuck in poor local optima and the learning time is huge with multiple hidden layers. In 1963, Vapnik et al. invented the original support vector machine (SVM) algorithm. Boser, Guyon, and Vapnik (1992) suggested a way to create nonlinear classifiers by applying the kernel trick to maximum- margin hyperplanes. In classification task, the weight of each feature s computed by optimization technique. In non-linear classification, VMs can efficiently perform the task using what is called the kernel rick by mapping their inputs. The non-linear classification task con- erted to linear classification problem in high-dimensional feature paces. The biggest limitation of SVM approach lies in choice of the ernel. In practice, the most serious problem with SVMs is the high lgorithmic complexity and extensive memory requirements of the equired quadratic programming in large-scale tasks (Suykens, Hor- ath, Basu, Micchelli, & Vandewalle, 2003). In recent years, the attention has shifted to deep learning. Deep earning is a set of algorithms in machine learning that attempts to odel high-level abstractions in data by using model architectures omposed of multiple non-linear transformations (Bengio, Courville, Vincent, 2013; Schmidhuber, 2014). Restricted Boltzmann Machine RBM) is a generative stochastic artificial neural network that can earn a probability distribution over its set of inputs. On the other and, Deep Belief Network (DBN) is a generative graphical model, or lternatively a type of deep neural network, composed of multiple ayers of latent variables (“hidden units”), with connections between he layers but not between units within each layer (Hinton, 2009b). From Hinton’s perspective, the DBN can be viewed as a composi- ion of simple learning modules each of which is a restricted type of BM that contains a layer of visible units. This layer represents the ata. Another layer of hidden units represents features that capture igher-order correlations in the data. The two layers are connected y a matrix of symmetrically weighted connections (W) and there re no connections within a layer (Hinton, 2009b). The key idea behind DBN is its weight (w), learned by a RBM define oth p(v|h, w)and the prior distribution over hidden vectors p(h|w) Hinton, 2009b). The probability of generating a visible vector, can be ritten as p(v) = ∑ h (p(h|w) p(v|h, w)) (3) As the learning of DBN is a computational intensive task, Hinton howed that RBMs could be stacked and trained in a greedy manner o form the DBN (Hinton, Osindero, & Teh, 2006). He introduced a fast lgorithm for learning DBN. The weight update between visible v and A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144 141 Fig. 2. Confusion matrix of RIW-BPNN. h � w a r e r i e t i d r P b & D t b a s t b t p 4 2 i w o p c m w d A o ( t m c N n d ( F R t idden h unites simply as: wi j = ε(〈vi.h j〉0 − 〈vi.h j〉1) (4) here 0 and 1 in the above equation designate to the network data nd reconstruction state, respectively. DBN is competitive for five reasons: DBN can be fine-tuned as neu- al networks, DBN has many non-linear hidden layers, DBN is gen- ratively pre-trained beside it can act as non-linear dimensionality eduction for input features vector, and finally the network teacher s another sensory input. In addition, the real performance of DBN ncourages using DBN. For example, in pattern recognition applica- ion, Hinton reported that the generalization performance of the DBN ig. 3. Dual vertical bar represents misclassified sample percentage. Therefore, shorter ba IW-BPNN (conjugate gradient case). The horizontal axis concentrate on percentage from (6 he references to color in this figure legend, the reader is referred to the web version of this a s 1.25% errors on the 10000 digits of the MNIST handwritten digits atabase (Hinton, 2009a). The DBN’s performance beats the 1.5% er- or achieved by the best back propagation nets (Hinton et al., 2006). latt obtained 1.6% using back propagation while K-Nearest Neigh- or produced 3.3% classification error (Lecun, LeCun, Bottou, Bengio, Haffner, 2001). It is still better than the 1.4% errors reported by ecoste and Schölkopf (2002) for SVM on the same task. In addition, Hinton discussed the possibility of unsupervised raining DBN following by back propagation pass. This will give a etter accuracy if we had good data priors (Hinton, 2009a). In 2010, n experiment performed by Erhan et al. (2010) suggested that un- upervised pre-training in prior to supervised learning tasks guides he learning towards basins of attraction of minima that support etter generalization from the training dataset. The evidence from hese results supported a regularization explanation for the effect of re-training. . Experiment conditions and methodology We used Matlab 2014a and Palm DBN implementation (Palm, 012). After the deep belief network fully trained, we transferred ts weights matrix to native Matlab back propagation neural net- ork with similar architecture, i.e. same number of input-hidden- utput neurons, and then we performed several supervised back- ropagation paths. We applied this approach on Wisconsin breast ancer original database with nine features and two classes (benign, alignant). The accepted samples are 690 from 699. Nine samples ere rejected for incomplete features. We reduced the used samples own to 683 entries to compare our results easier with others, e.g. kay (2009) used 683 samples. The Sampling is repeated randomly Sub-sampling validation f (train+validate) quanta relative to test quanta at different train+validate) to test partitions, which varied from (0.5–99.5%) o (80–20%) while train–validate partition is fixed at 70–30%. Our ethodology is to calculate misclassified sample percentage from the onfusion matrix of Randomly Initialized Weight Back-Propagation eural Network (RIW-BPNN) side by side with back propagation eural network initialized by weights obtained from a trained eep belief network with similar architecture (DBN-NN) at different train+validate) to test partitions. r is the better. The first bar (left/blue) for DPN-NN and the second (right/red) is for 0.5–39.5%) to (64.5–35.5%) of (train+validate) to test partitions. (For interpretation of rticle.) 142 A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144 Fig. 4. Confusion matrix of DBN-NN. Fig. 5. Confusion matrix of RIW-BPNN. T c s ( The RIW-BPNN and DBN-NN architecture is nine inputs – four hidden – two hidden - one output. To increase classifier perfor- mance for both architectures, we test conjugate gradient back prop- agation and Levenberg–Marquardt in neural network learning phase. he experiment conducted by Pauline and Santhakumaran indi- ating Levenberg–Marquardt learning algorithm gives better clas- ifier accuracy when used with back-propagation neural network Paulin, 2011). A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144 143 Fig. 6. Dual vertical bar represents misclassified sample percentage. Therefore, shorter bar is the better. The first bar (left/blue) for DPN-NN and the second (right/red) is for RIW-BPNN (Levenberg-Marquardt case). The horizontal axis concentrate on percentage from (53.5–46.5%) to (57.2–42.8%) of (train+validate) to test partitions. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) Table 1 Summary table of classifiers performance. Classifiers Classifier performance Accuracy (%) Sensitivity (%) Specificity (%) Train+validate to test partition (%) Ls-SVM, Akay (2009) paper Wisconsin 683 entries 99.51 100 97.91 80–20% Übeyli (2007) - SVM 99.54 – – 37–63% Dheeba et al. (2014)) - PSOWNN 93.67 94.17 92.11 – Mert et al. (2015)) - ICA-RBFNN 90.49 – – – Nahato et al. (2015)) - RS-BPNN 98.6 – – – Our tested - RIW-BPNN Wisconsin - 683 entries conjugate gradient back propagation 98.86 100 98.22 61.35–38.65%. Our tested - DBN-NN Wisconsin 683 entries conjugate gradient back propagation 99.59 100 99.39 63.84–36.16% Our tested - RIW-BPNN Wisconsin - 683 entries Levenberg–Marquardt 99.03 99.13 98.97 54.76–45.24% Our tested - DBN-NN Wisconsin 683 entries Levenberg–Marquardt 99.68 100 99.47 54.9–45.1% 5 5 m ( s c I a ( 6 a ( s t 5 a 9 . Results .1. Scaled conjugate gradient back propagation Figs. 1 and 2 show the confusion matrix obtained from experi- ent as (train+validate) to test partitions varied from (0.5–99.5%) to 80–20%), while train–validate partition is fixed at 70–30%. The re- ults show that the best classifier accuracy was 99.59% for DBN-NN omplex at (train+validate) to test partition equals to 63.84–36.16%. n comparison, the best accuracy of RIW-BPNN was 98.86% reached t ((train+validate) to test partition equals to 61.35–38.65%. At the best accuracy of DBN-NN, the total test samples (1−0.63836) ∗ 683)) =247 samples and True positive (TP) = 82, True negative (TN) = 164, False Positive (FP) = 1, False negative (FN) = 0 Sensitivity = 100 ∗ TP/(TP+FN) = 100% Specificity = 100 ∗ TN/(TN+FP) = 100 ∗ 164/165 = 99.39% At the best accuracy of RIW-BPNN, the total test samples ((1−0. 134) ∗ 683)) = 264 samples and True positive (TP) = 95, True negative (TN) = 166, False Positive (FP) = 3, False negative (FN) = 0 Sensitivity = 100 ∗ TP/(TP+FN) = 100% Specificity = 100 ∗ TN/(TN+FP) = 100 ∗ 166/169 = 98.22% Fig. 3 demonstrates part of the sample partition domain nd shows the accuracy reach 99.6% for DBN-NN complex at train+validate) to test partition equals to 63.84–36.16%. In compari- on with RIW-BPNN, best accuracy 98.86% reached at (train+validate) o test partition equals to 61.35–38.65%. .2. Levenberg–Marquardt Figs. 4 and 5 show the confusion matrix obtained from experiment t several (train+validate) to test partitions, which varied from (0.5– 9.5%) to (80–20%), while train - validate partition is fixed at 70–30%. 144 A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144 c t p R A A B B C D D E G H H H H L M N N P P P P Q S S S S Ü W W The best classifier accuracy of DBN-NN complex was 99.68% obtained at (train+validate) to test partition equals to 54.9–45.1% with misclas- sified sample percentage reached 0.32%. In comparison, RIW-BPNN best accuracy reached 99.03% with misclassified sample percentage 0.97% obtained at (train+validate) to test partition equals to 54.76– 45.24%. At the best accuracy of DBN-NN, the total test samples ((1−0.549) ∗ 683)) = 308 samples and True positive (TP) = 118, True negative (TN) = 189, False Positive (FP) = 1, False negative (FN) = 0 Sensitivity = 100 ∗ TP/(TP+FN) = 100% Specificity = 100 ∗ TN/(TN+FP) = 100 ∗ 189/190 = 99.47% At the best accuracy of RIW-BPNN, the total test samples were ((1−0. 548) ∗ 683)) = 309 samples and TP = 114, TN = 192, FP = 2, FN = 1 Sensitivity = 100 ∗ TP/(TP+FN) = 99.130% Specificity = 100 ∗ TN/(TN+FP) = 100 ∗ 192/194 = 98.969% From Fig. 6, we can observe that the accuracy of DPN-NN complex reached 99.68% obtained at (train+validate) to test partition equals 54.9–45.1%. Table 1 summarizes different classifier performance including our technique. 6. Conclusion In this research, we presented an automatic diagnosis system for detecting breast cancer based on DBN unsupervised pre-training phase followed by a supervised back propagation neural network phase (DBN-NN). The pre-trained back propagation neural network with unsupervised phase DBN achieves higher classification accuracy in comparison to a classifier with just one supervised phase. The ra- tionale behind this enhancement could be that the learning of input statistics from input feature space by DBN phase initializes back prop- agation neural network to search objective function near a good local optima in supervised learning phase. From our experiment at the specified network architecture, DBN-NN complex accuracy outperforms RIW-BPNN when back propagation neural network uses conjugate gradient algorithm for learning. DBN-NN still outperforms RIW-BPNN when we use Levenberg-Marquardt for training in back propagation neural net- work phase. The enhancement of overall neural network accuracy is reaching 99.68% with 100% sensitivity and 99.47% specificity in breast cancer case. Results show classifier performance improvements over previous studies. Although Hinton developed fast algorithm for training DBN, DBN’s learning process still require substantial computational effort on legacy hardware. Therefore, the main limitation/challenge of our ap- proach is to build a CAD scheme based on DBN using commercial hardware to assist medical professionals in the early detection pro- cess of breast abnormality. Future research effort should be allocated for evaluating such classifier complex for auto diagnosis of other abnormalities such as epilepsy based on EEG dataset, cardiac arrhythmia, and dia- betic retinopathy (DR). Further, the presence of general-purpose omputing on graphics processing units (GPGPU) and the distribu- ion nature of DBN may also encourage the developments of efficient arallel algorithm for learning such classifier. eferences bonyi, J., & Szeifert, F. (2003). Supervised fuzzy clustering for the identification of fuzzy classifiers. Pattern Recognition Letters, 24, 2195–2207. kay, M. F. (2009). Support vector machines combined with feature selection for breast cancer diagnosis. Expert Systems With Applications, 36, 3240–3247. engio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35, 1798–1828. oser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin classifiers. In Proceedings of the 5th annual ACM workshop on computational learning theory (pp. 144–152). ACM Press. enters for disease control and prevention (2014) . Cancer prevention control. Accessed 03.09.14. ecoste, D., & Schölkopf, B. (2002). Training Invariant Support Vector Machines. Ma- chine Learning, 46, 161–190. heeba, J., Singh, N. A., & Selvi, S. T. (2014). Computer-aided detection of breast cancer on mammograms: A swarm intelligence optimized wavelet neural network ap- proach. Journal of Biomedical Informatics, 49, 45–52. rhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11, 625–660. oodman, D. E., Boggess, L. C., & Watkins, a. B. (2002). Artificial immune system clas- sification of multiple-class problems. In Proceedings of the intelligent engineering systems (pp. 179–184). ASME. amilton, H.J., Shan, N., & Cercone, N. (1996). RIAC: A Rule Induction Algorithm Based on Approximate Classification. inton, G.E.(2009a).Deep belief nets. Accessed 03.09.14. inton, G.E.(2009b). Deep belief networks. Accessed 03.09. 14. inton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527–1554. ecun, Y., LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (2001). Gradient-based learning applied to document recognition. Intelligent Signal Processing (pp. 306–351). IEEE Press. ert, A., Kılıç, N. Z., Bilgili, E., & Akan, A. (2015). Breast cancer detection with reduced feature set. Computational and Mathematical Methods in Medicine, 1–11. ahato, K. B., Nehemiah, H. K., & Kannan, A. (2015). Knowledge mining from clini- cal datasets using rough sets and backpropagation neural network. Computational Mathematical Methods in Medicine, 1–13. auck, D., & Kruse, R. (1999). Obtaining interpretable fuzzy classification rules from medical data. Artificial Intelligence in Medicine, 16, 149–169. alm, R.B.(2012). Deep learning toolbox. Accessed 03.09.14. aulin, F. (2011). Classification of breast cancer by comparing backpropagation training algorithm. Intenational Journal on Computer Science and Engineering, 3, 327–332. ena-Reyes, C. A., & Sipper, M. (1999). A fuzzy-genetic approach to breast cancer diag- nosis. Artificial Intelligence in Medicine, 17, 131–155. olat, K., & Günes, S. (2007). Breast cancer diagnosis using least square support vector machine. Digital Signal Processing, 17, 694–701. uinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90. alama, G. I., Abdelhalim, M. B., & Zeid, M. A. (2012). Breast Cancer diagnosis on three different datasets using multi-classifiers. International Journal of Computer and In- formation Technology, 1, 36–43. chmidhuber, J. (2014). Deep learning in neural networks: An overview. Neural Net- works, 61C, 85–117. etiono, R. (2000). Generating concise and accurate classification rules for breast can- cer diagnosis. Artificial Intelligence in Medicine, 18, 205–219. uykens, J. A. K., Horvath, G., Basu, S., Micchelli, C., & Vandewalle, J. (2003), Advances in Learning Theory: Vol. 190 p. 392. IOS Press. beyli, E. D. (2007). Implementing automated diagnostic systems for breast cancer de- tection. Expert Systems With Applications, 33, 1054–1062. isconsin breast cancer dataset (WBCD) (2014). (original). Accessed 03.09.14 . orld cancer research fund. (2014). Accessed 03.09.14. http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0001 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0002 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004 http://www.cdc.gov/cancer/dcpc/data/women.htm http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0005 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009 http://www.cs.toronto.edu/~hinton/nipstutorial/nipstut3.pdf http://www.scholarpedia.org/article/Deep_belief_networks http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0014 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0014 https://github.com/rasmusbergpalm/DeepLearnToolbox http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0015 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0016 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0017 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0018 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0020 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0021 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0006 http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0006 https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29 http://www.wcrf.org/int/cancer-facts-figures/data-specific-cancers/breast-cancer-statistics Breast cancer classification using deep belief networks 1 Introduction 2 Background 3 From back propagation (BP) to deep belief network (DBN) 4 Experiment conditions and methodology 5 Results 5.1 Scaled conjugate gradient back propagation 5.2 Levenberg-Marquardt 6 Conclusion References