Breast cancer classification using deep belief networks


Expert Systems With Applications 46 (2016) 139–144

Contents lists available at ScienceDirect

Expert Systems With Applications

journal homepage: www.elsevier.com/locate/eswa

Breast cancer classification using deep belief networks

Ahmed M. Abdel-Zaher∗, Ayman M. Eldeib
Department of Systems and Biomedical Engineering, Cairo University, Giza, Egypt

a r t i c l e i n f o

Keywords:

Breast cancer diagnosis

CAD

Classification

Deep learning based classifier

Pattern recognition

a b s t r a c t

Over the last decade, the ever increasing world-wide demand for early detection of breast cancer at many

screening sites and hospitals has resulted in the need of new research avenues. According to the World Health

Organization (WHO), an early detection of cancer greatly increases the chances of taking the right decision on

a successful treatment plan. The Computer-Aided Diagnosis (CAD) systems are applied widely in the detection

and differential diagnosis of many different kinds of abnormalities. Therefore, improving the accuracy of

a CAD system has become one of the major research areas. In this paper, a CAD scheme for detection of

breast cancer has been developed using deep belief network unsupervised path followed by back propagation

supervised path. The construction is back-propagation neural network with Liebenberg Marquardt learning

function while weights are initialized from the deep belief network path (DBN-NN). Our technique was tested

on the Wisconsin Breast Cancer Dataset (WBCD). The classifier complex gives an accuracy of 99.68% indicating

promising results over previously-published studies. The proposed system provides an effective classification

model for breast cancer. In addition, we examined the architecture at several train-test partitions.

© 2015 Elsevier Ltd. All rights reserved.

1

n

c

c

c

f

&

w

t

C

d

p

a

e

i

w

s

c

t

i

2

(

2

c

i

e

t

q

L

w

c

v

a

o

(

t

(

9

t

a

m

o

h

0

. Introduction

Breast cancer is the most common cancers among women with

early 1.7 million new cases diagnosed in 2012 (Centers for disease

ontrol and prevention, cancer prevention control, 2014) (World can-

er research fund, 2014). Breast cancer represents 18.3% of the total

ancer cases in Egypt. A percentage of 37.3% of breast cancer could be

ully healed especially in case of early detection (Salama, Abdelhalim,

Zeid, 2012). In Egypt and Arab countries, the breast cancer targets

omen in the age of 30 and represents 42 cases per 100 thousand of

he population (Salama et al., 2012).

An accurate classifier is the most important component of any

AD scheme that is developed to assist medical professionals in early

etecting mammographic lesions. CAD systems are designed to sup-

ort radiologists in the process of visually screening mammograms to

void miss-diagnosis because of fatigue, eyestrain, or lack of experi-

nce. The use of an accurate CAD system for early detection could def-

nitely save precious lives. In this study, back propagation neural net-

ork initialized by weights from a trained deep belief network with

imilar architecture (DBN-NN) was used to diagnose the breast can-

er. Our data source is the Wisconsin Breast Cancer Dataset (WBCD)

aken from the University of California at Irvine (UCI) machine learn-

ng repository (Wisconsin breast cancer dataset (WBCD) (original),

014).
∗ Corresponding author. Tel.: +20 1114488419.
E-mail addresses: ahmedallah.m.s@gmail.com (A.M. Abdel-Zaher), eldeib@ieee.org

A.M. Eldeib).

n

N

w

ttp://dx.doi.org/10.1016/j.eswa.2015.10.015

957-4174/© 2015 Elsevier Ltd. All rights reserved.
. Background

A variety of classification techniques were developed for breast

ancer CAD systems. The accuracy of many of them was evaluated us-

ng the dataset taken from the UCI machine-learning repository. For

xample, Goodman, Boggess, and Watkins, tried different methods

hat produced the following accuracies: optimized learning vector

uantization (optimized-LVQ) method’s performance was 96.7%, big-

VQ method reached 96.8%, and the last method, they proposed AIRS,

hich depending on the artificial immune system, obtained 97.2% of

lassification accuracy (Goodman, Boggess, & Watkins, 2002).

Quinlan reached 94.74% classification accuracy using 10-fold cross

alidation with C4.5 decision tree method (Quinlan, 1996). Abonyi

nd Szeifert used Supervised Fuzzy Clustering (SFC) technique and

btained 95.57% accuracy (Abonyi & Szeifert, 2003). Salama et al.

2012) performed an experiment on WBC dataset and results showed

hat the fusion between MLP and J48 classifiers with feature selection

PCA) is superior to the other classifiers.

Hamilton, Shan, and Cercone (1996) with RIAC method obtained

6% accuracy. Polat and Günes (2007) examined the robustness of

he least square Support Vector Machine (SVM) by using classification

ccuracy, analysis of sensitivity and specificity, k-fold cross-validation

ethod, and confusion matrix. They obtained classification accuracy

f 98.53%.

Nauck and Kruse (1999) obtained 95.06% with neuro-fuzzy tech-

iques. Pauline and Santhakumaran used Feed Forward Artificial

eural Networks and back propagation algorithm to train the net-

ork (Pauline, 2011).The performance of the network is evaluated

http://dx.doi.org/10.1016/j.eswa.2015.10.015
http://www.ScienceDirect.com
http://www.elsevier.com/locate/eswa
http://crossmark.crossref.org/dialog/?doi=10.1016/j.eswa.2015.10.015&domain=pdf
mailto:ahmedallah.m.s@gmail.com
mailto:eldeib@ieee.org
http://dx.doi.org/10.1016/j.eswa.2015.10.015


140 A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144

Fig. 1. Confusion matrix of DBN-NN.

i

S

t

v

s

k

a

r

v

l

m

c

&

(

l

h

a

l

t

t

R

d

h

b

a

b

(

w

s

t

a

using Wisconsin breast cancer dataset for various training algo-

rithms. The highest accuracy of 99.28% is achieved when using Lev-

enberg Marquardt algorithm.

The accuracy obtained by Pena-Reyes & Sipper (1999) was 97.36%

using fuzzy-GA method. Akay (2009) combined SVM with fea-

ture selection obtaining highest classification accuracy (99.51%) for

SVM model that contains five features. Moreover, Setiono (2000)

was reached 98.1% using the Neuro-rule method. Übeyli (2007)

used SVM and obtain 99.54% accuracy at 37% train and 63% test

partition.

Mert, Kılıç, Bilgili, and Akan (2015)., explored features reduction

properties of independent component analysis (ICA) on breast cancer

decision support system. They proofed that a one-dimensional fea-

tures vector obtained from (ICA) causes Radial Bases Function Neural

Network (RBFNN) classifier to be more distinguishing with the in-

creased accuracy from 87.17% to 90.49%.

Nahato, Nehemiah, and Kannan (2015), used a rough set indis-

cernibility relation method with back propagation neural network

(RS-BPNN). This work has two stages. The first stage handles miss-

ing values to obtain a smooth data set and to select appropriate at-

tributes from the clinical dataset by indiscernibility relation method.

The second stage is classification using back propagation neural net-

work. The accuracy obtained from the proposed method was 98.6%

on breast cancer dataset.

Dheeba, Singh, and Selvi (2014), investigated a new classifica-

tion approach for detection of breast abnormalities in digital mam-

mograms using Particle Swarm Optimized Wavelet Neural Network

(PSOWNN). The proposed abnormality detection algorithm is based

on extracting Laws texture energy measures from mammograms and

classifying the suspicious regions by applying a pattern classifier.

They achieved 93.671%, 92.105% and 94.167% for accuracy, specificity,

and sensitivity, respectively.

In our study, we applied deep belief network (DBN) in an unsu-

pervised phase to learn input features statistics of the original WBCD

dataset. Then, we transferred the obtained network weight matrix of

DBN to back propagation neural network with similar architecture

to start the supervised phase. In supervised phase, we tested both

conjugate gradient and Levenberg-Marquardt algorithm for learning

back propagation neural network.

3. From back propagation (BP) to deep belief network (DBN)

In 1985, the second-generation neural networks with back prop-

agation algorithm have emerged. However, the learning algorithm

struggle to adjust network weights so that output neurons state y rep-

resent the learning example t. A common method for measuring the

discrepancy between the expected output t and the actual output y is

using the squared error measure:

E = (t − y)2 (1)

The change in weight, which is added to the old weight, is equal to

the product of the learning rate and the gradient of the error function,

multiplied by −1:

�wi j = −
∂E

∂wi j
(2)

where almost all data is unlabeled. However, back propagation neural

network requires a labeled training data. Therefore, the biggest issue

with back propagation NN appears as its possibility to get stuck in

poor local optima and the learning time is huge with multiple hidden

layers.

In 1963, Vapnik et al. invented the original support vector machine

(SVM) algorithm. Boser, Guyon, and Vapnik (1992) suggested a way to

create nonlinear classifiers by applying the kernel trick to maximum-

margin hyperplanes. In classification task, the weight of each feature
s computed by optimization technique. In non-linear classification,

VMs can efficiently perform the task using what is called the kernel

rick by mapping their inputs. The non-linear classification task con-

erted to linear classification problem in high-dimensional feature

paces. The biggest limitation of SVM approach lies in choice of the

ernel. In practice, the most serious problem with SVMs is the high

lgorithmic complexity and extensive memory requirements of the

equired quadratic programming in large-scale tasks (Suykens, Hor-

ath, Basu, Micchelli, & Vandewalle, 2003).

In recent years, the attention has shifted to deep learning. Deep

earning is a set of algorithms in machine learning that attempts to

odel high-level abstractions in data by using model architectures

omposed of multiple non-linear transformations (Bengio, Courville,

Vincent, 2013; Schmidhuber, 2014). Restricted Boltzmann Machine

RBM) is a generative stochastic artificial neural network that can

earn a probability distribution over its set of inputs. On the other

and, Deep Belief Network (DBN) is a generative graphical model, or

lternatively a type of deep neural network, composed of multiple

ayers of latent variables (“hidden units”), with connections between

he layers but not between units within each layer (Hinton, 2009b).

From Hinton’s perspective, the DBN can be viewed as a composi-

ion of simple learning modules each of which is a restricted type of

BM that contains a layer of visible units. This layer represents the

ata. Another layer of hidden units represents features that capture

igher-order correlations in the data. The two layers are connected

y a matrix of symmetrically weighted connections (W) and there

re no connections within a layer (Hinton, 2009b).

The key idea behind DBN is its weight (w), learned by a RBM define

oth p(v|h, w)and the prior distribution over hidden vectors p(h|w)

Hinton, 2009b). The probability of generating a visible vector, can be

ritten as

p(v) =
∑

h

(p(h|w) p(v|h, w)) (3)

As the learning of DBN is a computational intensive task, Hinton

howed that RBMs could be stacked and trained in a greedy manner

o form the DBN (Hinton, Osindero, & Teh, 2006). He introduced a fast

lgorithm for learning DBN. The weight update between visible v and


A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144 141

Fig. 2. Confusion matrix of RIW-BPNN.

h

�

w

a

r

e

r

i

e

t

i

d

r

P

b

&

D

t

b

a

s

t

b

t

p

4

2

i

w

o

p

c

m

w

d

A

o

(

t

m

c

N

n

d

(

F

R

t

idden h unites simply as:

wi j = ε(〈vi.h j〉0 − 〈vi.h j〉1) (4)
here 0 and 1 in the above equation designate to the network data

nd reconstruction state, respectively.

DBN is competitive for five reasons: DBN can be fine-tuned as neu-

al networks, DBN has many non-linear hidden layers, DBN is gen-

ratively pre-trained beside it can act as non-linear dimensionality

eduction for input features vector, and finally the network teacher

s another sensory input. In addition, the real performance of DBN

ncourages using DBN. For example, in pattern recognition applica-

ion, Hinton reported that the generalization performance of the DBN
ig. 3. Dual vertical bar represents misclassified sample percentage. Therefore, shorter ba

IW-BPNN (conjugate gradient case). The horizontal axis concentrate on percentage from (6

he references to color in this figure legend, the reader is referred to the web version of this a
s 1.25% errors on the 10000 digits of the MNIST handwritten digits

atabase (Hinton, 2009a). The DBN’s performance beats the 1.5% er-

or achieved by the best back propagation nets (Hinton et al., 2006).

latt obtained 1.6% using back propagation while K-Nearest Neigh-

or produced 3.3% classification error (Lecun, LeCun, Bottou, Bengio,

Haffner, 2001). It is still better than the 1.4% errors reported by

ecoste and Schölkopf (2002) for SVM on the same task.

In addition, Hinton discussed the possibility of unsupervised

raining DBN following by back propagation pass. This will give a

etter accuracy if we had good data priors (Hinton, 2009a). In 2010,

n experiment performed by Erhan et al. (2010) suggested that un-

upervised pre-training in prior to supervised learning tasks guides

he learning towards basins of attraction of minima that support

etter generalization from the training dataset. The evidence from

hese results supported a regularization explanation for the effect of

re-training.

. Experiment conditions and methodology

We used Matlab 2014a and Palm DBN implementation (Palm,

012). After the deep belief network fully trained, we transferred

ts weights matrix to native Matlab back propagation neural net-

ork with similar architecture, i.e. same number of input-hidden-

utput neurons, and then we performed several supervised back-

ropagation paths. We applied this approach on Wisconsin breast

ancer original database with nine features and two classes (benign,

alignant). The accepted samples are 690 from 699. Nine samples

ere rejected for incomplete features. We reduced the used samples

own to 683 entries to compare our results easier with others, e.g.

kay (2009) used 683 samples.

The Sampling is repeated randomly Sub-sampling validation

f (train+validate) quanta relative to test quanta at different

train+validate) to test partitions, which varied from (0.5–99.5%)

o (80–20%) while train–validate partition is fixed at 70–30%. Our

ethodology is to calculate misclassified sample percentage from the

onfusion matrix of Randomly Initialized Weight Back-Propagation

eural Network (RIW-BPNN) side by side with back propagation

eural network initialized by weights obtained from a trained

eep belief network with similar architecture (DBN-NN) at different

train+validate) to test partitions.
r is the better. The first bar (left/blue) for DPN-NN and the second (right/red) is for

0.5–39.5%) to (64.5–35.5%) of (train+validate) to test partitions. (For interpretation of

rticle.)


142 A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144

Fig. 4. Confusion matrix of DBN-NN.

Fig. 5. Confusion matrix of RIW-BPNN.

T

c

s

(

The RIW-BPNN and DBN-NN architecture is nine inputs – four

hidden – two hidden - one output. To increase classifier perfor-

mance for both architectures, we test conjugate gradient back prop-

agation and Levenberg–Marquardt in neural network learning phase.
he experiment conducted by Pauline and Santhakumaran indi-

ating Levenberg–Marquardt learning algorithm gives better clas-

ifier accuracy when used with back-propagation neural network

Paulin, 2011).


A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144 143

Fig. 6. Dual vertical bar represents misclassified sample percentage. Therefore, shorter bar is the better. The first bar (left/blue) for DPN-NN and the second (right/red) is for

RIW-BPNN (Levenberg-Marquardt case). The horizontal axis concentrate on percentage from (53.5–46.5%) to (57.2–42.8%) of (train+validate) to test partitions. (For interpretation

of the references to color in this figure legend, the reader is referred to the web version of this article.)

Table 1

Summary table of classifiers performance.

Classifiers Classifier performance

Accuracy

(%)

Sensitivity

(%)

Specificity

(%)

Train+validate to

test partition (%)

Ls-SVM, Akay (2009) paper Wisconsin

683 entries

99.51 100 97.91 80–20%

Übeyli (2007) - SVM 99.54 – – 37–63%

Dheeba et al. (2014)) - PSOWNN 93.67 94.17 92.11 –

Mert et al. (2015)) - ICA-RBFNN 90.49 – – –

Nahato et al. (2015)) - RS-BPNN 98.6 – – –

Our tested - RIW-BPNN Wisconsin - 683

entries conjugate gradient back

propagation

98.86 100 98.22 61.35–38.65%.

Our tested - DBN-NN Wisconsin 683

entries conjugate gradient back

propagation

99.59 100 99.39 63.84–36.16%

Our tested - RIW-BPNN Wisconsin - 683

entries Levenberg–Marquardt

99.03 99.13 98.97 54.76–45.24%

Our tested - DBN-NN Wisconsin 683

entries Levenberg–Marquardt

99.68 100 99.47 54.9–45.1%

5

5

m

(

s

c

I

a

(

6

a

(

s

t

5

a

9

. Results

.1. Scaled conjugate gradient back propagation

Figs. 1 and 2 show the confusion matrix obtained from experi-

ent as (train+validate) to test partitions varied from (0.5–99.5%) to

80–20%), while train–validate partition is fixed at 70–30%. The re-

ults show that the best classifier accuracy was 99.59% for DBN-NN

omplex at (train+validate) to test partition equals to 63.84–36.16%.

n comparison, the best accuracy of RIW-BPNN was 98.86% reached

t ((train+validate) to test partition equals to 61.35–38.65%.

At the best accuracy of DBN-NN, the total test samples

(1−0.63836) ∗ 683)) =247 samples and

True positive (TP) = 82, True negative (TN) = 164,
False Positive (FP) = 1, False negative (FN) = 0
Sensitivity = 100 ∗ TP/(TP+FN) = 100%
Specificity = 100 ∗ TN/(TN+FP) = 100 ∗ 164/165 = 99.39%
At the best accuracy of RIW-BPNN, the total test samples ((1−0.
134) ∗ 683)) = 264 samples and

True positive (TP) = 95, True negative (TN) = 166,
False Positive (FP) = 3, False negative (FN) = 0
Sensitivity = 100 ∗ TP/(TP+FN) = 100%
Specificity = 100 ∗ TN/(TN+FP) = 100 ∗ 166/169 = 98.22%

Fig. 3 demonstrates part of the sample partition domain

nd shows the accuracy reach 99.6% for DBN-NN complex at

train+validate) to test partition equals to 63.84–36.16%. In compari-

on with RIW-BPNN, best accuracy 98.86% reached at (train+validate)

o test partition equals to 61.35–38.65%.

.2. Levenberg–Marquardt

Figs. 4 and 5 show the confusion matrix obtained from experiment

t several (train+validate) to test partitions, which varied from (0.5–

9.5%) to (80–20%), while train - validate partition is fixed at 70–30%.


144 A.M. Abdel-Zaher, A.M. Eldeib / Expert Systems With Applications 46 (2016) 139–144

c

t

p

R

A

A

B

B

C

D

D

E

G

H

H

H

H

L

M

N

N

P

P

P

P

Q

S

S

S

S

Ü

W

W

The best classifier accuracy of DBN-NN complex was 99.68% obtained

at (train+validate) to test partition equals to 54.9–45.1% with misclas-

sified sample percentage reached 0.32%. In comparison, RIW-BPNN

best accuracy reached 99.03% with misclassified sample percentage

0.97% obtained at (train+validate) to test partition equals to 54.76–

45.24%.

At the best accuracy of DBN-NN, the total test samples ((1−0.549)
∗ 683)) = 308 samples and

True positive (TP) = 118, True negative (TN) = 189,
False Positive (FP) = 1, False negative (FN) = 0
Sensitivity = 100 ∗ TP/(TP+FN) = 100%
Specificity = 100 ∗ TN/(TN+FP) = 100 ∗ 189/190 = 99.47%

At the best accuracy of RIW-BPNN, the total test samples were

((1−0. 548) ∗ 683)) = 309 samples and

TP = 114, TN = 192, FP = 2, FN = 1
Sensitivity = 100 ∗ TP/(TP+FN) = 99.130%
Specificity = 100 ∗ TN/(TN+FP) = 100 ∗ 192/194 = 98.969%

From Fig. 6, we can observe that the accuracy of DPN-NN complex

reached 99.68% obtained at (train+validate) to test partition equals

54.9–45.1%.

Table 1 summarizes different classifier performance including our

technique.

6. Conclusion

In this research, we presented an automatic diagnosis system

for detecting breast cancer based on DBN unsupervised pre-training

phase followed by a supervised back propagation neural network

phase (DBN-NN). The pre-trained back propagation neural network

with unsupervised phase DBN achieves higher classification accuracy

in comparison to a classifier with just one supervised phase. The ra-

tionale behind this enhancement could be that the learning of input

statistics from input feature space by DBN phase initializes back prop-

agation neural network to search objective function near a good local

optima in supervised learning phase.

From our experiment at the specified network architecture,

DBN-NN complex accuracy outperforms RIW-BPNN when back

propagation neural network uses conjugate gradient algorithm for

learning. DBN-NN still outperforms RIW-BPNN when we use

Levenberg-Marquardt for training in back propagation neural net-

work phase. The enhancement of overall neural network accuracy is

reaching 99.68% with 100% sensitivity and 99.47% specificity in breast

cancer case. Results show classifier performance improvements over

previous studies.

Although Hinton developed fast algorithm for training DBN, DBN’s

learning process still require substantial computational effort on

legacy hardware. Therefore, the main limitation/challenge of our ap-

proach is to build a CAD scheme based on DBN using commercial

hardware to assist medical professionals in the early detection pro-

cess of breast abnormality.

Future research effort should be allocated for evaluating such

classifier complex for auto diagnosis of other abnormalities such

as epilepsy based on EEG dataset, cardiac arrhythmia, and dia-

betic retinopathy (DR). Further, the presence of general-purpose
omputing on graphics processing units (GPGPU) and the distribu-

ion nature of DBN may also encourage the developments of efficient

arallel algorithm for learning such classifier.

eferences

bonyi, J., & Szeifert, F. (2003). Supervised fuzzy clustering for the identification of

fuzzy classifiers. Pattern Recognition Letters, 24, 2195–2207.
kay, M. F. (2009). Support vector machines combined with feature selection for breast

cancer diagnosis. Expert Systems With Applications, 36, 3240–3247.
engio, Y., Courville, A., & Vincent, P. (2013). Representation learning: A review and

new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence,
35, 1798–1828.

oser, B. E., Guyon, I. M., & Vapnik, V. N. (1992). A training algorithm for optimal margin

classifiers. In Proceedings of the 5th annual ACM workshop on computational learning
theory (pp. 144–152). ACM Press.

enters for disease control and prevention (2014) <http://www.cdc.gov/cancer/dcpc/
data/women.htm>. Cancer prevention control. Accessed 03.09.14.

ecoste, D., & Schölkopf, B. (2002). Training Invariant Support Vector Machines. Ma-
chine Learning, 46, 161–190.

heeba, J., Singh, N. A., & Selvi, S. T. (2014). Computer-aided detection of breast cancer

on mammograms: A swarm intelligence optimized wavelet neural network ap-
proach. Journal of Biomedical Informatics, 49, 45–52.

rhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., & Bengio, S. (2010). Why
does unsupervised pre-training help deep learning? Journal of Machine Learning

Research, 11, 625–660.
oodman, D. E., Boggess, L. C., & Watkins, a. B. (2002). Artificial immune system clas-

sification of multiple-class problems. In Proceedings of the intelligent engineering

systems (pp. 179–184). ASME.
amilton, H.J., Shan, N., & Cercone, N. (1996). RIAC: A Rule Induction Algorithm Based

on Approximate Classification.
inton, G.E.(2009a).Deep belief nets. <http://www.cs.toronto.edu/∼hinton/

nipstutorial/nipstut3.pdf> Accessed 03.09.14.
inton, G.E.(2009b). Deep belief networks. <http://www.scholarpedia.org/article/

Deep_belief_networks> Accessed 03.09. 14.

inton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep belief
nets. Neural Computation, 18, 1527–1554.

ecun, Y., LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (2001). Gradient-based learning
applied to document recognition. Intelligent Signal Processing (pp. 306–351). IEEE

Press.
ert, A., Kılıç, N. Z., Bilgili, E., & Akan, A. (2015). Breast cancer detection with reduced

feature set. Computational and Mathematical Methods in Medicine, 1–11.

ahato, K. B., Nehemiah, H. K., & Kannan, A. (2015). Knowledge mining from clini-
cal datasets using rough sets and backpropagation neural network. Computational

Mathematical Methods in Medicine, 1–13.
auck, D., & Kruse, R. (1999). Obtaining interpretable fuzzy classification rules from

medical data. Artificial Intelligence in Medicine, 16, 149–169.
alm, R.B.(2012). Deep learning toolbox. <https://github.com/rasmusbergpalm/

DeepLearnToolbox> Accessed 03.09.14.
aulin, F. (2011). Classification of breast cancer by comparing backpropagation training

algorithm. Intenational Journal on Computer Science and Engineering, 3, 327–332.

ena-Reyes, C. A., & Sipper, M. (1999). A fuzzy-genetic approach to breast cancer diag-
nosis. Artificial Intelligence in Medicine, 17, 131–155.

olat, K., & Günes, S. (2007). Breast cancer diagnosis using least square support vector
machine. Digital Signal Processing, 17, 694–701.

uinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial
Intelligence Research, 4, 77–90.

alama, G. I., Abdelhalim, M. B., & Zeid, M. A. (2012). Breast Cancer diagnosis on three

different datasets using multi-classifiers. International Journal of Computer and In-
formation Technology, 1, 36–43.

chmidhuber, J. (2014). Deep learning in neural networks: An overview. Neural Net-
works, 61C, 85–117.

etiono, R. (2000). Generating concise and accurate classification rules for breast can-
cer diagnosis. Artificial Intelligence in Medicine, 18, 205–219.

uykens, J. A. K., Horvath, G., Basu, S., Micchelli, C., & Vandewalle, J. (2003), Advances in

Learning Theory: Vol. 190 p. 392. IOS Press.
beyli, E. D. (2007). Implementing automated diagnostic systems for breast cancer de-

tection. Expert Systems With Applications, 33, 1054–1062.
isconsin breast cancer dataset (WBCD) (2014). (original). Accessed 03.09.14 <https:

//archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29>.
orld cancer research fund. (2014). <http://www.wcrf.org/int/cancer- facts- figures/

data- specific- cancers/breast- cancer- statistics> Accessed 03.09.14.

http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0001
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0002
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0003
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0004
http://www.cdc.gov/cancer/dcpc/data/women.htm
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0005
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0007
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0008
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0009
http://www.cs.toronto.edu/~hinton/nipstutorial/nipstut3.pdf
http://www.scholarpedia.org/article/Deep_belief_networks
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0010
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0012
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0013
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0014
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0014
https://github.com/rasmusbergpalm/DeepLearnToolbox
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0015
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0016
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0017
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0018
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0019
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0020
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0020
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0021
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref011
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0006
http://refhub.elsevier.com/S0957-4174(15)00710-1/sbref0006
https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+%28Original%29
http://www.wcrf.org/int/cancer-facts-figures/data-specific-cancers/breast-cancer-statistics

	Breast cancer classification using deep belief networks
	1 Introduction
	2 Background
	3 From back propagation (BP) to deep belief network (DBN)
	4 Experiment conditions and methodology
	5 Results
	5.1 Scaled conjugate gradient back propagation
	5.2 Levenberg-Marquardt

	6 Conclusion
	 References