key: cord-0266450-ul7irfh6 authors: Hussain, Shadab; Nanda, Dr. Santosh Kumar; Barigidad, Susmith; Akhtar, Shadab; Suaib, Md; Ray, Niranjan K. title: Novel Deep Learning Architecture for Heart Disease Prediction using Convolutional Neural Network date: 2021-05-22 journal: nan DOI: 10.1109/ocit53463.2021.00076 sha: f60a45502f91f9bc7f7ad2c3471d967a9c812591 doc_id: 266450 cord_uid: ul7irfh6 Healthcare is one of the most important aspects of human life. Heart disease is known to be one of the deadliest diseases which is hampering the lives of many people around the world. Heart disease must be detected early so the loss of lives can be prevented. The availability of large-scale data for medical diagnosis has helped developed complex machine learning and deep learning-based models for automated early diagnosis of heart diseases. The classical approaches have been limited in terms of not generalizing well to new data which have not been seen in the training set. This is indicated by a large gap in training and test accuracies. This paper proposes a novel deep learning architecture using a 1D convolutional neural network for classification between healthy and non-healthy persons to overcome the limitations of classical approaches. Various clinical parameters are used for assessing the risk profile in the patients which helps in early diagnosis. Various techniques are used to avoid overfitting in the proposed network. The proposed network achieves over 97% training accuracy and 96% test accuracy on the dataset. The accuracy of the model is compared in detail with other classification algorithms using various performance parameters which proves the effectiveness of the proposed architecture. There has been considerable research in the field of healthcare in the last few years particularly after the Covid pandemic. According to the World Health Organization, heart disease is one of the worst diseases, accounting for the majority of human fatalities worldwide [1] . It is also observed that more than 24% of the deaths in India are due to various forms of heart disease [2] . As a result, there is a need to develop an early detection system that prevents mortality caused by cardiac disorders. Heart disease, also known as cardiac disease, is caused by the constriction of the coronary arteries, which provide blood to the heart. There are methods like Angiography which is used for detecting heart diseases but it is very costly and is prone to certain reactions in a patient's body. This prevents the widespread use of these techniques in countries with large poor populations. There is a need of developing healthcare products that provide quality results at an affordable rate. Healthcare organizations are also looking for clinical tests which can be performed without invasion at a cheap rate. The creation of a computer-based decision support system for the diagnosis of various diseases can assist organizations in meeting the needs of millions of people worldwide. The rapid growth of machine learning and deep learning algorithms has helped research in various industries including medical. The availability of large-scale medical diagnosis data has helped in training these algorithms. The clinical support system can be developed using these algorithms which helps in reducing cost and increasing accuracy [3] . Various clinical features can be utilized by machine learning algorithms for categorizing the risk profile of the patients. There are certain features like age, sex, heredity which are not in control while features like blood pressure, smoking, drinking habits are in control of the patient [2] . The proposed algorithm uses a combination of these features for categorizing healthy and non-healthy patients. The rest of the paper is structured as follows: Section II discusses the available approaches of heart disease classification utilizing machine learning technologies. Section-III provides an explanation of the suggested architecture. Section-IV discusses the implementation specifics and findings. A lot of work has gone into designing a heart disease diagnosis system for early identification using several clinical criteria. For identifying patients, many methods such as Logistic Regression, Support Vector Machine, Decision Tree, Random Forest, Artificial Neural Network, and others are utilized. This section summarizes those implementations. S. Radhimeenakshi [4] proposed a Classification of cardiac disease using a decision tree and support vector machine. In terms of accuracy as measured by a confusion matrix, he concluded that the decision tree classifier outperforms SVM. R.W.Jones et al [5] developed a strategy for predicting cardiac disease using an ANN. They used a selfapplied questionnaire for training the neural network. The neural network contained three hidden layers and was trained using a backpropagation algorithm. The architecture was validated using the Dundee rank factor score and achieved a 98% relative operating characteristic value on the dataset. Ankita Dewan et al. [6] compared the performance of genetic algorithms and backpropagation for training the neural network architecture. They concluded backpropagation algorithms perform better with a very minimum error on the dataset. SY Huang et al. [7] proposed a learning vector quantization algorithm for training the artificial neural network. They used 13 clinical features for training the network and achieved almost 80% accuracy on the dataset. Jayshril S. Sonawane et al. [8] proposed a new artificial neural architecture that can be taught with a vector quantization technique and random order incremental training They also used 13 clinical features for training and attained an accuracy of 85.55 percent on the dataset. Majid Ghonji Feshki et al. [9] used four different classification algorithms for detecting cardiac diseases. They concluded that the PSO algorithm with neural networks achieved the best accuracy of around 91.94% on the dataset. R. R. Manza et al. [10] proposed an ANN with a large number of neurons in the hidden layer which uses a Radial Basis Function. They obtained around 97% accuracy on this architecture. Saba Bashir et al. [2, 10] proposed a hybrid model for heart disease prediction which uses a combination of decision trees, SVM, and Naïve Bayes algorithms. They achieved 74% sensitivity, 82% accuracy, and 93% specificity. P. Ramprakash et al. [1] proposed a deep neural network and χ 2 statistical model for feature selection. They used various techniques to avoid overfitting and underfitting. They achieved 94% accuracy, 93% sensitivity, and 93% specificity. Turay Karayilan et al. [2] studied the performance of artificial neural networks with the various number of hidden layers. They achieved around 95.55% accuracy using five hidden layers. It can be observed that most of the proposed systems use Artificial Neural Networks with some modifications. It is observed that these architectures are prone to overfitting so perform poorly on new data. So this paper proposes a new architecture using a one-dimensional convolutional neural network with dropout to avoid overfitting It uses the Cleveland database [12] which includes 13 characteristics for distinguishing between healthy and unhealthy individuals. The other classification algorithms are also implemented for verifying the performance of the proposed architecture using well-known performance measuring parameters. A detailed explanation of the proposed architecture with the algorithms and techniques used in the next section. This section describes the proposed architecture and all its constituent layers in detail along with the techniques used to optimize the architecture. It also gives some theoretical background about the 1-D convolutional neural network (CNN) which is central to the proposed architecture. Conventional 2D CNN has become very popular in pattern recognition problems like Image classification and object detection [13] . CNNs are similar to ANN in which they consist of self-optimizing neurons which are trained to perform a certain task. This has led to the development of 1-D CNN which can operate on one-dimensional datasets or Time series data [13] . The proposed architecture using this concept of 1D CNN is shown in Fig. 1 below. The input to the architecture will be the 13 characteristics that are crucial in the prediction of heart disease. These features are converted to a new representation called word embedding by the layer called Embedding Layer. It is similar to the Bag of Words concept used for Text data. It helps in a better representation of the dataset according to unique values present in each of the features. The Embedding layer's output is given to the 1D CNN layer for feature extraction. 1D CNN is very similar to conventional 2D CNN but the convolution operation is only applied to the one dimension which results in shallow architecture which can be easily trained on normal CPU or even embedded development boards [13] . The convolution operation helps in finding useful hierarchical features from the dataset which are useful in classification. The dimensions of the output features after 1D CNN can be calculated using the equation given below: Where x is the dimension of output features and w is the size of input features. f indicates the size of the filter used for convolutions. 'p' indicates padding which are values added on the boundary before applying convolution. 's' indicates stride which is the value travelled after applying convolution operation. The 1D convolution operation is a linear operation that is not useful in classifying nonlinear data. Most of the real-world dataset is nonlinear which requires some nonlinear operation after convolution. This nonlinear function is called an activation function. Some of the most common activation functions are the sigmoid, hyperbolic tangent, and rectified linear unit (RelU). The proposed architecture uses the RelU activation function which is easy to compute and allows faster computation. It also does not suffer from vanishing or exploding gradient problems. There can be multiple convolution layers in the architecture followed by an activation function. The proposed architecture uses two 1-D convolution layers with 128 filters and filter sizes of 3. The output of the final convolution layer is passed through the global max-pooling layer which pools the maximum value from all the channels and reduces the dimension of output. The output of pooling is given to the fully connected layer with 256 neurons which extracts the useful features for classification. This layer is similar to the hidden layer is ANN. The final layer contains a single neuron which gives the classification probability. The final layer uses the sigmoid activation function as it directly gives the probability for binary classification. The layer-wise details along with output feature dimensions and trainable weights of every layer are shown in Table 1 . The proposed 1D CNN architecture contains around 0.13 million trainable parameters which will get adapted during the training of the network. It was observed that general CNN architecture overfitted the training data meaning that training accuracy was very high and validation accuracy was low. The dropout technique was introduced to remove overfitting. It removes random neurons with a certain probability during training which allows the different networks to be trained at every iteration. This will help in the network not being too dependent on any single neuron of the network. The dropout layer has been introduced after each trainable layer in the proposed architecture. The addition of the dropout layer helped the training and test accuracy to be very similar which points to the network adapting well to data that it has not seen. The next section describes the implementation details and results obtained after training the proposed architecture. The suggested architecture for heart disease prediction was built with the scikit-learn and Keras libraries, which enables the implementation of various machine learning and deep learning algorithms. The development setup includes an Intel i5 CPU and 8GB of RAM. It also contains a GeForce 940 GPU, which aids in the training of the architecture. The paper uses the Cleveland database [12] which has 303 samples of patients with 14 different features. The dataset is split into two halves where 80% is used for training and the remaining 20% is used for validation. The clinical parameters (features) used for classification in the dataset are explained in Table 2 . Most of the traditional classification architectures require all the attributes in the same range. This dataset has attributes in different ranges so a standardization technique is applied which converts all the attributes into the same range. It subtracts all the attribute values with the average value of the attribute and divides them by the standard deviation of the attribute. The final attribute is the true label for the patient whether he/she has heart disease or not. The dataset is a little unbalanced in the sense that there are more negative examples compared to positive as shown in Fig. 2 below. The batch size is taken as 32. The binary cross-entropy is calculated between the true value and the predicted value for calculating loss for optimization. This function has to be minimized using some optimization algorithm to achieve convergence. The Adam optimization algorithm is used for training as it provides faster convergence and does not zigzag around the local minima [14] . ADAM uses exponentially weighted gradients as well as exponentially weighted square gradients for updating the weights at each iteration. Exponentially decaying averages of past gradients is calculated by: Where μ t is the momentum term at timestamp 't', 1 is constant which is taken as 0.9 and Gt is the gradient at timestamp 't'. Exponentially decaying averages of past squared gradients is calculated by: Where Ѵt is the velocity term at timestamp 't', 2 is constant which is taken as 0.99 and Gt is the gradient at timestamp 't'. Bias correction μ = μ and Ѵ = Ѵ and then, update parameters using Adam's update rule: Where ε is constant with a very small value which avoids division by zero and Wt is a parameter value at timestamp 't'. The training and test accuracy after each epoch is shown in Fig. 3 below: The proposed architecture with dropout achieves training accuracy of 97.79% and test accuracy of 96.77%. Some other well-known classification algorithms are also implemented for comparing the performance of the proposed architecture. The detailed comparison table is shown in Table 3 below: When the dataset is unbalanced then sometimes accuracy does not give a correct idea about the performance measure of the architecture. So, the performance is also measured in terms of other performance measuring parameters like F1 Score, precision, recall and Area under the receiver operating characteristic curve. Precision is an indication of how many positive predictions are correct whereas recall identifies how many actual positive examples are correctly identified. There is always a tradeoff between precision and recall so a new performance measuring parameter F1 score is introduced. F1 score measures the harmonic mean of these two which gives a balance value between precision and recall. The last parameter AUC is a measure of the area under the ROC curve. The ROC curve is shown in Fig. 5 . The ROC curve is a graph that indicated the relationship between the false positive rate and the true positive rate. The proposed architecture has a ROC curve that is very near to the ideal curve which indicates the good performance of the architecture on the test set. The proposed architecture performs well in terms of all of these performance parameters. The proposed architecture is also verified on new data which is not available on either train or validation set. It achieves good performance on new data as well. The statistical importance of each feature in classification is also observed. The purpose of this paper is to use a computerassisted technique to detect cardiac problems early. A 1D convolutional neural network design for predicting heart disease is proposed in this paper. It also contains an Embedding layer which converts the feature vector into new vector embedding which helps in classification. The proposed architecture is implemented as a software system on a computer that can help in the early diagnosis of cardiac disease at a cheap cost and with high accuracy. The architecture uses overfitting avoidance techniques which help the performance of unseen data. The performance of 1D CNN architecture is best among all other classification algorithms like Logistic Regression, Naïve Bays, SVM, Decision Tree, Random Forest, LightGBM, XGBoost, and ANN. More and more parameters can be included in the system which can help in classifying heart disease more accurately. It can also be integrated with wearable sensor readings for real-time prediction of heart diseases. New initiative launched to tackle cardiovascular disease, the world's number one killer Heart Disease Prediction Using Deep Neural Network Prediction of heart disease using neural network Classification and prediction of heart disease risk using data mining techniques of Support Vector Machine and Artificial Neural Network Detecting the risk factors of coronary heart disease by use of neural networks Prediction of heart disease using a hybrid technique in data mining classification HDPS: Heart disease prediction system Prediction of heart disease using learning vector quantization algorithm Improving the heart disease diagnosis by evolutionary algorithm of PSO and Feed Forward Neural Network Prediction of heart disease medical prescription using radial basis function An ensemble-based decision support framework for intelligent heart disease diagnosis Principle Investigator responsible for data collection FA-1D-CNN Implementation to Improve Diagnosis of Heart Disease Risk Level Adam: A method for stochastic optimization hQChain: Leveraging Towards Blockchain and Queueing Model for Secure Smart Connected Health Deepfog: Fog computing-based deep neural architecture for prediction of stress types, diabetes and hypertension attacks Leveraging machine learning in mist computing telemonitoring system for diabetes prediction