Abstract—This paper does a comparative study of commonly used machine learning algorithms in predicting the prevalence of heart diseases. It uses the publicly available Cleveland Dataset and models the classification techniques on it. It brings up the differences between different models and evaluates their accuracies in predicting a heart disease. We have shown that lesser complex models such as logistic regression and support vector machines with linear kernel give more accurate results than their more complex counterparts. We have used F1 score and ROC curves as evaluative measures. Through this effort, we aim to provide a benchmark and improve earlier ones in the field of heart disease diagnostics using machine learning classification techniques.
Index Terms—Cleveland heart disease dataset, classification, svm, neural networks.
Divyansh Khanna, Rohan Sahu, and Bharat Deshpande are with the Department of Computer Science, Birla Institute of Technology and Sciences, Pilani, Goa Campus, 403726 India (e-mail: divyanshkhanna09@gmail.com, rohan9605@gmail.com, bmd@goa.bits-pliani.ac.in).
Veeky Baths is with the Department of Biological Science, Birla Institute of Technology and Sciences, Pilani, Goa Campus, 403726 India (e-mail: veeky@goa.bits-pilani.ac.in).
Cite: Divyansh Khanna, Rohan Sahu, Veeky Baths, and Bharat Deshpande, "Comparative Study of Classification Techniques (SVM, Logistic Regression and Neural Networks) to Predict the Prevalence of Heart Disease," International Journal of Machine Learning and Computing vol.5, no. 5, pp. 414-419, 2015.