Implementation of Random Forest and Extreme Gradient Boosting in the Classification of Heart Disease using Particle Swarm Optimization Feature Selection

Keywords: GBoost, Random Forest, PSO, Heart

Abstract

Heart disease is a condition that ranks as the primary cause of death worldwide. Based on available data, over 36 million people have succumbed to non-communicable diseases, and heart disease falls within the category of non-communicable diseases. This research employs a heart disease dataset from the UCI Repository, consisting of 303 instances and 14 categorical features. In this research, the data were analyzed using the classification methods XGBoost (Extreme Gradient Boosting) and Random Forest, which can be applied with PSO (Particle Swarm Optimization) as a feature selection technique to address the issue of irrelevant features. This issue can impact prediction performance on the heart disease dataset. From the results of the conducted research, the obtained values for the XGBoost (Extreme Gradient Boosting) model were 0.877, and for the Random Forest model, it was 0.874. On the other hand, in the model utilizing Particle Swarm Optimization (PSO), the obtained AUC values are 0.913 for XGBoost (Extreme Gradient Boosting) and 0.918 for Random Forest. These research results demonstrate that PSO (Particle Swarm Optimization) can enhance the AUC of heart disease prediction performance. Therefore, this research contributes to enhancing the precision and efficiency of heart disease patient data processing, which benefits heart disease diagnosis in terms of speed and accuracy.

Downloads

Download data is not yet available.

References

D. Shah, S. Patel, and S. Kumar, “Heart Disease Prediction using Machine Learning Techniques,” SN Comput. Sci., 2020, doi: 10.1007/s42979-020-00365-y.

P. Ghosh et al., “Efficient prediction of cardiovascular disease using machine learning algorithms with relief and lasso feature selection techniques,” IEEE Access, vol. 9, pp. 19304–19326, 2021, doi: 10.1109/ACCESS.2021.3053759.

R. Perumal and K. AC, “Early Prediction of Coronary Heart Disease from Cleveland Dataset using Machine Learning Techniques,” Int. J. Adv. Sci. Technol., vol. 29, no. 06, pp. 4225–4234, 2020, [Online]. Available: http://sersc.org/journals/index.php/IJAST/article/view/16428

M. Belgiu and L. Dra, “ISPRS Journal of Photogrammetry and Remote Sensing Random forest in remote sensing : A review of applications and future directions ˘ gut,” vol. 114, pp. 24–31, 2016, doi: 10.1016/j.isprsjprs.2016.01.011.

A. S. More and D. P. Rana, “Review of Random Forest Classification Techniques to Resolve Data Imbalance,” pp. 72–78, 2017.

R. Pavan, M. Nara, S. Gopinath, and N. Patil, “Bayesian optimization and gradient boosting to detect phishing websites,” 2021 55th Annu. Conf. Inf. Sci. Syst. CISS 2021, pp. 2–6, 2021, doi: 10.1109/CISS50987.2021.9400317.

C. Chen & Guestrin, “XGBoost: A Scalable Tree Boosting System,” J. Assoc. Physicians India, 2016, [Online]. Available: ISBN 978-1-4503-%0A4232-2/16/08

O. Almomani, “SS symmetry Detection System Based on PSO , GWO , FFA and,” 2020.

M. Batool, A. Jalal, and K. Kim, “Sensors Technologies for Human Activity Analysis Based on SVM Optimized by PSO Algorithm,” 2019 Int. Conf. Appl. Eng. Math., pp. 145–150, 2019.

E. Prasetyo and B. Prasetiyo, “Increased Classification Accuracy C4.5 Algorithm Using Bagging Techniques in Diagnosing Heart Disease,” J. Teknol. Inf. dan Ilmu Komput., vol. 7, no. 5, pp. 1035–1040, 2020, doi: 10.25126/jtiik.202072379.

M. Abualkibash, “U SING D IFFERENT M ACHINE L EARNING A LGORITHMS O N K DD -99 A ND N SL -K DD D ATASETS - A R EVIEW P APER,” no. June 2019, 2021, doi: 10.5121/ijcsit.2019.11306.

H. Tyralis and G. Papacharalampous, “Scientists and Practitioners and Their Recent History,” 2019.

A. E. Maxwell et al., “Implementation of machine-learning classification in remote sensing : an applied review sensing : an applied review,” Int. J. Remote Sens., vol. 39, no. 9, pp. 2784–2817, 2018, doi: 10.1080/01431161.2018.1433343.

H. B. Kibria, “The Severity Prediction of The Binary And Multi-Class Cardiovascular Disease - M ULTI -C LASS C ARDIOVASCULAR D ISEASE - A M ACHINE L EARNING -B ASED F USION A PPROACH,” no. March, 2022, doi: 10.48550/arXiv.2203.04921.

L. Breiman, “Random Forests. Machine Learning,” Netherlands Kluwer Acad. Publ., 2001.

G. Biau, “Analysis of a Random Forests Model,” J. Mach. Learn. Res., p. Vol.49, No.5, pp. 373–381, 2012.

Z. Bingzhen, “A Random Forest Classification Model for Transmission Line Image Processing,” no. Iccse, pp. 613–617, 2020.

B. A. Tama, L. Nkenyereye, S. M. R. Islam, and K. S. Kwak, “An enhanced anomaly detection in web traffic using a stack of classifier ensemble,” IEEE Access, vol. 8, pp. 24120–24134, 2020, doi: 10.1109/ACCESS.2020.2969428.

B. Engineering and N. Firdous, “pulmonary embolism-a non-cardiac cause of cardiac arrest Handling of derived imbalanced dataset using XGBoost for identification of pulmonary embolism — a non ‑ cardiac cause of cardiac arrest,” no. December, 2021, doi: 10.1007/s11517-021-02455-2.

Y. Li, D. Yao, J. Yao, and W. Chen, “A particle swarm optimization algorithm for beam angle selection in intensity-modulated radiotherapy,” vol. 3491, doi: 10.1088/0031-9155/50/15/002.

M. Moodi, M. Ghazvini, and H. Moodi, “Knowledge-Based Systems A hybrid intelligent approach to detect Android Botnet using Smart,” Knowledge-Based Syst., vol. 222, p. 106988, 2021, doi: 10.1016/j.knosys.2021.106988.

A. R. Syulistyo, D. M. J. Purnomo, M. F. Rachmadi, and A. Wibowo, “Convolutions Subsampling Convolutions Gaussian connection Full connection Full connection Subsampling,” JIKI (Jurnal Ilmu Komput. dan Informasi) UI, vol. 9, no. 1, pp. 52–58, 2016.

F. Gorunescu, Data Mining: Concepts, models and techniques. 2011.

D. R. Chandranegara, S. Arifianto, and H. Wibowo, “Aircraft Data Analysis Using Data Void Elimination and Data Smoothing Methods,” J. POROS Tek., vol. 12, no. 1, pp. 1–7, 2020.

I. Behravan, “An optimal SVM with feature selection using multi- objective PSO,” pp. 76–81, 2016.

J. Han, M. Kamber, and J. Pei, “Third Edition : Data Mining Concepts and Techniques,” J. Chem. Inf. Model., vol. 53, no. 9, pp. 1689–1699, 2012, [Online]. Available: http://library.books24x7.com/toc.aspx?bkid=44712

K. Sumathi, S. Kannan, and K. Nagarajan, “Data Mining: Analysis of student database using Classification Techniques,” Int. J. Comput. Appl., vol. 141, no. 8, pp. 22–27, 2016, doi: 10.5120/ijca2016909703.

H. Wei, C. Hu, S. Chen, Y. Xue, and Q. Zhang, “Establishing a software defect prediction model via effective dimension reduction,” Inf. Sci. (Ny)., vol. 477, pp. 399–409, 2019, doi: 10.1016/j.ins.2018.10.056.

D. Rodriguez, I. Herraiz, R. Harrison, J. Dolado, and J. C. Riquelme, “Preliminary Comparison of Techniques for Dealing with Imbalance in Software Defect Prediction Categories and Subject Descriptors,” Proc. 18th Int. Conf. Eval. Assess. Softw. Eng. - EASE ’14, 2014.

M. Ajdani and H. Ghaffary, “Introduced a new method for enhancement of intrusion detection with random forest and PSO algorithm,” no. November 2020, pp. 1–10, 2021, doi: 10.1002/spy2.147.

H. Jiang, Z. He, G. Ye, and H. Zhang, “Network Intrusion Detection Based on PSO-Xgboost Model,” IEEE Access, vol. 8, pp. 58392–58401, 2020, doi: 10.1109/ACCESS.2020.2982418.

Published
2023-09-24
How to Cite
[1]
M. R. Ansyari, M. I. Mazdadi, F. Indriani, D. Kartini, and T. H. Saragih, “Implementation of Random Forest and Extreme Gradient Boosting in the Classification of Heart Disease using Particle Swarm Optimization Feature Selection”, j.electron.electromedical.eng.med.inform, vol. 5, no. 4, pp. 250-260, Sep. 2023.
Section
Electronics