Implementation of Monarch Butterfly Optimization for Feature Selection in Coronary Artery Disease Classification Using Gradient Boosting Decision Tree

Keywords: Coronary Artery Disease, Classification, GBDT, Feature Selection, MBO

Abstract

Coronary artery disease, a prevalent type of cardiovascular disease, is a significant contributor to premature mortality globally. Employing the classification of coronary artery disease as an early detection measure can have a substantial impact on reducing death rates caused by this ailment. To investigate this, the Z-Alizadeh dataset, consisting of clinical data from patients afflicted with coronary artery disease, was utilized, encompassing a total of 303 data points that comprise 55 predictive attribute features and 1 target attribute feature. For the purpose of classification, the Gradient Boosting Decision Tree (GBDT) algorithm was chosen, and in addition, a metaheuristic algorithm called monarch butterfly optimization (MBO) was implemented to diminish the number of features. The objective of this study is to compare the performance of GBDT before and after the application of MBO for feature selection. The evaluation of the study's findings involved the utilization of a confusion matrix and the calculation of the area under the curve (AUC). The outcomes demonstrated that GBDT initially attained an accuracy rate of 87.46%, a precision of 83.85%, a recall of 70.37%, and an AUC of 82.09%. Subsequent to the implementation of MBO, the performance of GBDT improved to an accuracy of 90.26%, a precision of 86.82%, a recall of 80.79%, and an AUC of 87.33% with the selection of 31 features. This improvement in performance leads to the conclusion that MBO effectively addresses the feature selection issue within this particular context.

Downloads

Download data is not yet available.

References

M. Wang, Coronary Artery Disease: Therapeutics and Drug Discovery, vol. 1177. 2020. doi: 10.1007/978-981-15-2517-9.

M. Sayadi, V. Varadarajan, F. Sadoughi, S. Chopannejad, and M. Langarizadeh, “A Machine Learning Model for Detection of Coronary Artery Disease Using Noninvasive Clinical Parameters,” Life, vol. 12, no. 11, 2022, doi: 10.3390/life12111933.

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,” Ann. Stat., vol. 29, no. 5, pp. 1189–1232, 2001, doi: 10.1214/aos/1013203451.

M. Zheng, F. Wang, X. Hu, Y. Miao, H. Cao, and M. Tang, “A Method for Analyzing the Performance Impact of Imbalanced Binary Data on Machine Learning Models,” Axioms, vol. 11, no. 11, 2022, doi: 10.3390/axioms11110607.

J. He, L. Yang, D. Liu, and Z. Song, “Automatic Recognition of High-Density Epileptic EEG Using Support Vector Machine and Gradient-Boosting Decision Tree,” Brain Sci., vol. 12, no. 9, 2022, doi: 10.3390/brainsci12091197.

Z. Ye et al., “The prediction of in-hospital mortality in chronic kidney disease patients with coronary artery disease using machine learning models,” Eur. J. Med. Res., vol. 28, no. 1, pp. 1–13, 2023, doi: 10.1186/s40001-023-00995-x.

H. Liu and R. Setiono, “Feature selection via discretization,” IEEE Trans. Knowl. Data Eng., vol. 9, no. 4, pp. 642–645, 1997, doi: 10.1109/69.617056.

G. Forman, “An extensive empirical study of feature selection metrics for text classification,” J. Mach. Learn. Res., vol. 3, pp. 1289–1305, 2003.

J. Hassannataj Joloudari et al., “GSVMA: A Genetic Support Vector Machine ANOVA Method for CAD Diagnosis,” Front. Cardiovasc. Med., vol. 8, pp. 1–14, 2022, doi: 10.3389/fcvm.2021.760178.

M. Alweshah, S. Al Khalaileh, B. B. Gupta, A. Almomani, A. I. Hammouri, and M. A. Al-Betar, “The monarch butterfly optimization algorithm for solving feature selection problems,” Neural Comput. Appl., vol. 34, no. 14, pp. 11267–11281, 2022, doi: 10.1007/s00521-020-05210-0.

S. Sridhar, D. Dhanasekaran, and G. C. P. Latha, “Content-Based Movie Recommendation System Using MBO with DBN,” Intell. Autom. Soft Comput., vol. 35, no. 3, pp. 3241–3257, 2023, doi: 10.32604/iasc.2023.030361.

C. Khammassi and S. Krichen, “A GA-LR wrapper approach for feature selection in network intrusion detection,” Comput. Secur., vol. 70, no. June, pp. 255–277, 2017, doi: 10.1016/j.cose.2017.06.005.

H. Mo, H. Sun, J. Liu, and S. Wei, “Developing window behavior models for residential buildings using XGBoost algorithm,” Energy Build., vol. 205, p. 109564, 2019, doi: 10.1016/j.enbuild.2019.109564.

P. Joshi, Python Machine Learning Cookbook. Packt Publishing, 2016. [Online]. Available: https://books.google.co.id/books?id=EwNwDQAAQBAJ

J. T. Hancock and T. M. Khoshgoftaar, “Survey on categorical data for neural networks,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00305-w.

T. M. Alam et al., “An investigation of credit card default prediction in the imbalanced datasets,” IEEE Access, vol. 8, pp. 201173–201198, 2020, doi: 10.1109/ACCESS.2020.3033784.

S. Sinsomboonthong, “Performance Comparison of New Adjusted Min-Max with Decimal Scaling and Statistical Column Normalization Methods for Artificial Neural Network Classification,” Int. J. Math. Math. Sci., vol. 2022, 2022, doi: 10.1155/2022/3584406.

X. Zhao, X. Li, S. Sun, and X. Jia, “Secure and Efficient Federated Gradient Boosting Decision Trees,” Appl. Sci., vol. 13, no. 7, 2023, doi: 10.3390/app13074283.

L. Ma, H. Xiao, J. Tao, T. Zheng, and H. Zhang, “An intelligent approach for reservoir quality evaluation in tight sandstone reservoir using gradient boosting decision tree algorithm,” Open Geosci., vol. 14, no. 1, pp. 629–645, 2022, doi: 10.1515/geo-2022-0354.

Y. Feng, S. Deb, G. G. Wang, and A. H. Alavi, “Monarch butterfly optimization: A comprehensive review,” Expert Syst. Appl., vol. 168, p. 114418, 2021, doi: 10.1016/j.eswa.2020.114418.

G. G. Wang, S. Deb, and Z. Cui, “Monarch butterfly optimization,” Neural Comput. Appl., vol. 31, no. 7, pp. 1995–2014, 2019, doi: 10.1007/s00521-015-1923-y.

S. Bao et al., “A new method for optimal parameters identification of a PEMFC using an improved version of Monarch Butterfly Optimization Algorithm,” Int. J. Hydrogen Energy, vol. 45, no. 35, pp. 17882–17892, 2020, doi: 10.1016/j.ijhydene.2020.04.256.

D. L. Namburi and M. Satya Sai Ram, “Speaker Recognition Based on Mutated Monarch Butterfly Optimization Configured Artificial Neural Network,” Int. J. Electr. Comput. Eng. Syst., vol. 13, no. 9, pp. 767–775, 2022, doi: 10.32985/ijeces.13.9.5.

a. K. Santra and C. J. Christy, “Genetic Algorithm and Confusion Matrix for Document Clustering,” Int. J. Comput. Sci., vol. 9, no. 1, pp. 322–328, 2012, [Online]. Available: http://ijcsi.org/papers/IJCSI-9-1-2-322-328.pdf

M. Bekkar, H. K. Djemaa, and T. A. Alitouche, “Evaluation Measures for Models Assessment over Imbalanced Data Sets,” J. Inf. Eng. Appl., vol. 3, no. 10, pp. 27–38, 2013, [Online]. Available: http://www.iiste.org/Journals/index.php/JIEA/article/view/7633

M. Te Wu, “Confusion matrix and minimum cross-entropy metrics based motion recognition system in the classroom,” Sci. Rep., vol. 12, no. 1, pp. 1–10, 2022, doi: 10.1038/s41598-022-07137-z.

I. Markoulidakis, I. Rallis, I. Georgoulas, G. Kopsiaftis, A. Doulamis, and N. Doulamis, “Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem,” Technologies, vol. 9, no. 4, 2021, doi: 10.3390/technologies9040081.

D. Chicco, N. Tötsch, and G. Jurman, “The matthews correlation coefficient (Mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min., vol. 14, pp. 1–22, 2021, doi: 10.1186/s13040-021-00244-z.

B. Robert and E. B. Brown, Data Mining: Concep, models and techniques, no. 1. 2004. [Online]. Available: https://link.springer.com/book/10.1007/978-3-642-19721-5

R. Dubey, J. Zhou, Y. Wang, P. M. Thompson, and J. Ye, “Analysis of sampling techniques for imbalanced data: An n=648 ADNI study,” Neuroimage, vol. 87, pp. 220–241, 2014, doi: 10.1016/j.neuroimage.2013.10.005

Published
2023-10-23
How to Cite
[1]
Siti Napi’ah, Triando Hamonangan Saragih, Dodon Turianto Nugrahadi, Dwi Kartini, and Friska Abadi, “Implementation of Monarch Butterfly Optimization for Feature Selection in Coronary Artery Disease Classification Using Gradient Boosting Decision Tree”, j.electron.electromedical.eng.med.inform, vol. 5, no. 4, pp. 314-323, Oct. 2023.
Section
Electronics