The Enhancing Diabetes Prediction Accuracy Using Random Forest and XGBoost with PSO and GA-Based Feature Selection
Abstract
Diabetes represents a global health concern classified as a non-communicable disease, impacting more than 422 million people worldwide, with the number expected to increase each year. This study aims to evaluate the performance of the Random Forest and Extreme Gradient Boosting (XGBoost) classification algorithms on the diabetes disease dataset taken from Kaggle. To improve prediction accuracy, feature selection was carried out using Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) which are expected to filter the most relevant features. The study results showed that the Random Forest model without feature selection yielded an Area Under Curve (AUC) value of 0.8120, while XGBoost achieved an AUC of 0.7666. After applying feature selection with PSO, the AUC increased to 0.8582 for Random Forest and 0.8250 for XGBoost. The use of feature selection with GA gave better results, with an AUC of 0.8612 for Random Forest and 0.8351 for XGBoost. These results indicate that the increase in accuracy after feature selection using PSO ranges from 5.7% to 7.6%, while the increase with GA ranges from 6.1% to 8.9%, with GA providing more significant results. This study contributes to improving the accuracy of diabetes disease classification, which is expected to support the diagnosis process more quickly and accurately.
Downloads
References
[2] N. F. Idris, M. A. Ismail, M. I. M. Jaya, A. O. Ibrahim, A. W. Abulfaraj, and F. Binzagr, “Stacking with Recursive Feature Elimination-Isolation Forest for classification of diabetes mellitus,” PLoS One, vol. 19, no. 5, pp. 1–18, 2024, doi: 10.1371/journal.pone.0302595.
[3] M. R. Ansyari, M. I. Mazdadi, F. Indriani, D. Kartini, and T. H. Saragih, “Implementation of Random Forest and Extreme Gradient Boosting in the Classification of Heart Disease using Particle Swarm Optimization Feature Selection,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 4, pp. 250–260, 2023, doi: 10.35882/jeeemi.v5i4.322.
[4] A. R. Kulkarni et al., “learning algorithm to non- invasively detect diabetes and pre- diabetes from electrocardiogram,” pp. 32–42, 2023, doi: 10.1136/bmjinnov-2021-000759.
[5] W. Feng et al., “Automated segmentation of choroidal neovascularization on optical coherence tomography angiography images of neovascular age-related macular degeneration patients based on deep learning,” J. Big Data, vol. 10, no. 1, 2023, doi: 10.1186/s40537-023-00757-w.
[6] M. A. Hama Saeed, “Diabetes type 2 classification using machine learning algorithms with up-sampling technique,” J. Electr. Syst. Inf. Technol., vol. 10, no. 1, 2023, doi: 10.1186/s43067-023-00074-5.
[7] R. Supriyadi, W. Gata, N. Maulidah, and A. Fauzi, “Application of Random Forest Algorithm to Determine the Quality of Red Wine,” E-Bisnis J. Ilm. Econ. and Business , vol. 13, no. 2, pp. 67–75, 2020, doi: 10.51903/e-bisnis.v13i2.247.
[8] AM Akbar, R. Herteno, SW Saputro, MR Faisal, and RA Nugroho, "Enhancing Software Defect Prediction through Hybrid Optimization for Feature Selection and Gradient Boosting Classification," J. Electron. Electromed. Eng. Med. Informatics , vol. 6, no. 2, pp. 169–181, 2024, doi: 10.35882/jeeemi.v6i2.388.
[9] AP Ariyanti, MI Mazdadi, A.- Farmadi, M. Muliadi, and R. Herteno, "Application of Extreme Learning Machine Method With Particle Swarm Optimization to Classify of Heart Disease," IJCCS (Indonesian J. Comput. Cybern. Syst. , vol. 17, no. 3, p. 281, 2023, doi: 10.22146/ijccs.86291.
[10] EDMKom., “Performance of Genetic Algorithm (GA) in Subject Scheduling,” InfoTekJar (Journal of Nas. Inform. and Network Technology) , vol. 1, no. 1, pp. 56–60, 2016, doi: 10.30743/infotekjar.v1i1.42.
[11] S. Mahmuda, “Implementation of Random Forest Method on Youtube Channel Content Category,” J. Jendela Mat. , vol. 2, no. 01, pp. 21–31, 2024, doi: 10.57008/jjm.v2i01.633 .
[12] M. K. Suryadi, R. Herteno, S. W. Saputro, M. R. Faisal, and R. A. Nugroho, “A Comparative Study of Various Hyperparameter Tuning on Random Forest Classification with SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 2, pp. 137–147, 2024, doi: 10.35882/jeeemi.v6i2.375.
[13] I. D. Mienye and N. Jere, “Optimized Ensemble Learning Approach with Explainable AI for Improved Heart Disease Prediction,” Inf., vol. 15, no. 7, 2024, doi: 10.3390/info15070394.
[14] YF Zamzam, TH Saragih, R. Herteno, Muliadi, DT Nugrahadi, and PH Huynh, "Comparison of CatBoost and Random Forest Methods for Lung Cancer Classification using Hyperparameter Tuning Bayesian Optimization-based," J. Electron. Electromed. Eng. Med. Informatics , vol. 6, no. 2, pp. 125–136, 2024, doi: 10.35882/jeeemi.v6i2.382.
[15] D. Alita and AR Isnain, “Sarcasm Detection in Sentiment Analysis Process Using Random Forest Classifier,” J. Computation , vol. 8, no. 2, pp. 50–58, 2020, doi: 10.23960/kompul.v8i2 .2615.
[16] Suci Amaliah, M. Nusrang, and A. Aswi, “Application of Random Forest Method for Classification of Coffee Drink Variants at Konijiwa Bantaeng Coffee Shop,” VARIANSI J. Stat. Its Appl. Teach. Res. , vol. 4, no. 3, pp. 121–127, 2022, doi: 10.35580/variansiunm31.
[17] E. Ismanto, A. Fadlil, A. Yudhana, and K. Kitagawa, "A Comparative Study of Improved Ensemble Learning Algorithms for Patient Severity Condition Classification," J. Electron. Electromed. Eng. Med. Informatics , vol. 6, no. 3, pp. 312–321, 2024, doi: 10.35882/jeeemi.v6i3.452.
[18] A. J. Weiss et al., “Machine learning using institution-specific multi-modal electronic health records improves mortality risk prediction for cardiac surgery patients,” JTCVS Open, vol. 14, no. June, pp. 214–251, 2023, doi: 10.1016/j.xjon.2023.03.010.
[19] Syarifah Aini, Wisnu Ananta Kusuma, Medria Kusuma Dewi Hardhienata, and Mushthofa, “Network-Based Molecular Features Selection to Predict the Drug Synergy in Cancer Cells,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 3, pp. 168–176, 2023, doi: 10.35882/jeemi.v5i3.307.
[20] D.- Andriansyah and Eka Wulansari Fridayanthie, "Optimization of Support Vector Machine and XGBoost Methods Using Feature Selection to Improve Classification Performance," J. Informatics Telecommun. Eng. , vol. 6, no. 2, pp. 484–493, 2023, doi: 10.31289/jite.v6i2.8373.
[21] S. Bumbungan, Kusrini, and Kusnawi, “Application of Particle Swarm Optimization (PSO) in Automatic Parameter Selection on Support Vector Machine (SVM) for Predicting Graduation of Amamapare Timika Polytechnic Students,” J. Tech. AMATA , vol. 4 , no. 1, pp. 81–93, 2022, doi: 10.55334/jtam.v4i1.77.
[22] A. Adamu, M. Abdullahi, S. B. Junaidu, and I. H. Hassan, “An hybrid particle swarm optimization with crow search algorithm for feature selection,” Mach. Learn. with Appl., vol. 6, no. April, p. 100108, 2021, doi: 10.1016/j.mlwa.2021.100108.
[23] A. Setiawan, L. W. Santoso, and R. Adipranata, “Penerapan Algoritma Particle Swarm Optimization ( PSO ) untuk Optimisasi Pembangunan Negara dalam Turn Based Strategy Game,” J. Infra, vol. 7, no. 1, pp. 249–255, 2019.
[24] K. A. Putri and W. F. Al Maki, “Enhancing Pneumonia Disease Classification using Genetic Algorithm-Tuned DCGANs and VGG-16 Integration,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 1, pp. 11–22, 2024, doi: 10.35882/jeeemi.v6i1.349.
[25] S. Napi, T. Hamonangan Saragih, D. Turianto Nugrahadi, D. Kartini, and F. Abadi, “Implementation of Monarch Butterfly Optimization for Feature Selection in Coronary Artery Disease Classification Using Gradient Boosting Decision Tree,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 4, pp. 314–323, 2023.
[26] D. T. Wilujeng, M. Fatekurohman, and I. M. Tirta, “Analisis Risiko Kredit Perbankan Menggunakan Algoritma K-Nearest Neighbor dan Nearest Weighted K-Nearest Neighbor,” Indones. J. Appl. Stat., vol. 5, no. 2, p. 142, 2023, doi: 10.13057/ijas.v5i2.58426.
[27] R. C. Chen, C. Dewi, S. W. Huang, and R. E. Caraka, “Selecting critical features for data classification based on machine learning methods,” J. Big Data, vol. 7, no. 1, 2020, doi: 10.1186/s40537-020-00327-4.
[28] H. Zhou, Y. Xin, and S. Li, “A diabetes prediction model based on Boruta feature selection and ensemble learning,” BMC Bioinformatics, vol. 24, no. 1, pp. 1–34, 2023, doi: 10.1186/s12859-023-05300-5.
[29] N. Z. Al Habesyah, R. Herteno, F. Indriani, I. Budiman, and D. Kartini, “Sentiment Analysis of TikTok Shop Closure in Indonesia on Twitter Using Supervised Machine Learning,” J. Electron. Electromed. Eng. Med. Informatics, vol. 6, no. 2, pp. 148–156, 2024, doi: 10.35882/jeeemi.v6i2.381.
[30] V. Maulida, R. Herteno, D. Kartini, F. Abadi, and M. R. Faisal, “Feature Selection Using Firefly Algorithm With Tree-Based Classification In Software Defect Prediction,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 4, pp. 223–230, 2023, doi: 10.35882/jeeemi.v5i4.315.
[31] E. J. Wahyu, C. Chairani, and C. Chairani, “The Application Of Particle Swarm Optimization Using Naive Bayes Method For Predicting Heart Disease,” Proceeding Int. Conf. Inf. Technol. Bus., vol. 0, no. 0, pp. 64–71, 2022, [Online]. Available: https://jurnal.darmajaya.ac.id/index.php/icitb/article/view/3395
Copyright (c) 2025 Dzira Naufia Jawza, Muhammad Itqan Mazdadi, Andi Farmadi, Triando Hamonangan Saragih, Dwi Kartini, Vugar Abdullayev

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).