A Comparative Study of Improved Ensemble Learning Algorithms for Patient Severity Condition Classification
Abstract
The evolution of Electronic Health Records (EHR) has facilitated comprehensive patient record-keeping, enhancing healthcare delivery and decision-making processes. Despite these advancements, analyzing EHR data using ensemble machine learning methods poses unique challenges. These challenges include data dimensionality, imbalanced class distributions, and the need for effective hyperparameter tuning to optimize model performance. The study conducted a thorough comparative analysis of various ensemble machine learning (EML) models using Electronic Health Record (EHR) datasets. After addressing data imbalance and reducing dimensionality, the accuracy of the EML models showed significant improvement. Notably, the Gradient Boosting Machine (GBM) and CatBoost models exhibited superior performance with an accuracy of 73%, achieved through experiments involving dimensionality reduction and handling of imbalanced data. Furthermore, optimization techniques such as Grid Search and Random Search were employed to enhance the EML models. The results of model optimization revealed that the GBM + Random Search model performed the best, achieving an accuracy of 74%, followed by the XGBoost + Grid Search model with an accuracy of 73%. The GBM model also excelled in distinguishing between positive and negative classes, boasting the highest Area under Curve (AUC) value of 0.78, indicative of its superior classification capabilities compared to other models. This study emphasizes the significance of incorporating cutting-edge EML techniques into clinical workflows and emphasizes the revolutionary potential of GBM in classification modeling for patient severity conditions. Future research should focus on deep learning (DL) applications and the integration of these models.
Downloads
References
[2] Y. Ramakrishnaiah, N. Macesic, G. I. Webb, A. Y. Peleg and S. Tyagi, "EHR-QC: A streamlined pipeline for automated electronic health records standardisation and preprocessing to predict clinical outcomes," Journal of Biomedical Informatics, vol. 147, no. 104509, pp. 1-10, 2023.
[3] H. Niu, O. A. Omitaomu, M. A. Langston, M. Olama, O. Ozmen, H. B. Klasky, A. Laurio, M. Ward and J. Nebeker, "EHR-BERT: A BERT-based model for effective anomaly detection in electronic health records," Journal of Biomedical Informatics, vol. 150, no. 104605, pp. 1-11, 2024.
[4] D. Kotecha, F. W. Asselbergs, S. Achenbach, S. D. Anker, D. Atar, C. Baigent, A. Banerjee, B. Beger, G. Brobert, B. Casadei, C. Ceccarelli, M. R. Cowie, F. Crea, M. Cronin, S. Denaxas and A. Derix, "CODE-EHR best-practice framework for the use of structured electronic health-care records in clinical research," The Lancet Digital Health, vol. 4, no. 10, p. e698, 2022.
[5] D. A. Martinez, S. R. Levin, E. Y. Klein, C. R. Parikh, S. Menez, R. A. Taylor and J. S. Hinson, "Early Prediction of Acute Kidney Injury in the Emergency Department With Machine-Learning Methods Applied to Electronic Health Record Data," Annals of Emergency Medicine, vol. 76, no. 4, pp. 501-514, 2020.
[6] A. M. Cogan, T. M. Haltom, S. L. Shimada, B. P. McGinn and G. M. Fix, "Understanding patients' experiences during transitions from one electronic health record to another: A scoping review," PEC Innovation, vol. 4, no. 100258, pp. 1-5, 2024.
[7] B. J. H. Yarborough and S. P. Stumbo, "Patient perspectives on acceptability of, and implementation preferences for, use of electronic health records and machine learning to identify suicide risk," General Hospital Psychiatry, vol. 70, pp. 31-37, 2021.
[8] E. L. Eisenstein, M. N. Zozus, M. Y. Garza, H. J. Lanham, B. Adagarla, A. Walden, D. K. Benjamin, K. O. Zimmerman and K. R. Kumar, "Assessing clinical site readiness for electronic health record (EHR)-to-electronic data capture (EDC) automated data collection," Contemporary Clinical Trials, vol. 128, p. 107144, 2023.
[9] W. Hurst, B. Tekinerdogan, T. Alskaif, A. Boddy and N. Shone, "Securing electronic health records against insider-threats: A supervised machine learning approach," Smart Health, vol. 26, no. 100354, pp. 1-14, 2022.
[10] K. Shoenbill, Y. Song, M. Craven, H. Johnson, M. Smith and E. A. Mendonca, "Identifying patterns and predictors of lifestyle modification in electronic health record documentation using statistical and machine learning methods," Preventive Medicine, vol. 136, no. 106061, pp. 1-11, 2020.
[11] A. Banerjee, A. Dashtban, S. Chen, L. Pasea, J. H. Thygesen, G. Fatemifar, B. Tyl, T. Dyszynski, F. W. Asselbergs, L. H. Lund, T. Lumbers, S. Denaxas and H. Hemingway, "Identifying subtypes of heart failure from three electronic health record sources with machine learning: an external, prognostic, and genetic validation study," European Union Innovative Medicines, vol. 5, no. 378, pp. 1-10, 2023.
[12] C. Ye, J. Li, S. Hao, M. Liu, H. Jin, L. Zheng, M. Xia, B. Jin, C. Zhu, S. T. Alfreds and X. B. Ling, "Identification of elders at higher risk for fall with statewide electronic health records and a machine learning algorithm," International Journal of Medical Informatics, vol. 137, no. 104105, pp. 1-7, 2020.
[13] A. Srinivas and J. P. Mosiganti, "A brain stroke detection model using soft voting based ensemble machine learning classifier," Measurement: Sensors, vol. 29, no. 100871, pp. 1-7, 2023.
[14] V. Jaiswal, P. Saurabh, U. K. Lilhore, M. Pathak, S. Simaiya and S. Dalal, "A breast cancer risk predication and classification model with ensemble learning and big data fusion," Decision Analytics Journal, vol. 8, no. 100298, pp. 1-13, 2023.
[15] C. Ye, J. Li, S. Hao, M. Liu, H. Jin, L. Zheng, M. Xia, B. Jin and X. B. Ling, "Identification of elders at higher risk for fall with statewide electronic health records and a machine learning algorithm," International Journal of Medical Informatics, vol. 137, no. 104105, pp. 1-7, 2020.
[16] A. J. Weiss, A. S. Yadaw, D. L. Meretzky, M. A. Levin, D. H. Adams, K. McCardle, G. Pandey and R. Iyengar, "Machine learning using institution-specific multi-modal electronic health records improves mortality risk prediction for cardiac surgery patients," Risk Scores: Evolving Technology, vol. 14, pp. 214-251, 2023.
[17] A. M. McKnite, K. M. Job, R. Nelson, C. M. Sherwin, K. M. Watt and S. C. Brewer, "Medication based machine learning to identify subpopulations of pediatric hemodialysis patients in an electronic health record database," Informatics in Medicine Unlocked, vol. 34, no. 101104, pp. 1-8, 2022.
[18] Y.-x. Li, X.-p. Shen, C. Yang, Z.-z. Cao, R. Du, M.-d. Yu, J.-p. Wang and M. Wang, "Novel electronic health records applied for prediction of pre-eclampsia: Machine-learning algorithms," Pregnancy Hypertension: An International Journal of Women's Cardiovascular Health, vol. 26, no. 102-109, pp. 102-109, 2021.
[19] J. H. Rubenstein, S. Fontaine, P. W. MacDonald, J. A. Burns, R. R. Evans, M. E. Arasim, J. W. Chang, E. M. Firsht and A. K. Waljee, "Predicting Incident Adenocarcinoma of the Esophagus or Gastric Cardia Using Machine Learning of Electronic Health Records," Gastroenterology, vol. 165, p. 1420–1429, 2023.
[20] M. Herrero-Zazo, T. Fitzgerald, V. Taylor, H. Street, A. N. Chaudhry, J. R. Bradley, E. Birney and V. L. Keevil, "Using machine learning to model older adult inpatient trajectories from electronic health records data," iScience, vol. 26, no. 105876, pp. 1-15, 022.
[21] M. Sadikin, "Mendeley Data," 05 05 2020. [Online]. Available: https://data.mendeley.com/datasets/7kv3rctx7m/1. [Accessed Monday 05 2024].
[22] A. M. Cogan, T. M. Haltom, S. L. Shimada, J. A. Davila, B. P. McGinn and G. M. Fix, "Understanding patients' experiences during transitions from one electronic health record to another: A scoping review," PEC Innovation, vol. 4, no. 100258, pp. 1-5, 2024.
[23] M. Rafało, "Cross validation methods: Analysis based on diagnostics of thyroid cancer metastasis," ICT Express, vol. 8, p. 183–188, 2022.
[24] S. Kucheryavskiy, O. Rodionova and A. Pomerantsev, "Procrustes cross-validation of multivariate regression models," Analytica Chimica Acta, vol. 1255, no. 341096, pp. 1-10, 2023.
[25] M. T. R, V. K. V, D. K. V, O. Geman, M. Margala and M. Guduri, "The stratified K-folds cross-validation and class-balancing methods with high-performance ensemble classifiers for breast cancer classification," Healthcare Analytics, vol. 4, no. 100247, pp. 1-10, 2023.
[26] T. Wang, Y. Xie, Y.-S. Jeong and M. K. Jeong, "Dynamic sparse PCA: a dimensional reduction method for sensor data in virtual metrology," Expert Systems With Applications, vol. 251, no. 123995, pp. 1-12, 2024.
[27] L. C. D. Nkengfack, D. Tchiotsop, R. Atangana, B. S. Tchinda, V. Louis-Door and D. Wolf, "A comparison study of polynomial-based PCA, KPCA, LDA and GDA feature extraction methods for epileptic and eye states EEG signals detection using kernel machines," Informatics in Medicine Unlocked, vol. 26, no. 100721, pp. 1-16, 2021.
[28] A. Imakura, M. Kihira, Y. Okada and T. Sakurai, "Another use of SMOTE for interpretable data collaboration analysis," Expert Systems With Applications, vol. 228, no. 120385, pp. 1-10, 2023.
[29] A. Arafa, N. El-Fishawy, M. Badawy and M. Radad, "RN-SMOTE: Reduced Noise SMOTE based on DBSCAN for enhancing imbalanced data classification," Journal of King Saud University - Computer and Information Sciences, vol. 34, pp. 5059-5074, 2022.
[30] I. Czarnowski, "Weighted Ensemble with one-class Classification and Over-sampling and Instance selection (WECOI): An approach for learning from imbalanced data streams," Journal of Computational Science, vol. 61, no. 101614, pp. 1-10, 2022.
[31] M. Umer, S. Sadiq, M. M. S. Missen, Z. Hameed, Z. Aslam, M. A. Siddique and M. NAPPI, "Scientific papers citation analysis using textual features and SMOTE resampling techniques," Pattern Recognition Letters, vol. 150, p. 250–257, 2021.
[32] A. J. Weiss, A. S. Yadaw, D. L. Meretzky, M. A. Levin, D. H. Adams, K. McCardle, G. Pandey and R. Iyengar, "Machine learning using institution-specific multi-modal electronic health records improves mortality risk prediction for cardiac surgery patients," Evolving Technology, vol. 14, pp. 214-251, 2023.
[33] A. Khoder and F. Dornaika, "Ensemble learning via feature selection and multiple transformed subsets: Application to image classification," Applied Soft Computing, vol. 113, no. 108006, pp. 1-14, 2021.
[34] M. Ogunsanya, J. Isichei and S. Desaia, "Grid Search Hyperparameter Tuning in Additive Manufacturing Processes," Manufacturing Letters, vol. 35, pp. 031-1042, 2023.
[35] Y. Zhao, W. Zhang and X. Liu, "Grid search with a weighted error function: Hyper-parameter optimization for financial time series forecasting," Applied Soft Computing, vol. 154, no. 111362, pp. 1-20, 2024.
[36] B. Xi, G. Xiong, K. A. Kozin, C. He, T. S. Tamir, Y. Song, X. Liu and Z. Shen, "Modeling and random search optimization for the polysilicon CVD reactor," Results in Control and Optimization, vol. 13, no. 100320, pp. 1-10, 2023.
[37] J. M. Beinecke, a. Anders, T. Schurrat and D. Heider, "Evaluation of machine learning strategies for imaging confirmed prostate cancer recurrence prediction on electronic health records," Computers in Biology and Medicine, vol. 143, no. 105263, pp. 1-9, 2022.
[38] A. Theissler, M. Thomas, M. Burch and F. Gerschner, "ConfusionVis: Comparative evaluation and selection of multi-class classifiers based on confusion matrices," Knowledge-Based Systems, vol. 247, no. 108651, pp. 1-16, 2022.
[39] D. Valero-Carreras, J. Alcaraz and M. Landete, "Comparing two SVM models through different metrics based on the confusion matrix," Computers and Operations Research, vol. 152, no. 106131, pp. 1-12, 2023.
Copyright (c) 2024 Edi Ismanto, Abdul Fadlil, Anton Yudhana, Kodai Kitagawa
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).