Liver Cirrhosis Classification using Extreme Gradient Boosting Classifier and Harris Hawk Optimization as Hyperparameter Tuning

Lista Tri  Nalasari; Syaiful Anam; Nur  Shofianah

doi:10.35882/jeeemi.v7i2.730

Lista Tri Nalasari Mathematics Department, Faculty of Mathematics and Natural Sciences, Brawijaya University, Malang, Indonesia https://orcid.org/0009-0008-2914-8582
Syaiful Anam Mathematics Department, Faculty of Mathematics and Natural Sciences, Brawijaya University, Malang, Indonesia https://orcid.org/0000-0002-6627-0084
Nur Shofianah Mathematics Department, Faculty of Mathematics and Natural Sciences, Brawijaya University, Malang, Indonesia https://orcid.org/0000-0002-6708-8698

DOI: https://doi.org/10.35882/jeeemi.v7i2.730

Keywords: XGBoost, Harris Hawk Optimizer, Hyperparameter Tuning, Liver Cirrhosis

Abstract

This study proposes an early diagnosis model based on Machine Learning for liver cirrhosis classification using the Hepatitis C dataset, which is the leading cause of cirrhosis, from UCI ML. The classification is performed using the XGBoost algorithm because it provides high accuracy and time efficiency based on previous studies. However, these advantages depend on the combination of its hyperparameters set. XGBoost has a large number of hyperparameters, which can be time-consuming for researchers to manually configure. Therefore, this study proposes combining XGBoost with the Harris Hawks Optimization (HHO) algorithm for hyperparameter tuning. HHO is implemented with a hawk population of 40 and maximum iterations set at 25. The proposed XGBoost-HHO model provides an average performance of 99.34% for accuracy, MAR, MAP and 99.33% for Macro F1-score. These performances are achieved with the shortest processing time across 25 experiments compared to other combination models. The performance of the XGBoost-HHO model shows more significant increase in performance and reduction in overfitting compared to the standard XGBoost, SVM, RF models, as well as several other combined models including RF-HHO, SVM-HHO, XGBoost-PSO, and XGBoost-BA. Additionally, based on the feature importance analysis of the XGBoost-HHO algorithm, Alanine Aminotransferase (ALT), Protein, and Gamma-glutamyltransferase (GGT) contribute the most to the classification process, with gain values of 11.21, 9.51, and 7.98, respectively. Overall, the findings of this study show that the XGBoost-HHO algorithm combination provides competitive performance and can serve as an excellent alternative for liver cirrhosis classification in terms of both accuracy and time efficiency.

Downloads

Download data is not yet available.

References

R. Islam, A. Sultana, and M. N. Tuhin, “A comparative analysis of machine learning algorithms with tree-structured parzen estimator for liver disease prediction,” Healthc. Anal., vol. 6, no. August, p. 100358, 2024, doi: 10.1016/j.health.2024.100358.

D. Badvath, A. safali Miriyala, S. chaitanya K. Gunupudi, and P. V. K. Kuricheti, “ONBLR: An effective optimized ensemble ML approach for classifying liver cirrhosis disease,” Biomed. Signal Process. Control, vol. 89, no. January, p. 105882, 2024, doi: 10.1016/j.bspc.2023.105882.

H. Enomoto et al., “Transition in the etiology of liver cirrhosis in Japan: a nationwide survey,” J. Gastroenterol., vol. 55, no. 3, pp. 353–362, 2020, doi: 10.1007/s00535-019-01645-y.

Z. Wang et al., “Clinical prediction of HBV-associated cirrhosis using machine learning based on platelet and bile acids,” Clin. Chim. Acta, vol. 551, no. July, p. 117589, 2023, doi: 10.1016/j.cca.2023.117589.

CDC, “Underlying Cause of Death, 2018-2022,” 2024.

World Health Organization (WHO), “Liver Cirrhosis, age-standardized death rates (15+), per 100.000 population,” 2024.

X. Kuang et al., “Transcriptomic and Metabolomic Analysis of Liver Cirrhosis,” pp. 922–932, 2024, doi: 10.2174/1386207326666230717094936.

I. Hanif and M. M. Khan, “Liver Cirrhosis Prediction using Machine Learning Approaches,” 2022 IEEE 13th Annu. Ubiquitous Comput. Electron. Mob. Commun. Conf. UEMCON 2022, no. 2019, pp. 28–34, 2022, doi: 10.1109/UEMCON54665.2022.9965718.

R. Bhardwaj, R. Mehta, and P. Ramani, “A Comparative Study of Classification Algorithms for Predicting Liver Disorders BT - Intelligent Computing Techniques for Smart Energy Systems,” A. Kalam, K. R. Niazi, A. Soni, S. A. Siddiqui, and A. Mundra, Eds., Singapore: Springer Singapore, 2020, pp. 753–760.

Y. Shinde, A. Kenchappagol, and S. Mishra, “Comparative Study of Machine Learning Algorithms for Breast Cancer Classification BT - Intelligent and Cloud Computing,” D. Mishra, R. Buyya, P. Mohapatra, and S. Patnaik, Eds., Singapore: Springer Nature Singapore, 2022, pp. 545–554.

K. R. Singh, R. Gupta, R. K. Kadian, and R. Singh, “An Optimized XGBoost approach for Predicting Progression of Hepatitis C using Hyperparameter Tuning and Feature Interaction Constraint,” 2022 2nd Asian Conf. Innov. Technol. ASIANCON 2022, pp. 1–8, 2022, doi: 10.1109/ASIANCON55314.2022.9909086.

A. Ogunleye and Q. G. Wang, “XGBoost Model for Chronic Kidney Disease Diagnosis,” IEEE/ACM Trans. Comput. Biol. Bioinforma., vol. 17, no. 6, pp. 2131–2140, 2020, doi: 10.1109/TCBB.2019.2911071.

P. Zhang, Y. Jia, and Y. Shang, “Research and application of XGBoost in imbalanced data,” Int. J. Distrib. Sens. Networks, vol. 18, no. 6, 2022, doi: 10.1177/15501329221106935.

B. Bischl et al., “Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 13, no. 2, pp. 1–43, 2023, doi: 10.1002/widm.1484.

L. Yang and A. Shami, “On hyperparameter optimization of machine learning algorithms: Theory and practice,” Neurocomputing, vol. 415, pp. 295–316, 2020, doi: 10.1016/j.neucom.2020.07.061.

A. A. Heidari, S. Mirjalili, H. Faris, I. Aljarah, M. Mafarja, and H. Chen, “Harris hawks optimization: Algorithm and applications,” Futur. Gener. Comput. Syst., vol. 97, pp. 849–872, 2019, doi: 10.1016/j.future.2019.02.028.

L. Duan, M. Wu, and Q. Wang, “Predicting the CPT-based pile set-up parameters using HHO-RF and WOA-RF hybrid models,” Arab. J. Geosci., vol. 15, no. 7, 2022, doi: 10.1007/s12517-022-09843-4.

M. M. K. Kazemi, Z. Nabavi, and D. J. Armaghani, “A novel Hybrid XGBoost Methodology in Predicting Penetration Rate of Rotary Based on Rock-Mass and Material Properties,” Arab. J. Sci. Eng., vol. 49, no. 4, pp. 5225–5241, 2024, doi: 10.1007/s13369-023-08360-0.

M. A. Kalas, L. Chavez, M. Leon, P. T. Taweesedt, and S. Surani, “Abnormal liver enzymes: A review for clinicians,” World J. Hepatol., vol. 13, no. 11, pp. 1688–1698, 2021, doi: 10.4254/wjh.v13.i11.1688.

K. Sumwiza, C. Twizere, G. Rushingabigwi, P. Bakunzibake, and P. Bamurigire, “Enhanced cardiovascular disease prediction model using random forest algorithm,” Informatics Med. Unlocked, vol. 41, no. July, p. 101316, 2023, doi: 10.1016/j.imu.2023.101316.

A. Alizargar, Y. L. Chang, and T. H. Tan, “Performance Comparison of Machine Learning Approaches on Hepatitis C Prediction Employing Data Mining Techniques,” Bioengineering, vol. 10, no. 4, 2023, doi: 10.3390/bioengineering10040481.

S. Raschka, Y. (Hayden) Liu, and V. Mirjalili, Machine Learning with PyTorch and Scikit Learn. 2022.

M. Ahmed Ouameur, M. Caza-Szoka, and D. Massicotte, “Machine learning enabled tools and methods for indoor localization using low power wireless network,” Internet of Things (Netherlands), vol. 12, p. 100300, 2020, doi: 10.1016/j.iot.2020.100300.

H. Benhar, A. Idri, and J. L Fernández-Alemán, “Data preprocessing for heart disease classification: A systematic literature review.,” Comput. Methods Programs Biomed., vol. 195, 2020, doi: 10.1016/j.cmpb.2020.105635.

E. Pusporani, S. Qomariyah, and I. Irhamah, “Klasifikasi Pasien Penderita Penyakit Liver dengan Pendekatan Machine Learning,” Inferensi, vol. 2, no. 1, p. 25, 2019, doi: 10.12962/j27213862.v2i1.6810.

Q. H. Doan, S. H. Mai, Q. T. Do, and D. K. Thai, “A cluster-based data splitting method for small sample and class imbalance problems in impact damage classification[Formula presented],” Appl. Soft Comput., vol. 120, p. 108628, 2022, doi: 10.1016/j.asoc.2022.108628.

E. C. Gök and M. O. Olgun, “SMOTE-NC and gradient boosting imputation based random forest classifier for predicting severity level of covid-19 patients with blood samples,” Neural Comput. Appl., vol. 33, no. 22, pp. 15693–15707, 2021, doi: 10.1007/s00521-021-06189-y.

A. Al Ahad, B. Das, M. R. Khan, N. Saha, A. Zahid, and M. Ahmad, “Multiclass liver disease prediction with adaptive data preprocessing and ensemble modeling,” Results Eng., vol. 22, no. February, p. 102059, 2024, doi: 10.1016/j.rineng.2024.102059.

R. Valarmathi and T. Sheela, “Heart disease prediction using hyper parameter optimization (HPO) tuning,” Biomed. Signal Process. Control, vol. 70, no. March, p. 103033, 2021, doi: 10.1016/j.bspc.2021.103033.

S. Li and X. Zhang, “Research on orthopedic auxiliary classification and prediction model based on XGBoost algorithm,” Neural Comput. Appl., vol. 32, no. 7, pp. 1971–1979, 2020, doi: 10.1007/s00521-019-04378-4.

K. Budholiya, S. K. Shrivastava, and V. Sharma, “An optimized XGBoost based diagnostic system for effective prediction of heart disease,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 7, pp. 4514–4523, 2022, doi: 10.1016/j.jksuci.2020.10.013.

S. Singh, S. K. Patro, and S. K. Parhi, “Evolutionary optimization of machine learning algorithm hyperparameters for strength prediction of high-performance concrete,” Asian J. Civ. Eng., vol. 24, no. 8, pp. 3121–3143, 2023, doi: 10.1007/s42107-023-00698-y.

D. A. Anggoro and S. S. Mukti, “Performance Comparison of Grid Search and Random Search Methods for Hyperparameter Tuning in Extreme Gradient Boosting Algorithm to Predict Chronic Kidney Failure,” Int. J. Intell. Eng. Syst., vol. 14, no. 6, pp. 198–207, 2021, doi: 10.22266/ijies2021.1231.19.

K. D. Chaudhuri and B. Alkan, “A hybrid extreme learning machine model with harris hawks optimisation algorithm: an optimised model for product demand forecasting applications,” Appl. Intell., vol. 52, no. 10, pp. 11489–11505, 2022, doi: 10.1007/s10489-022-03251-7.

S. Gupta, K. Deep, A. A. Heidari, H. Moayedi, and M. Wang, “Opposition-based learning Harris hawks optimization with advanced transition rules: principles and analysis,” Expert Syst. Appl., vol. 158, p. 113510, 2020, doi: 10.1016/j.eswa.2020.113510.

H. Kang, R. Liu, Y. Yao, and F. Yu, “Improved Harris hawks optimization for non-convex function optimization and design optimization problems,” Math. Comput. Simul., vol. 204, pp. 619–639, 2023, doi: 10.1016/j.matcom.2022.09.010.

K. Ali, Z. A. Shaikh, A. A. Khan, and A. A. Laghari, “Multiclass skin cancer classification using EfficientNets – a first step towards preventing skin cancer,” Neurosci. Informatics, vol. 2, no. 4, p. 100034, 2022, doi: 10.1016/j.neuri.2021.100034.

C. A. D. Lestari, S. Anam, and U. Sa’adah, Tomato Leaf Disease Classification with Optimized Hyperparameter: A DenseNet-PSO Approach, no. Icamsac 2023. Atlantis Press International BV, 2024. doi: 10.2991/978-94-6463-413-6_23.

M. Grandini, E. Bagli, and G. Visani, “Metrics for Multi-Class Classification: an Overview,” pp. 1–17, 2020, [Online]. Available: http://arxiv.org/abs/2008.05756

Z. Jiang, J. Che, M. He, and F. Yuan, “A CGRU multi-step wind speed forecasting model based on multi-label specific XGBoost feature selection and secondary decomposition,” Renew. Energy, vol. 203, no. November 2022, pp. 802–827, 2023, doi: 10.1016/j.renene.2022.12.124.

T. H. S. Li, H. J. Chiu, and P. H. Kuo, “Hepatitis C Virus Detection Model by Using Random Forest, Logistic-Regression and ABC Algorithm,” IEEE Access, vol. 10, no. August, pp. 91045–91058, 2022, doi: 10.1109/ACCESS.2022.3202295.

L. Abualigah, “Particle Swarm Optimization: Advances, Applications, and Experimental Insights,” Comput. Mater. Contin., vol. 82, no. 2, pp. 1539–1592, 2025, doi: 10.32604/cmc.2025.060765.

M. Z. Rehman, M. Aamir, N. M. Nawi, A. Khan, S. A. Lashari, and S. Khan, “An Optimized Neural Network with Bat Algorithm for DNA Sequence Classification,” Comput. Mater. Contin., vol. 73, no. 1, pp. 493–511, 2022, doi: 10.32604/cmc.2022.021787.

M. Shehab et al., “Harris Hawks Optimization Algorithm: Variants and Applications,” Arch. Comput. Methods Eng., vol. 29, no. 7, pp. 5579–5603, 2022, doi: 10.1007/s11831-022-09780-1.

S. Dalal, E. M. Onyema, and A. Malik, “Hybrid XGBoost model with hyperparameter tuning for prediction of liver disease with better accuracy,” World J. Gastroenterol., vol. 28, no. 46, pp. 6551–6563, 2022, doi: 10.3748/wjg.v28.i46.6551.