A Comparative Analysis of SMOTE and ADASYN for Cervical Cancer Detection using XGBoost with MICE Imputation
Abstract
Cervical cancer remains a significant global health burden for women, with approximately 660,000 new cases and 350,000 associated deaths recorded worldwide in 2022. Machine learning methods have shown great promise in advancing timely detection and accurate diagnosis. This investigation compares two widely used oversampling strategies, Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN), applied to cervical cancer identification via the XGBoost classifier, paired with Multiple Imputation by Chained Equations (MICE) to handle incomplete data. The dataset consists of cervical cancer risk factors with four diagnostic outcomes: Hinselmann, Schiller, Cytology, and Biopsy, which are treated as independent binary classification tasks rather than a single multilabel classification problem. The process began by preparing a dataset of cervical cancer risk factors through MICE imputation, then applying SMOTE and ADASYN to address class imbalance. The XGBoost model is optimized using Random Search hyperparameter tuning and evaluated across train-test split ratios (50:50, 60:40, 70:30, 80:20, and 90:10) using accuracy, precision (macro, micro, weighted), recall (macro, micro, weighted), F1-score (macro, micro, weighted), and AUC metrics. The results indicated that the XGBoost setup with MICE and SMOTE outperformed the others, achieving 97.1% accuracy, 97.1% mic-precision, 97.1% mic-recall, 97.1% mic-F1, and 97.1% AUC. Meanwhile, the ADASYN-integrated model showed marginally lower results, with 95.4% accuracy, 95.4% micro-precision, 95.4% micro-recall, 95.4% micro-F1, and 55.5% AUC. SMOTE proved more adept at creating evenly distributed synthetic data for the underrepresented group. Overall, this work underscores the value of integrating MICE imputation, SMOTE oversampling, and tuned XGBoost as a reliable approach for cervical cancer detection. These insights pave the way for automated screening tools that can bolster clinical judgment and improve early diagnosis outcomes.
Downloads
References
L. W. Habtemariam, E. T. Zewde, and G. L. Simegn, “Cervix Type and Cervical Cancer [1] L. W. Habtemariam, E. T. Zewde, and G. L. Simegn, “Cervix Type and Cervical Cancer Classification System Using Deep Learning Techniques,” Medical Devices: Evidence and Research, vol. 15, pp. 163–176, 2022, doi: 10.2147/MDER.S366303.
J. J. Tanimu, M. Hamada, M. Hassan, H. A. Kakudi, and J. O. Abiodun, “A Machine Learning Method for Classification of Cervical Cancer,” Electronics (Switzerland), vol. 11, no. 3, Feb. 2022, doi: 10.3390/electronics11030463.
S. Umirzakova, S. Muksimova, J. Baltayev, and Y. I. Cho, “Force Map-Enhanced Segmentation of a Lightweight Model for the Early Detection of Cervical Cancer,” Diagnostics, vol. 15, no. 5, p. 513, Feb. 2025, doi: 10.3390/diagnostics15050513.
K. Wdowiak, A. Drab, P. Filipek, and U. Religioni, “The Assessment of Knowledge About Cervical Cancer, HPV Vaccinations, and Screening Programs Among Women as an Element of Cervical Cancer Prevention in Poland,” J Pers Med, vol. 14, no. 12, p. 1139, Dec. 2024, doi: 10.3390/jpm14121139.
N. Bhatla, D. Aoki, D. N. Sharma, and R. Sankaranarayanan, “Cancer of the cervix uteri: 2021 update,” International Journal of Gynecology and Obstetrics, vol. 155, no. S1, pp. 28–44, Oct. 2021, doi: 10.1002/ijgo.13865.
C. K. Maswanganye, P. P. Mkhize, and N. D. Matume, “Mapping the HPV Landscape in South African Women: A Systematic Review and Meta-Analysis of Viral Genotypes, Microbiota, and Immune Signals,” Dec. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/v16121893.
C. Buelens et al., “Experiences and Perceptions of Cervical Cancer Screening Using Self-Sampling among Under-Screened Women in Flanders,” Healthcare, vol. 12, no. 17, p. 1704, Aug. 2024, doi: 10.3390/healthcare12171704.
A. Sukhamwang, S. Inthanon, P. Dejkriengkraikul, T. Semangoen, and S. Yodkeeree, “Anti-Cancer Potential of Isoflavone-Enriched Fraction from Traditional Thai Fermented Soybean against Hela Cervical Cancer Cells,” Int J Mol Sci, vol. 25, no. 17, p. 9277, Aug. 2024, doi: 10.3390/ijms25179277.
P. E. Castle, “Looking Back, Moving Forward: Challenges and Opportunities for Global Cervical Cancer Prevention and Control,” Viruses, vol. 16, no. 9, p. 1357, Aug. 2024, doi: 10.3390/v16091357.
N. Al Mudawi and A. Alazeb, “A Model for Predicting Cervical Cancer Using Machine Learning Algorithms,” Sensors, vol. 22, no. 11, Jun. 2022, doi: 10.3390/s22114132.
S. Gil-Rojas et al., “Application of Machine Learning Techniques to Assess Alpha-Fetoprotein at Diagnosis of Hepatocellular Carcinoma,” Int J Mol Sci, vol. 25, no. 4, Feb. 2024, doi: 10.3390/ijms25041996.
Z. Liu, S. Zhang, H. Zhang, and X. Li, “A Study on Caregiver Activity Recognition for the Elderly at Home Based on the XGBoost Model,” Mathematics, vol. 12, no. 11, Jun. 2024, doi: 10.3390/math12111700.
Z. Shao, M. N. Ahmad, and A. Javed, “Comparison of Random Forest and XGBoost Classifiers Using Integrated Optical and SAR Features for Mapping Urban Impervious Surface,” Remote Sens (Basel), vol. 16, no. 4, Feb. 2024, doi: 10.3390/rs16040665.
R. Abedi, R. Costache, H. Shafizadeh-Moghadam, and Q. B. Pham, “Flash-flood susceptibility mapping based on XGBoost, random forest and boosted regression trees,” Geocarto Int, vol. 37, no. 19, pp. 5479–5496, 2022, doi: 10.1080/10106049.2021.1920636.
Y. Zhou, W. Shao, F. Nunziata, W. Wang, and C. Li, “An Algorithm to Retrieve Range Ocean Current Speed under Tropical Cyclone Conditions from Sentinel-1 Synthetic Aperture Radar Measurements Based on XGBoost,” Remote Sens (Basel), vol. 16, no. 17, Sep. 2024, doi: 10.3390/rs16173271.
K. Budholiya, S. K. Shrivastava, and V. Sharma, “An optimized XGBoost based diagnostic system for effective prediction of heart disease,” Journal of King Saud University - Computer and Information Sciences, vol. 34, no. 7, pp. 4514–4523, Jul. 2022, doi: 10.1016/j.jksuci.2020.10.013.
R. Wang, J. Zhang, B. Shan, M. He, and J. Xu, “XGBoost Machine Learning Algorithm for Prediction of Outcome in Aneurysmal Subarachnoid Hemorrhage,” Neuropsychiatr Dis Treat, vol. 18, pp. 659–667, 2022, doi: 10.2147/NDT.S349956.
E. Elgeldawi, A. Sayed, A. R. Galal, and A. M. Zaki, “Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis,” Informatics, vol. 8, no. 4, Dec. 2021, doi: 10.3390/informatics8040079.
T. Kee and W. K. O. Ho, “Optimizing Machine Learning Models for Urban Sciences: A Comparative Analysis of Hyperparameter Tuning Methods,” Urban Science, vol. 9, no. 9, p. 348, Aug. 2025, doi: 10.3390/urbansci9090348.
A. R. M. Rom, N. Jamil, and S. Ibrahim, “Multi objective hyperparameter tuning via random search on deep learning models,” Telkomnika (Telecommunication Computing Electronics and Control), vol. 22, no. 4, pp. 956–968, Aug. 2024, doi: 10.12928/TELKOMNIKA.v22i4.25847.
D. A. Anggoro and S. S. Mukti, “Performance Comparison of Grid Search and Random Search Methods for Hyperparameter Tuning in Extreme Gradient Boosting Algorithm to Predict Chronic Kidney Failure,” International Journal of Intelligent Engineering and Systems, vol. 14, no. 6, pp. 198–207, Dec. 2021, doi: 10.22266/ijies2021.1231.19.
T. Kavzoglu and A. Teke, “Advanced hyperparameter optimization for improved spatial prediction of shallow landslides using extreme gradient boosting (XGBoost),” Bulletin of Engineering Geology and the Environment, vol. 81, no. 5, May 2022, doi: 10.1007/s10064-022-02708-w.
Y. Kim, S. Steen, and H. Muri, “A novel method for estimating missing values in ship principal data,” Ocean Engineering, vol. 251, May 2022, doi: 10.1016/j.oceaneng.2022.110979.
C. Ribeiro and A. A. Freitas, “A data-driven missing value imputation approach for longitudinal datasets,” Artif Intell Rev, vol. 54, no. 8, pp. 6277–6307, Dec. 2021, doi: 10.1007/s10462-021-09963-5.
D. S. Lee and S. Y. Son, “Weighted Average Ensemble-Based PV Forecasting in a Limited Environment with Missing Data of PV Power,” Sustainability (Switzerland) , vol. 16, no. 10, May 2024, doi: 10.3390/su16104069.
M. Mera-Gaona, U. Neumann, R. Vargas-Canas, and D. M. López, “Evaluating the impact of multivariate imputation by MICE in feature selection,” PLoS One, vol. 16, no. 7 July, Jul. 2021, doi: 10.1371/journal.pone.0254720.
J. Yu, R. Pan, and Y. Zhao, “High-Dimensional, Small-Sample Product Quality Prediction Method Based on MIC-Stacking Ensemble Learning,” Applied Sciences (Switzerland), vol. 12, no. 1, Jan. 2022, doi: 10.3390/app12010023.
D. Lee and K. Kim, “An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data,” Expert Syst Appl, vol. 184, Dec. 2021, doi: 10.1016/j.eswa.2021.115442.
A. S. Tarawneh, A. B. Hassanat, G. A. Altarawneh, and A. Almuhaimeed, “Stop Oversampling for Class Imbalance Learning: A Review,” IEEE Access, vol. 10, pp. 47643–47660, 2022, doi: 10.1109/ACCESS.2022.3169512.
F. Duan, S. Zhang, Y. Yan, and Z. Cai, “An Oversampling Method of Unbalanced Data for Mechanical Fault Diagnosis Based on MeanRadius-SMOTE,” Sensors, vol. 22, no. 14, Jul. 2022, doi: 10.3390/s22145166.
N. Anđelić and S. Baressi Šegota, “Achieving High Accuracy in Android Malware Detection through Genetic Programming Symbolic Classifier,” Computers, vol. 13, no. 8, Aug. 2024, doi: 10.3390/computers13080197.
P. Gnip, L. Vokorokos, and P. Drotár, “Selective oversampling approach for strongly imbalanced data,” PeerJ Comput Sci, vol. 7, pp. 1–22, 2021, doi: 10.7717/PEERJ-CS.604.
T. K. Dang, T. C. Tran, L. M. Tuan, and M. V. Tiep, “Machine learning based on resampling approaches and deep reinforcement learning for credit card fraud detection systems,” Applied Sciences (Switzerland), vol. 11, no. 21, Nov. 2021, doi: 10.3390/app112110004.
C. D. Nguyen, J. B. Carlin, and K. J. Lee, “Practical strategies for handling breakdown of multiple imputation procedures,” Emerg Themes Epidemiol, vol. 18, no. 1, Dec. 2021, doi: 10.1186/s12982-021-00095-3.
H. El Azhari et al., “Predicting the Production and Depletion of Rare Earth Elements and Their Influence on Energy Sector Sustainability through the Utilization of Multilevel Linear Prediction Mixed-Effects Models with R Software,” Sustainability, vol. 16, no. 5, p. 1951, Feb. 2024, doi: 10.3390/su16051951.
H. S. Laqueur, A. B. Shev, and R. M. C. Kagawa, “SuperMICE: An Ensemble Machine Learning Approach to Multiple Imputation by Chained Equations,” Am J Epidemiol, vol. 191, no. 3, pp. 516–525, Mar. 2022, doi: 10.1093/aje/kwab271.
L. J. Beesley and J. M. G. Taylor, “A stacked approach for chained equations multiple imputation incorporating the substantive model,” Biometrics, vol. 77, no. 4, pp. 1342–1354, Dec. 2021, doi: 10.1111/biom.13372.
J. K. Essel, J. A. Mensah, E. Ocran, and L. Asiedu, “On the search for efficient face recognition algorithm subject to multiple environmental constraints,” Heliyon, vol. 10, no. 7, Apr. 2024, doi: 10.1016/j.heliyon.2024.e28568.
N. U. Okafor and D. T. Delaney, “Missing Data Imputation on IoT Sensor Networks: Implications for on-Site Sensor Calibration,” IEEE Sens J, vol. 21, no. 20, pp. 22833–22845, Oct. 2021, doi: 10.1109/JSEN.2021.3105442.
F. B. Hamzah, F. Mohamad Hamzah, S. F. Mohd Razali, and A. El-Shafie, “Multiple imputations by chained equations for recovering missing daily streamflow observations: a case study of Langat River basin in Malaysia,” Hydrological Sciences Journal, vol. 67, no. 1, pp. 137–149, 2022, doi: 10.1080/02626667.2021.2001471.
F. B. Hamzah, F. M. Hamzah, S. F. M. Razali, and H. Samad, “A comparison of multiple imputation methods for recovering missing data in hydrological studies,” Civil Engineering Journal (Iran), vol. 7, no. 9, pp. 1608–1619, Sep. 2021, doi: 10.28991/cej-2021-03091747.
L. Song and G. Guo, “Full Information Multiple Imputation for Linear Regression Model with Missing Response Variable,” IAENG International Journal of Applied Mathematics, vol. 54, no. 1, pp. 77–81, 2024.
R. K. Kim et al., “Data integration of National Dose Registry and survey data using multivariate imputation by chained equations,” PLoS One, vol. 17, no. 6 June, Jun. 2022, doi: 10.1371/journal.pone.0261534.
N. A. M. Pauzi, Y. B. Wah, S. M. Deni, S. K. N. A. Rahim, and Suhartono, “Comparison of single and mice imputation methods for missing values: A simulation study,” Pertanika J Sci Technol, vol. 29, no. 2, pp. 979–998, 2021, doi: 10.47836/pjst.29.2.15.
G. Liu, T. Zhang, H. Dai, X. Cheng, and D. Yang, “ResInceptNet-SA: A Network Traffic Intrusion Detection Model Fusing Feature Selection and Balanced Datasets,” Applied Sciences (Switzerland), vol. 15, no. 2, Jan. 2025, doi: 10.3390/app15020956.
M. L. Ali, K. Thakur, S. Schmeelk, J. Debello, and D. Dragos, “Deep Learning vs. Machine Learning for Intrusion Detection in Computer Networks: A Comparative Study,” Applied Sciences, vol. 15, no. 4, p. 1903, Feb. 2025, doi: 10.3390/app15041903.
J. Fonseca, G. Douzas, and F. Bacao, “Improving imbalanced land cover classification with k-means smote: Detecting and oversampling distinctive minority spectral signatures,” Information (Switzerland), vol. 12, no. 7, Jul. 2021, doi: 10.3390/info12070266.
T. Wongvorachan, S. He, and O. Bulut, “A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining,” Information (Switzerland), vol. 14, no. 1, Jan. 2023, doi: 10.3390/info14010054.
I. Ul Hassan, R. H. Ali, Z. Ul Abideen, T. A. Khan, and R. Kouatly, “Significance of Machine Learning for Detection of Malicious Websites on an Unbalanced Dataset,” Digital, vol. 2, no. 4, pp. 501–519, Dec. 2022, doi: 10.3390/digital2040027.
M. Alrumaidhi, M. M. G. Farag, and H. A. Rakha, “Comparative Analysis of Parametric and Non-Parametric Data-Driven Models to Predict Road Crash Severity among Elderly Drivers Using Synthetic Resampling Techniques,” Sustainability (Switzerland), vol. 15, no. 13, Jul. 2023, doi: 10.3390/su15139878.
C. Azad, B. Bhushan, R. Sharma, A. Shankar, K. K. Singh, and A. Khamparia, “Prediction model using SMOTE, genetic algorithm and decision tree (PMSGD) for classification of diabetes mellitus,” in Multimedia Systems, Springer Science and Business Media Deutschland GmbH, Aug. 2022, pp. 1289–1307. doi: 10.1007/s00530-021-00817-2.
Z. Xu, D. Shen, T. Nie, Y. Kou, N. Yin, and X. Han, “A cluster-based oversampling algorithm combining SMOTE and k-means for imbalanced medical data,” Inf Sci (N Y), vol. 572, pp. 574–589, Sep. 2021, doi: 10.1016/j.ins.2021.02.056.
J. Li, Q. Zhu, Q. Wu, and Z. Fan, “A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors,” Inf Sci (N Y), vol. 565, pp. 438–455, Jul. 2021, doi: 10.1016/j.ins.2021.03.041.
S. P. Kenaka, A. Cakravastia, A. Ma’ruf, and R. T. Cahyono, “Enhancing Intermittent Spare Part Demand Forecasting: A Novel Ensemble Approach with Focal Loss and SMOTE,” Logistics, vol. 9, no. 1, p. 25, Feb. 2025, doi: 10.3390/logistics9010025.
J. Liu, F. Tian, A. Zhao, W. Zheng, and W. Cao, “Logging Lithology Discrimination with Enhanced Sampling Methods for Imbalance Sample Conditions,” Applied Sciences (Switzerland), vol. 14, no. 15, Aug. 2024, doi: 10.3390/app14156534.
S. Wang, Y. Dai, J. Shen, and J. Xuan, “Research on expansion and classification of imbalanced data based on SMOTE algorithm,” Sci Rep, vol. 11, no. 1, Dec. 2021, doi: 10.1038/s41598-021-03430-5.
Y. Fu, Y. Du, Z. Cao, Q. Li, and W. Xiang, “A Deep Learning Model for Network Intrusion Detection with Imbalanced Data,” Electronics (Switzerland), vol. 11, no. 6, Mar. 2022, doi: 10.3390/electronics11060898.
S. K. Ganesan, P. Velusamy, S. Rajendran, R. Sakthivel, M. Bose, and B. S. Inbaraj, “ZooCNN: A Zero-Order Optimized Convolutional Neural Network for Pneumonia Classification Using Chest Radiographs,” J Imaging, vol. 11, no. 1, Jan. 2025, doi: 10.3390/jimaging11010022.
M. R. Askari, M. Abdel-Latif, M. Rashid, M. Sevil, and A. Cinar, “Detection and Classification of Unannounced Physical Activities and Acute Psychological Stress Events for Interventions in Diabetes Treatment,” Algorithms, vol. 15, no. 10, Oct. 2022, doi: 10.3390/a15100352.
M. Glučina, A. Lorencin, N. Anđelić, and I. Lorencin, “Cervical Cancer Diagnostics Using Machine Learning Algorithms and Class Balancing Techniques,” Applied Sciences (Switzerland), vol. 13, no. 2, Jan. 2023, doi: 10.3390/app13021061.
R. Wang, F. Liu, and Y. Bai, “A Software Defect Prediction Method That Simultaneously Addresses Class Overlap and Noise Issues after Oversampling,” Electronics (Switzerland), vol. 13, no. 20, Oct. 2024, doi: 10.3390/electronics13203976.
R. M. Saputra et al., “Improving Cervical Cancer Classification Using ADASYN and Random Forest with GridSearchCV Optimization,” vol. 16, no. 01, 2025, doi: 10.35970/infotekmesin.v16i1.2552.
D. Chen, W. Li, and J. Fang, “Blending-Based Ensemble Learning Low-Voltage Station Area Theft Detection,” Energies (Basel), vol. 18, no. 1, Jan. 2025, doi: 10.3390/en18010031.
A. Akinjole, O. Shobayo, J. Popoola, O. Okoyeigbo, and B. Ogunleye, “Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction,” Mathematics, vol. 12, no. 21, Nov. 2024, doi: 10.3390/math12213423.
S. Chalichalamala, N. Govindan, and R. Kasarapu, “Logistic Regression Ensemble Classifier for Intrusion Detection System in Internet of Things,” Sensors, vol. 23, no. 23, Dec. 2023, doi: 10.3390/s23239583.
V. R. Joseph, “Optimal ratio for data splitting,” Stat Anal Data Min, vol. 15, no. 4, pp. 531–538, Aug. 2022, doi: 10.1002/sam.11583.
Q. H. Nguyen et al., “Influence of data splitting on performance of machine learning models in prediction of shear strength of soil,” Math Probl Eng, vol. 2021, 2021, doi: 10.1155/2021/4832864.
Y. Aydın, C. Cakiroglu, G. Bekdaş, and Z. W. Geem, “Explainable Ensemble Learning and Multilayer Perceptron Modeling for Compressive Strength Prediction of Ultra-High-Performance Concrete,” Biomimetics, vol. 9, no. 9, p. 544, Sep. 2024, doi: 10.3390/biomimetics9090544.
X. Zhu, J. Chu, K. Wang, S. Wu, W. Yan, and K. Chiam, “Prediction of rockhead using a hybrid N-XGBoost machine learning framework,” Journal of Rock Mechanics and Geotechnical Engineering, vol. 13, no. 6, pp. 1231–1245, Dec. 2021, doi: 10.1016/j.jrmge.2021.06.012.
A. Asselman, M. Khaldi, and S. Aammou, “Enhancing the prediction of student performance based on the machine learning XGBoost algorithm,” Interactive Learning Environments, vol. 31, no. 6, pp. 3360–3379, 2023, doi: 10.1080/10494820.2021.1928235.
M. Ma et al., “XGBoost-based method for flash flood risk assessment,” J Hydrol (Amst), vol. 598, Jul. 2021, doi: 10.1016/j.jhydrol.2021.126382.
S. Thongsuwan, S. Jaiyen, A. Padcharoen, and P. Agarwal, “ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost,” Nuclear Engineering and Technology, vol. 53, no. 2, pp. 522–531, Feb. 2021, doi: 10.1016/j.net.2020.04.008.
G. Qi and B. Liu, “Production Feature Analysis of Global Onshore Carbonate Oil Reservoirs Based on XGBoost Classier,” Processes, vol. 12, no. 6, Jun. 2024, doi: 10.3390/pr12061137.
Y. Wang et al., “Short-term load forecasting of industrial customers based on SVMD and XGBoost,” International Journal of Electrical Power and Energy Systems, vol. 129, Jul. 2021, doi: 10.1016/j.ijepes.2021.106830.
N. H. Tiep et al., “A New Hyperparameter Tuning Framework for Regression Tasks in Deep Neural Network: Combined-Sampling Algorithm to Search the Optimized Hyperparameters,” Mathematics, vol. 12, no. 24, Dec. 2024, doi: 10.3390/math12243892.
B. Bischl et al., “Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,” Mar. 01, 2023, John Wiley and Sons Inc. doi: 10.1002/widm.1484.
R. Hossain and D. Timmer, “Machine Learning Model Optimization with Hyper Parameter Tuning Approach,” 2021.
R. Valarmathi and T. Sheela, “Heart disease prediction using hyper parameter optimization (HPO) tuning,” Biomed Signal Process Control, vol. 70, Sep. 2021, doi: 10.1016/j.bspc.2021.103033.
X. Zhong et al., “Automatic Classification of All-Sky Nighttime Cloud Images Based on Machine Learning,” Electronics (Switzerland), vol. 13, no. 8, Apr. 2024, doi: 10.3390/electronics13081503.
M. K. Suryadi, R. Herteno, S. W. Saputro, M. R. Faisal, and R. A. Nugroho, “Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 137–147, Mar. 2024, doi: 10.35882/jeeemi.v6i2.375.
H. Ghinaya, R. Herteno, M. R. Faisal, A. Farmadi, and F. Indriani, “Analysis of Important Features in Software Defect Prediction Using Synthetic Minority Oversampling Techniques (SMOTE), Recursive Feature Elimination (RFE) and Random Forest,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 3, pp. 276–288, May 2024, doi: 10.35882/jeeemi.v6i3.453.
M. M. Muraru, Z. Simó, and L. B. Iantovics, “Cervical Cancer Prediction Based on Imbalanced Data Using Machine Learning Algorithms with a Variety of Sampling Methods,” Applied Sciences (Switzerland), vol. 14, no. 22, Nov. 2024, doi: 10.3390/app142210085.
M. C. Hinojosa Lee, J. Braet, and J. Springael, “Performance Metrics for Multilabel Emotion Classification: Comparing Micro, Macro, and Weighted F1-Scores,” Applied Sciences (Switzerland), vol. 14, no. 21, Nov. 2024, doi: 10.3390/app14219863.
Ž. Vujović, “Classification Model Evaluation Metrics,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 6, pp. 599–606, 2021, doi: 10.14569/IJACSA.2021.0120670.
I. Markoulidakis, I. Rallis, I. Georgoulas, G. Kopsiaftis, A. Doulamis, and N. Doulamis, “Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem,” Technologies (Basel), vol. 9, no. 4, Dec. 2021, doi: 10.3390/technologies9040081.
N. Aida, T. H. Saragih, D. Kartini, R. A. Nugroho, and D. T. Nugrahadi, “Comparison of Extreme Machine Learning and Hidden Markov Model Algorithm in Predicting The Recurrence Of Differentiated Thyroid Cancer Using SMOTE,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 4, pp. 429–444, Oct. 2024, doi: 10.35882/jeeemi.v6i4.467.
S. Farhadpour, T. A. Warner, and A. E. Maxwell, “Selecting and Interpreting Multiclass Loss and Accuracy Assessment Metrics for Classifications with Class Imbalance: Guidance and Best Practices,” Remote Sens (Basel), vol. 16, no. 3, Feb. 2024, doi: 10.3390/rs16030533.
K. Takahashi, K. Yamamoto, A. Kuchiba, and T. Koyama, “Confidence interval for micro-averaged F 1 and macro-averaged F 1 scores,” Applied Intelligence, vol. 52, no. 5, pp. 4961–4972, Mar. 2022, doi: 10.1007/s10489-021-02635-5.
N. Namdev and N. Tomar, “Ad Click Prediction: A Comparative Evaluation of Logistic Regression and Performance Metrics,” Int J Res Appl Sci Eng Technol, vol. 11, no. 7, pp. 1514–1523, Jul. 2023, doi: 10.22214/ijraset.2023.54914.
F. S. Nahm, “Receiver operating characteristic curve: overview and practical use for clinicians,” Korean J Anesthesiol, vol. 75, no. 1, pp. 25–36, Feb. 2022, doi: 10.4097/kja.21209.
U. K. Lilhore et al., “Hybrid Model for Detection of Cervical Cancer Using Causal Analysis and Machine Learning Techniques,” Comput Math Methods Med, vol. 2022, 2022, doi: 10.1155/2022/4688327.
Copyright (c) 2026 Mita Azzahra Ramadhan, Triando Hamonangan Saragih, Dwi Kartini, Muliadi Muliadi, Muhammad Itqan Mazdadi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).


.png)
.png)
.png)
.png)
.png)