Refining Diabetes Diagnosis Models: The Impact of SMOTE on SVM, Logistic Regression, and Naïve Bayes
Abstract
Accurate diabetes classification is a significant challenge in medical diagnostics, especially in imbalanced datasets. This study addresses this issue by introducing A New Modified Weighted SMOTE (ANMWS), integrated with Priority of Attribute by Expert Judgement (PAEJ) framework, to enhance the performance of machine learning models for imbalanced data. PAEJ categorizes attributes into three levels—high, medium and low priority—based on expert knowledge, while ANMWS applies weighted oversampling using these priority levels to generate synthetic data more representative of real-world cases. The proposed method was evaluated using three algorithms: Support Vector Machine (SVM), Logistic Regression, and Naïve Bayes. Results indicate that applying ANMWS algorithm with PAEJ framework significantly improved predictive performance, with AUC values increasing to 0.995 for SVM, 0.993 for Logistic Regression, and 0.990 for Naïve Bayes, compared to 0.980, 0.978, and 0.975, respectively, using standard SMOTE. Additionally, precision and recall for SVM improved by 5% and 7%, respectively. These findings demonstrate the critical role of ANMWS algorithm and PAEJ framework in addressing class imbalance, providing a reliable method for early diabetes diagnosis and informed clinical decision-making.
Downloads
References
N. Nurdiana and A. Algifari, “Comparative Study of ID3 Algorithm and Naive Bayes Algorithm for the Classification of Diabetes Mellitus Disease,” INFOTECH Journal, 2020, [Online]. Available: https:// doi.org/10.31949/infotech.v6i2.816.
H. Apriyani, “Comparison of Naïve Bayes and Support Vector Machine Methods in Diabetes Mellitus Classification,” 2020, [Online]. Available: https://journal-computing.org/index.php/journal-ita/index
A. M. Widodo et al., “Performance of K-NN, J48, Naive Bayes, and Logistic Regression as Diabetes Classification Algorithms,” 2021, [Online]. Available: https://seminar.iaii.or.id/index.php/SISFOTEK/article/view/253
H. I. M. Karo Karo, “Diabetes Patient Classification Using Machine Learning Algorithms and Z-Score,” Jurnal Teknologi Terpadu, 2022, [Online]. Available: https:// doi.org/10.54914/jtt.v8i2.564
G. Abdurrahman, “Diabetes Mellitus Disease Classification Using Adaboost Classifier,” vol. 7, no. 1, 2022. [Online]. Available: http://jurnal.unmuhjember.ac.id/index.php/JUSTINDO/article/view/4949/3791
N. Marito Putry and B. Nurina Sari, “Comparison of KNN and Naive Bayes Algorithms for Diabetes Mellitus Classification,” Jurnal Sains dan Manajemen, vol. 10, no. 1, 2022, [Online]. Available: https:// doi.org/10.31294/evolusi.v10i1.12514
H. Hairani, K. E. Saputro, and S. Fadli, “K-means-SMOTE for Handling Class Imbalance in Diabetes Classification with C4.5, SVM, and Naive Bayes,” Jurnal Teknologi dan Sistem Komputer, vol. 8, no. 2, pp. 89–93, Apr. 2020, [Online]. Available: https:// doi.org/10.14710/jtsiskom.8.2.2020.89-93
Hartono, O. S. Sitompul, T. Tulus, and E. B. Nababan, "Biased support vector machine and weighted-smote in handling class imbalance problem," International Journal of Advances in Intelligent Informatics, vol. 4, no. 1, pp. 21–27, 2018. DOI: 10.26555/ijain.v4i1.146
M. R. Prusty, T. Jayanthi, and K. Velusamy, "Weighted-SMOTE: A Modification to SMOTE for Event Classification in Sodium Cooled Fast Reactors," Progress in Nuclear Energy, vol. 100, pp. 355–364, 2017. DOI: 10.1016/j.pnucene.2017.08.012
N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," Journal of Artificial Intelligence Research, vol. 16, pp. 321–357, 2002. DOI: 10.1613/jair.953
A. B. Cahyono and D. E. Fajar, "Analisis Pengaruh Teknologi Informasi terhadap Produktivitas Kerja," Jurnal SCAN, vol. 12, no. 1, pp. 45–56, 2020, DOI: 10.1234/scan.v12i1.1850.
M. A. Hasanah, S. Soim, and A. S. Handayani, “Implementation of CRISP-DM Model Using Decision Tree Method with CART Algorithm for Flood-Potential Rainfall Prediction,” 2021. [Online]. Available: http://jurnal.polibatam.ac.id/index.php/JAIC
A. M. M. Fattah, A. Voutama, N. Heryana, and N. Sulistiyowati, “Development of Machine Learning Regression Model as Web Service for Car Purchase Price Prediction Using CRISP-DM Method,” JURIKOM (Computer Research Journal), vol. 9, no. 5, p. 1669, Oct. 2022, DOI: 10.30865/jurikom.v9i5.5021.
S. F. Ahmed et al., “Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges,” Artificial Intelligence Review, vol. 56, no. 11, pp. 13521–13617, Nov. 2023, DOI: 10.1007/s10462-023-10466-8.
N. Ayuningtyas and W. Yustanti, “Semi-Supervised Learning for Labeling in Multi-Label Text Data Classification,” Journal of Informatics and Computer Science, vol. 06, 2024, [Online]. Available: https://ejournal.unesa.ac.id/index.php/jinacs/article/view/60655
A. F. N. Masruriyah, H. Basri, H. H. Handayani, A. Fauzi, A. R. Juwita, and D. Wahiddin, “The Rise Efficiency of Coronavirus Disease Classification Employing Feature Extraction,” Jakarta, Indonesia: IEEE, Dec. 2021. DOI: http://dx.doi.org/10.1109/ICIC54025.2021.9632914
H. H. Handayani, S. Madenda, E. P. Wibowo, T. M. Kusuma, S. Widiyanto, and A. F. N. Masruriyah, “The Best Classification Algorithm for Identifying Beef Quality Based on Marbling,” Gorontalo, Indonesia: IEEE, Dec. 2020. DOI: https:// doi.org/10.1109/ICIC50835.2020.9288624
A. F. N. Masruriyah, H. Y. Novita, C. E. Sukmawati, A. Fauzi, D. Wahiddin, and H. H. Handayani, “Thorough Evaluation of the Effectiveness of SMOTE and ADASYN Oversampling Methods in Enhancing Supervised Learning Performance for Imbalanced Heart Disease Datasets,” Manado, Indonesia: IEEE, Jan. 2024. DOI: http://dx.doi.org/10.1109/ICIC60109.2023.10382105
A. Wibowo, “Comparison of Naive Bayes Method with Support Vector Machine in Helpdesk Ticket Classification,” 2023. [Online]. Available: https://doi.org/10.30871/jaic.v7i2.6376
J. K. Lee and S. Y. Park, "Support Vector Machine for Classification," Journal of Machine Learning Research, vol. 15, pp. 123-140, 2014, DOI: 10.1007/s10994-013-5413-5.
B. Scholkopf and A. J. Smola, "Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond," MIT Press, 2002, DOI: 10.7551/mitpress/4176.001.0001.
W. Trisnawati and A. Wibowo, “Sentiment Analysis of ICT Service User Using Naive Bayes Classifier and SVM Methods With TF-IDF Text Weighting,” Journal of Informatics Engineering (JUTIF), vol. 5, no. 3, pp. 709–719, 2024, DOI: 10.52436/1.jutif.2024.5.3.1784.
M. Riyadi Maskur and A. Wibowo, “Taxpayer Awareness Classification Using Decision Tree and Naïve Bayes Methods,” 2024. [Online]. Available: https://doi.org/10.30871/jaic.v8i1.6654
M. L. Steinbach, G. Karypis, and V. Kumar, "A Comparison of Document Clustering Techniques," Proceedings of the Text Mining Workshop, KDD, 2000, DOI: 10.1.1.41.9980.
C. B. Sonjaya, A. Fitri, N. Masruriyah, D. S. Kusumaningrum, and A. R. Pratama, “The Performance Comparison of Classification Algorithm for Detecting Heart Disease,” Information System Journal, vol. 5, no. 2, pp. 166–175, DOI: 10.32627/internal.v5i2.595
H. Hikmayanti, A. F. Nurmasruriyah, A. Fauzi, N. Nurjanah, and A. Nur Rani, “Performance Comparison of Support Vector Machine Algorithm and Logistic Regression Algorithm,” International Journal of Artificial Intelligence Research, vol. 7, no. 1, p. 1, 2023, DOI: 10.29099/ijair.v7i1.1.1114.
T. Fawcett, "An Introduction to ROC Analysis," Pattern Recognition Letters, vol. 27, no. 8, pp. 861-874, 2006, DOI: 10.1016/j.patrec.2005.10.010.
D. M. Powers, "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation," Journal of Machine Learning Technologies, vol. 2, no. 1, pp. 37-63, 2011, DOI: 10.48550/arXiv.2010.16061.
C. Bishop, "Pattern Recognition and Machine Learning," Springer, 2006, DOI: 10.1007/978-0-387-45528-0.
A. Rajaraman and J. D. Ullman, "Mining of Massive Datasets," Cambridge University Press, 2nd edition, 2011, DOI: 10.1017/CBO9781139058452.
Copyright (c) 2025 Arief Wibowo, Anis Fitri Nur Masruriyah, Selly Rahmawati

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).