Implementation of SMOTE and whale optimization algorithm on breast cancer classification using backpropagation
Abstract
Breast cancer, which is characterized by uncontrolled cell growth, is the primary cause of mortality among women worldwide. The unchecked proliferation of cells leads to the formation of a mass or tumor. Generally, the absence of timely and efficient treatment contributes to this phenomenon. To prevent breast cancer, one of the strategies involves the classification of malignant and non-malignant types. For this particular investigation, the Breast Cancer Wisconsin dataset (original) comprising 699 instances with 11 classes and 1 target attribute was utilized. Synthetic Minority Oversampling (SMOTE) was employed to balance the dataset, with the Backpropagation classification algorithm and the Whale Optimization Algorithm (WOA) serving as optimization techniques. The main objectives of this study were to analyze the impact of the backpropagation method and SMOTE, examine the effect of the backpropagation method in conjunction with WOA, and assess the outcome of using the backpropagation method and SMOTE after incorporating WOA. The evaluation of the study's findings was performed using a confusion matrix and the Area Under the Curve (AUC) metric. The research outcomes based on the application of backpropagation yielded an accuracy rate of 96%, precision of 94%, recall of 95%, and an AUC of 96%. Subsequently, upon implementing SMOTE and WOA, the performance of the backpropagation method improved, resulting in an accuracy rate of 99%, precision of 97%, recall of 97%, and an AUC of 98%. This notable enhancement in performance suggests that the utilization of SMOTE and WOA effectively enhances accuracy. However, it is important to note that the observed improvements are relatively modest in nature.
Downloads
References
R. Patra and S. Sunani., “a Review on Different Computing Method for Breast Cancer Diagnosis Using Artificial Neural Network and Datamining Techniques.,” Int. J. Adv. Res., vol. 4, no. 11, pp. 598–610, 2016, doi: 10.21474/ijar01/2123.
F. Alsharif et al., “The level of lymphedema awareness among women with breast cancer in the kingdom of saudi arabia,” Int. J. Environ. Res. Public Health, vol. 18, no. 2, pp. 1–10, 2021, doi: 10.3390/ijerph18020627.
A. A. Guth, B. Diskin, F. Schnabel, N. Pourkey, D. Axelrod, and R. Shapiro, “Abstract P2-03-01: Changes in breast cancer presentation during Covid-19: Experience in an Urban Academic Center,” Cancer Res., vol. 82, no. 4_Supplement, pp. P2-03-01-P2-03–01, 2022, doi: 10.1158/1538-7445.sabcs21-p2-03-01.
D. Nasien, V. Enjeslina, M. Hasmil Adiya, and Z. Baharum, “Breast Cancer Prediction Using Artificial Neural Networks Back Propagation Method,” J. Phys. Conf. Ser., vol. 2319, no. 1, 2022, doi: 10.1088/1742-6596/2319/1/012025.
R. Rai, M. K. Tiwari, D. Ivanov, and A. Dolgui, “Machine learning in manufacturing and industry 4.0 applications,” Int. J. Prod. Res., vol. 59, no. 16, pp. 4773–4778, 2021, doi: 10.1080/00207543.2021.1956675.
O. Iparraguirre-Villanueva, A. Epifanía-Huerta, C. Torres-Ceclén, J. Ruiz-Alvarado, and M. Cabanillas-Carbonell, “Breast Cancer Prediction using Machine Learning Models,” Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 2, pp. 610–620, 2023, doi: 10.14569/IJACSA.2023.0140272.
A. G. Putrada, I. D. Wijaya, and D. Oktaria, “Overcoming Data Imbalance Problems in Sexual Harassment Classification with SMOTE,” Int. J. Inf. Commun. Technol., vol. 8, no. 1, pp. 20–29, 2022, doi: 10.21108/ijoict.v8i1.622.
A. C. Algorithm, “C5 . 0 Algorithm and Synthetic Minority Over- sampling Technique ( SMOTE ) for Rainfall Forecasting in Bandung Regency,” 2019 7th Int. Conf. Inf. Commun. Technol., vol. 4, pp. 1–5, 2019.
A. Khumaidi and R. Raafi’udin, “Effects of Oversampling Smote and Spectral Transformations in the Classification of Mango Cultivars Using Near-Infrared Spectroscopy,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 12, no. 3, pp. 1047–1053, 2022, doi: 10.18517/ijaseit.12.3.16001.
H. Fang, H. Fan, S. Lin, Z. Qing, and F. R. Sheykhahmad, “Automatic breast cancer detection based on optimized neural network using whale optimization algorithm,” Int. J. Imaging Syst. Technol., vol. 31, no. 1, pp. 425–438, 2021, doi: 10.1002/ima.22468.
N. Thanh-Long, Tran-Minh, and L. Hong-Chuong, “A Back Propagation Neural Network Model with the Synthetic Minority Over-Sampling Technique for Construction Company Bankruptcy Prediction,” Int. J. Sustain. Constr. Eng. Technol., vol. 13, no. 3, pp. 68–79, 2022, doi: 10.30880/ijscet.2022.13.03.007.
Y. Hassouneh, H. Turabieh, T. Thaher, I. Tumar, H. Chantar, and J. Too, “Boosted Whale Optimization Algorithm with Natural Selection Operators for Software Fault Prediction,” IEEE Access, vol. 9, pp. 14239–14258, 2021, doi: 10.1109/ACCESS.2021.3052149.
I. Aljarah, H. Faris, and S. Mirjalili, “Optimizing connection weights in neural networks using the whale optimization algorithm,” Soft Comput., vol. 22, no. 1, pp. 1–15, 2018, doi: 10.1007/s00500-016-2442-1.
M. Hasanali and C. W. Howe, “An Examination of Machine Learning Algorithms for Missing Values Imputation,” Int. J. Innov. Technol. Explor. Eng., vol. 8, no. 12S2, pp. 415–420, 2019, doi: 10.35940/ijitee.l1081.10812s219.
S. K. Kiangala and Z. Wang, “An effective adaptive customization framework for small manufacturing plants using extreme gradient boosting-XGBoost and random forest ensemble learning algorithms in an Industry 4.0 environment,” Mach. Learn. with Appl., vol. 4, no. February, p. 100024, 2021, doi: 10.1016/j.mlwa.2021.100024.
C. Khammassi and S. Krichen, “A GA-LR wrapper approach for feature selection in network intrusion detection,” Comput. Secur., vol. 70, pp. 255–277, 2017, doi: 10.1016/j.cose.2017.06.005.
A. S. Hussein, T. Li, C. W. Yohannese, and K. Bashir, “A-SMOTE: A new preprocessing approach for highly imbalanced datasets by improving SMOTE,” Int. J. Comput. Intell. Syst., vol. 12, no. 2, pp. 1412–1422, 2019, doi: 10.2991/ijcis.d.191114.002.
B. D. Handari, D. Wulandari, N. A. Aquita, S. Leandra, D. Sarwinda, and G. F. Hartono, “Comparing Restricted Boltzmann Machine – Backpropagation Neural Networks, Artificial Neural Network – Genetic Algorithm and Artificial Neural Network – Particle Swarm Optimization for Predicting DHF Cases in DKI Jakarta,” Int. J. Adv. Sci. Eng. Inf. Technol., vol. 12, no. 6, pp. 2476–2484, 2022, doi: 10.18517/ijaseit.12.6.16226.
Y. A. Lesnussa, C. G. Mustamu, F. Kondo Lembang, and M. W. Talakua, “Application of Backpropagation Neural Networks in Predicting Rainfall Data in Ambon City,” Int. J. Artif. Intell. Res., vol. 2, no. 2, 2018, doi: 10.29099/ijair.v2i2.59.
J. Nievergelt, “R69-13 Perceptrons: An Introduction to Computational Geometry,” IEEE Trans. Comput., vol. C–18, no. 6, p. 572, 1969, doi: 10.1109/T-C.1969.222718.
S. Chakraborty, A. Kumar Saha, S. Sharma, S. Mirjalili, and R. Chakraborty, “A novel enhanced whale optimization algorithm for global optimization,” Comput. Ind. Eng., vol. 153, no. December 2020, p. 107086, 2021, doi: 10.1016/j.cie.2020.107086.
J. Ha, M. Kambe, and J. Pe, Data Mining: Concepts and Techniques. 2011. doi: 10.1016/C2009-0-61819-5.
S. S. Mukrimaa et al., Intelligent systems reference library, volume 12, vol. 6, no. August. 2016.
R. Dubey, J. Zhou, Y. Wang, P. M. Thompson, and J. Ye, “Analysis of sampling techniques for imbalanced data: An n=648 ADNI study,” Neuroimage, vol. 87, pp. 220–241, 2014, doi: 10.1016/j.neuroimage.2013.10.005.
M. Bekkar, H. K. Djemaa, and T. A. Alitouche, “Evaluation Measures for Models Assessment over Imbalanced Data Sets,” J. Inf. Eng. Appl., vol. 3, no. 10, pp. 27–38, 2013, [Online]. Available: http://www.iiste.org/Journals/index.php/JIEA/article/view/7633
D. Chicco, N. Tötsch, and G. Jurman, “The matthews correlation coefficient (Mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation,” BioData Min., vol. 14, pp. 1–22, 2021, doi: 10.1186/s13040-021-00244-z.
a. K. Santra and C. J. Christy, “Genetic Algorithm and Confusion Matrix for Document Clustering,” Int. J. Comput. Sci., vol. 9, no. 1, pp. 322–328, 2012, [Online]. Available: http://ijcsi.org/papers/IJCSI-9-1-2-322-328.pdf
A. Anton, N. F. Nissa, A. Janiati, N. Cahya, and P. Astuti, “Application of Deep Learning Using Convolutional Neural Network (CNN) Method For Women’s Skin Classification,” Sci. J. Informatics, vol. 8, no. 1, pp. 144–153, 2021, doi: 10.15294/sji.v8i1.26888.
M. Te Wu, “Confusion matrix and minimum cross-entropy metrics based motion recognition system in the classroom,” Sci. Rep., vol. 12, no. 1, pp. 1–10, 2022, doi: 10.1038/s41598-022-07137-z.
I. Markoulidakis, I. Rallis, I. Georgoulas, G. Kopsiaftis, A. Doulamis, and N. Doulamis, “Multiclass Confusion Matrix Reduction Method and Its Application on Net Promoter Score Classification Problem,” Technologies, vol. 9, no. 4, 2021, doi: 10.3390/technologies9040081.
Copyright (c) 2023 Noor Erlianita, Muhammad Itqan Mazdadi, Triando Hamonangan Saragih, Muhammad Reza Faisal, Muliadi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).