Feature Selection Using Firefly Algorithm With Tree-Based Classification In Software Defect Prediction

Keywords: Software Defect Prediction, Firefly, Decision tree, Random forest, Deep forest

Abstract

Defects that occur in software products are a universal occurrence. Software defect prediction is usually carried out to determine the performance, accuracy, precision and performance of the prediction model or method used in research, using various kinds of datasets. Software defect prediction is one of the Software Engineering studies that is of great concern to researchers. This research was conducted to determine the performance of tree-based classification algorithms including Decision Trees, Random Forests and Deep Forests without using feature selection and using firefly feature selection. And also know the tree-based classification algorithm with firefly feature selection which can provide better software defect prediction performance. The dataset used in this study is the ReLink dataset which consists of Apache, Safe and Zxing. Then the data is divided into testing data and training data with 10-fold cross validation. Then feature selection is performed using the Firefly Algorithm. Each ReLink dataset will be processed by each tree-based classification algorithm, namely Decision Tree, Random Forest and Deep Forest according to the results of the firefly feature selection. Performance evaluation uses the AUC value (Area under the ROC Curve). Research was conducted using google collab and the average AUC value generated by Firefly-Decision Tree is 0.66, the average AUC value generated by Firefly-Random Forest is 0.77, and the average AUC value generated by Firefly-Deep Forest is 0, 76. The results of this study indicate that the approach using the Firefly algorithm with Random Forest classification can work better in predicting software damage compared to other tree-based algorithms. In previous studies, tree-based classification with hyperparameter tuning on software defect prediction datasets obtained quite good results. In another study, the classification performance of SVM, Naïve Bayes and K-nearest neighbor with firefly feature selection resulted in improved performance. Therefore, this research was conducted to determine the performance of a tree-based algorithm using the firefly selection feature.

Downloads

Download data is not yet available.

References

M. McDonald, R. Musson, and R. Smith, “The Practival Guide to Defect Prevention,” ProVISION, no. 57, pp. 1–478, 2008.

S. A. Putri, “Prediksi Cacat Software Dengan Teknik Sampel Dan Seleksi Fitur Pada Bayesian Network,” J. Kaji. Ilm., vol. 19, no. 1, p. 17, 2019, doi: 10.31599/jki.v19i1.314.

Emma Andini, M. R. Faisal, Rudy Herteno, R. A. Nugroho, Friska Abadi, and Muliadi, “Peningkatan Kinerja Prediksi Cacat Software Dengan Hyperparameter Tuning Pada Algoritma Klasifikasi Deep Forest,” J. Mnemon., vol. 5, no. 2, pp. 119–127, 2022, doi: 10.36040/mnemonic.v5i2.4793.

M. Anbu and G. S. Anandha Mala, “Feature selection using firefly algorithm in software defect prediction,” Cluster Comput., vol. 22, no. s5, pp. 10925–10934, 2019, doi: 10.1007/s10586-017-1235-3.

C. Khammassi and S. Krichen, “A GA-LR wrapper approach for feature selection in network intrusion detection,” Comput. Secur., vol. 70, pp. 255–277, 2017, doi: 10.1016/j.cose.2017.06.005.

H. Wei, C. Hu, S. Chen, Y. Xue, and Q. Zhang, “Establishing a software defect prediction model via effective dimension reduction,” Inf. Sci. (Ny)., vol. 477, pp. 399–409, 2019, doi: 10.1016/j.ins.2018.10.056.

B. Kovalerchuk, “Enhancement of Cross Validation Using Hybrid Visual and Analytical Means with Shannon Function,” Stud. Comput. Intell., vol. 835, pp. 517–543, 2020, doi: 10.1007/978-3-030-31041-7_29.

S. A. Putri and D. Larasati, “Penerapan Feature Selection Pada Bayesian Network Untuk,” PILAR Nusa Mandiri, vol. 13, no. 2, pp. 275–280, 2017.

E. Emary, H. M. Zawbaa, K. K. A. Ghany, A. E. Hassanien, and B. Parv, “Firefly optimization algorithm for feature selection,” ACM Int. Conf. Proceeding Ser., vol. 02-04-September-2015, 2015, doi: 10.1145/2801081.2801091.

L. Zhang, K. Mistry, C. P. Lim, and S. C. Neoh, “Feature selection using firefly optimization for classification and regression models,” Decis. Support Syst., vol. 106, pp. 64–85, 2018, doi: 10.1016/j.dss.2017.12.001.

E. M. Mashhour, E. M. F. El Houby, K. T. Wassif, and A. I. Salah, “Feature selection approach based on firefly algorithm and chi-square,” Int. J. Electr. Comput. Eng., vol. 8, no. 4, pp. 2338–2350, 2018, doi: 10.11591/ijece.v8i4.pp2338-2350.

Y. Simamora, I. Hajar, and A. Fernandes, “Penerapan Algoritma Kunang – Kunang (Firefly Algorithm) untuk Optimasi Rekonfigurasi Jaringan Distribusi Radial,” Energi & Kelistrikan, vol. 11, no. 2, pp. 71–79, 2019, doi: 10.33322/energi.v11i2.498.

S. Larabi Marie-Sainte and N. Alalyani, “Firefly Algorithm based Feature Selection for Arabic Text Classification,” J. King Saud Univ. - Comput. Inf. Sci., vol. 32, no. 3, pp. 320–328, 2020, doi: 10.1016/j.jksuci.2018.06.004.

A. H. Nasrullah, “Implementasi Algoritma Decision Tree Untuk Klasifikasi Produk Laris,” J. Ilm. Ilmu Komput., vol. 7, no. 2, pp. 45–51, 2021, doi: 10.35329/jiik.v7i2.203.

N. Gayatri, S. Nickolas, and A. V Reddy, “Feature Selection Using Decision Tree Induction in Class level Metrics Dataset for Software Defect Predictions,” World Congr. Eng. Comput. Sci. Vols 1 2, vol. I, pp. 124–129, 2010, [Online]. Available: http://www.iaeng.org/publication/WCECS2010/WCECS2010_pp124-129.pdf

R. T. Wulandari, “Pengertian Data Mining,” Data Min., vol. 7, no. 3, pp. 3–9, 2010.

R. Supriyadi, W. Gata, N. Maulidah, and A. Fauzi, “Penerapan Algoritma Random Forest Untuk Menentukan Kualitas Anggur Merah,” E-Bisnis J. Ilm. Ekon. dan Bisnis, vol. 13, no. 2, pp. 67–75, 2020, doi: 10.51903/e-bisnis.v13i2.247.

Y. N. Soe, P. I. Santosa, and R. Hartanto, “Software defect prediction using random forest algorithm,” Proc. - 12th SEATUC Symp. SEATUC 2018, 2018, doi: 10.1109/SEATUC.2018.8788881.

Y. S. Nugroho and N. Emiliyawati, “Sistem Klasifikasi Variabel Tingkat Penerimaan Konsumen Terhadap Mobil Menggunakan Metode Random Forest,” J. Tek. Elektro, vol. 9, no. 1, pp. 24–29, 2017.

D. R. Ibrahim, R. Ghnemat, and A. Hudaib, “Software defect prediction using feature selection and random forest algorithm,” Proc. - 2017 Int. Conf. New Trends Comput. Sci. ICTCS 2017, vol. 2018-January, pp. 252–257, 2017, doi: 10.1109/ICTCS.2017.39.

D. Liparas, Y. HaCohen-Kerner, A. Moumtzidou, S. Vrochidis, and I. Kompatsiaris, “News articles classification using random forests and weighted multimodal features,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 8849, pp. 63–75, 2014, doi: 10.1007/978-3-319-12979-2_6.

T. Zhou, X. Sun, X. Xia, B. Li, and X. Chen, “Improving defect prediction with deep forest,” Inf. Softw. Technol., vol. 114, no. July 2018, pp. 204–216, 2019, doi: 10.1016/j.infsof.2019.07.003.

L. V. Utkin, M. S. Kovalev, and A. A. Meldo, “A deep forest classifier with weights of class probability distribution subsets,” Knowledge-Based Syst., vol. 173, pp. 15–27, 2019, doi: 10.1016/j.knosys.2019.02.022.

I. Dataset, P. Kebangkrutan, P. Wilda, I. Sabilla, and C. B. Vista, “Jurnal Politeknik Caltex Riau,” J. Komput. Terap., vol. 7, no. 2, pp. 329–339, 2021, [Online]. Available: https://jurnal.pcr.ac.id/index.php/jkt/

M. L. Calle, V. Urrea, A. L. Boulesteix, and N. Malats, “AUC-RF: A new strategy for genomic profiling with random forest,” Hum. Hered., vol. 72, no. 2, pp. 121–132, 2011, doi: 10.1159/000330778.

Published
2023-08-11
How to Cite
[1]
V. Maulida, R. Herteno, D. Kartini, F. Abadi, and M. R. Faisal, “Feature Selection Using Firefly Algorithm With Tree-Based Classification In Software Defect Prediction ”, j.electron.electromedical.eng.med.inform, vol. 5, no. 4, pp. 223-230, Aug. 2023.
Section
Electronics