Optimizing Software Defect Prediction Models: Integrating Hybrid Grey Wolf and Particle Swarm Optimization for Enhanced Feature Selection with Popular Gradient Boosting Algorithm

Software defects, also referred to as software bugs, are anomalies or flaws in computer program that cause software to behave unexpectedly or produce incorrect results. These defects can manifest in various forms, including coding errors, design flaws


I. INTRODUCTION A. BACKGROUND
Software defect prediction is a crucial tasks in software engineering that can be utilized to maintain software quality [1].Software defect is a bug, error, flaw, mistake, fault, or failure in a computer system that can cause unexpected or erroneous results or impair intended software performance [2].To enhance the reliability of software, developers utilize software defect prediction techniques to identify potential bugs and various error [3].Software defect prediction seeks to forecast defective software modules before they are identified [4].Identifying software defects at an early stage can result in decreased development expenses, rework efforts, and more reliable software [5].Identify defective software modules is important to continuously improve the quality of software [6].

B. PREVIOUS STUDIES
Software defect prediction datasets often have noisy attribute properties, high dimensional, and imbalance classes.Specifically, in the NASA MDP dataset, several attributes exhibit a wide range of values, resulting in noisy attributes.Additionally, datasets such as JM1 and MC1 have very large dimensions, which can cause algorithms to consume significant time and resources.Moreover, high-dimensional data can lead algorithms to produce suboptimal results.Furthermore, the majority of the NASA MDP datasets exhibit an imbalanced class distribution between defects and nondefects [7,8].To overcome problems of imbalanced classes in software defect dataset, Rahardian et al [9] conducted an experiment to solve the imbalance class problem in the Nasa MDP dataset, they took several approaches, namely using Synthetic Minority Oversampling Technique (SMOTE), Tomek Links (TL), One-Sided Selection (OSS), Random Oversampling (ROS), and Random Undersampling (RUS).The results show that the highest AUC value obtained is achieved by using the SMOTE approach, with an AUC value of 0.7277.This research demonstrates that SMOTE is an effective method for addressing imbalanced classes in the NASA MDP dataset.However, this study did not incorporate feature selection into the predictive models.Feature selection involves selecting attributes that have a significant impact on predicting the class.This technique can reduce the number of input features to a classifier and enhance prediction performance.Consequently, predicting software defects without feature selection may yield suboptimal results [10].To address this issue, a feature selection method is employed to reduce the number of features and improve prediction performance.
Futhermore a study conducted by [11] employed an experiment to handle noisy attributes.They utilized two approaches using Particle Swarm Optimization (PSO) and Genetic Algorithm (GA) for feature selection.The researchers conducted several experiments using different classifiers, namely Neural Network, Nearest Neighbor, Support Vector Machine (SVM), Statistical Classifier, and Decision Tree on the NASA MDP dataset.The results showed that significant values were obtained when using the SVM Classifier.The Average AUC result of PSO-SVM is 0.695, while the Average AUC of GA-SVM is 0.631.This research proved that PSO and GA are effective optimization algorithm for handling noisy attributes.However, in this study, data balancing methods were not utilized, the problem of imbalanced classes still exists.Consequently, this leads to poor performance produced by the algorithm.
Another research was conducted by [12].In this study, they conducted several experiments to enhance GA performance by employing hyperparameter tuning and SMOTE in the NASA MDP dataset.They utilized several approaches, namely Grid search, Random search, Optuna, Bayesian search, Hyperband, Tree-structured Parzen Estimator (TPE), and Nevergrad.The highest average AUC obtained was 0.806 using Hyperband and 0.805 using Optuna.Another research utilizing PSO as feature selection was conducted by [13] and [14].In the study conducted by [13], they employed RUS, PSO, and Naïve Bayes to predict software defects in the NASA MDP dataset, with the best AUC obtained being 0.801.Meanwhile, a study conducted by [14] attempted a different balancing method, namely using Bootstrap Aggregating (Bagging) to address the issue of class imbalance.In this research, they utilized PSO for feature selection and Logistic Regression as the classification algorithm.The highest AUC result they obtained was 0.794.The results of the three previous studies have shown that it is possible to address noisy attributes and imbalanced classes by implementing balancing methods and then utilizing PSO or GA as feature selection.However, PSO and GA also have weaknesses, especially in high-dimensional datasets.These algorithms tend to generate suboptimal solutions within the search space without achieving better solutions.As a result feature selection yield suboptimal performance in the model, consume valuable time, and getting traped in local optima [15,16].
Feature Selection, especially PSO tends to have low performance without optimization.Generally, the best results can be obtained when parameter tuning is performed or when various PSO techniques are utilized [15].According to [17], there are several techniques to enhance the PSO method, including hybridization, improved strategies such as fuzzy logic and mutation, and the utilization of different PSO variants such as binary and chaotic.These techniques can improve the performance of the PSO algorithm.Furthermore research was conducted by [18], who attempted to enhance the PSO technique by using a variant of PSO.They employed Binary PSO as feature selection with Artificial Neural Network (ANN) as classification.This method was used to predict software defects in four NASA MDP datasets: JM1, KC1, KC3, and PC1.They generated AUC values of 0.739, 0.8487, 0.882, and 0.9297, respectively, achieving an average AUC value of 0.84985.However, in this research, premature convergence occurred, leading to PSO being trapped in local optima.This issue can result in PSO yielding suboptimal results.To address this issue, our study combines PSO with algorithms that have good exploration capabilities for hybridization to prevent PSO from getting trapped in local optima in the software defect prediction model.
Based on this background, we proposed a model to optimize the PSO algorithm by hybridizing with the GWO algorithm, as previously mentioned by [17], doing a hybrid on PSO allows this algorithm to get more optimal results.We used PSO over GA because particle swarm optimization algorithms are easier to use, require fewer adjustable parameters, and are simpler to comprehend compared to other bionic algorithms like genetic algorithms [15].According to [19] the right classifier is needed to be able to reduce high dimensional data and to get better performance.Research conducted by [20] found that the Gradient Boosting Algorithm can handle High Dimensional Data.Therefore, we propose a new prediction model using HGWOPSO as feature selection and popular Gradient Boosting Algorithm as classification for predicting software defect in NASA MDP Dataset.Gradient Boosting used in this study are XGBoost, LightGBM, and CatBoost.

C. OBJECTIVE
The objective of this study is to improving performance results in software defect prediction using HGWOPSO as feature selection for XGBoost, LightGBM, and CatBoost as Classifier which measured with Area Under the ROC Curve (AUC).

II. METHOD
This section describes the dataset used, Synthetic Minority Oversampling Technique (SMOTE), Particle Swarm Optimization (PSO), Grey Wolf Optimizer (GWO), Hybrid Grey Wolf Optimizer and Particle Swarm Optimizaion (HGWOPSO), 10 Fold cross validation, Extreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Categorial Boosting (CatBoost), Area under the ROC Curve (AUC) and T-Test.The research flow of this research can be seen in FIGURE 1.

FIGURE 1. Research Flow using proposed Feature Selection and Classification Models
FIGURE 1 shows a flowchart that we used in this study.The first step is collecting the NASA MDP dataset, followed by dividing the data using cross validation.In this study we use 10-fold cross validation for the validation technique.Each NASA MDP dataset is divided into 10 sections, with 8 sections allocated for training data while the remaining 2 section are used as test data.After the data is divided, SMOTE is performed on the training data to balance the dataset, followed by feature selection and classification executed with three scenarios.Feature selection executed via PSO, GWO, and HGWOPSO.After the feature selection is executed, classification is performed using 3 different algorithms which are Xgboost, Lightgbm, and Catboost.
Research evaluation uses the average AUC value.This Experiments was carried out using Jupyter Notebook.

A. DATA COLLECTION
In this study we use a software defect dataset called NASA MDP, These datasets are sourced from the NASA corpus, which encompasses real software projects across diverse domains and programming languages namely C, C++, and Java.The dataset exhibits considerable variations in code size, complexity, and functionality, offering a comprehensive representation of software development challenges.It comprises numerous software metrics, including lines of code, cyclomatic complexity, and code churn.These metrics provide valuable insights into the characteristics and attributes of software components.The primary purpose of this dataset is to facilitate the evaluation and development of predictive models aimed at identifying potentially defective software components early in the development lifecycle.In the data preprocessing phase, attributes containing categorical values are converted to nominal values, specifically 0 and 1.In the NASA MDP dataset, the Defective attribute represent Y and will converted to 1 while Non-Defective represent N and will be converted to 0. The dataset is available for download at the following link: https://github.com/klainfo/NASADefectDataset/tree/masterTABLE 1 is shows, which contains information and some general statistics about each of the datasets used.

B. 10 K-FOLD CROSS VALIDATION
To reduce the tendency or systematic error in estimating the performance of a model, random sampling in datasets is performed by implementing cross validation [21].Crossvalidation is a statistical method for evaluating the performance of an algorithm.The capability of crossvalidation lies in its ability to divide the data into training and testing sets.Cross-validation is a computational method that requires information partitioning using subsets.[22].Cross validation is also resampling data to prevent overfitting [23].One part of the data is ultilazed to validate the model while the remaining part is utilized for training the classifier [24] At this phase, the dataset is divided into training and test data using cross-validation with a value of k = 10.The data will be split into ten subsets, each containing instances from the same class [25].

C. Synthetic Minority Oversampling Technique
SMOTE is a resampling technique that generates some samples in order to increase the number of the minority class by selecting a random point from the line segment.SMOTE linking a sample and its closest neighbor to generates a new sample [10].The SMOTE method uses oversampling to rebalance the original training set.Instead of simply replicating minority class instances, the primary concept of SMOTE is to offer synthetic samples [26].The idea using SMOTE in software defect prediction is to balance the defective and non-defective instances, which can increase the detection performance [27].SMOTE can be mathematically modeled in the following equation (1) [28].

Consider a minority class sample x and one of its k-nearest neighbors y[i]. The equation generates a new synthetic sample 𝑥 𝑛𝑒𝑤 by linearly interpolating between x and y[i],
with the extent of interpolation controlled by a random factor.The random factor, denoted by rand (0,1), scales the difference between x and y[i], allowing for variability in the synthetic sample generation process.By repeating this process for each sample in the minority class and selecting appropriate nearest neighbors, SMOTE effectively balancing the dataset, creating new synthetic samples that reflect the underlying distribution of the minority class.This method helps to rebalance the class distribution, enabling classifiers to learn more effectively from the data and improving their ability to generalize to minority class instances [28].TABLE 2 shows before and after SMOTE.

D. FEATURE SELECTION 1. PSO FEATURE SELECTION
Particle swarm optimization (PSO) is a remarkably effective metaheuristic approach that has been effeciently employed to acquire an optimal subset of features containing crucial information within a feasible time [29].PSO begins by generating a set of random solutions and iteratively seeks for the optimal solution [15].The PSO algorithm's concept and development were inspired by the social behaviors of fish schools and flocks of birds.In the wild, a swarm of birds flies across an area, following the leader who has closest position to the food.Birds social behavior can be translated into mathematical procedures, such as PSO, to solve optimization issues.In this approach, the swarm of birds is viewed as a swarm of particles, with each particle representing a candidate solution.[30].A swarm of particles updates their relative positions from iteration to effectively conduct the search process.In order to obtain the optimum solution, each particle moves towards its prior personal best position (Pbest) and the global best position (Gbest) inside the swarm [17].In order to produce the optimal feature subset, PSO will ends when the requirements are satisfied.PSO position and velocity variations are derived from basic formulas (2) and (3) [31].
The first formula illustrates how the position (  ) of a particle () at time step ( + 1) is updated from its previous position at time (), taking into account the particle's velocity (  ).Here x i (t+1) represents the updated position of particle.
On the other hand, the second formula explains how the velocity of the particle at time step is updated by considering the contributions from the personal best position (Pbest) and the global best position (Gbest) that the particle itself and the entire population have achieved respectively [31].The PSO algorithm's performance is optimized for optimal problem solving by the adjustment of coefficients (c1 and c2) and randomization (r1 and r2) [32].In this studies we used Cognitive Coefficient (c1) = 0.5, Social Coefficient (c2) = 0.3, Inertia weight (w) = 0.9, iteration = 50 and population

GWO FEATURE SELECTION
Grey Wolf Optimizer is metaheuristic swarm-based algorithm that mimics the social leadership and hunting behavior of grey wolves in nature [33].The algorithm mimics how grey wolves behave in their natural environment, including their leadership structure and pursuit style [34].Within the leadership structure of grey wolves, there exist four distinct type: alpha, beta, delta, and omega wolves.Alpha wolves symbolize the solution with the most optimal results, while beta and delta wolves denote the second and third best solutions within the population, the rest of nominated solutions are omega [35].Hunting behavior of grey wolves consists of the following three primary parts.First part is tracking, chasing, and approaching the prey.
After that the wolfs Pursuing, encircling, and harassing the prey till it stops moving.Last part is the wolves attacking the prey [36].Grey wolf algorithm can be mathematically modeled in the following equations ( 4) and ( 5) [33]: In these equations the variable t represents the number of iterations, Xp denotes the prey position, X represent the grey wolves location, while The variables A and C serve as coefficients for the vectors.their values are determined through equations ( 6) and ( 7) [36]: Here, the quantity of a exhibits a linear decrease from 2 to 0, inversely correlating with the decreasing number of iterations.r1 and r2 represent uniformly selected random numbers between [0,1].Alpha wolves lead grey wolves to locate prey.Occasionally, beta and delta wolves assist the alpha wolf.These algorithm prioritizes alpha wolves as the optimal option, followed by beta and delta wolves.As a result, the positions of these three wolves influence the movement of the rest of the population [35].
The mathematical formulas are shown in equation ( 8) [35]: The values   ,   and   represent the best three wolves in each iteration, respectively as shown in equations ( 9) and ( 10) [36].
Here,   (t + 1) representing the new position of the prey, which signifies the average of the positions of the top three wolves within the group.This algorithm will finish the hunt if Grey wolves attacking the prey [36].In this study we utilized step size (a) = 2, Alfa (A) = 0.5, Convergence Control (C) = 0.3, population size = 5, and iteration = 50.TABLE 4 shows average feature selected by GWO.

HGWOPSO FEATURE SELECTION
Hybrid Grey Wolf Optimizer -Particle Swarm Optimization is developed without altering the fundamental operation of GWO and PSO.The PSO algorithm can successfully solve most real-world issues [17].However, a solution is needed to prevent PSO from becoming stuck in a local minimum.
The GWO algorithm is used to assist the PSO in minimizing the risk of getting trapped in a local minimum.Rather than sending certain particles to random locations, the exploration ability of the GWO can be used to partially improve some of the particle positions, which decreases the risks entailed.
Because the GWO algorithm is used in addition to the PSO algorithm, the running duration of the code is increased [37], [38].FIGURE 2 show the flowchart of HGWOPSO method.In this study, we used the same parameters for both PSO and GWO algorithm, TABLE 5 shows average feature selected by HGWOPSO.Extreme Gradient Boosting is a supervised machine learning technique that combines the predictions of multiple weaker or low-performing models.This approach involves utilizing an ensemble of decision trees within the gradient boosting framework [39].XGBoost utilizes gradient boosting as its core.However, unlike the traditional gradient boosting algorithm, XGBoost does not add weak learners sequentially.Instead, XGBoost adopts a multi-threaded approach by optimizing CPU core utilization in machines [40].XGBoost is known for its speed and efficiency due to its implementation of parallel processing [41].The Xgboost approach utilizes the shrinkage technique to combine multiple weak learners and reduce the possibility of model overfitting.The combination of trees can be mathematically modeled in equation ( 11) [42].
() =  −1 () +   (), 0 <  < 1 (11) Where, fm(X) denotes the m-th step in constructing the weak learner, and Fm(X) represents the m-th step in building the integrated learner.As there exists a substantial negative relationship between the parameter n and the number of iterations, the model's generalization properties are frequently improved when n assumes a lesser value [43].  (  ) represents the newly constructed tree model, with t indicating the total count of base tree models.The computational process of XGBoost is shown in a schematic diagram illustrated in FIGURE 3.

LIGHTGBM CLASSIFICATION
Light Gradient Boosting Machine is a gradient boosting framework that uses tree-based learning algorithms.LightGBM is mainly featured by the decision tree algorithm based on gradient-based one-side sampling (GOSS), exclusive feature bundling (EFB), a histogram and leaf-wise growth strategy with a depth limit [45].GOSS removes a considerable fraction of data instances with small gradients and only utilizes the remainder to estimate information gain.
Because data records with bigger gradients play an important part in the computation of information gain, GOSS can produce a reasonably accurate estimate of information gain with a considerably smaller dataset.EFB reduces the amount of features by bundling mutually exclusive characteristics [46].One unique aspect of the LightGBM algorithm compared to other gradient boosting tree algorithms is in spilting tree.When another boosting algorithms split the tree depthwise or levelwise, LightGBM growing the tree leafwise on the same leaf [47].FIGURE 4 shows how LighGBM spliting the tree while FIGURE 5 shows how another algorithm such as XGBoost splitting the tree.LightGBM can be mathematically modeled in the following equation ( 12) [45]   = ∑   (  )   (12) Here,   denotes the prediction generated by the model for the -th data sample.This prediction stems from the combination of predictions from each decision tree   , where  represents the number of trees within the model.Consequently, if there are  trees in the model, the final prediction is the summation of predictions yielded by each individual tree.This illustrates the concept of ensemble learning, wherein the combination of multiple weak models can yield a stronger one.By employing this approach, LightGBM enables the modeling of complex relationships between input features and target outputs by integrating the results from several decision trees [45][46][47].

CATBOOST CLASSIFICATION
Categorical Boosting is a new gradient boosting tree that can hadle categorical data.It does not use binary substitution of categorical values, instead it performs a random permutation of the dataset and calculates the average label value [48].
Catboost use decision tree as base predictor [49].When constructing a new split for the tree, CatBoost uses a greedy way to consider the combinations.CatBoost combines all combinations preset with all categorical features in the dataset [50].FIGURE 6 shows how CatBoost constructing a tree.
The values M represents the total number of parameters in the model, and  is a hyperparameter controlling the regularization strength.The regularization component aims to curb the weight of parameters, preventing them from growing excessively large, which could lead to overfitting.Meanwhile,  serves as an index used to iterate through each parameter in the model [49].

F. AREA UNDER THE ROC CURVE
The area under the Receiver Operating Characteristics curve, or simply AUC is a metric used to measure the performance of classification models.It represents the measure of separability between the models true positive rate and false positive rate across various threshold values.AUC ranges from 0 to 1, where a higher AUC indicates better model performance [51].AUC includes False Negative (FN), False Positive (FP), True Negative (TN), and True Positive (TP).AUC can be mathematically modeled in the following equations ( 15 Moreover, interpreting the AUC value provides insights into the models capacity to differentiate between positive and negative classes.Additionally, AUC serves as a useful tool for model selection and comparison, allowing practitioners to assess the relative effectiveness of different classifiers [53].TABLE 6 presents a list of several AUC values for categorization [54].

G. T-TEST
The t-test is a statistical test employed to determine if there is a significant difference between the means of two groups.
It is commonly employed in scientific research to assess whether the means of two populations are statistically different from each other [55].The t-test calculates the tvalue, which signifies the difference between the means of the two groups relative to the variation within each group, factoring in sample sizes and standard deviations.Subsequently, this t-value is compared against a critical value derived from the t-distribution to determine the statistical significance of the observed difference [56] If the t-test value is less than 0.05, then the results of both comparisons can be considered significant [57].T-test can be calculated uses equations ( 16) below [56].
Here  1 and  2 are the mean values from groups 1 and 2,   is an estimate of the pooled s of the measurements, and n1 and n2 are the number of observations for each group [56].
III. RESULT TABLE 7 and FIGURE 7 shows the performance of each model on NASA MDP dataset.In this research, we observed that our proposed method of hybridizing the PSO algorithm with the GWO algorithm maximizes the results of the PSO algorithm.TABLE 7 and FIGURE 7 show that The HGWOPSO feature selection outperforms both the PSO and GWO algorithms across all 13 NASA MDP datasets.The average results for these three feature selection methods are presented in TABLE 8, While TABLE 9 shows increase value of each methods.10, is evident that there is a significant improvement between the HGWOPSO algorithm and the GWO or PSO algorithms.
The results indicate that the highest outcome is achieved by HGWOPSO CatBoost with an Average AUC of 0.894.This represents an increase of 6.5% compared to PSO CatBoost, with a significance value of 0.005, and an increase of 6.3% compared to GWO CatBoost, with a significance value of 0.001.This test proved that our proposed method stands out by demonstrating a consistently higher level of significance compared to traditional PSO or GWO algorithms that do not utilize hybridization.

IV. DISCUSSION
The results showed that our proposed method could enchane software defect prediction using HGWOPSO as feature selection and gradient boosted tree as classifier such as XGBoost, LightGBM and CatBoost.As we can see in TABLE 10, We conducted a two-tailed t-test between HGWOPSO and PSO, and GWO individually.The results of all t-tests showed values smaller than 0.05.This means there is a significant difference between HGWPSO and PSO, as well as between HGWOPSO and GWO.From the result above, our method has proven successfully in optimizing software defect prediction.This is evidenced that our method is superior compared to prior study, TABLE 11 shown the comparasion between our proposed method and other PSO method.

Researcher
Method AUC [11] PSO -SVM 0.695 [13] PSO -NB 0.805 [14] PSO -LR 0.794 [18] BPSO(BCO) -ANN 0.849 In previous research on software defect prediction, especially in the NASA MDP dataset, various models were employed.Researchers employ different approaches to achieve optimal results, such as parameter tuning, combining multiple learning models, and seeking effective combinations between different methods.Because of that, we also strive to compare our research findings with different methodologies.TABLE 12 shown the comparasion between our proposed method and various methodologies.Based on the data presented in TABLE 13 and FIGURE 8, a comparison is made between different methods for predicting software defects.The proposed method using HGWOPSO and CatBoost demonstrates the best performance in KC1, KC3, and PC1 datasets.HGWOPSO CatBoost achieves superior results compared to other methods because HGWOPSO optimizes the performance of PSO through the exploration capabilities inherited from the GWO algorithm.This enables HGWOPSO to select more relevant features and attain better results.Additionally, CatBoost's unique approach to constructing and splitting trees also plays a crucial role in classification.However, the method used in this study also has limitations.Specifically, the resulting model's performance fails to reach optimal levels in the JM1 dataset.HGWOPSO CatBoost yields an AUC of 0.681, as indicated in TABLE 6, which falls into the Poor Category.This is attributed to the excessively highdimensional data and class imbalance present in the JM1 dataset, resulting in suboptimal results from the method we employed In this study, our findings in software defect prediction using HGWOPSO have significant implications both in industry and research.Industrially, the prediction model we developed can be implemented in software development companies to enhance the quality assurance process.By accurately predicting software defects, companies can allocate resources more efficiently, prioritize testing efforts, and ultimately deliver high-quality software products to their clients.Additionally, IT consultancy firms can leverage our prediction model to offer better risk assessment and mitigation strategies to their clients, helping businesses anticipate potential software defects and take proactive measures to minimize their impact on operations.On the research front, our contribution in developing the HGWOPSO approach as a novel method for defect prediction provides a substantial contribution to the field of software engineering.Our findings can serve as a foundation for future research in building more advanced defect prediction models and improved methodologies.Furthermore, the dataset and methodology we utilized can serve as a benchmark for future studies in software defect prediction, facilitating the evaluation and enhancement of prediction models in the field.Thus, our research not only advances knowledge in software defect prediction but also has practical implications for various industries and research domains.

V. CONCLUSION
Software defect prediction is a crucial task in software engineering that can be utilized to maintain software quality.Identifying software defects at an early stage can result in decreased development expenses, rework efforts, and more reliable software.Software defect prediction datasets, specifically the NASA MDP dataset, have noisy attribute properties, high dimensionality, and imbalanced classes.To overcome these issues, we propose a method using HGWOPSO as feature selection and gradient boosting trees for classification, namely XGBoost, LightGBM, and CatBoost.The proposed method, which utilizes HGWOPSO, has been found to enhance AUC performance compared to the previous PSO study.The average AUC values yielded by HGWOPSO XGBoost, HGWOPSO LightGBM, and HGWOPSO CatBoost are 0.891, 0.881, and 0.894, respectively.We also conducted a two-tailed t-test between HGWOPSO and PSO, as well as between HGWOPSO and GWO individually.The results of all t-tests showed values smaller than 0.05.This indicates a significant difference between HGWPSO and PSO, as well as between HGWOPSO and GWO.This is prove that our proposed method successfully maximizes the results of the PSO algorithm.The findings of the research shows that employing HGWOPSO feature selection with CatBoost classification results in superior performance compared to the method used in the previous study.
This research still has several limitations.As we can see in TABLE 13, it is evident that the method we used yielded suboptimal performance compared to previous studies, spesifically in the JM1 dataset.Our best method, HGWOPSO CatBoost, resulted in an AUC of 0.681, falling into the 'poor' category.This could be attributed to the dataset's excessively large high-dimensional data and highly imbalanced classes.For future research, we recommend focusing on examining this dataset, given its excessively high-dimensional data and highly imbalanced classes.To mitigate the imbalanced classes, we suggest changing the sampling method used, such as RUS, ROS, TL, or OSS, This change aims to address class imbalance and improve model performance.Additionally, we recommend changing the classification method in order to select a more suitable approach.The objective is to address issues associated with high-dimensional data, This is evident when we change the classification yields beeter performance result, as shown in TABLE 7 and FIGURE 7, where the HGWOPSO -XGBoost method outperformed HGWOPSO -CatBoost with an AUC of 0.717.Furthermore, we suggest employing hyperparameter tuning in future research the aim for this study is to achieve more optimal results in software defect prediction.

FIGURE 6 .Here
FIGURE 6. Depth-wise tree growth in CatBoost [50]Due to CatBoost unique way of building trees, CatBoost has two main components in performing optimization, namely Loss Component, and Regularization component[49].Loss component is the part that measures how well the model predicts the actual target from the training samples, Loss Component can be modeled into mathematical form in the following equation (13)[49]

TABLE 13 Detail comparison with other research method
13 is clear that the result of this research outperform the methodology of previous studies.TABLE13and FIGURE 8 compares the performance of previous studies where the highest AUC was achieved, using the BPSO-ANN and BGA-LR methods.The study was conducted on the JM1, KC1, KC3, and PC1 datasets.