Multi-Stage CNN: U-Net and Xcep-Dense of Glaucoma Detection in Retinal Images

Glaucoma is a chronic neurological disease in the retina that causes vision loss. Glaucoma can be detected from abnormalities that occur in the optic disc and optic cup on the retina. To get the features, a segmentation process is needed. Segmentation can improve the performance of the classification. This study combines segmentation and classification with CNN architecture to detect glaucoma. At the segmentation stage, U-Net CNN architecture is applied. U-Net has encoder and decoder sections to get output in the form of images containing only the features needed. At the classification stage, the study proposes the Xcep-Dense Net. Xcep-Dense is a CNN architecture that combines Exception and Dense. The Xcep-Dense seeks the advantages possessed by Xception and Dense architecture and overcomes the weaknesses of each architecture. In the segmentation, The results of U-Net architecture are above 90% for the accuracy, recall, precision, and F1-score but Cohen's kappa is above 85%. The results show that U-Net is excellent for optic disc and optic cup segmentation. At the classification, the accuracy and precision are above 85%, the recall and F1-score are above 80%, and Cohen's kappa is 77%. These results show that the Xcep-Dense architecture is robust enough to classify glaucoma which consists of three classes advanced glaucoma, early glaucoma, and normal. Based on the results, it shows that the proposed method is feasible for detecting glaucoma. The results of this study are expected in the future to be developed into an automatic machine for early detection of glaucoma.


I. INTRODUCTION
The optic disc is a circular spot on the retina formed by the axons of the retinal ganglion cells.The axons of these cells are responsible for sending signals from the eye's photoreceptors to the optic nerve so that the eye can see.The optic cup is the bright area in the center of the optic disc.The optic disc and optic cup are measurements used in the diagnosis of glaucoma.usually, the optic disc and optic cup can be measured horizontally or vertically on the patient [1].Glaucoma is a chronic neurological disease in the human eye where the nerves that connect the eye to the brain are damaged gradually causing vision loss to blindness [2].Glaucoma detection is done by direct observation of the optic disc and optic cup by an ophthalmologist through retinal images taken from the fundus camera.Abnormalities in the optic disc and optic cup can be helped by segmentation.Segmentation is needed to extract important features from images and remove parts that are not needed at the classification stage [7].Segmentation is an important stage before carrying out a classification Segmentation of retinal images is necessary before classifying glaucoma.Segmentation is carried out to extract the features of the optic disc and optic cup from the background which are needed when carrying out the classification stage of glaucoma.Segmentation helps detect more significant glaucoma disorders on retinal image.There are 2 segmentation techniques, namely manual segmentation and automated segmentation.Manual segmentation requires expertise, is very tedious and time consuming, and the results are highly subjective [3], [4].Automated segmentation has many benefits, including increased accuracy, time savings, cost savings, and robustness.Currently, many automated segmentations have been developed using deep learning.Deep learning has powerful capabilities for integrating very large data sets, learning complex relationships, and incorporating existing knowledge in data [5].
One of the deep learning methods developed for image segmentation and classification is the convolution neural network (CNN).CNN has been developed with various architectures for both image segmentation and classification.CNN architecture that is widely used for segmentation is the U-Net architecture [8].The U-Net architecture consists of two parts, namely the encoder, and the decoder.The encoder section functions to extract features from the image while the decoder functions to reconstruct image features [18].Several studies that have used the U-Net architecture have been carried out.Fu et al. [9] performed blood vessel segmentation using the U-Net architecture with performance results of accuracy above 90% and recall below 75% but did not measure precision and F1-score.Venkatesh et al. [10] segmented skin cancer using the U-Net architecture with performance results of accuracy above 90% but did not measure recall, precision, and F1-score.Saood and Hatem [11] performed lung segmentation using the U-Net architecture with performance results of accuracy, recall, and precision above 90% but did not measure the F1-score.Research conducted by Fu et al. [9], Venkatesh et al. [10], and Saood and Hatem [11] only performed segmentation to separate the required features and did not carry out classification.Segmentation only provides information about the boundaries and regions of the object being observed.Classification assigns labels to images or regions with a holistic understanding of those images.Segmentation only provides information for experts but classification provides information labels that can be understood by the public.segmentation is carried out before classification to be able to increase the validity and accuracy of the classification.
One of the automatic systems for detecting glaucoma is the classification of retinal images.Classification is the process of determining a category or label for an object that has been defined previously based on a particular model [12].A method used for classification is the Convolutional Neural Network (CNN) because it can handle input data of type    such as retinal image data [13].The CNN method has been developed for several architectures, one of which is the Extreme Inception (Xception) architecture.Xception is a CNN architecture that improves efficiency in computing processes.Xception has depthwise separable convolution and residual connections in its architecture so that this architecture has small parameters and is computationally efficient [12], [14].Several studies that have classified glaucoma using the Xception architecture on retinal images including Juneja et al. [15] obtained accuracy, recall, and precision values above 90%, but did not measure the F1-score and Cohen's kappa values.Diaz-Pinto et al. [16] obtained results for accuracy, recall, precision, and F1-score above 85%, but did not measure the Cohen's kappa.Juneja et al. [17] obtained accuracy, recall, and precision values above 90%, but did not measure the F1score and Cohen's kappa.The study conducted by Juneja et al. [15], Diaz-Pinto et al. [16], and Juneja et al. [17] directly classify the original image without doing segmentation on the retinal image.
Xception helps in reducing the number of parameters and computations.The small number of parameters can avoid overfitting on Xception but having too few parameters can lead to underfitting, where the model fails to capture even the basic patterns in the training data, especially for image data.The use of residual blocks in Xception utilizes skip connections to help the network learn better representations, too many skip connections or a very deep architecture can increase the risk of overfitting, especially when the dataset is small or lacks diversity.The inclusion of skip connections might lead to the oversight or loss of several significant features and information from the preceding layers.[18].The skip connection feature which causes a lot of information in the previous layer to be missed or lost (vanishing gradient) can be overcome by modifying the Xception architecture.Another CNN architecture is the Densely Connected Convolutional Network (DenseNet).DenseNet is a CNN architecture that uses dense connections and can overcome the problem of loss of gradients in deep networks [19].DenseNet consists of several dense blocks and works by combining each layer without involving the skip connection feature.Each feature map is used as input for the next layer [20].Layer merging causes parameters to be larger.Xception places its emphasis on utilizing depthwise separable convolutions to achieve effective feature extraction.While it does enable some feature reuse across several layers, it doesn't prioritize dense connectivity to the extent that DenseNet does.DenseNet, on the other hand, is recognized for its dense connectivity approach, involving the concatenation and transfer of feature maps from prior layers to successive layers.This dense reuse of features plays a key role in enhancing gradient flow, counteracting the vanishing gradient issue, and fostering the network's ability to acquire concise and distinctive representations.Several studies have used DenseNet in classification, including Wu et al. [21] obtained results for accuracy and F1-scores above 90%, but did not measure recall and precision.Liao et al. [22] obtained accuracy, recall, and precision values above 80%, but did not measure the F1-score.Hasan et al. [23] obtained accuracy, recall, and F1-score values above 90%, but did not measure precision.Research conducted by Wu et al. [21], Liao et al. [22], and Hasan et al. [23] only focused on classification, not segmentation.
This study proposes two stages in detecting glaucoma.In the first stage, segmentation is performed on the retinal image.Segmentation is carried out to separate the features of the optic disc and optic cup which are needed when classifying glaucoma.The segmentation stage is carried out using the U-Net architecture.In the second stage, glaucoma was classified.Classification is carried out based on the segmentation results of the optic disc and optic cup features of the retinal image.
The new architecture proposed in this study to be used at the classification stage is the Xcep-Dense architecture.Xcep-Dense is a combination of Xception and DenseNet architectures.The combination of the Xception and DenseNet architectures is done by adding a dense block at the end of Xception.The dense block is used to replace the residual connection in Xception.The addition of dense blocks is done to combine the output of previous blocks and future blocks so that they can store previous and current spatial information for a long time.has a lower number of parameters and complexity.The success rate of the architecture proposed in this study was measured by calculating performance evaluations such as accuracy, recall, precision, F1-score, and Cohen's kappa.The results obtained from this study accommodate the needs of experts in the medical world and the public.The results at the segmentation stage provide input to medical experts to be able to observe the optic disc and optic cup in the retina.The results at the classification stage provide information not only to medical experts but also to the public such as patients regarding the possibility of glaucoma in the eye.The results of this study can be further developed into automatic applications for detecting glaucoma in the medical world.

II. MATERIALS AND METHODS
The stages of this research method can be seen in FIGURE 1.

A. DATA DESCRIPTION
The data used in this study used secondary data, namely datasets collected from Harvard Dataverse which were obtained and uploaded by Ungsoo Kim from Kim's Eye Hospital.This dataset consists of 1,532 images consisting of 3 labels and is divided into 788 Normal Control images, 289 Early Glaucoma images, and 467 Advanced Glaucoma images.The dataset can be accessed online through the following website pages of the dataset on link https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi: 10.7910/DVN/1YRRAC.An example of an enlarged retinal image display on the Harvard Dataverse dataset can be seen in FIGURE 2.

FIGURE 1. The flow of research stages
In FIGURE 2, you can see the appearance of the optic disc and optic cup in the retinal image.The optic disc is the bright part of the retinal image which is circled in green while the optic cup is the bright part inside the optic cup which is circled in blue.

B. PREPROCESSING
Pre-processing is carried out to improve the quality of the data for the better.The stages of research data pre-processing are as follows: Image resizing aims to resize images adaptively for a more optimal display so that the data used is not too large and adapted to the model to be used.The resizing performed on all images conforms to the model used and, in this study, all images are converted to a size of 224 × 224 pixels.

2) DATA AUGMENTATION
The process of augmentation is to increase data to increase model capabilities and overcome the problem of limited data at the time of research [24].In this study, the augmentations used are rotation image, image flipping, and color jitter.Augmentation with the rotation technique, namely random rotation with an angle of [0 °, 20 °], the restriction of the rotation angle in this augmentation is intended so that other features that are not needed such as the background are not taken away so that the results are not too different from the original image.An example of an image resulting from augmentation rotation can be seen in FIGURE 3.

FIGURE 3. Example of an Augmented Rotation Image Result
In FIGURE 3, the original image is rotated randomly with angles of 3 °, 4 °, and 10 °respectively.The flipping augmentation used is a random flip vertically and horizontally which aims to increase data without reducing the features in it, an example of an image augmentation flipping results can be seen in FIGURE 4. In FIGURE 4, the original image is augmented by flipping vertically and flipping horizontally, so a new image is obtained.Furthermore, augmentation with color jitter, color jitter functions to increase the amount of data by changing the brightness, contrast, saturation, and color levels of the image randomly.

3) RGB CONVERSION
The image received from the input data is still in the form of BGR (Blue Green Red), then the image is converted from BGR (Blue Green Red) to RGB (Red Green Blue) which aims to make channel acquisition more accurate and the image easier to model.

FIGURE 4. Example of Flipping Augmentation Result Image
Color jitter works by using a number matrix, namely the pixel values on a computer with each pixel combined into RGB to produce a variety of colors so that the contrast, saturation, and brightness of the image increase.An example of color jitter augmentation can be seen in FIGURE 5.
After converting the image from BGR to RBG, each channel is taken from the RBG image which is divided into red, green, and blue channels.The comparison results for each channel can be seen in FIGURE 6.Based on FIGURE 6, the red channel has a display that is too bright so that the optical disc and optical cup parts will be difficult to detect because they will include a background, the green channel has a display of the optical disc and optical cup parts to be taken which are quite clear but other features are not needed such as blood vessels, and the blue channel has the appearance of the optic disc and optic cup features which will be taken very clearly compared to other parts and other features are also not too flashy.The channel that will be taken for this research is the blue channel because the blue channel has a very clear display of optical disc and optical cup features than the other channels.

C. SEGMENTATION WITH U-NET ARCHITECTURE
After the image preprocessing stage segmentation is carried out on the retinal image.Segmentation is carried out to separate the features that will be used at the classification stage, namely the optical disc and the optical cup.The segmentation stage uses the U-Net architecture.The U-Net architecture consists of two parts, namely the decoder, and encoder.The encoder part functions to extract useful features from the input image and the decoder is used to reconstruct the features to get the final segmentation results.The encoder has a convolution layer process, ReLu activation function, batch normalization, and max pooling.The U-Net architecture used at the segmentation stage can be seen in FIGURE 7. The convolution layer aims to learn the feature representation of the input.This layer consists of a set of convolutional kernels for extracting local features from input [25].The process of calculating the convolution operation on the convolutional layer uses Equation (1).where,  , is the output result of the ReLu activation function and  , is the input pixel value from the result of the convolution layer operation.

FIGURE 7. U-Net Architecture for Optic Disc and Optic Cup Segmentation
The results of the ReLU activation function are normalized using batch normalization.Batch Normalization is a normalization process for each layer in the network that is applied before or after the activation function [26].Batch Normalization results are calculated by first calculating the average (  ) and variance (  2 ), then normalizing them.The process of calculating the average (  ), variance (  2 ), and normalization is carried out using Equations ( 3), (4), and (5).
where,   is the average value of each mini-batch,   2 is the variance value for each mini-batch,  s the number of minibatches,  is the amount of data in a mini-batch, ̂  the result of normalizing input values in the -th row and the -th column,   is the input matrix entry resulting from the operation of the ReLU activation function in -th row and th column, and  is the smallest constant value.Then the dimension reduction is carried out on the feature map resulting from batch normalization using max pooling.In the decoder section, the convolution layer operation is performed, the ReLu activation function, and the same batch normalization as in the encoder section.Then, the feature map dimensions are increased by using up-sampling on the decoder section.The results of operations on the encoder and decoder are combined using concatenate.Then perform the calculation operation of the SoftMax activation function using Equation (6).
for  = 1, … ,  where  is the number of classes. is the output result of the softmax activation function and  is the input result of batch normalization.The final stage is to calculate the loss function using categorical loss entropy.Categorical cross-entropy is a loss function that has more than 2 object classes or multi-class.Categorical cross-entropy is calculated using Equation ( 7) [27].
where  is the predicted result matrix row,  is the predicted matrix column,   is the output matrix entry of the softmax activation function operation result in the -th row of the -th column,   is the actual result matrix entry in the -th row of the -th column, and  are the results of categorical crossentropy.then it is continued into the middle flow block which is repeated 8 times and finally enters the exit flow block which is done 3 times with the addition of dense block.In the dense block, there are input sections, batch normalization, ReLU, convolution, and transition layers where this process is carried out to overcome the vanishing gradient problem which is carried out as many as -layers according to the model and then ends with class classification using the softmax activation function.

A. PREPROCESSING
In the initial preprocessing stage, the image is resized to a size of 224 × 224 pixels.Then, augmentation is performed using image rotating, image flipping, and color jitter techniques.The rotating image technique produced 5,902 new images, the flipping image technique produced 5,932 new images, and the color jitter technique produced 5,918 new images so a total of 19,032 new data were obtained.The data consists of three classes, namely 6,370 images of the advanced glaucoma class, 6,358 images of the early glaucoma class, and 6,304 images of the normal control class.Next, a conversion from BGR to RGB is performed to avoid errors when selecting channels, and the blue channel is retrieved from the image because the blue channel has optical disc and optical cup features that are brighter and clearer than other channels.The results of the preprocessing stages carried out can be seen in FIGURE 9.

B. SEGMENTATION WITH U-NET ARCHITECTURE
Segmentation is carried out to separate the optical disc and optical cup features that will be used at the classification stage.The segmentation process is divided into two, namely training and testing.In the training process, measurement of accuracy and loss is carried out on the training data and data validation.Accuracy is used to measure the success of the segmentation stage in extracting the desired features from the image while loss is measured to see the level of error in the segmentation stage in recognizing features from the image.Graph of accuracy and loss values in the segmentation stage training process can be seen in FIGURE 10.
In FIGURE 11  In FIGURE 10(a), it can be seen that the accuracy value in the training process is above 90%.Accuracy values in training data and validation data continued to increase and began to stabilize from the 20th epoch.In FIGURE 10(b) it can be seen that the loss in the training process for training data and data validation continues to decrease towards a loss below 25%.In addition to accuracy and loss, at the segmentation training stage recall and precision measurements are also carried out.The measured recall indicates the success of the segmentation stage in recognizing each feature of the image correctly, while the precision is measured to see the success rate of the segmentation stage in recognizing each feature correctly compared to the features predicted correctly.The graph of recall and precision in the segmentation stage training process can be seen in FIGURE 11.

FIGURE 11. Graph of (a) Recall and (b) Precision in the Segmentation
Training Process In FIGURE 12(a) it can be seen that the F1-score in the training process is close to 100%.The F1-score for training data and validation data continued to increase and began to stabilize in the 5th epoch for training data and the 30th epoch for data validation.In FIGURE 12(b) it can be seen that Cohen's kappa in the training process for data training and data validation continues to increase above 85%.In the testing process of the segmentation stage, a comparison is made between the predicted results of the segmentation stage and the ground truth.A comparison between the results of segmentation and ground truth at the segmentation stage can be seen in Table 1.

No Original Image Prediction Results
Ground Truth TABLE 1, It can see a comparison of the results of segmentation and ground truth at the segmentation stage.At the segmentation stage, the resulting prediction results have a similar appearance to ground truth.This shows that the segmentation stage was successful in recognizing the features of the optic disc and optic cup from the retinal image according to ground truth.

C. CLASSIFICATION WITH XCEP-DENSE ARCHITECTURE
The classification stage consists of two processes, namely training, and testing.Before entering the training stage, the data was divided into two parts, namely 80%, namely 15,226 data as training data, and as much as 20%, namely 3,806 data as testing data.In the data training process, the data used will be trained using the Xcep-Dense model with each parameter used including the number of epochs of 200 and a batch size of 32.The labels used are 3 labels, namely advanced glaucoma, early glaucoma, and normal control.Then, a random split was performed on the training data into two parts, namely 90%, namely 13,703 data as training data, and as much as 10%, namely 1,523 data as validation data.Furthermore, the results of the split data are carried out by a training process for each layer which is carried out until the 200th epoch.In the training process, the accuracy value of the training data and data validation is measured to see the ability of the proposed classification model.In addition, the loss value is also measured to see the level of error between the predicted class and the actual class.Graph of accuracy and loss in the classification training process can be seen in FIGURE 13.In FIGURE 13(a) it can be seen that the accuracy in the training process is above 85%.The accuracy of the training data and data validation continues to increase and begins to stabilize from the 50th epoch.In  In addition to the accuracy and loss, at the classification training stage recall and precision measurements are also carried out.The recall indicates the success of the classification stage in correctly predicting the pixels of each label.Precision is a measure of how well the architecture correctly predicts each label compared to the correctly predicted result.The graph of recall and precision in the segmentation stage training process can be seen in FIGURE 14.In FIGURE 14(a) it can be seen that the recall value on the training data and data validation in the training process is very good.This can be seen from the results that continue to increase in each epoch.In the 100th epoch, the recall value has begun to stabilize and has not decreased.The recall value in the training process is 80%.In FIGURE 14  In the training process, the accuracy of the forecast results of a model used is measured, and the smaller the error results obtained, the better the model used by calculating the RMSE result.The RMSE shows that the difference between the predicted results and the actual results has a small error or error.The graph of the RMSE in the training process can be seen in FIGURE 16.

FIGURE 16. Graph of RMSE in the Classification Training Process
In FIGURE 16 it can be seen that the RMSE on the training data and data validation training process is very good.The RMSE obtained is close to 0 and continues to decrease at each epoch.In the 100th epoch, the RMSE has begun to stabilize and no longer increases.The RMSE indicates that the image quality is very good.At the testing stage, predictions are made based on the results of the best weight that has been obtained at the training stage.The testing data obtained from the split data is 3,806 data.The data is used to see the results of the model's performance in classifying images.At this stage, the results of model performance will be obtained in the form of accuracy, recall, precision, F1-score, and Cohen's kappa.A comparison of the performance results for each label is shown in FIGURE 17.
Based on FIGURE 17, each label is marked with orange advanced glaucoma, yellow early glaucoma, and green normal eyes.In the graph, it can be seen that the results of the green early glaucoma label have smaller results compared to other labels.This occurs because the initial data before augmentation on the early glaucoma label is less than the other label data.The highest accuracy, recall, F1-score, and Cohen's Kappa are owned by the advanced glaucoma label, but the precision on the advanced glaucoma label is still below the precision on the normal control label.The precision on the normal control label has the highest value compared to the precision on the other labels.The results show that Xcep-Dense is excellent for classifying glaucoma from the data provided.

FIGURE 17. Comparison of results Classification performance on each label
At the testing stage, the AUC for each class was also measured.The AUC obtained for each class can be seen in the ROC graph in FIGURE 18.Based on FIGURE 18, it can be seen that the higher the false positive rate, the higher the true positive rate for each label.The ROC graph shows the performance of the proposed classification model at all classification thresholds.On the ROC chart, there are AUC values for each label.The AUC indicates the model's ability to group each label.The lowest AUC is owned by the normal control label and the highest is obtained by the advanced glaucoma label.

D. ANALYSIS AND DISCUSSION
In this study, two stages were carried out in detecting glaucoma which consisted of segmentation and classification stages.At the segmentation stage, the performance results on data testing obtained accuracy, recall, precision, and an F1score of 98% while Cohen's kappa was 88%. the result of Cohen's Kappa indicates that there is a very high degree of agreement between actual observations and predictive observations than expected.F1-score above 98% indicates that the U-Net architecture works very well in optical disc and optical cup segmentation.The F1-score results also show that the model obtained has a good balance between

FIGURE 2 .
FIGURE 2. Views of the Optic Disc and Optic Cup on the Retina Image in Detail

FIGURE 6 .
FIGURE 6.Comparison of Display of Image Channels (a) Original image (b) Red Channel (c) Green Channel and (d) Blue Channel for  = 1,2, … ,  and  = 1,2, … , , where  , is the convolution matrix entry in the baris -th row, -th column,  +,+ is the input matrix entry  + -row,  + -th column,  +1,+1 is the kernel matrix entry  + 1-th row,  + 1-th column, and   is the bias for the -th kernel.Then the process of calculating the ReLu activation function is carried out from the results of the convolution layer.Rectified Linear Unit (ReLU) is one of the activation functions used in CNN where if the input of the activation function is negative then the output changes to zero.Meanwhile, if the input of the activation function is positive, then the output is the value of the input itself.Mathematically, ReLU can be defined in Equation(2). , = ( , ) = max(0, ) = {  ,   , ≥ 0 0   , < 0 (2) After the segmentation stage, a combination of Xception and Dense Block architectures is carried out.The combination of the Xception and Dense Block architectures is done by adding a dense block at the end of the Xception model after the flattening process to overcome the vanishing gradient problem caused by the use of skip connections.The combination of the Xception and Dense Block architectures forms a new architecture, namely Xcep-Dense which can be seen in FIGURE 8.

FIGURE 8 .
FIGURE 8. Xcep-Dense Architecture Based on FIGURE 8, the basic architecture used is Xception where the Xception architecture consists of 3 main blocks, namely entry flow, middle flow, and exit flow.Each block in Xception uses a depthwise separable convolution in its architecture which has a residual connection in it.The

FIGURE 9 .
FIGURE 9. Image Preprocessing Results in Stages (a) Augmentation (b) BGR (c) RGB and (d) Blue Channel (a) it can be seen that the recall in the training process is close to 100%.The recall on training data and validation data continues to increase and begins to stabilize in the 5th epoch for training data and the 30th epoch for data validation.In FIGURE 11(b) it can be seen that the precision value in the training process for training data and validation data continues to increase above 95% and begins to stabilize in the 10th epoch.At the segmentation training stage, F1-score and Cohen's kappa measurements were also carried out.The F1-score is the average of the weighted recall and precision values.Cohen's kappa indicates a measure of the degree of agreement between the predicted features produced and the actual features on a nominal scale.The graph of the F1-score and Cohen's kappa in the segmentation stage of the training process can be seen in FIGURE 12.

FIGURE 10 .
FIGURE 10.Graph of (a) Accuracy and (b) Loss in the SegmentationTraining Process

FIGURE 12 .
FIGURE 12. Graph of (a) F1-Score and (b) Cohen's Kappa in the Segmentation Training Process FIGURE  13(b)  it can be seen that the loss value in the training process for training data and data validation continues to decrease below 20%.

FIGURE 13 .
FIGURE 13.Graph of (a) Accuracy and (b) Loss in the ClassificationTraining Process

FIGURE 14 .
FIGURE 14. Graph of (a) Recall and (b) Precision in the ClassificationTraining Process (b) it can be seen that the precision value of the training data and data validation in the training process is 88%.The precision value begins to stabilize at the 60th epoch.During the training process, the F1-score and Cohen's kappa were also measured.The F1-score is the average of the weighted recall and precision values.Cohen's kappa shows a measure of the degree of agreement between the predicted results generated with the actual label on a nominal scale with two or more classes.The graph of the F1-score and Cohen's kappa in the training process can be seen in FIGURE 15.In FIGURE 15(a) it can be seen that the F1-score on the training data and data validation training process continues to increase in each epoch.The F1-score at the 100th epoch has begun to stabilize and has not decreased.In FIGURE 15(b) it can be seen that Cohen's kappa in the training data and data validation in the training process is towards 80%.However, the graph of Cohen's kappa is not overfitting and continues to increase in each epoch and begins to stabilize at the 100th epoch.

FIGURE 15 .
FIGURE 15.Graph of (a) F1-Score and (b) Cohen's Kappa in the Classification Training Process