QCML: Qualified Contrastive Machine Learning methodology for infectious disease diagnosis in CT images

The COVID-19 pandemic has had a terrible effect on human health, and computer-aided diagnostic (CAD) systems for chest computed tomography have emerged as a potential alternative for COVID-19 diagnosis. Yet, since the cost of data annotation may be excessively costly in the medical area, there is a shortage of data that has been annotated. A considerable quantity of labelled data is required in order to train a CAD system to a high level of accuracy. The study aims to describe an automatic and precise COVID-19 diagnostic method that utilizes a restricted amount of labelled CT images to solve this problem. The framework of the system is known as Qualified Contrastive Machine Learning (QCML), and the improvements that we have made may be summed up as follows: 1) In order to make use of all of the image's characteristics, we combine features with a two-dimensional discrete wavelet transform. 2) We employ the COVID-Net encoder with a redesign that focuses on the efficiency of learning and the task specificity of the data. 3) In order to strengthen our capacity to generalize, we have implemented a novel pertaining technique that is based on Qualified Contrastive Machine Learning. 4) In order to get better categorization results, we have included an extra auxiliary work. The application of Qualified Contrastive Machine Learning methodology for infectious disease diagnosis in CT images offers an accuracy of 93.55%, a recall of 91.59%, a precision of 96.92%, and an F1-score of 94.18%, demonstrating the potential for accurate and efficient COVID-19 diagnosis with limited labelled data.


I. INTRODUCTION
Coronavirus (Covid- 19), which initially appeared in Wuhan, China in December 2019, and is swiftly becoming a worldwide pandemic [1] is caused by the severe acute respiratory syndrome coronavirus 2 (SARS-COV-2), which is responsible for producing severe acute respiratory syndrome.More than 410 million cases have been documented up to this point, and about 6 million fatalities have been attributed to the disease, as stated by the World Health Organization (WHO) [2].Because of the rising number of fatalities and cases, the World Health Organization (WHO) designated the coronavirus illness as a Covid-19 pandemic in March of 2020.This caused a number of nations to shut down their borders and institute curfews as preventative measures [3].Covid-19 predominantly affects the respiratory system, such as the lungs, and often produces symptoms that are comparable to those of pneumonia [4].Fever, coughing, sneezing, and shortness of breath are among of the most common symptoms, and the disease is very infectious because it spreads by respiratory droplets, such as when an infected person coughs or sneezes.The elderly and those who suffer from ongoing medical conditions are at a greater risk of contracting the Covid-19 infection.
In addition to RT-PCR testing, imaging techniques such as computed tomography (CT) and chest X-ray (X-ray) are used for the purpose of identifying the virus known as Covid-19, which mostly affects the lungs [5].Patients with pneumonia caused by Covid-19 may be accurately diagnosed with the use of CT imaging to identify characteristic radiographic findings.Yet, in order to properly analyse these photos, specialised medical doctors are necessary.CT and X-ray imaging both offer certain benefits in early detection in comparison to RT-PCR assays, but both methods also have some limitations.Artificial intelligence (AI) and deep learning (DL) techniques are being investigated as a potential alternative to help in the early diagnosis of Covid-19 and to expedite the treatment process.This will be accomplished by enabling medical professionals to quickly and accurately diagnose the disease utilising CT and X-ray images [13][14][15].Researchers make extensive use of artificial intelligence and deep learning approaches for the identification of Covid-19 infection using X-ray and CT images.In comparison to more conventional approaches, the adoption of deep learning techniques has increased significantly because of their superior efficiency [16].Deep learning architectures, in contrast to machine learning and conventional approaches, do not need the extraction of features from the data during the pre-processing step.This is one of the most significant factors that drove academics to focus their attention on this area of study.
Deep hybrid learning (DHL) and deep boosted hybrid learning (DBHL) are the names of two novel deep learningbased models that were suggested for successful detection of Covid-19 in X-ray datasets in another work that was conducted by [17].According to the findings of this research, which makes use of X-ray pictures of two different classes (Covid and Non-Covid), the method has an accuracy of 98.53%.According to the findings of this research, doing performance evaluations using just binary class X-ray pictures is detrimental to the overall performance of the designs being investigated.So it is essential to make use of a variety of data sets while attempting to discover Covid-19 [18].The number of classes and the kind of image both have a role in the degree to which a CNN architecture is successful.
Our strategy takes use of data obtained from a variety of sources and overcomes the problem of a lack of large-scale labelled data by using these data.To be more specific, we employ the recently specifically created COVID-Net [19] for experimental research.The architecture of this network has been redesigned to better accommodate the needs of our project.In addition, while we are working on the downstream job of the system that we have suggested, we have included a cooperative learning method that utilises a contrastive learning object [20].An approach like this one not only boosts speed, but it also gathers semantic representations that are very close to one another yet have hazy borders inside a category domain like this one.In order to stabilise the pipeline and capture additional characteristics in advance of training, the DWT technique will be deployed.Using a publicly available COVID-19 CT dataset, we conduct in-depth experiments to test and assess our methodology for COVID-19 identification tasks [21].According to the findings, our method is successful in obtaining substantial CT feature representations, which allows appropriate classifications to be made between COVID-19 patients and normal patients using the annotated data source.The research gap in this context is the requirement for an automated and accurate technique for diagnosing COVID-19 using a small number of annotated CT scans and a large number of unlabeled CT images.Existing approaches might struggle due to a lack of labelled data, limiting their ability to properly diagnose the condition.The suggested method addresses this gap by combining QCML and discrete wavelet transform to increase the accuracy and efficiency of COVID-19 detection.The most important findings and contributions of this research are presented in the following points, 1.To address the issue of inadequate labelled data, an automated and precise method for diagnosing COVID-19 based on a small number of labelled CT scans and a large number of unlabeled CT images is required.2. In order to make the most of all of the characteristics included inside the pictures, we combine QCML with discrete wavelet transform.3. The process involves extracting both spectral and temporal features from a CNN that has been trained using heatmaps derived from a multilevel discrete wavelet transform (DWT).These spectral-temporal features are then combined with spatial features that are extracted from another CNN that was trained using the original multiview CT images.This integration of spectraltemporal and spatial features is intended to improve the accuracy and effectiveness of the overall classification or analysis task.The paper is structured as follows: In Section 2, the methods, datasets, and proposed pipeline are described in detail.This section provides a comprehensive overview of the approach taken by the authors in their research.Section 3 outlines the experimental setup, including the parameters that were adjusted for the CNNs and the metrics that were used for evaluation.This section provides a clear understanding of the experimental design and the specific factors that were considered in the study.The results are presented in Section 4, which highlights the key findings of the research.This section provides a detailed analysis of the performance of the proposed method, including any limitations or challenges that were encountered.Section 5 discusses the results in more detail, providing a critical analysis of the findings and their implications.This section also highlights the contributions of the research and its potential future applications.Finally, Section 6 concludes the paper by summarizing the main findings, highlighting the significance of the research, and providing recommendations for future work.

II. PROPOSED MODEL
Convolutional Neural Networks, also known as CNNs, are powerful models that can achieve high levels of accuracy when classifying data in multi-class problems.One of the main advantages of CNNs is their ability to learn and improve their classification performance over time without human intervention.The architecture of a CNN is based on a coordinated arrangement of multilayer perceptrons, where each neuron in one layer is connected to all the neurons in the next layer.This highly interconnected structure allows CNNs to extract and identify features from input data with remarkable accuracy.Sample Image Dataset is shown in FIGURE 1. DataSet contains 6432 x-ray images for training and 1286 images for testing.Chest X-ray images are typically 2D and can vary in resolution, but a common size is around 1024x1024 pixels.https://www.kaggle.com/datasets/prashant268/chest-xray-covid19-pneumonia.
The building blocks of a CNN are the convolutional layer and the rectified linear unit (ReLu).The convolutional layer is the most important component of a CNN, responsible for extracting key features from the input data.The ReLu is an activation function that helps to increase the non-linearity of the CNN, allowing it to better represent complex relationships between features.Overall, the architecture and design of CNNs enable them to perform exceptionally well in complex classification tasks, making them a popular choice for image recognition, natural language processing, and other AI applications.
The input data is processed by convolving it with nuclei using convolution filters that span across the entire visual field.As a result, the convolutional operation enables the extraction of both simple, small patterns and more complex, detailed patterns from the input data.This hierarchical network structure allows for the extraction of the most important feature maps, improves the generalization capability of the model, and reduces the computational complexity required for training and inference [22].The convolution operation is a fundamental mathematical operation that underlies this process, and it can be expressed mathematically as [23].The proposed model is shown in FIGURE 2.

A. WAVELET DECOMPOSITION
It is possible to localize a signal in both the time domain and the frequency domain using the wavelet transform as shown in FIGURE 3 and FIGURE 4, which makes it a helpful tool for signal classification.This feature sets the wavelet transform apart from other techniques of transformation.Wavelet is used for analysing CT scans for the primary reason that it can efficiently detect obscure or hidden characteristics within the pictures.This is the major reason why wavelet is selected.By scaling and shifting a mother wavelet that already exists, it is possible to generate a wavelet of a higher order as shown in Eq. ( 1) and Eq. ( 2) [24].

FIGURE 3. Wavelet Decomposition
A higher-order wavelet is derived from a fixed mother wavelet by applying scaling and shifting operations.Let's assume () represents a continuous function with square integrability.In the context of a real-valued wavelet (), we define the continuous wavelet transform of () using Eq. ( 4) and ( 5) [26].These equations capture the mathematical representation of the transformation process, where the wavelet is scaled and shifted to analyse different components of the input function ().The parameters of  and  are the wavelet scale and translation factors, respectively.Orthogonal wavelets Eq. ( 6) validated by 2  carry signal variations at the resolution 2 − [27].
where  , refers to the low-frequency component and  , refers to the high-frequency component.we can deduce the IDWT Eq. ( 7) process to reconstruct  from  , and  , [28].

B. ENCODERS
A traditional encoder is designed to be versatile and compatible with various types of network architectures, allowing it to be applied to different types of tasks.However, the differences between different encoders can vary significantly, ranging from substantial variations to minor variances.In our specific research, we chose to utilize COVID-Net, a popular and well-known encoder, as our chosen encoders  and .These encoders were trained to learn representations based on lung CT images for the purpose of COVID-19 detection.This architectural choice allows for effective representation learning while controlling the computational complexity.The architectural advantages of COVID-Net, particularly its lightweight residual layers, align well with the internal structure of our QCML method.This alignment contributes to the successful integration of COVID-Net as the encoder in our research framework as shown in FIGURE 5. COVID-Net was first designed for the purpose of categorising CXR pictures; nevertheless, it has the potential to be beneficial for improving pattern recognition due to the fact that CT scans include more exact features than CXR images do.On the other hand, if there isn't any adequate coordination, this variation might result in greater computational load and a reduction in output accuracy.In order to solve this problem, we have included batch normalisation (BN) into the COVID-Net architecture.This will help minimise the amount of internal covariate shift (ICS) [24] and increase representation learning stability while the network is being trained.After the input photos and at each central hub in the top sector of the architecture, BN layers have been purposefully created.This is due to the fact that the bottom sector is filled to the brim with convolution layers, and the addition more BN layers would lead to a considerable rise in the number of parameters as well as the complexity of the calculation.We choose a neuron   at random from a layer to guarantee that the amount of processing that is used is as efficient as possible.The BN operation may be stated mathematically as represented in Eq. ( 8) [29].
=    ̂ +   , ℎ  ̂ =   −(  ) E(.) and Var(.) denote the average and variation of the input neurons, respectively.On the other hand, γ and β are trained parameters from layers that are utilized to revert the transformation of the activation and make the network more flexible.

C. QUALIFIED CONTRASTIVE MACHINE LEARNING (QCML)
In recent years, contrastive machine learning has been an increasingly prominent study subject, particularly when applied to the processing of medical images.This trend is especially evident in the context of the medical field.
Extracting meaningful representations from unlabelled information by using a variety of pretext tasks, such as mask image modelling and contrastive machine learning, is the goal of this methodology, which is based on deep learning networks.The auxiliary work is finished off by designing a loss function, which also serves to fine-tune the features that will be used in the subsequent tasks, such as semantic segmentation, object identification, and picture classification [25].We have decided to use a contrastive task as the contrastive machine learning pretext task for the COVID-19 diagnostic downstream task.This choice was made in order to maximise accuracy.Contrastive machine learning is one of the most successful ways of contrastive machine learning because it permits the learning of excellent representations without the need for annotated datasets.This makes it one of the most effective methods of contrastive machine learning.
In this method, several unlabelled photographs are compared to one another, and the contrastive loss is determined by analysing the degree to which the images are alike and differently from one another [26].After the pre-training phase, which took place at the QCML stage, the encoder was moved into the classification stage in preparation for the COVID-19 detection in input images.In contrast to the contrastive loss that we imposed during the contrastive machine learning, we will now add a similarity function in the following manner as shown in Eq. ( 9) [30].
The notation (u, v) refers to a pair of embedding characteristics that were taken from the classifier's average pooling layer.For the sake of dimensionality reduction, we make the decision to implement an extra projection micro network Z(.).As a result of handling things in this manner, the dimension of the embedding's is brought down to 128, and the similarity function is modified to read as Eq. ( 10) [31].
where I is intended to function as an indicator and may take on the value 0 or 1 depending on whether or not (u, v) is positive or negative.Each batch of data will be entered into a calculation that will simultaneously determine the crossentropy loss and the contrastive loss.By using our strategy, it is possible to improve the performance of the model in a manner that is invariant to the domain, which in turn improves the performance of the bi-classification.

Algorithm:
1. Let the batch size N s & N c , Dataset I D 1 , Dataset D 2 , Temperature τ, fq, fk, wavelet filter Tf, g, augmentation function T, dictionary q and the projection head z be the input.Enqueue (q, z2i -1); Dequeue (q); end for return fk; The various datasets heat maps are shown in FIGURE 6 and FIGURE 7.

III. TRAINING AND OPTIMIZATION OF THE PROPOSED ARCHITECTURE
The architecture that has been proposed is subjected to a training process facilitated by a backpropagation algorithm.This algorithm is responsible for adjusting the network's weights and biases to optimize its performance.In the case of training multi-class datasets, the chosen cost function is Cross-Entropy Eq. ( 12).This particular cost function is designed to measure the dissimilarity between the predicted output probabilities and the true labels of the multi-class dataset.It takes into account the logarithm of the predicted probabilities, ensuring that the model is penalized for incorrect predictions and encouraged to make more accurate ones.On the other hand, when dealing with two-class datasets, the binary cross-entropy Eq. ( 13) is employed as the cost function.This specific cost function is tailored to the nature of binary classification tasks.It evaluates the discrepancy between the predicted probabilities and the true binary labels, aiming to minimize the divergence between them.By calculating the logarithm of the predicted probabilities, it effectively captures the errors made by the model and guides the learning process towards achieving better classification performance [23].
In the given context, the variable "n" represents the total number of samples within the dataset.The variable "y" denotes the true or actual value associated with each sample, while "y-hat" (yˆ) represents the predicted value for the corresponding sample.To update the weights within the architecture during the training process, the Adam optimization algorithm is utilized.This algorithm, referenced as "Adam optimization [27]," is a popular choice for optimizing neural network models.It combines the advantages of both adaptive gradient descent (AdaGrad) and root mean square propagation (RMSprop) algorithms.During the optimization process, the Adam algorithm employs a learning coefficient, denoted as η (eta), which determines the step size or rate at which the weights are updated [28,29].The learning coefficient η can vary at different time steps during the training process, denoted as t.The specific value of η at a given time step t is determined by the Adam optimization algorithm as represented an Eq. ( 14) -Eq.( 16) [24].
In the provided explanation, the variable "w" represents the weights within the architecture.The hyperparameters β1 and β2 correspond to specific coefficients used in the Adam optimization algorithm.The coefficient η denotes the learning rate, which determines the step size for weight updates during training, with its value varying at different time steps denoted as t.The gradient at time step t is represented by g t .Furthermore, Vt and St symbolize the exponential moving averages of the gradients and the squares of gradients, respectively, along with the weights wt.Within the proposed architecture, the rectified linear unit (ReLU) activation function is employed after each convolution operation.This activation function is described mathematically in Eq. ( 17) referenced as [30,31].The ReLU activation function is widely used in neural networks and is defined as follows: Fully connected layers (FC) are integral components of convolutional neural network (CNN) architectures.In FC layers, every neuron in the preceding layer is connected to every neuron in the subsequent layer.These connections enable the calculation of the degree to which each value aligns with a particular class or category.In the architecture's final layer, the output of the FC layer is combined with activation functions such as sigmoid, support vector machine (SVM), softmax, and others to facilitate the prediction of classes.In the specific study mentioned, the softmax activation function is utilized for classification purposes.The softmax activation function computes a probability distribution for a given set of output categories.This distribution assigns probabilities to each category based on the input values.Eq. ( 18) referenced as [23] presents the mathematical representation of the softmax activation function used in this study.
In the given context, x represents the input vector to the softmax activation function.The variable n corresponds to the number of classes or categories in the classification task.The index k ranges from 1 to n indicating each individual class.The output vector of the softmax activation function is denoted as Z.It consists of n elements, with each element representing the probability or likelihood of the input belonging to the corresponding class k.One of the key properties of the softmax function is that the sum of all the elements in the output vector Z is equal to 1.This property ensures that the output vector represents a valid probability distribution over the classes, as probabilities must sum up to 1.

IV. RESULTS
Typically, studies that involve only two classes tend to have higher success rates compared to studies involving multiple classes.In comparison, the proposed architecture gave a rate of 99.85% accuracy.Nevertheless, the success rates of these approaches are contingent on the datasets that were used by the researchers.This includes the quantity of samples and classes that were included within the datasets, both of which might have an effect on the effectiveness of the designs.Eq. ( 19), Eq. ( 20), and Eq. ( 21) explain how classification algorithms are assessed based on accuracy, sensitivity, and specificity [24,25].

V. DISCUSSION
The Dataset-X-ray dataset was employed in the third experimental investigation that was conducted on identifying Covid-19.This dataset is comprised of four classes.The findings of the research are shown in TABLE 4. The proposed method QCML had the maximum performance in terms of accuracy (96.69%), precision (0.99), recall (0.98), F1-score (0.98).When it comes to classification and recognition tasks, convolutional neural networks, often known as CNNs, are frequently used.These networks make use of layers that are completely linked and are made up of feature maps that are produced using the convolution technique.These feature maps, which assist identify the characteristics recognised or preserved within the input, are produced by applying the filters that are employed in this procedure to the input picture.This produces the feature maps.Although while the feature maps that are closest to the output tend to capture more broad information, the network was built to recognise even the smallest of details in the picture that is being fed into it.The initial two layers of the convolution process each highlight a particular aspect of the pictures, which ultimately results in visuals that may be understood.These important traits are referred to as attributes, and although humans may have difficulty comprehending them, CNN models are able to understand them.Also, when feature maps progress deeper into the network, they have a tendency to expose less and fewer details, despite the fact that these details are relevant characteristics that are used in the decision-making process by CNN models.The heat maps of proposed model is shown in FIGURE 9.
X-Ray Images Heat Map GAP CT -Images Heat Map GAP When the findings of experimental research are considered as a whole, it can be observed that it makes accurate predictions of X-ray and CT pictures.As comparison to Xray pictures, computed tomography (CT) images proved to be more successful.One possible explanation for this is that CT scans have a higher level of sensitivity and a more refined level of detail.FIGURE 10,FIGURE 11,and FIGURE 12 represents the performance comparison of all three datasets.

VI. CONCLUSION
The study aims to address the urgent need for accurate and efficient diagnostic methods for COVID-19, given the rising CovXNet [22] CoroNet [20] CovidXrayNet [24] DarkCovidNet [31] Proposed QCML model CovXNet [22] CoroNet [20] CovidXrayNet [24] DarkCovidNet [31] Proposed QCML model CovXNet [22] CoroNet [20] CovidXrayNet [24] DarkCovidNet [31] Proposed QCML model number of cases and associated deaths.To this day, it has been responsible for the illness of millions of people and the death of millions more.The appearance of new strains of this disease presents significant dangers to human health, since the illness is known to be associated with a variety of adverse health effects.A number of states have taken a variety of preventative actions to stop the spread of the illness and cut down on the number of fatalities.In most cases, RT-PCR assays are used in order to make a diagnosis of this condition.Yet, there are a number of drawbacks associated with it, including the insufficiency of RT-PCR assays, the danger of transmission to healthcare professionals, the discomfort experienced by patients, and the expense.In this context, a variety of research projects are carried out, and a variety of potential solutions are proposed.One of these research looks at deep learning architectures that have a high level of performance.With an accuracy of 99.85%, a recall of 91.59%, a precision of 96.92%, and an F1-score of 94.18%, our suggested system demonstrated its better performance in comparison to other current systems.In future, it could be developed as an epidemiological models that can predict disease spread, identify high-risk populations, and inform public health interventions to control outbreaks effectively.

FIGURE 5 .
FIGURE 5. Network structure of the feature encoding block.

FIGURE 11 .FIGURE 12 .
FIGURE 11.Performance comparison of proposed model using dataset II.

TABLE 2 Performance comparison with dataset i
The results of an experimental investigation employing the dataset I, which comprises of CT scans utilised for identifying Covid-19, are shown in TABLE2.This table contains the information.The model earned the greatest performance overall across all measures, resulting in a success rate of one hundred percent.The confusion matrix of the proposed system is shown in FIGURE8.In the second application, experimental research was carried out by integrating the dataset I and dataset II, both of which comprise CT scans of Covid-19 and normal instances.This was done so that the results of the study could be used to the second application.The findings are shown in TABLE3, where it is possible to see that the proposed method QCML was successful in reaching the greatest possible success rate of 95.52% according to the accuracy metric.