https://jeeemi.org/index.php/jeeemi/issue/feed Journal of Electronics, Electromedical Engineering, and Medical Informatics2025-07-16T18:06:45+07:00Dr. Triwiyantoeditorial.jeeemi@gmail.comOpen Journal Systems<p>The Journal of Electronics, Electromedical Engineering, and Medical Informatics, (JEEEMI), is a peer-reviewed periodical scientific journal aimed at publishing research results of the Journal focus areas. The Journal is published by the Department of Electromedical Engineering, Health Polytechnic of Surabaya, Ministry of Health, Indonesia. The role of the Journal is to facilitate contacts between research centers and the industry. The aspiration of the Editors is to publish high-quality scientific professional papers presenting works of significant scientific teams, experienced and well-established authors as well as postgraduate students and beginning researchers. All articles are subject to anonymous review processes by at least two independent expert reviewers prior to publishing on the International Journal of Electronics, Electromedical Engineering, and Medical Informatics website.</p>https://jeeemi.org/index.php/jeeemi/article/view/799Predicting Construction Costs with Machine Learning: A Comparative Study on Ensemble and Linear Models2025-07-04T21:29:11+07:00Lifei Chen1002372862@ucsiuniversity.edu.mySew Sun Tiangtiangss@ucsiuniversity.edu.myKim Soon ChongChongKS@ucsiuniversity.edu.myAbhishek Sharmaabhishek15491@gmail.coTarek Berghoutt.berghout@univ-batna2.dzWei Hong Limlimwh@ucsiuniversity.edu.my<p>Accurate prediction of construction costs plays a pivotal role in ensuring successful project delivery, influencing budget formulation, resource allocation, and financial risk management. However, traditional estimation methods often struggle to handle complex, nonlinear relationships inherent in construction datasets. This study proposes a process innovation by systematically evaluating six machine learning (ML) models, i.e., Ridge Regression, Lasso Regression, Elastic Net, K-Nearest Neighbors (KNN), XGBoost, and CatBoost, on a standardized RSMeans dataset comprising 4,477 real-world construction data points. The primary aim is to benchmark the predictive performance, generalizability, and stability of both linear and ensemble models in construction cost forecasting. Each model is subjected to rigorous hyperparameter tuning using grid search with 5-fold cross-validation. Performance is assessed using <em>R</em>² (coefficient of determination), RMSE (root mean squared error), and MBE (mean bias error), while confidence intervals are computed to quantify predictive uncertainty. Results indicate that linear models achieve modest accuracy (<em>R</em>² ≈ 0.83), but struggle to model nonlinear interactions. In contrast, ensemble-based models significantly outperform , i.e., XGBoost and CatBoost achieve R² values of 0.988 and 0.987, respectively, RMSE values below 0.5, and near-zero MBE. Moreover, confidence interval visualization and feature importance analysis provide transparency and interpretability, enhancing the models practical applicability. Unlike prior studies that compare models in isolation, this work introduces a unified, interpretable framework and highlights the trade-offs between accuracy, overfitting, and deployment readiness. The findings have real-world implications for contractors, project managers, and cost engineers seeking reliable, data-driven decision support systems. In summary, this study present a scalable and robust ML-based framework that facilitate process innovation in construction cost estimation, paving the way for more intelligent, efficient, and risk-aware construction project management.</p>2025-05-20T00:00:00+07:00Copyright (c) 2025 Lifei Chen, Sew Sun Tiang, Kim Soon Chong, Abhishek Sharma, Tarek Berghout, Wei Hong Limhttps://jeeemi.org/index.php/jeeemi/article/view/822Power-Efficient 8-Bit ALU Design Using Squirrel Search and Swarm Intelligence Algorithms2025-05-31T18:48:10+07:00Ashish Pasayaashishec447@gmail.comSarman Hadiaasso_s_k_hadia@gtu.edu.inKiritkumar Bhattkrbhatt2022@gmail.com<p><strong>The Arithmetic Logic Unit (ALU) serves as a core digital computing element which performs arithmetic functions along with logic operations. The normal operation of ALU designs leads to increased power consumption because of signal redundancy and continuous operation when new data inputs are unavailable. The research implements the Squirrel Search Algorithm (SSA) combined with Swarm Intelligence Algorithm (SIA) for 8-bit ALU optimization to achieve maximum resource efficiency alongside computational accuracy. The optimization properties of SSA and SIA make them ideal choices for digital circuit design applications because they yielded successful results in power-aware systems. The proposed method utilizes SSA-based conditional execution paired with SIA-based transition minimization to direct operations to execute only during fluctuating input data conditions thus eliminating undesired calculations. Studies confirm SSA and SIA function more effectively than distributed clock gating for power saving because they enable runtime-dependent optimization without creating significant computational overhead. The experimental Xilinx Vivado tests executed on an AMD Spartan-7 FPGA (XC7S50FGGA484) running at 100 MHz frequency established that SSA eliminates power consumption from 6 mW to 2 mW, and SIA achieves a power level of 4 mW. The SSA algorithm generates worst negative slack (WNS) values of 8.740 ns which SIA produces as 6.531 ns improving system timing performance. SSA-optimized ALU requires the same number of LUTs as the unoptimized design at 42 LUTs yet SIA uses 50 LUTs because of added logical elements. We observe no changes in flip-flop use during SSA where nine FFs remain yet SIA shows an increase in its usage up to 29 FFs due to input tracking. The study proves that bio-inspired methods create energy-efficient platforms which make them ideal for implementing ALU designs with FPGAs. Research studies demonstrate that hybrid swarm intelligence techniques represent an unexplored potential to optimize power-efficient architectures thus reinforcing their significance for future high-performance energy-efficient digital systems.</strong></p>2025-05-28T08:11:13+07:00Copyright (c) 2025 Sarman Hadia, Ashish Pasaya, Kiritkumar Bhatthttps://jeeemi.org/index.php/jeeemi/article/view/948Advanced Traffic Flow Optimization Using Hybrid Machine Learning and Deep Learning Techniques 2025-06-27T12:48:19+07:00Mohammed El Kaim Billahmohammed.kaimbillah@gmail.comAbdelfettah Mabroukmabroukdes@gmail.com<p><strong>Road traffic congestion remains a persistent and critical challenge in modern urban environments, adversely affecting travel times, fuel consumption, air quality, and overall urban livability. To address this issue, this study proposes a hybrid ensemble learning framework for accurate short-term traffic flow prediction across signalized urban intersections. The model integrates Random Forest, Gradient Boosting, and Multi-Layer Perceptron within a weighted voting ensemble mechanism, wherein model contributions are dynamically scaled based on individual validation performance. Benchmarking is performed against traditional and advanced baselines, including Linear Regression, Support Vector Regression, and Long Short-Term Memory (LSTM) networks. A real-world traffic dataset, comprising 56 consecutive days of readings from six intersections, is utilized to validate the approach. A robust preprocessing pipeline is implemented, encompassing anomaly detection, temporal feature engineering especially time-of-day and day-of-week normalization, and sliding window encoding to preserve temporal dependencies. Experimental evaluations on 4-intersection and 6-intersection scenarios reveal that the ensemble consistently outperforms all baselines, achieving a peak R² of 0.954 and an RMSE of 0.045. Statistical significance testing using Welch’s t-test confirms the reliability of these improvements. Furthermore, SHAP-based interpretability analysis reveals the dominant influence of temporal features during high-variance periods. While computational overhead and data sparsity during rare events remain limitations, the framework demonstrates strong applicability for deployment in smart traffic systems. Its predictive accuracy and model transparency make it a viable candidate for adaptive signal control, congestion mitigation, and urban mobility planning. Future work may explore real-time streaming adaptation, external event integration, and generalization across heterogeneous urban networks.</strong></p>2025-06-27T12:47:01+07:00Copyright (c) 2025 Mohammed El Kaim Billah, Abdelfettah Mabroukhttps://jeeemi.org/index.php/jeeemi/article/view/690Grad-CAM based Visualization for Interpretable Lung Cancer Categorization using Deep CNN Models2025-05-22T10:32:22+07:00Rashmi Mothkurrashmimothkur@gmail.comPullagura Soubhagyalakshmirashmimothkur@gmail.comSwetha C. B.rashmimothkur@gmail.com<p>The Grad-CAM (Gradient-weighted Class Activation Mapping) technique has loomed as a crucial tool for elucidating deep learning models, particularly convolutional neural networks (CNNs), by visually accentuating the regions of input images that accord most to a model's predictions. In the context of lung cancer histopathological image classification, this approach provides discernment into the decision-making process of models like InceptionV3, XceptionNet, and VGG19. These CNN architectures, renowned for their high performance in image categorization tasks, can be leveraged for automated diagnosis of lung cancer from histopathological images. By applying Grad-CAM to these models, heatmaps can be generated that divulge the areas of the tissue samples most influential in categorizing the images as lung adenocarcinomas, squamous cell carcinoma, and benign patches. This technique allows for the visualization of the network's focus on specific regions, such as cancerous cells or abnormal tissue structures, which may otherwise be difficult to explicate. Using pre-trained models fine-tuned for the task, the Grad-CAM method assesses the gradients of the target class concerning the final convolutional layer, generating a heatmap that can be overlaid on the input image. The results of Grad-CAM for InceptionV3, XceptionNet, and VGG19 offer distinct insights, as each model has unique characteristics. InceptionV3 pivots on multi-scale features, XceptionNet apprehend deeper patterns with separable convolutions, and VGG19 emphasizes simpler, more global attributes. By justaposing the heatmaps generated by each architecture, one can assess the model’s focus areas, facilitating better comprehension and certainty in the model's prophecy, crucial for clinical applications. Ultimately, the Grad-CAM approach not only intensify model transparency but also aids in ameliorating the interpretability of lung cancer diagnosis in histopathological image categorization.</p>2025-05-03T20:00:02+07:00Copyright (c) 2025 Rashmi Mothkur, Pullagura Soubhagyalakshmi, Swetha C. B.https://jeeemi.org/index.php/jeeemi/article/view/719Applied Machine Learning in EEG data Classification to Classify Major Depressive Disorder by Critical Channels2025-05-22T10:32:40+07:00Sudhir Dhekanesudhir.dhekane@djsce.ac.inAnand Khandareanand.khandare@thakureducation.org<p><strong>The electroencephalogram (EEG) stands out as a promising non-invasive tool for assessing depression. However, the efficient selection of channels is crucial for pinpointing key channels that can differentiate between different stages of depression within the vast dataset. This study outcome a comprehensive strategy for optimizing EEG channels to classify Major Depressive Disorder (MDD) using machine learning (ML) and deep learning (DL) approaches, and monitor effect of central lobe channels. A thorough review underscores the vital significance of EEG channel selection in the analysis of mental disorders. Neglecting this optimization step could result in heightened computational expenses, squandered resources, and potentially inaccurate classification results. Our assessment encompassed a range of techniques, such as Asymmetric Variance Ratio (AVR), Amplitude Asymmetry Ratio (AAR), Entropy-based selection employing Probability Mass Function (PMF), and Recursive Feature Elimination (RFE) where, RFE exhibited superior performance, particularly in pinpointing the most pertinent EEG channels while including central lobe channels like Fz, Cz, and Pz. With this accuracy between 97 to 99% is recorded by Electroencephalography Neural Network (EEGNet). Our experimental findings indicate that, models using RFE achieved enhancement in accuracy to classifying depressive disorders across diverse classifiers: EEGNet (96%), Random Forest (95%), Long Short-Term Memory (LSTM: 97.4%), 1D-CNN with 95%, and Multi-Layer Perceptron (98%) irrespective of central lobe incorporation. A pivotal contribution of this research is the development of a robust Multilayer Perceptron (MLP) model trained on EEG data from 382 participants, achieved accuracy of 98.7%, with a perfect precision score of 1.00, F1-Score of 0.983, and a Recall-Score of 0.966, to make it an enhanced technique for depression classification. Significant channels identified include Fp1, Fp2, F7, F4, F8, T3, C3, Cz, T4, T5, and P3, offering critical insights about depression. Our findings shows that, optimized EEG channel selection via RFE enhances depression classification accuracy in the field of brain-computer interface.</strong></p>2025-05-05T21:22:13+07:00Copyright (c) 2025 SUDHIR DHEKANE, Anand Khandarehttps://jeeemi.org/index.php/jeeemi/article/view/784Computational Analysis of Medical Image Generation Using Generative Adversarial Networks (GANs)2025-05-22T10:33:19+07:00Shrina Patelshrinapatel310@yahoo.comAshwin Makwanaashwinmakwana.ce@charusat.ac.in<p>The limited availability of diverse, high-quality medical images constitutes a significant obstacle to training reliable deep-learning models that can be used in clinical settings. The traditional methods used for data augmentation generate inadequate medical photos that result in poor model performance and a low rate of successful generalization. This research studies the effectiveness of DCGAN cGAN CycleGAN and SRGAN GAN architectures through performance testing in five essential medical imaging datasets, including Diabetic Retinopathy, Pneumonia and Brain Tumor and Skin Cancer and Leukemia. The main achievement of this research was to perform an extensive evaluation of these GAN models through three key metrics: generation results, training loss metrics, and computational resource utilization. DCGAN generated stable high-quality synthetic images, whereas its generator produced losses from 0.59 (Pneumonia) to 6.24 (Skin Cancer), and its discriminator output losses between 0.29 and 6.25. CycleGAN showed the best convergence potential for Diabetic Retinopathy with generator and discriminator losses of 2.403 and 2.02 and Leukemia with losses at 3.325 and 3.129. The SRGAN network produced high-definition images at a generator loss of 6.253 and discriminator loss of 6.119 for the Skin Cancer dataset. Still, it failed to maintain crucial medical characteristics in grayscale images. GCN exhibited stable performance across all loss metrics and datasets. The DCGAN model required the lowest computing resources for 4 to 7 hours, using 0.9M and 1.4M parameters. The framework of SRGAN consumed between 7 and 10 hours and needed 1.7M to 2.3M parameters for its operation, and CycleGAN required identical computational resources. DCGAN was determined as the ideal model for synthetic medical image generation since it presented an optimal combination of quality output and resource efficiency. The research indicates that using DCGAN-generated images to increase medical datasets serves as a solution for boosting AI-based diagnostic system capabilities within healthcare.</p>2025-05-08T22:19:35+07:00Copyright (c) 2025 Shrina Patel, Ashwin Makwanahttps://jeeemi.org/index.php/jeeemi/article/view/779Breast Cancer Classification on Ultrasound Images Using DenseNet Framework with Attention Mechanism2025-06-20T14:28:18+07:00Hanina Nafisa Azkahanina.nafisa.azka@student.uns.ac.idWiharto Wihartowiharto@staff.uns.ac.idEsti Suryaniestisuryani@staff.uns.ac.id<p><strong>Breast cancer is one of </strong><strong>the most prevalent and life-threatening diseases</strong><strong> among women worldwide. Early detection of breast cancer being critical for increasing survival rates. </strong><strong>Ultrasound image is commonly used for breast cancer screening</strong><strong> due to its non-invasive, safe, and cost-effective. However, ultrasound images are often of low quality and have significant noise, which can hinder the effectiveness of classification models. This study proposes an enhanced breast cancer classification model that leverages transfer learning in combination with attention mechanisms to improve diagnostic performance. The main contribution of this research is the introduction of Dense-SASE, a novel architecture that combines DenseNet-121 with two powerful attention modules: Scaled-Dot Product Attention and Squeeze-and-Excitation (SE) Block. These mechanisms are integrated to improve feature representation and allow the model to focus on the most relevant regions of the ultrasound images. The proposed method was evaluated on a publicly available breast ultrasound image dataset, with classification performed across three categories: normal, benign, and malignant. Experimental results demonstrate that the Dense-SASE model achieves an accuracy of 98.29%, a precision of 97.97%, a recall of 98.98%, and an F1-score of 98.44%</strong><strong>.</strong><strong> Additionally, Grad-CAM visualizations demonstrated the model's capability to localize lesion areas effectively, avoiding non-informative regions, and confirming the model's interpretability. In conclusion, the Dense-SASE model significantly improves the accuracy and reliability of breast cancer classification in ultrasound images. By effectively learning and focusing on clinically relevant features, this approach offers a promising solution for computer-aided diagnosis (CAD) systems and has the potential to assist radiologists in early and accurate breast cancer detection.</strong></p>2025-05-09T19:27:11+07:00Copyright (c) 2025 Wiharto, Hanina Nafisa Azka, Esti Suryanihttps://jeeemi.org/index.php/jeeemi/article/view/829Classification of Cervical Cell Types Based on Machine Learning Approach: A Comparative Study2025-07-04T21:56:33+07:00Wan Azani Wan Mustafawanazani@unimap.edu.myKhalis Khiruddinkhalisdanial@studentmail.unimap.edu.myKhairur Rijal Jamaludinkhairur.kl@utm.myFirdaus Yuslan Khusairis201361529@studentmail.unimap.edu.myShahrina Ismailshahrinaismail@usim.edu.my<p>Cervical cancer remains a major global health issue and is the second most common cancer affecting women worldwide. Early detection is crucial for effective treatment, but remains challenging due to the asymptomatic nature of the disease and the visual complexity of cervical cell structures, which are often affected by inconsistent staining, poor contrast, and overlapping cells. This study aims to classify cervical cell images using Artificial Intelligence (AI) techniques by comparing the performance of Convolutional Neural Networks (CNNs), Support Vector Machine (SVMs), and K-Nearest Neighbors (KNNs). The Herlev Pap smear image dataset was used for experimentation. In the preprocessing phase, images were resized to 100 × 100 pixels and enhanced through grayscale conversion, Gaussian smoothing for noise reduction, contrast stretching, and intensity normalization. Segmentation was performed using region-growing and active contour methods to isolate cell nuclei accurately. All classifiers were implemented using MATLAB. Experimental results show that CNN achieved the highest performance, with an accuracy of 85%, a precision of 86.7%, and a sensitivity of 83%, outperforming both SVM and KNN. These findings indicate that CNN is the most effective approach for cervical cell classification in this study. However, limitations such as class imbalance and occasional segmentation inconsistencies impacted overall performance, particularly in detecting abnormal cells. Future work will focus on improving classification accuracy, especially for abnormal samples , by exploring data augmentation techniques such as Generative Adversarial Networks (GANs) and implementing ensemble learning strategies. Additionally, integrating the proposed system into a real-time diagnostic platform using a graphical user interface (GUI) could support clinical decision-making and enhance cervical cancer screening programs.</p>2025-05-20T21:45:22+07:00Copyright (c) 2025 Wan Azani Mustafa, Khalis Khiruddin, Khairur Rijal Jamaludin, Firdaus Yuslan Khusairi and Shahrina Ismailhttps://jeeemi.org/index.php/jeeemi/article/view/893Dual Attention and Channel Atrous Spatial Pyramid Pooling Half-UNet for Polyp Segmentation2025-05-28T19:01:45+07:00Beatrix Datu Sarirabeatrixdatusarira@student.uns.ac.idHeri Prasetyoheri.prasetyo@staff.uns.ac.id<p><strong>Colorectal cancer (CRC) is a leading cause of cancer-related deaths, with two million cases detected in 2020 and causing one million deaths annually. Approximately 95% of CRC cases originate from colorectal adenomatous polyps. Early detection through accurate polyp segmentation is crucial for preventing and treating CRC effectively. While colonoscopy screening remains the primary detection method, its limitations have prompted the development of Computer-Aided Diagnostic (CAD) systems enhanced by deep learning models. This study proposes a novel neural network architecture called Dual Attention and Channel Atrous Spatial Pyramid Pooling Half-UNet (DACHalf-UNet) for medical polyp image segmentation that balances optimal performance with computational efficiency. The proposed model builds upon the U-Net framework by integrating Double Squeeze-and-Excitation (DSE) blocks in the encoder after the Ghost Module, Channel Atrous Spatial Pyramid Pooling (CASPP) in the bottleneck and decoder, and Attention Gate (AG) mechanisms within the architecture. DACHalf-UNet was trained and evaluated on the CVC-ClinicDB and Kvasir-SEG datasets for 70 epochs. Evaluations demonstrated superior performance with F1-Score and IoU values of 94.23% and 89.28% on CVC-ClinicDB, and 88.40% and 81.47% on Kvasir-SEG, respectively. Comparative analysis showed that DACHalf-UNet outperforms existing architectures including U-Net, U-Net++, ResU-Net, AGU-Net, CSAP-UNet, PRCNet, UNeXt, and UNeSt. Notably, the model achieves this performance with only 0.56 million trainable parameters and 30.29 GFLOPs, significantly reducing computational complexity compared to previous methods. These results demonstrate that DACHalf-UNet effectively addresses the need for accurate and efficient polyp segmentation, potentially enhancing CAD systems and contributing to improved CRC detection and treatment outcomes.</strong></p>2025-05-28T18:57:20+07:00Copyright (c) 2025 Beatrix Datu Sarira, Heri Prasetyohttps://jeeemi.org/index.php/jeeemi/article/view/713Performance Evaluation of Classification Algorithms for Parkinson’s Disease Diagnosis: A Comparative Study2025-05-30T19:29:30+07:00Dhiraj Baruahbaruahd5@gmail.comRizwan Rehmanrizwan@dibru.ac.inPranjal Kumar Borapranjaly2k@gmail.comPriyakshi Mahantapriyakshi.online@gmail.comKankana Duttakankanadutta@dibru.ac.inPinakshi Konwarpinakshikonwar@dibru.ac.in<p>Selection and implementation of classification algorithms along with proper preprocessing methods are important for the accuracy of predictive models. This paper compares some well-known and frequently used algorithms for classification tasks and performs in depth analysis. In this study we analyzed four most frequently used algorithm viz random forest (RF), decision tree (DT), logistic regression (LR) and support vector machine (SVM). To conduct the study on the well-known Oxford Parkinson’s disease Detection dataset obtained from the UCI Machine Learning Repository. We evaluated the algorithms' performance using six distinct approaches. Firstly, we used the classifiers where we didn’t used any method to enhance the performance of the classifier. Secondly, we applied Principal Component Analysis (PCA) to minimize the dimensionality of the dataset. Thirdly, we used collinearity-based feature elimination (CFE) method where we applied correlation among the features and if the correlation between a pair of features exceeds the threshold of 0.9, we eliminated one from the pair. Fourthly, we adopt synthetic minority oversampling technique (SMOTE) to synthetically increase the instances of the minority class. Fifth, we combined PCA+SMOTE and on sixth method, we combined CFE + SMOTE. The study demonstrates that SVM is highly effective for Parkinson’s disease classification. SVM maintained high accuracy, precision, recall and F1-score across various preprocessing techniques including PCA, CFE and SMOTE, making it robust and reliable for clinical applications. RF showed improved results with SMOTE. However, it experienced reduced performance with PCA and CFE, indicating its dependence on original feature interactions. DT benefited from PCA, while LR showed limited improvements and sensitivity to oversampling. These findings emphasize the importance of selecting appropriate preprocessing techniques to enhance model performance.</p>2025-05-30T19:23:32+07:00Copyright (c) 2025 Dhiraj Baruah, Rizwan Rehman, Pranjal Kumar Bora, Priyakshi Mahanta, Kankana Dutta, Pinakshi Konwarhttps://jeeemi.org/index.php/jeeemi/article/view/904Performance Comparison of Extreme Learning Machine (ELM) and Hierarchical Extreme Learning Machine (H-ELM) Methods for Heart Failure Classification on Clinical Health Datasets2025-06-01T05:49:00+07:00Ichwan Dwi Nugrahaichwandwinugraha@gmail.comTriando Hamonangan Saragihtriando.saragih@ulm.ac.idIrwan Budimanirwan.budiman@ulm.ac.idDwi Kartinidwikartini@ulm.ac.idFatma Indrianif.indriani@ulm.ac.idWahyu Caesarendrawahyucaesarendra@gmail.com<p>Heart failure is one of the leading causes of death worldwide and requires accurate and timely diagnosis to improve patient outcomes. However, early detection remains a significant challenge due to the complexity of clinical data, high dimensionality of features, and variability in patient conditions. Traditional clinical methods often fall short in identifying subtle patterns that indicate early stages of heart failure, motivating the need for intelligent computational techniques to support diagnostic decisions. This study aims to enhance predictive modeling for heart failure classification by comparing two supervised machine learning approaches: Extreme Learning Machine (ELM) and Hierarchical Extreme Learning Machine (HELM). The main contribution of this research is the empirical evaluation of HELM's performance improvements over conventional ELM using 10-fold cross-validation on a publicly available clinical dataset. Unlike traditional neural networks, ELM offers fast training by randomly assigning weights and analytically computing output connections, while HELM extends this with a multi-layer structure that allows for more complex feature representation and improved generalization. Both models were assessed based on classification accuracy and Area Under the Curve (AUC), two critical metrics in medical classification tasks. The ELM model achieved an accuracy of 73.95% ± 8.07 and an AUC of 0.7614 ± 0.093, whereas the HELM model obtained a comparable accuracy of 73.55% ± 7.85 but with a higher AUC of 0.7776 ± 0.085. In several validation folds, HELM outperformed ELM, notably reaching 90% accuracy and 0.9250 AUC in specific cases. In conclusion, HELM demonstrates improved robustness and discriminatory capability in identifying heart failure cases. These findings suggest that HELM is a promising candidate for implementation in clinical decision support systems. Future research may incorporate feature selection, hyperparameter optimization, and evaluation across multi-center datasets to improve generalizability and real-world applicability.</p>2025-05-31T17:01:08+07:00Copyright (c) 2025 Ichwan Dwi Nugraha, Triando Hamonangan Saragih, Irwan Budiman, Dwi Kartini, Fatma Indriani, Wahyu Caesarendrahttps://jeeemi.org/index.php/jeeemi/article/view/704Advancement of Lung Cancer Diagnosis with Transfer Learning: Insights from VGG16 Implementation 2025-06-01T20:58:03+07:00Vedavrath Lakidevedavrathlakide@gmail.comV. Ganesanvganesh1711@gmail.com<p><strong>Lung cancer continues to be one of the leading causes of cancer-related mortality globally, largely due to the challenges associated with its early and accurate detection. Timely diagnosis is critical for improving survival rates, and advances in artificial intelligence (AI), particularly deep learning, are proving to be valuable tools in this area. This study introduces an enhanced deep learning-based approach for lung cancer classification using the VGG16 neural network architecture. While previous research has demonstrated the effectiveness of ResNet-50 in this domain, the proposed method leverages the strengths of VGG16 particularly its deep architecture and robust feature extraction capabilities to improve diagnostic performance. To address the limitations posed by scarce labelled medical imaging data, the model incorporates transfer learning and fine-tuning techniques. It was trained and validated on a well-curated dataset of lung CT images. The VGG16 model achieved a high training accuracy of 99.09% and a strong validation accuracy of 95.41%, indicating its ability to generalize well across diverse image samples. These results reflect the model’s capacity to capture intricate patterns and subtle features within medical imagery, which are often critical for accurate disease classification. A comparative evaluation between VGG16 and ResNet-50 reveals that VGG16 outperforms its predecessor in terms of both accuracy and reliability. The improved performance underscores the potential of the proposed approach as a reliable and scalable AI-driven diagnostic solution. Overall, this research highlights the growing role of deep learning in enhancing clinical decision-making, offering a promising path toward earlier detection of lung cancer and ultimately contributing to better patient outcomes</strong>.</p>2025-06-01T00:00:00+07:00Copyright (c) 2025 Vedavrath Lakide and V. Ganesanhttps://jeeemi.org/index.php/jeeemi/article/view/838Exploring Dataset Variability in Diabetic Retinopathy Classification Using Transfer Learning Approaches2025-06-07T22:33:23+07:00Kinjal Patnikinjalpatni11@gmail.comShruti Yagnikshrutiyagnik.ce@indusuni.ac.inPratik Patelpratik.patel2988@paruluniversity.ac.in<p>Diabetic retinopathy (DR) stands as a primary international cause of vision impairment that needs effective and swift diagnostic services to protect eye structures from advancing deterioration. The variations of imaging data that appear between sources create major obstacles for achieving consistent performance from models. The elimination of performance fluctuation problems during DR classifications across two benchmark datasets EYE-PACS and APTOS is examined through systematic transfer learning analysis using different high-performing CNN architectures including VGG16, VGG19, ResNet50, Xception, InceptionV3, MobileNetV2, and InceptionResNetV2. The research evaluates how data heterogeneity affects and how augmentation approaches impact the accuracy while stabilizing robustness in deep learning models. The research provides new insights through its extensive investigation of generalization performance based on dataset changes which utilize modified data augmentation methods for retinal images. A collection of data transformations such as rotation, flipping, zooming and brightness modifications create simulated realistic scenarios to handle imbalanced data classes. Academic research involved CNN pre-training followed by transfer learning on both databases while researchers evaluated the models through both untreated source data and augmented image testing procedures. InceptionResNetV2 outperformed its counterparts with 96.2% accuracy and Xception delivered 95.7% accuracy in APTOS evaluation and both models scored 95.9% and 95.4% respectively on EYE-PACS testing. When augmentation was applied it increased the performance level by 3% to 5% across all running models. The experimental outcomes demonstrate how adequate variable training allows these models to recognize datasets regardless of their heterogeneity. This analysis confirms that combining reliable deep learning structures with purposeful data enhancement techniques substantially enhances DR diagnosis reliability to build scalable future diagnostic solutions for ophthalmology practice.</p>2025-06-07T05:28:28+07:00Copyright (c) 2025 Kinjal Patni Patni, Shruti Yagnik, Pratik Patelhttps://jeeemi.org/index.php/jeeemi/article/view/877BHMI: A Multi-Sensor Biomechanical Human Model Interface for Quantifying Ergonomic Stress in Armored Vehicle 2025-07-04T20:54:51+07:00Giva Andriana Mutiaragivamz@telkomuniversity.ac.idHardy Adiluhunghardydil@telkomuniversity.ac.idPeriyadi Periyadiperiyadi@telkomuniversity.ac.idMuhammad Rizqy Alfarisimrizkyalfarisi@telkomuniversity.ac.idLisda Meisarohlisdameisaroh@telkomuniversity.ac.id<p>Ergonomic stress inside armored military vehicles presents a critical yet often overlooked risk to soldier safety, operational effectiveness, and long-term health. Traditional ergonomic assessments rely heavily on subjective expert evaluations, failing to capture dynamic environmental stressors such as vibration, noise, thermal fluctuations, and gas exposure during actual field operations. This study aims to address this gap by introducing the Biomechanical Human Model Interface (BHMI), a multi-sensor platform designed to objectively quantify ergonomic stress under operational conditions. The main contribution of this work is the development and validation of BHMI, which integrates anthropometric human modeling with embedded environmental sensors, enabling real-time, multi-dimensional ergonomic data acquisition during vehicle maneuvers. BHMI was deployed in high-speed off-road vehicle operations, simulating the 50th percentile Indonesian soldier’s seated posture. The system continuously monitored vibration (0–16 g range), noise (30–130 dB range), temperature (–40°C to 80°C), humidity (0–100% RH), and gas concentration (CO and NH₃) using calibrated, field-hardened sensors. Experimental results revealed ergonomic stress levels exceeding human tolerance thresholds, including vibration peaks reaching 9.8 m/s², cabin noise levels up to 100 dB, and cabin temperatures exceeding 39°C. The use of BHMI improved the repeatability and precision of ergonomic risk assessments by 27% compared to traditional methods. Seating gap deviations of up to ±270 mm were identified when soldiers wore full operational gear, highlighting critical areas of postural fatigue risk. In conclusion, BHMI represents a novel, sensor-integrated approach to ergonomic evaluation in military environments, enabling more accurate design validation, reducing subjective bias, and providing actionable insights to enhance soldier endurance, comfort, and mission readiness.</p>2025-06-09T13:23:21+07:00Copyright (c) 2025 Giva Andriana Mutiara, Hardy Adiluhung, Periyadi Periyadi, Muhammad Rizqy Alfarisi, Lisda Meisarohhttps://jeeemi.org/index.php/jeeemi/article/view/934AMIN-CNN: Enhancing Brain Tumor Segmentation through Modality-Aware Normalization and Deep Learning2025-07-04T16:26:37+07:00Sivakumar Depurusiva.depur@gmail.comM. Sunil Kumarsunilmalchi@gmail.com<p>Accurate segmentation of reliable brain tumor detection is essential for early diagnosis and treatment, which helps to increase patient survival rates. However, the inherent variability in tumor shape, size, and intensity across different MRI modalities makes automated segmentation a challenging task. Traditional deep learning approaches, such as U-Net and its variants, provide robust results but often struggle with modality-specific inconsistencies and generalization across diverse datasets. This research presented AMIN-CNN, an adaptive multimodal invariant normalization incorporating a novel 3D convolutional neural network to improve brain tumors segmentation across various MRI technologies. Through adaptive normalization, AMIN-CNN covers modality-specific differences more effectively than Basic CNN and U-Net, leading to improved integration of multimodal MRI input data. The model maintains strong learning performance with minimal overfitting beyond epoch 50. Regularization techniques can reduce this. AMIN-CNN stands out with the best Dice Score (about 0.92 WT, 0.87 ET, and 0.89 TC), Precision (0.3), accuracy of 93.2 % and can decrease false positives. The lower Sensitivity in AMIN-CNN results in it finding the smaller but more correct tumor regions, making it more precise. Compared with traditional methods, AMIN-CNN demonstrates a competitive or better segmentation result and maintains computational efficiency. The model has demonstrated strong independence, with a Hausdorff Distance of 20, compared to 100 for other models. According to these test results, AMIN-CNN is the most effective and clinically correct method among the different architectures, mainly due to its high precision and ability to measure tumors with accuracy.</p>2025-07-03T22:04:29+07:00Copyright (c) 2025 Sivakumar Depuru, M. Sunil Kumarhttps://jeeemi.org/index.php/jeeemi/article/view/947Advanced Deep Learning for Stroke Classification Using Multi-Slice CT Image Analysis2025-07-03T23:14:21+07:00Fouzi Lezzarfouzi.lezzar@univ-constantine2.dzSeif Eddine Milimili.seifeddine@ensc.dz<p>Brain stroke is a leading cause of mortality and disability globally, necessitating rapid and accurate diagnosis for timely intervention. While Computed Tomography (CT) imaging is the gold standard for stroke detection, manual interpretation is time-consuming, prone to error, and subject to inter-observer variability. Although deep learning models have shown promise in automating stroke detection, many rely on 2D analysis, ignore 3D spatial relationships, or require labour-intensive slice-level annotations, which limits their scalability and clinical applicability. To address these challenges, we propose MedHybridNet, a novel hybrid deep learning architecture that integrates convolutional neural networks (CNNs) for local feature extraction with Transformer-based modules to model global contextual dependencies across volumetric brain scans. Our main contribution is twofold: (1) the SliceAttention mechanism, which dynamically identifies diagnostically relevant slices using only patient-level labels, eliminating the need for costly slice-level annotations while enhancing interpretability through attention maps and Grad-CAM visualizations; and (2) a cGAN-based augmentation strategy that generates high-quality, pathology-informed synthetic CT slices to overcome data scarcity and class imbalance. The framework processes complete 3D brain volumes, leveraging both CNNs and Transformers in a dual-path design, and incorporates hierarchical attention for refined feature selection and classification. Evaluated via patient-wise 5-fold cross-validation on a real-world dataset of 2501 CT scans from 82 patients, MedHybridNet achieves an accuracy of 98.31%, outperforming existing methods under weak supervision. These results demonstrate its robustness, generalization capability, and superior interpretability. By combining architectural innovation with clinically relevant design choices, MedHybridNet advances the integration of Artificial Intelligence (AI) into real-world stroke care, offering a scalable, accurate, and explainable solution that can significantly improve diagnostic efficiency and patient outcomes in routine clinical practice.</p>2025-07-03T00:00:00+07:00Copyright (c) 2025 Fouzi Lezzar, Seif Eddine Milihttps://jeeemi.org/index.php/jeeemi/article/view/949Improving Accuracy and Efficiency of Medical Image Segmentation Using One-Point-Five U-Net Architecture with Integrated Attention and Multi-Scale Mechanisms2025-07-04T07:13:36+07:00Muhammad Anang Fathur Rohmananangmuhammad245@student.uns.ac.idHeri Prasetyoheri.prasetyo@staff.uns.ac.idEry Permana Yudhaerypermana@staff.uns.ac.idChih-Hsien Hsiahsiach@niu.edu.tw<p>Medical image segmentation is essential for supporting computer-aided diagnosis (CAD) systems by enabling accurate identification of anatomical and pathological structures across various imaging modalities. However, automated medical image segmentation remains challenging due to low image contrast, significant anatomical variability, and the need for computational efficiency in clinical applications. Furthermore, the scarcity of annotated medical images due to high labelling costs and the requirement of expert knowledge further complicates the development of robust segmentation models. This study aims to address these challenges by proposing One-Point-Five U-Net, a novel deep learning architecture designed to improve segmentation accuracy while maintaining computational efficiency. The main contribution of this work lies in the integration of multiple advanced mechanisms into a compact architecture: ghost modules, Multi-scale Residual Attention (MRA), Enhanced Parallel Attention (EPA) in skip connections, the Convolutional Block Attention Module (CBAM), and Multi-scale Depthwise Convolution (MSDC) in the decoder. The proposed method was trained and evaluated on four public datasets: CVC-ClinicDB, Kvasir-SEG, BUSI, and ISIC2018. One-Point-Five U-Net achieved sensitivity, specificity, accuracy, DSC, and IoU of of 94.89%, 99.63%, 99.23%, 95.41%, and 91.27% on CVC-ClinicDB; 91.11%, 98.60%, 97.33%, 90.93%, and 83.84% on Kvasir-SEG; 85.35%, 98.65%, 96.81%, 87.02%, and 78.18% on BUSI; and 87.67%, 98.11%, 93.68%, 89.27%, and 83.06% on ISIC2018. These results outperform several state-of-the-art segmentation models. In conclusion, One-Point-Five U-Net demonstrates superior segmentation accuracy with only 626,755 parameters and 28.23 GFLOPs, making it a highly efficient and effective model for clinical implementation in medical image analysis. </p>2025-07-04T06:50:17+07:00Copyright (c) 2025 Muhammad Anang Fathur Rohman, Heri Prasetyo, Ery Permana Yudha, Chih-Hsien Hsiahttps://jeeemi.org/index.php/jeeemi/article/view/588Combination Of Gamma Correction and Vision Transformer In Lung Infection Classification On CT-Scan Images2025-07-09T20:46:20+07:00Lucky Indra KesumaLuckyindra25@gmail.comPipin Octavia pipinoctavia@unisti.ac.idPurwita Sari wita@ilkom.unsri.ac.idGracia Mianda Caroline Batubaragraciamianda2002@gmail.comKarina Karinakorikarina28@gmail.com<p>Lung infection is an inflammatory condition of the lungs with a high mortality rate. Lung infections can be identified using CT-Scan images, where the affected areas are analyzed to determine the infection type. However, manual interpretation of CT-Scan results by medical specialists is often time-consuming, subjective, and requires a high level of accuracy. To address these challenges, this study proposes an automated classification method for lung infections using deep learning techniques. Convolutional Neural Networks (CNNs) are widely used for image classification tasks. However, CNN operates locally with limited receptive fields, making capturing global patterns in complex lung CT images challenging. CNN also struggles to model long-range pixel dependencies, which is crucial for analyzing visually similar regions in lung CT-Scans. This study uses a Vision Transformer (ViT) to overcome CNN limitations. ViT employs self-attention mechanisms to capture global dependencies across the entire image. The main contribution of this study is the implementation of ViT to enhance classification performance in lung CT-Scan images by capturing complex and global image patterns that CNN fails to model. However, ViT requires a large dataset to perform optimally. To overcome these challenges, augmentation techniques such as flipping, rotation, and gamma correction are applied to increase the amount of data without altering the important features. The dataset comprises lung CT-scan images sourced from Kaggle and is divided into Covid and Non-Covid classes. The proposed method demonstrated excellent classification performance, achieving accuracy, sensitivity, specificity, precision, and F1-Score above 90%. Additionally, the Cohen’s kappa coefficient reached 89%. These results show that the proposed method effectively classifies lung infections using CT-Scan images and has strong potential as a clinical decision-support tool, particularly in reducing diagnostic time and improving consistency in medical evaluations.</p>2025-07-09T20:40:27+07:00Copyright (c) 2025 Lucky Indra Kesuma, Pipin Octavia , Purwita Sari , Gracia Mianda Caroline Batubara, Karinahttps://jeeemi.org/index.php/jeeemi/article/view/792A Quantum Convolutional Neural Network for Breast Cancer Classification using Boruta and GA-Based Feature Selection with Quantum Feature Maps2025-07-13T08:05:29+07:00Veeranjaneyulu Pagadalapagadalaveeru@gmail.comVenkatesh Bvenkatesh.cse88@gmail.comSindhu Boinapallisindhu1209@gmail.comRamya Krishna Dhulipalladramyakrishna28@gmail.comS Annapoornaalekya.venky58@gmail.com<p>Accurate and computationally efficient classification systems are essential for the early detection of breast cancer, particularly when dealing with complex and high-dimensional medical datasets. Traditional machine learning models often face limitations in capturing intricate nonlinear relationships inherent in such data, potentially compromising diagnostic performance. In this study, we introduce QBG-QCNN, a Quantum-enhanced framework named Boruta-GA optimized Quantum Convolutional Neural Network, designed for breast cancer classification. The model is trained on the Breast Cancer Wisconsin (Diagnostic) Dataset, which contains 30 numerical features extracted from fine needle aspiration (FNA) images of breast tissue samples. To reduce dimensionality while preserving critical diagnostic information, a hybrid Boruta-GA feature selection strategy is applied to extract key features such as radius_mean, texture_mean, area_mean, and concavity_mean. These selected features are then encoded into a 4-qubit quantum circuit using advanced quantum feature maps ZZFeatureMap, RealAmplitudes, and EfficientSU2, eliminating the need for manual feature engineering. The encoded quantum data is processed through a QCNN that incorporates quantum convolution, pooling, and parameterized ansatz layers, leveraging quantum entanglement and parallelism for more efficient learning. Implemented using PennyLane and IBM Qiskit, and optimized with the COBYLA, the model achieves outstanding performance: 94.3% accuracy, 95.2% precision, 94.6% recall, and a 93.0% F1-score. These results significantly outperform those of classical CNNs, standard QNNs, and other hybrid models. In conclusion, QBG-QCNN demonstrates that quantum machine learning, when integrated with intelligent feature selection, offers a powerful, scalable, and interpretable solution for early-stage breast cancer diagnosis. Future research will extend this framework to multi-modal datasets and real-device deployment on real quantum devices under noise constraints.</p>2025-07-13T08:04:35+07:00Copyright (c) 2025 Veeranjaneyulu Pagadala, Venkatesh B, Sindhu Boinapalli, Ramya Krishna Dhulipalla, S Annapoornahttps://jeeemi.org/index.php/jeeemi/article/view/788EEG Performance Signal Analysis for Diagnosing Autism Spectrum Disorder using Butterworth and Empirical Mode Decomposition2025-07-16T18:06:45+07:00Imam Fathur Rahmanimamfr@mhs.usk.ac.idMelindamelinda@usk.ac.idMuhammad Irhamsyahirham.ee@usk.ac.idYunidar Yunidaryunidar@usk.ac.idYudha Nurdinyudha.nurdin@usk.ac.idW.K. WongWeiKitt.w@curtin.edu.myLailatul Qadri ZakariaWeiKitt.w@curtin.edu.my<p>Electroencephalography (EEG) is a technique used to measure electrical activity in the brain by placing electrodes on the scalp. EEG plays an essential role in analyzing a variety of neurological conditions, including autism spectrum disorder (ASD). However, in the recording process, EEG signals are often contaminated by noise, hindering further analysis. Therefore, an effective signal processing method is needed to improve the data quality before feature extraction is performed. This study applied the Butterworth Band-Pass Filter (BPF) as a preprocessing method to reduce noise in EEG signals and then used the Empirical Mode Decomposition (EMD) method to extract relevant features. The performance of this method was evaluated using three main parameters, namely Mean Square Error (MSE), Mean Absolute Error (MAE), and Signal-to-Noise Ratio (SNR). The results showed that EMD was able to retain important information in EEG signals better than signals that only passed through the BPF filtration stage. EMD produces lower MAE and MSE values than Butterworth, suggesting that this method is more accurate in maintaining the original shape of the signal. In subject 3, EMD recorded the lowest MAE of 0.622 compared to Butterworth, which reached 20.0, and the MSE value of 0.655 compared to 771.5 for Butterworth. In addition, EMD also produced a higher SNR, with the highest value of 23,208 in subject 5, compared to Butterworth, which reached only 1,568. These results prove that the combination of BPF as a preprocessing method and EMD as a feature extraction method is more effective in maintaining EEG signal quality and improving analysis accuracy compared to the use of the Butterworth Band-Pass Filter alone.</p>2025-07-16T17:58:35+07:00Copyright (c) 2025 Imam Fathur Rahman, Melinda, Muhammad Irhamsyah, Yunidar Yunidar, Yudha Nurdin, W.K. Wong, Lailatul Qadri Zakariahttps://jeeemi.org/index.php/jeeemi/article/view/835Addressing Intrinsic Data Characteristics Issues of Imbalance Medical Data Using Nature Inspired Percolation Clustering2025-06-20T14:36:17+07:00Kaikashan Siddavatamkaikashansiddavatam@gmail.comSubhash Shindeskshinde@ltce.in<p><strong>Data on diseases are generally skewed towards either positive or negative cases, depending on their prevalence. The problem of imbalance can significantly impact the performance of classification models, resulting in biased predictions and reduced model accuracy for the underrepresented class. Other factors that affect the performance of classifiers include intrinsic data characteristics, such as noise, outliers, and within-class imbalance, which complicate the learning task. Contemporary imbalance handling techniques employ clustering with SMOTE (Synthetic Minority Oversampling Technique) to generate realistic synthetic data that preserves the underlying data distribution, generalizes unseen data and mitigates overfitting to noisy points. Centroid-based clustering methods (e.g., K-means) often produce synthetic samples that are too clustered or poorly spaced. At the same time, density-based methods (e.g., DBSCAN) may fail to generate sufficient meaningful synthetic samples in sparse regions. The work aims to develop nature-inspired clustering that, combined with SMOTE, generates synthetic samples that adhere to the underlying data distribution and maintain sparsity among the data points that enhance performance of classifier. We propose PC-SMOTE, which leverages Percolation Clustering (PC), a novel clustering algorithm inspired by percolation theory. The methodology of PC utilizes a connectivity-driven framework to effectively handle irregular cluster shapes, varying densities, and sparse minority instances. The experiment was designed using a hybrid approach to assess PC-SMOTE using synthetically generated data with variable spread and other parameters; second, the algorithm was evaluated on eight sets of real medical datasets. The results show that the PC-SMOTE method works excellently for the Breast cancer dataset, Parkinson's dataset, and Cervical cancer dataset, where AUC is in the range of 96% to 99%, which is high compared to the other two methods. This demonstrates the effectiveness of the PC-SMOTE algorithm in handling datasets with both low and high imbalance ratios and often demonstrates competitive or superior performance compared to K-means and DBSCAN combined with SMOTE in terms of AUC, F1-score, G-mean, and PR-AUC.</strong></p>2025-06-05T00:00:00+07:00Copyright (c) 2025 Kaikashan Siddavatam, Subhash Shindehttps://jeeemi.org/index.php/jeeemi/article/view/775Automated ICD Medical Code Generation for Radiology Reports using BioClinicalBERT with Multi-Head Attention Network 2025-06-13T18:08:23+07:00Sasikala D.sasikalaradhasri.puscholar2019@gmail.comSarrvesh N.nsarrvesh6710@gmail.comSabarinath J.sabarinath.jsn@gmail.comTheetchenya S.theetchenya@gmail.comKalavathi S.kalavathi@svce.ac.in<p><strong>International Classification of Diseases (ICD) coding plays a pivotal role in healthcare systems with its provision of a standard method for classifying medical diagnoses, treatments, and procedures. However, the process of manually applying ICD codes to clinical records is both time-consuming and error-prone, particularly considering the large magnitude of medical terminologies and the periodic changes to the coding system. This work introduces a Hierarchical Multi-Head Attention Network (HMHAN) that aims to automate ICD coding using domain-related embeddings with an attention mechanism. The proposed method uses BioClinicalBERT for feature extraction from clinical text and then a two-level attention mechanism to learn hierarchical dependencies between labels. BioClinicalBERT is pre-trained on large biomedical and clinical corpora that enable it to capture complex contextual relationships specific to medical language more effectively. The multi-head attention mechanism enables the model to focus on different parts of the input text simultaneously, learning intricate associations between medical terms and corresponding ICD codes at various levels. This method uses SMOTE (Synthetic Minority Oversampling Technique) based multi-label resampling to solve class imbalance. SMOTE generates synthetic examples for underrepresented classes, allowing the model to learn better from imbalanced data without overfitting. For this work, MIMIC-IV dataset of de-identified radiology reports and corresponding ICD codes are used. The performance of the model is assessed with F1 score, Hamming loss, and ROC-AUC metrics. Results obtained from the model with an F1 score of 0.91, Hamming loss of 0.07, and ROC-AUC of 0.92 show promising research directions to automate the ICD coding process. This system will improve the effectiveness of healthcare workflows by automating ICD code generation for advanced clinical care.</strong></p>2025-06-13T05:57:12+07:00Copyright (c) 2025 Sasikala D., Sarrvesh N., Sabarinath J., Theetchenya S., Kalavathi S.https://jeeemi.org/index.php/jeeemi/article/view/946SympTextML: Leveraging Natural Language Symptom Descriptions for Accurate Multi-Disease Prediction2025-07-14T20:55:42+07:00Dhairya Vyasdhairya.vyas-cse@msubaroda.ac.inMilind Shahmilindshahcomputer@gmail.comHarsh Kantawalaharsh.kantawala@cvmu.edu.inBrijesh Patelbrijes.patel@cvmu.edu.inTejas Pateltejas.patel@cvmu.edu.inJalaja Enamalajalajae@gmail.com<p>This research presents an AI-driven framework for multi-disease classification using natural language symptom descriptions, optimized through large language model (LLM) oriented preprocessing techniques. The proposed system integrates essential NLP steps text normalization, lemmatization, and n-gram vectorization to convert unstructured clinical symptom data into machine-readable form. A publicly available dataset comprising 8,498 samples across ten common diseases, including pneumonia, heart attack, diabetes, stroke, asthma, and depression, was used for training and evaluation. Data balancing and cleaning ensured uniform class representation with 1,200 samples per disease category. The processed dataset was subjected to supervised machine learning models, including SVM, KNN, Decision Tree, Random Forest, and Extra Trees, to identify the most effective classifier. Experimental results, conducted in Google Colab, showed that ensemble models (Random Forest and Extra Trees) significantly outperformed the others, achieving 99% accuracy, precision, recall, and F1-scores, while SVM and Decision Tree followed closely with 98% performance across metrics. Notably, the models consistently predicted pneumonia with high confidence for relevant input queries , validating the framework's robustness. This work demonstrates the efficacy of integrating LLM-compatible preprocessing with traditional ML classifiers for accurate disease detection based on symptom narratives. The proposed approach serves as a foundational step toward developing scalable, intelligent healthcare support systems capable of real-time disease prediction and decision-making assistance.</p>2025-07-14T20:53:20+07:00Copyright (c) 2025 Dhairya Vyas, Milind Shah, Harsh Kantawala, Brijesh Patel, Tejas Patel, Jalaja Enamala