Robustness Under Attack: Assessing Adversarial Fragility in Deep Learning Models for COVID-19 Radiography Prediction

Muhammad Hisyam Kamil; Elga Putri Tri Farma; Setio Basuki

doi:10.35882/jeeemi.v8i2.1506

Muhammad Hisyam Kamil Department of Informatics Engineering, Universitas Muhammadiyah Malang, Malang, Indonesia https://orcid.org/0009-0005-2412-7756
Elga Putri Tri Farma Department of Informatics Engineering, Universitas Muhammadiyah Malang, Malang, Indonesia https://orcid.org/0009-0005-2412-7756
Setio Basuki Department of Informatics Engineering, Universitas Muhammadiyah Malang, Malang, Indonesia https://orcid.org/0000-0001-6631-9880

DOI: https://doi.org/10.35882/jeeemi.v8i2.1506

Keywords: adversarial attacks, chest x-ray, convolutional neural network, covid-19, grad-cam

Abstract

Deep learning, especially Convolutional Neural Network (CNN) architectures, has significantly improved medical image analysis for predicting lung diseases through chest X-ray (CXR) images, including pneumonia and COVID-19. However, despite achieving high diagnostic precision, CNN models remain highly susceptible to adversarial attacks, defined as small, visually imperceptible alterations optimized to exploit non-linear decision boundaries that cause high-confidence mispredictions. This vulnerability presents a critical concern in clinical settings, where deterministic diagnostic errors directly compromise patient safety. This paper systematically implements white-box adversarial attacks to quantify the resilience of CNN models in multi-class CXR image classification. This paper utilizes the COVID-19 Radiography Dataset, comprising four diagnostic categories: COVID-19, Lung Opacity, Normal, and Viral Pneumonia. A DenseNet-121 architecture was employed for feature extraction, and the trained model was subsequently subjected to Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks under varying L∞-bounded epsilon settings. The empirical experiments reveal three critical findings: 1) The implementation of sub-pixel adversarial attacks causes severe performance degradation, where the PGD attack constrained at an epsilon of 0.1/255 reduced the global model accuracy from a baseline of 95.42% to 25.32%; 2) Iterative attacks (PGD) represent the absolute worst-case scenario for model reliability by efficiently discovering high-dimensional manifold gaps, whereas the model demonstrates relative resilience to linear, single-step FGSM perturbations; and 3) Gradient-weighted Class Activation Mapping (Grad-CAM) analysis verifies that this performance collapse is associated with a deterministic semantic shift, displacing the model's spatial attention from clinically relevant pulmonary regions toward spurious background noise. In conclusion, this paper empirically proves that despite exhibiting high accuracy on clean data, unprotected CNNs remain fundamentally unsafe for autonomous clinical deployment due to their acute vulnerability to gradient-based perturbations, necessitating the future integration of robust adversarial training frameworks

Downloads

Download data is not yet available.

References

M. E. H. Chowdhury et al., “Can AI Help in Screening Viral and COVID-19 Pneumonia?,” IEEE Access, vol. 8, pp. 132665–132676, 2020, doi: 10.1109/ACCESS.2020.3010287.

Y. Li and S. Liu, “The Threat of Adversarial Attack on a COVID-19 CT Image-Based Deep Learning System,” Bioengineering, vol. 10, no. 2, Feb. 2023, doi: 10.3390/bioengineering10020194.

A. B. Godbin and S. G. Jasmine, “Leveraging Radiomics and Genetic Algorithms to Improve Lung Infection Diagnosis in X-Ray Images Using Machine Learning,” IEEE Access, vol. 12, pp. 47656–47671, 2024, doi: 10.1109/ACCESS.2024.3383781.

A. E. Minarno, T. N. Izzah, Y. Munarko, and S. Basuki, “Classification of Malaria Using Convolutional Neural Network Method on Microscopic Image of Blood Smear,” JOIV: International Journal on Informatics Visualization, vol. 8, no. 3, 2024, doi: 10.62527/joiv.8.3.2154.

R. Rajpoot, M. Gour, S. Jain, and V. B. Semwal, “Integrated ensemble CNN and explainable AI for COVID-19 diagnosis from CT scan and X-ray images,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-75915-y.

R. Senthil, L. Ravishankar, S. D. Dunston, and V. Mary Anita Rajam, “Universal Adversarial Perturbation Attack on the Inception-Resnet-v1 model and the Effectiveness of Adversarial Retraining as a Suitable Defense Mechanism,” in 2023 International Conference on Innovative Trends in Information Technology, ICITIIT 2023, Institute of Electrical and Electronics Engineers Inc., 2023. doi: 10.1109/ICITIIT57246.2023.10068722.

K. Kansal, P. S. Krishna, P. B. Jain, S. R, P. Honnavalli, and S. Eswaran, “Defending against adversarial attacks on Covid-19 classifier: A denoiser-based approach,” Heliyon, vol. 8, no. 10, Oct. 2022, doi: 10.1016/j.heliyon.2022.e11209.

N. Dietrich, B. Gong, and M. N. Patlas, “Adversarial artificial intelligence in radiology: Attacks, defenses, and future considerations,” Nov. 01, 2025, Elsevier Masson s.r.l. doi: 10.1016/j.diii.2025.05.006.

S. Brohi and Q. U. A. Mastoi, “From Accuracy to Vulnerability: Quantifying the Impact of Adversarial Perturbations on Healthcare AI Models,” Big Data and Cognitive Computing, vol. 9, no. 5, May 2025, doi: 10.3390/bdcc9050114.

W. Tajak, K. Nurzyńska, and A. Piórkowski, “Vulnerability to One-Pixel Attacksof Neural Network Architecturesin Medical Image Classification,” Bio-Algorithms and Med-Systems, vol. 21, no. 1, pp. 58–70, Oct. 2025, doi: 10.5604/01.3001.0055.3261.

M. J. Tsai, P. Y. Lin, and M. E. Lee, “Adversarial Attacks on Medical Image Classification,” Cancers (Basel), vol. 15, no. 17, Sep. 2023, doi: 10.3390/cancers15174228.

J. Malik, R. Muthalagu, and P. M. Pawar, “A Systematic Review of Adversarial Machine Learning Attacks, Defensive Controls, and Technologies,” IEEE Access, vol. 12, pp. 99382–99421, 2024, doi: 10.1109/ACCESS.2024.3423323.

V. Sorin, S. Soffer, B. S. Glicksberg, Y. Barash, E. Konen, and E. Klang, “Adversarial attacks in radiology – A systematic review,” Oct. 01, 2023, Elsevier Ireland Ltd. doi: 10.1016/j.ejrad.2023.111085.

M. A. S. Maulana, S. Basuki, and A. A. Wardhana, "Exploiting vulnerabilities of machine learning models in medical text via generative adversarial attacks," Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Aug. 2025, doi: 10.22219/kinetik.v10i3.2280.

W. Lee, M. Ju, Y. Sim, Y. K. Jung, T. H. Kim, and Y. Kim, “Adversarial Attacks on Medical Segmentation Model via Transformation of Feature Statistics,” Applied Sciences (Switzerland), vol. 14, no. 6, Mar. 2024, doi: 10.3390/app14062576.

E. Darzi, F. Dubost, N. M. Sijtsema, and P. M. A. van Ooijen, “Exploring adversarial attacks in federated learning for medical imaging,” arXiv preprint arXiv:2310.06227, 2023, doi: 10.48550/arXiv.2310.06227.

E. Mahamud, N. Fahad, M. Assaduzzaman, S. M. Zain, K. O. M. Goh, and M. K. Morol, “An explainable artificial intelligence model for multiple lung diseases classification from chest X-ray images using fine-tuned transfer learning,” Decision Analytics Journal, vol. 12, Sep. 2024, doi: 10.1016/j.dajour.2024.100499.

M. Aasem and M. Javed Iqbal, “Toward explainable AI in radiology: Ensemble-CAM for effective thoracic disease localization in chest X-ray images using weak supervised learning,” Front Big Data, vol. 7, 2024, doi: 10.3389/fdata.2024.1366415.

A. Askhatuly, D. Berdysheva, A. Berdyshev, A. Adamova, and D. Yedilkhan, “Adversarial Attacks and Defense Mechanisms in Machine Learning: A Structured Review of Methods, Domains, and Open Challenges,” 2025, Institute of Electrical and Electronics Engineers Inc. doi: 10.1109/ACCESS.2025.3624409.

N. al Roken, H. Hacid, A. Bouridane, and A. Hussain, “On adversarial attack detection in the artificial intelligence era: Fundamentals, a taxonomy, and a review,” Sep. 01, 2025, Elsevier B.V. doi: 10.1016/j.iswa.2025.200554.

J. Zhang, H. Chao, G. Dasegowda, G. Wang, M. K. Kalra, and P. Yan, “Revisiting the Trustworthiness of Saliency Methods in Radiology AI,” Radiol Artif Intell, vol. 6, no. 1, 2024, doi: 10.1148/ryai.220221.

J. H. Sim and H. M. Song, “A Generalized Framework for Adversarial Attack Detection and Prevention Using Grad-CAM and Clustering Techniques,” Systems, vol. 13, no. 2, Feb. 2025, doi: 10.3390/systems13020088.

S. B. ul haque and A. Zafar, “Robust Medical Diagnosis: A Novel Two-Phase Deep Learning Framework for Adversarial Proof Disease Detection in Radiology Images,” Journal of Imaging Informatics in Medicine, vol. 37, no. 1, pp. 308–338, Feb. 2024, doi: 10.1007/s10278-023-00916-8.

J. Yao, Z. Guo, X. Zhang, N. Yan, Q. Wang, and W. Yu, “Cross-domain lung opacity detection via adversarial learning and box fusion,” Sci Rep, vol. 14, no. 1, Dec. 2024, doi: 10.1038/s41598-024-82719-7.

S. Alphonse, S. Abinaya, and N. Kumar, “Pain assessment from facial expression images utilizing Statistical Frei-Chen Mask (SFCM)-based features and DenseNet,” Journal of Cloud Computing, vol. 13, no. 1, Dec. 2024, doi: 10.1186/s13677-024-00706-9.

E. Hassan, S. A. Ghazalah, N. El-Rashidy, T. A. El-Hafeez, and M. Y. Shams, “DenseNet Model with Attention Mechanisms for Robust Date Fruit Image Classification,” International Journal of Computational Intelligence Systems, vol. 18, no. 1, Dec. 2025, doi: 10.1007/s44196-025-00809-4.

Z. Sari and S. Basuki, “Transfer Learning Approaches for Non-Organic Waste Classification: Experiments with MobileNet and VGG-16,” Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Oct. 2025, doi: 10.22219/kinetik.v10i4.2319.

M. Rahman, P. Roy, S. S. Frizell, and L. Qian, “Evaluating Pretrained Deep Learning Models for Image Classification Against Individual and Ensemble Adversarial Attacks,” IEEE Access, vol. 13, pp. 35230–35242, 2025, doi: 10.1109/ACCESS.2025.3544107.

X. Liu, F. Shen, and J. Zhao, “Region-guided attack on the segment anything model,” Neural Networks, vol. 193, Jan. 2026, doi: 10.1016/j.neunet.2025.108058.

A. Guesmi, M. A. Hanif, B. Ouni, and M. Shafique, “Physical Adversarial Attacks for Camera-Based Smart Systems: Current Trends, Categorization, Applications, Research Challenges, and Future Outlook,” IEEE Access, vol. 11, pp. 109617–109668, 2023, doi: 10.1109/ACCESS.2023.3321118.

J. Sun, H. Yu, and J. Zhao, “An Adversarial Attack via Penalty Method,” IEEE Access, vol. 13, pp. 18123–18140, 2025, doi: 10.1109/ACCESS.2025.3529217.

Y. Kumaran S, J. J. Jeya, R. Mahesh T, S. B. Khan, S. Alzahrani, and M. Alojail, “Explainable lung cancer classification with ensemble transfer learning of VGG16, Resnet50 and InceptionV3 using grad-cam,” BMC Med Imaging, vol. 24, no. 1, Dec. 2024, doi: 10.1186/s12880-024-01345-x.

M. Akçiçek, H. Bingöl, B. Petik, S. Ünlü, and M. Yıldırım, “Detection of osteochondral lesion of talus in ankle magnetic resonance images with GradCAM-based hybrid CNN model,” Egyptian Journal of Radiology and Nuclear Medicine, vol. 57, no. 1, Dec. 2026, doi: 10.1186/s43055-025-01661-4.

H. C. Yoon and L. P. Lin, “Brain Tumor Classification in MRI: Insights From LIME and Grad-CAM Explainable AI Techniques,” IEEE Access, vol. 13, pp. 154172–154202, 2025, doi: 10.1109/ACCESS.2025.3603272.

M. A. Farhad, A. Razaque, S. B. Mukhanov, D. S. M. Hassan, and H. Mohan Rai, “Enhanced Lesion Localization and Classification in Ocular Tumor Detection Using Grad-CAM and Transfer Learning,” IEEE Access, vol. 13, pp. 167762–167777, 2025, doi: 10.1109/ACCESS.2025.3610183.

M. Ennab and H. Mcheick, “Enhancing Pneumonia Diagnosis Through AI Interpretability: Comparative Analysis of Pixel-Level Interpretability and Grad-CAM on X-ray Imaging With VGG19,” IEEE Open Journal of the Computer Society, vol. 6, pp. 1155–1165, 2025, doi: 10.1109/OJCS.2025.3582726.

W. Villegas-Ch, A. Jaramillo-Alcázar, and S. Luján-Mora, “Evaluating the Robustness of Deep Learning Models against Adversarial Attacks: An Analysis with FGSM, PGD and CW,” Big Data and Cognitive Computing, vol. 8, no. 1, Jan. 2024, doi: 10.3390/bdcc8010008.

M. Xu, T. Zhang, Z. Li, M. Liu, and D. Zhang, "Towards Evaluating the Robustness of Deep Diagnostic Models by Adversarial Attack," Medical Image Analysis, vol. 69, 2021, doi: 10.1016/j.media.2021.101977.

R. Olivier and B. Raj, “How many perturbations break this model? Evaluating robustness beyond adversarial accuracy,” arXiv preprint arXiv:2207.04129, 2023, doi: 10.48550/arXiv.2207.04129.

F. Croce and M. Hein, “Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks,” arXiv preprint arXiv:2003.01690, 2020, doi: 10.48550/arXiv.2003.01690.

Y. Dong, Z. Deng, T. Pang, H. Su, and J. Zhu, “Adversarial distributional training for robust deep learning,” arXiv preprint arXiv:2002.05999, 2020, doi: 10.48550/arXiv.2002.05999.

Q. Tian, K. Kuang, K. Jiang, F. Wu, and Y. Wang, “Analysis and Applications of Class-wise Robustness in Adversarial Training,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, Aug. 2021, pp. 1561–1570. doi: 10.1145/3447548.3467403.

L. He, Q. Ai, X. Yang, Y. Ren, Q. Wang, and Z. Xu, “Boosting adversarial robustness via self-paced adversarial training,” Neural Netw., vol. 167, pp. 706–714, Oct. 2023, doi: 10.1016/j.neunet.2023.08.063.

Z. Li, P. Y. Chen, and T. Y. Ho, “GREAT Score: Global Robustness Evaluation of Adversarial Perturbation using Generative Models,” in Advances in Neural Information Processing Systems, Neural information processing systems foundation, 2024. doi: 10.52202/079017-1236.

Z. Cheng, Y. Wu, Y. Li, L. Cai, and B. Ihnaini, “A Comprehensive Review of Explainable Artificial Intelligence (XAI) in Computer Vision,” Jul. 01, 2025, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/s25134166.

M. Bhandari, T. B. Shahi, B. Siku, and A. Neupane, “Explanatory classification of CXR images into COVID-19, Pneumonia and Tuberculosis using deep learning and XAI,” Comput Biol Med, vol. 150, Nov. 2022, doi: 10.1016/j.compbiomed.2022.106156.

A. Chaddad, J. Peng, J. Xu, and A. Bouridane, “Survey of Explainable AI Techniques in Healthcare,” Jan. 01, 2023, MDPI. doi: 10.3390/s23020634.

M. H. Ashraf et al., “HIRD-Net: An Explainable CNN-Based Framework with Attention Mechanism for Diabetic Retinopathy Diagnosis Using CLAHE-D-DoG Enhanced Fundus Images,” Life, vol. 15, no. 9, Sep. 2025, doi: 10.3390/life15091411.

Q. Abbas, W. Jeong, and S. W. Lee, “Explainable AI in Clinical Decision Support Systems: A Meta-Analysis of Methods, Applications, and Usability Challenges,” Sep. 01, 2025, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/healthcare13172154.

A. Yinusa and M. Faezipour, “A multi-layered defense against adversarial attacks in brain tumor classification using ensemble adversarial training and feature squeezing,” Sci Rep, vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-00890-x.

S. H. Park, S. H. Lee, M. Y. Lim, P. M. Hong, and Y. K. Lee, “A Comprehensive Risk Analysis Method for Adversarial Attacks on Biometric Authentication Systems,” IEEE Access, vol. 12, pp. 116693–116710, 2024, doi: 10.1109/ACCESS.2024.3439741.

R. Chen et al., “Transferable adversarial attacks on human pose estimation: A regularization and pruning framework,” Inf Sci (N Y), vol. 723, Jan. 2026, doi: 10.1016/j.ins.2025.122674.

Y. Wang, X. Ma, J. Bailey, J. Yi, B. Zhou, and Q. Gu, “On the convergence and robustness of adversarial training,” arXiv preprint arXiv:2112.08304, 2022, doi: 10.48550/arXiv.2112.08304.