Hybrid CNN-Transformer Architecture for Robust Liver Tumor Segmentation in 2D CT Slices

Huda Dham  Bader; Mohammed Sabah Jarjees

doi:10.35882/jeeemi.v8i3.1602

Huda Dham Bader Department of Medical Physiology, College of Medicine, University of Mosul, Mosul, Iraq https://orcid.org/0009-0005-7620-846X
Mohammed Sabah Jarjees Department of Medical Instrumentation Techniques Engineering, Technical Engineering College of Mosul, Northern Technical University, Mosul, Iraq https://orcid.org/0000-0002-0340-9085

DOI: https://doi.org/10.35882/jeeemi.v8i3.1602

Keywords: Medical image segmentation, CNN, Transformer architecture, Liver CT imaging, Tumor detection

Abstract

Liver tumor segmentation from CT scans is a task affected by class imbalance, low contrast, and small lesion size. Manual segmentation is time-consuming and also suffers from inter-observer variability. We propose a 2D CNN-Transformer model with 20.3M parameters in an encoder–decoder structure with four transformer layers (8 heads, 2048 feedforward dimension). The model processes 2D axial slices due to GPU memory limits. The loss function combines Cross-Entropy, Dice, and Focal losses with α = 0.25 and γ = 2.0. Preprocessing includes CLAHE (clip limit = 2.0, 8×8 tiles) and gamma correction (γ = 1.2). From the LiTS dataset (131 volumes), 11 volumes with 1,688 slices were selected based on tumor presence, annotation quality, and artifact removal. A patient-level split of 80% for training, 10% for validation, and 10% for testing was used to prevent data leakage. The model achieved liver Dice = 0.916 ± 0.122 and tumor Dice = 0.810 ± 0.304. The 95% confidence intervals using bootstrapping (1,000 resamples) were [0.897–0.934] for liver and [0.765–0.856] for tumor. Best validation results at Epoch 98 were liver Dice = 0.938, tumor Dice = 0.823, and accuracy = 0.992. Pixel accuracy was 99.20% and was not used as the main metric due to class imbalance, where background pixels exceed 90%. An ablation study showed that CLAHE and gamma correction improved tumor Dice by 8.6% and liver Dice by 3.3% compared to a baseline without preprocessing. The model shows performance for liver tumor segmentation on a LiTS subset. External validation on the full dataset and multi-center data is required before clinical use

Downloads

Download data is not yet available.

References

R. V. Manjunath and K. Kwadiki, “Automatic liver and tumour segmentation from CT images using Deep learning algorithm,” Results Control Optim., vol. 6, Mar. 2022, doi: 10.1016/j.rico.2021.100087.

H. Rumgay et al., “Global burden of primary liver cancer in 2020 and predictions to 2040,” J. Hepatol., vol. 77, no. 6, pp. 1598–1606, 2022, doi: 10.1016/j.jhep.2022.08.021.

K. Sethia et al., “Advances in liver, liver lesion, hepatic vasculature, and biliary segmentation: a comprehensive review of traditional and deep learning approaches,” Artif. Intell. Rev., vol. 58, no. 10, Oct. 2025, doi: 10.1007/s10462-025-11310-x.

E. E. Nithiyaraj and S. Arivazhagan, “Survey on Recent Works in Computed Tomography based Computer-Aided Diagnosis of Liver using Deep Learning Techniques,” 2020. doi: 10.38124/IJISRT20JUL058.

S. Verma, M. Bala, and M. Angurala, “Deep learning for liver evaluation: A comprehensive review and implications for ulcerative colitis detection,” Meas. Sensors, vol. 39, Jun. 2025, doi: 10.1016/j.measen.2025.101867.

D. Wei, Y. Jiang, X. Zhou, D. Wu, and X. Feng, “A Review of Advancements and Challenges in Liver Segmentation,” Aug. 01, 2024, Multidisciplinary Digital Publishing Institute (MDPI). doi: 10.3390/jimaging10080202.

H. Cao et al., “Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation,” In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13803. Springer, Cham, 2021. doi: 10.1007/978-3-031-25066-8_9.

A. Vaswani et al., “Attention Is All You Need,” in 31st Conference on Neural Information Processing Systems (NIPS 2017); Long Beach; CA; USA. doi. 10.48550/arXiv.1706.03762.

J. Chen et al., “TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation.doi. 10.48550/arXiv.2102.04306.

F. Özcan, O. N. Uçan, S. Karaçam, and D. Tunçman, “Fully Automatic Liver and Tumor Segmentation from CT Image Using an AIM-Unet,” Bioengineering, vol. 10, no. 2, 2023, doi: 10.3390/bioengineering10020215.

R. Zheng et al., “Automatic Liver Tumor Segmentation on Dynamic Contrast Enhanced MRI Using 4D Information: Deep Learning Model Based on 3D Convolution and Convolutional LSTM,” IEEE Trans. Med. Imaging, vol. 41, no. 10, pp. 2965–2976, 2022, doi: 10.1109/TMI.2022.3175461.

J. Shao, S. Luan, Y. Ding, X. Xue, B. Zhu, and W. Wei, “Attention Connect Network for Liver Tumor Segmentation from CT and MRI Images,” Technol. Cancer Res. Treat., vol. 23, pp. 1–11, 2024, doi: 10.1177/15330338231219366.

M. W. Sabir et al., “Segmentation of Liver Tumor in CT Scan Using ResU-Net,” Appl. Sci., vol. 12, no. 17, pp. 1–15, 2022, doi: 10.3390/app12178650.

Ü. Budak, Y. Guo, E. Tanyildizi, and A. Şengür, “Cascaded deep convolutional encoder-decoder neural networks for efficient liver tumor segmentation,” Med. Hypotheses, vol.134,p. 109431,2020.doi.10.1016/j.mehy.2019.109431.

K. Hettihewa, T. Kobchaisawat, N. Tanpowpong, and T. H. Chalidabhongse, “MANet: a multi-attention network for automatic liver tumor segmentation in computed tomography (CT) imaging,” Sci. Rep., vol. 13, no. 1, p. 20098, 2023, doi: 10.1038/s41598-023-46580-4.

Z. Wang et al., “Hybrid gabor attention convolution and transformer interaction network with hierarchical monitoring mechanism for liver and tumor segmentation,” Sci. Rep., vol. 15, Mar. 2025, doi: 10.1038/s41598-025-90151-8.

P. Bilic et al., “The Liver Tumor Segmentation Benchmark (LiTS),” Med. Image Anal., vol. 84, Feb. 2023, doi: 10.1016/j.media.2022.102680.

G. Litjens et al., “A survey on deep learning in medical image analysis,” Med. Image Anal., vol. 42, pp. 60–88, 2017, doi: 10.1016/j.media.2017.07.005.

S. M. Pizer, R. E. Johnston, J. P. Ericksen, B. C. Yankaskas, and K. E. Muller, “Contrast-limited adaptive histogram equalization: speed and effectiveness,” in [1990] Proceedings of the First Conference on Visualization in Biomedical Computing, 1990, pp. 337–345. doi: 10.1109/VBC.1990.109340.

S. Rahman, M. M. Rahman, M. Abdullah-Al-Wadud, G. D. Al-Quaderi, and M. Shoyaib, “An adaptive gamma correction for image enhancement,” EURASIP J. Image Video Process., vol. 2016, no. 1, p. 35, 2016, doi: 10.1186/s13640-016-0138-1.

A. Buslaev, V. I. Iglovikov, E. Khvedchenya, A. Parinov, M. Druzhinin, and A. A. Kalinin, “Albumentations: Fast and Flexible Image Augmentations,” 2020. doi: 10.3390/info11020125.

F. Anwar, M. Attique, S. Kadry, and J. Kim, "ResTransUNet: A hybrid CNN-transformer approach for liver and tumor segmentation in CT images," Computers in Biology and Medicine, vol. 190, p. 110048, May 2025, doi: 10.1016/j.compbiomed.2025.110048.

O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation BT - Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015,” N. Navab, J. Hornegger, W. M. Wells, and A. F. Frangi, Eds., Cham: Springer International Publishing, 2015, pp. 234–241. doi:10.1007/978-3-319-24574-4_28.

J. L. Ba, J. R. Kiros, and G. E. Hinton, “Layer Normalization,”2016,doi:10.48550/arXiv.1607.06450.

V. Nair and G. E. Hinton, “Rectified linear units improve Restricted Boltzmann machines,” ICML 2010 - Proceedings, 27th Int. Conf. Mach. Learn., no. 3, pp. 807–814, 2010. doi: 10.5555/3104322.3104425.

W. Luo, Y. Li, R. Urtasun, and R. Zemel, “Understanding the effective receptive field in deep convolutional neural networks,” Adv. Neural Inf. Process. Syst., no. Nips, pp. 4905–4913, 2016, doi: 10.48550/arXiv.1701.04128.

Y. Dan, W. Jin, X. Yue, and Z. Wang, “Enhancing medical image segmentation with a multi-transformer U-Net,” pp. 1–19, 2024, doi: 10.7717/peerj.17005.

K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778. doi:10.48550/arXiv.1512.03385.

K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks BT - Computer Vision – ECCV 2016,” B. Leibe, J. Matas, N. Sebe, and M. Welling, Eds., Cham: Springer International Publishing, 2016, pp. 630–645. doi:10.1007/978-3-319-46493-0_38.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958,2014 Available: https://jmlr.org/papers/v15/srivastava14a.html.

V. Dumoulin and F. Visin, “A guide to convolution arithmetic for deep learning,” pp. 1–31, 2018, [Online]. doi.org/10.48550/arXiv.1603.07285.

J. S. Bridle, “Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition BT - Neurocomputing,” F. F. Soulié and J. Hérault, Eds., Berlin, Heidelberg: Springer Berlin Heidelberg, 1990, pp. 227–236. doi:10.1007/978-3-642-76153-9_28.

N. Abraham and N. M. Khan, “A novel focal tversky loss function with improved attention u-net for lesion segmentation,” Proc. - Int. Symp. Biomed. Imaging, vol. 2019-April, pp. 683–687, 2019, doi: 10.1109/ISBI.2019.8759329.

T.-Y. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal Loss for Dense Object Detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, pp. 318–327, 2017, [Online]. doi: 10.48550/arXiv.1708.02002.

C. H. Sudre, W. Li, T. K. M. Vercauteren, S. Ourselin, and M. J. Cardoso, “Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations,” Deep Learn. Med. image Anal. multimodal Learn. Clin. Decis. Support Third Int. Work. DLMIA 2017, 7th Int. Work. ML-CDS 2017, held conjunction with MICCAI 2017 Quebec City, QC,..., vol. 2017, pp. 240–248, 2017, [Online]. doi:10.48550/arXiv.1707.03237.

N. A. Al-Najdawi, A. F. Al-Shawabkeh, S. Tedmori, I. I. Ikhries, and O. Dorgham, “Comprehensive evaluation of optimization algorithms for medical image segmentation,” Sci. Rep., vol. 15, no. 1, Dec. 2025, doi: 10.1038/s41598-025-14261-z.

J. Liao, H. Wang, H. Gu, and Y. Cai, “Liver tumor segmentation method combining multi-axis attention and conditional generative adversarial networks,” PLoS One, vol. 19, no. 12 December, pp. 1–24, 2024, doi: 10.1371/journal.pone.0312105.

A. Hatamizadeh et al., “UNETR: Transformers for 3D Medical Image Segmentation,” Oct. 2021, [Online]. Available: doi:10.48550/arXiv.2103.10504.

F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,” Nat. Methods, vol. 18, no. 2, pp. 203–211, Feb. 2021, doi: 10.1038/s41592-020-01008-z.

A. L. Simpson et al., “Large Annotated Medical Image Dataset for The Development and Evaluation of Segmentation Algorithms,” ArXiv, vol. abs/1902.0, 2019. doi:10.48550/arXiv.1902.09063.