Comparative Analysis of Attention Mechanisms in Pix2Pix for Multimodal MRI Fusion

Ali-Abdelatif Betouil; Abdelmadjid Benmachiche; Khadija Rais; Amel Sahki; Imene Soualmia

doi:10.35882/jeeemi.v8i3.1720

Ali-Abdelatif Betouil Laboratory of Computer Science and Applied Mathematics, Dept. of Computer Science, Faculty of Science and Technology, Chadli Bendjedid, University, El-Tarf, Algeria. https://orcid.org/0000-0003-0859-6948
Abdelmadjid Benmachiche Laboratory of Computer Science and Applied Mathematics, Dept. of Computer Science, Faculty of Science and Technology, Chadli Bendjedid, University, El-Tarf, Algeria. https://orcid.org/0000-0002-0690-2625
Khadija Rais Laboratory of Mathematics, Informatics and Systems (LAMIS), Echahid Cheikh Larbi Tebessi University, Tebessa, Algeria. https://orcid.org/0009-0004-3907-7782
Amel Sahki Laboratory of Computer Science and Applied Mathematics, Dept. of Computer Science, Faculty of Science and Technology, Chadli Bendjedid, University, El-Tarf, Algeria https://orcid.org/0009-0008-9431-5914
Imene Soualmia Laboratory of Computer Science and Applied Mathematics, Dept. of Computer Science, Faculty of Science and Technology, Chadli Bendjedid, University, El-Tarf, Algeria. https://orcid.org/0009-0005-1872-1986

DOI: https://doi.org/10.35882/jeeemi.v8i3.1720

Abstract

Medical image fusion (MIF) is a key technique in medical imaging, which combines complementary information from different imaging modalities, thereby improving the accuracy of diagnosis, particularly for lesion detection and treatment planning. Deep learning has significantly advanced this area, with the development of generative models and transformers leading to improvements in fidelity and accuracy, although the study of the influence of attention mechanisms on these models remains limited to a single type or a single architectural placement. This paper offers an analytical examination of the architectures of Pix2Pix with three attention mechanisms (spatial attention, channel attention (Squeeze-and-Excitation), and self-attention), where they are tested in three different placement strategies (encoder-only, decoder-only, and encoder-decoder), using the BraTS2020 dataset, with training supervised by a pseudo-ground-truth derived from arithmetic averaging. We fused six MRI modality pairs (FLAIR-T1, FLAIR-T1ce, FLAIR-T2, T1-T1ce, T1-T2, T1ce-T2), evaluating them using different metrics, including SSIM, PSNR, NMI, Entropy, and Q^AB/F. Results show that, in all cases, attention integration can significantly improve the quality of fusion over baseline methods, including cGAN and standard Pix2Pix. Spatial attention with encoder-decoder placement shows the best results, with SSIM values up to 0.91 and PSNR superior to 25 dB for the heterogeneous modality pair FLAIR-T1. Similarly, channel and self-attention demonstrate their effectiveness, especially with encoder-decoder placements. Based on these findings, attention-based fusion systems can be practically designed in a way that enhances MMIF, and the importance of designing attention in accordance with the nature of the modality is emphasized for optimal fusion performance. Our study demonstrates its effectiveness and may serve as a foundation for future research.

Downloads

Download data is not yet available.

References

G. Ali, S. Shah, M. AElAffendi, M. Asim, and M. Hammad, “The evolving landscape of few shot learning in medical image diagnosis a scoping review,” Discov. Appl. Sci., vol. 8, no. 2, p. 213, Jan. 2026, doi: 10.1007/s42452-025-08188-3.

S. Ullah Khan, M. Ahmad Khan, M. Azhar, F. Khan, Y. Lee, and M. Javed, “Multimodal medical image fusion towards future research: A review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 35, no. 8, p. 101733, Sep. 2023, doi: 10.1016/j.jksuci.2023.101733.

Z. Zhou, J. Wu, J. Jiang, M. Zhou, W. Guo, and Y. Hu, “A review of multimodal medical image fusion Developments in traditional, model-based and learning-based approaches,” Perioper. Precis. Med., pp. 152–167, Dec. 2025, doi: 10.61189/617079irudnn.

A. Bouamrane, M. Derdour, A. Bennour, A. Benmachiche, and M. Gasmi, “Machine Learning for Medical Image Analysis,” in AI for Medical Image Analysis: Reconciling Innovation and Ethical Considerations, N. Ben Aoun, S. Ahmad, and M. Hammad, Eds., Cham: Springer Nature Switzerland, 2026, pp. 97–125. doi: 10.1007/978-3-032-02963-8_4.

Q. Zhang et al., “Multimodal Fusion on Low-quality Data: A Comprehensive Survey,” Inf. Fusion, p. 104437, May 2026, doi: 10.1016/j.inffus.2026.104437.

K. Rais, M. Amroune, M. Y. Haouam, A. Benmachiche, and S. Abid, “Dynamic feature context activation and data augmentation for enhanced medical image segmentation,” Multimed. Tools Appl., vol. 85, Feb. 2026, doi: 10.1007/s11042-026-21296-5.

M. Haribabu, V. Guruviah, and P. Yogarajah, “Recent advancements in multimodal medical image fusion techniques for better diagnosis: an overview,” Curr. Med. Imaging Rev., vol. 19, no. 7, pp. 673–694, 2023, doi: 10.2174/1573405618666220606161137.

T. Tirupal, B. C. Mohan, and S. S. Kumar, “Multimodal Medical Image Fusion Techniques – A Review,” Curr. Signal Transduct. Ther., vol. 16, no. 2, pp. 142–163, Aug. 2021, doi: 10.2174/1574362415666200226103116.

G. Dai et al., “Prompt-level contrastive learning for context-aware multi-modal image representation in medical diagnosis,” Pattern Recognit., vol. 174, p. 113027, Jun. 2026, doi: 10.1016/j.patcog.2025.113027.

W. Li, P. Jia, D. He, S. Liu, G. Wang, and Y. Huang, “SAFusion: Scenario-Adaptive Network for Multimodal Medical Image Fusion,” IEEE J. Biomed. Health Inform., pp. 1–14, 2026, doi: 10.1109/JBHI.2026.3651957.

C. Yu, J. Ye, Y. Liu, X. Zhang, and Z. Zhang, “AMF-MedIT: An efficient align-modulation-fusion framework for medical image–tabular data,” Biomed. Signal Process. Control, vol. 118, p. 109772, 2026, doi: 10.1016/j.bspc.2026.109772.

Sa. I. Ibrahim, M. A. Makhlouf, and Gh. S. El-Tawel, “Multimodal medical image fusion algorithm based on pulse coupled neural networks and nonsubsampled contourlet transform,” Med. Biol. Eng. Comput., vol. 61, no. 1, pp. 155–177, Jan. 2023, doi: 10.1007/s11517-022-02697-8.

M. Rafiq, A. Maurya, P. Singh, and M. Diwakar, “Laplacian Pyramid-Based Fusion with Contrast–Entropy Attention and Sign-Consistent Softmax for Enhanced Multimodal Medical Imaging,” in 2026 2nd International Conference on Cognitive Computing in Engineering, Communications, Sciences and Biomedical Health Informatics (IC3ECSBHI), Feb. 2026, pp. 702–706. doi: 10.1109/IC3ECSBHI67834.2026.11468941.

F. Yang, M. Jia, L. Lu, and M. Yin, “Adaptive zero-learning medical image fusion,” Biomed. Signal Process. Control, vol. 84, p. 105008, Jul. 2023, doi: 10.1016/j.bspc.2023.105008.

X. Feng et al., “MMIF-VAEFusion: An end-to-end multi-modal medical image fusion network using vector quantized variational auto-encoder,” Biomed. Signal Process. Control, vol. 102, p. 107407, Apr. 2025, doi: 10.1016/j.bspc.2024.107407.

M. Safari, A. Fatemi, and L. Archambault, “MedFusionGAN: multimodal medical image fusion using an unsupervised deep generative adversarial network,” BMC Med. Imaging, vol. 23, no. 1, p. 203, 2023, doi: 10.1186/s12880-023-01160-w.

T. Zhang, X. Yang, R. Lu, D. Zhang, X. Xie, and Z. Zhu, “Modal Feature Disentanglement and Contribution Estimation for Multimodality Image Fusion,” IEEE Trans. Instrum. Meas., vol. 74, pp. 1–16, 2025, doi: 10.1109/TIM.2025.3545534.

J. Huang, T. Tan, X. Li, T. Ye, and Y. Wu, “Multiple attention channels aggregated network for multimodal medical image fusion,” Med. Phys., vol. 52, no. 4, pp. 2356–2374, 2025, doi: 10.1002/mp.17607.

W. Tang, F. He, Y. Liu, and Y. Duan, “MATR: Multimodal Medical Image Fusion via Multiscale Adaptive Transformer,” IEEE Trans. Image Process., vol. 31, pp. 5134–5149, 2022, doi: 10.1109/TIP.2022.3193288.

Z. Zhang, T. Zhang, and Y. Sun, “FMTFuse: Edge Fourier-Enhanced Multi-Scale Transformer for Multi-Modal Image Fusion,” in ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2026, pp. 12102–12106. doi: 10.1109/ICASSP55912.2026.11463388.

W. Wang et al., “MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer,” Sensors, vol. 24, no. 13, Jun. 2024, doi: 10.3390/s24134056.

F. Luo, D. Wu, L. R. Pino, and W. Ding, “A novel multimodel medical image fusion framework with edge enhancement and cross-scale transformer,” Sci. Rep., vol. 15, no. 1, p. 11657, Apr. 2025, doi: 10.1038/s41598-025-93616-y.

R. He et al., “Multiscale self-attention convolution and adaptive fusion for enhanced multimodal medical image fusion,” Expert Syst. Appl., vol. 299, p. 129967, Mar. 2026, doi: 10.1016/j.eswa.2025.129967.

D. Cao, J. Wang, J. Yan, Z. Chen, X. Liao, and H. Cheng, “Neighborhood-Attention-Based Multiscale Alignment and Hierarchical Reconstruction for Multimodal Medical Image Fusion,” ACM Trans Multimed. Comput Commun Appl, février 2026, doi: 10.1145/3797039.

Z. Zhao et al., “DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France: IEEE, Oct. 2023, pp. 8048–8059. doi: 10.1109/ICCV51070.2023.00742.

G. C. Kumar, K. M, J. S, and N. S, “Structured constraints based Deep guided Generative adversarial network(GAN) for deformable multimodal medical image fusion(MMIF) and enhancement,” in 2025 2nd International Conference on New Frontiers in Communication, Automation, Management and Security (ICCAMS), Jul. 2025, pp. 1–5. doi: 10.1109/ICCAMS65118.2025.11234098.

H. Song, Y. Mao, J. Feng, and M. Ye, “MAPD-Mamba: Modality-Adaptive Perception-Driven Mamba Fusion Network,” in ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2026, pp. 5306–5310. doi: 10.1109/ICASSP55912.2026.11462025.

L. Xie et al., “Deep label fusion: A generalizable hybrid multi-atlas and deep convolutional neural network for medical image segmentation,” Med. Image Anal., vol. 83, p. 102683, Jan. 2023, doi: 10.1016/j.media.2022.102683.

L. Huang, T. Denoeux, P. Vera, and S. Ruan, “Evidence Fusion with Contextual Discounting for Multi-modality Medical Image Segmentation,” in Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, L. Wang, Q. Dou, P. T. Fletcher, S. Speidel, and S. Li, Eds., Cham: Springer Nature Switzerland, 2022, pp. 401–411. doi: 10.1007/978-3-031-16443-9_39.

L. Tang et al., “GAN-Guided Few-Shot Attention Network for Medical Images Fusion Quality Assessment,” IEEE Trans. Med. Imaging, vol. 44, no. 11, pp. 4292–4306, Nov. 2025, doi: 10.1109/TMI.2025.3572511.

Y. Wang et al., “RFSC: Multimodal medical image alignment fusion diagnostic classification network based on de discriminator image translation,” Biomed. Signal Process. Control, vol. 109, p. 107905, Nov. 2025, doi: 10.1016/j.bspc.2025.107905.

X. Huo, G. Sun, S. Tian, Y. Wang, L. Yu, J. Long, W. Zhang, and A. Li, "HiFuse: Hierarchical multi-scale feature fusion network for medical image classification," Biomedical Signal Processing and Control, vol. 87, part A, art. no. 105534, Jan. 2024, doi: 10.1016/j.bspc.2023.105534.

J. Yin, J. Peng, X. Li, and J. Wang, “Enhanced Aortic CT Synthesis Based on Multiscale Information Fusion,” IEEE Multimed., vol. 32, no. 2, pp. 75–84, Apr. 2025, doi: 10.1109/MMUL.2025.3546908.

Mst. N. Aktar, A. J. Lambert, and M. Pickering, “An automatic fusion algorithm for multi-modal medical images,” Comput. Methods Biomech. Biomed. Eng. Imaging Vis., vol. 6, no. 5, pp. 584–598, Sep. 2018, doi: 10.1080/21681163.2017.1304244.

P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 5967–5976. doi: 10.1109/CVPR.2017.632.

S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” in Computer Vision – ECCV 2018, vol. 11211, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., in Lecture Notes in Computer Science, vol. 11211. , Cham: Springer International Publishing, 2018, pp. 3–19. doi: 10.1007/978-3-030-01234-2_1.

J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 7132–7141. doi: 10.1109/CVPR.2018.00745.

A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. doi: 10.5555/3295222.3295349.

S. Bakas et al., “Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge,” Apr. 23, 2019, arXiv: arXiv:1811.02629. doi: 10.48550/arXiv.1811.02629.

B. H. Menze et al., “The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS),” IEEE Trans. Med. Imaging, vol. 34, no. 10, pp. 1993–2024, Oct. 2015, doi: 10.1109/TMI.2014.2377694.

S. Bakas et al., “Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features,” Sci. Data, vol. 4, p. 170117, Sep. 2017, doi: 10.1038/sdata.2017.117.