Comparative Analysis of Attention Mechanisms in Pix2Pix for Multimodal MRI Fusion
Abstract
Medical image fusion (MIF) is a key technique in medical imaging, which combines complementary information from different imaging modalities, thereby improving the accuracy of diagnosis, particularly for lesion detection and treatment planning. Deep learning has significantly advanced this area, with the development of generative models and transformers leading to improvements in fidelity and accuracy, although the study of the influence of attention mechanisms on these models remains limited to a single type or a single architectural placement. This paper offers an analytical examination of the architectures of Pix2Pix with three attention mechanisms (spatial attention, channel attention (Squeeze-and-Excitation), and self-attention), where they are tested in three different placement strategies (encoder-only, decoder-only, and encoder-decoder), using the BraTS2020 dataset, with training supervised by a pseudo-ground-truth derived from arithmetic averaging. We fused six MRI modality pairs (FLAIR-T1, FLAIR-T1ce, FLAIR-T2, T1-T1ce, T1-T2, T1ce-T2), evaluating them using different metrics, including SSIM, PSNR, NMI, Entropy, and QAB/F. Results show that, in all cases, attention integration can significantly improve the quality of fusion over baseline methods, including cGAN and standard Pix2Pix. Spatial attention with encoder-decoder placement shows the best results, with SSIM values up to 0.91 and PSNR superior to 25 dB for the heterogeneous modality pair FLAIR-T1. Similarly, channel and self-attention demonstrate their effectiveness, especially with encoder-decoder placements. Based on these findings, attention-based fusion systems can be practically designed in a way that enhances MMIF, and the importance of designing attention in accordance with the nature of the modality is emphasized for optimal fusion performance. Our study demonstrates its effectiveness and may serve as a foundation for future research.
Downloads
References
G. Ali, S. Shah, M. AElAffendi, M. Asim, and M. Hammad, “The evolving landscape of few shot learning in medical image diagnosis a scoping review,” Discov. Appl. Sci., vol. 8, no. 2, p. 213, Jan. 2026, doi: 10.1007/s42452-025-08188-3.
M. A. Saleh, A. A. Ali, K. Ahmed, and A. M. Sarhan, “A brief analysis of multimodal medical image fusion techniques,” Electronics, vol. 12, no. 1, p. 97, 2022, doi: 10.3390/electronics12010097.
Z. Zhou, J. Wu, J. Jiang, M. Zhou, W. Guo, and Y. Hu, “A review of multimodal medical image fusion Developments in traditional, model-based and learning-based approaches,” Perioper. Precis. Med., pp. 152-167, Dec. 2025, doi: 10.61189/617079irudnn.
B. Huang, F. Yang, M. Yin, X. Mo, and C. Zhong, “A Review of Multimodal Medical Image Fusion Techniques,” Comput. Math. Methods Med., vol. 2020, p. 8279342, Apr. 2020, doi: 10.1155/2020/8279342.
T. Tirupal, B. C. Mohan, and S. S. Kumar, “Multimodal Medical Image Fusion Techniques - A Review,” Curr. Signal Transduct. Ther., vol. 16, no. 2, pp. 142-163, Aug. 2021, doi: 10.2174/1574362415666200226103116.
M. Zubair, M. Hussain, M. A. Albashrawi, M. Bendechache, and M. Owais, “A comprehensive review of techniques, algorithms, advancements, challenges, and clinical applications of multi-modal medical image fusion for improved diagnosis,” Comput. Methods Programs Biomed., vol. 272, p. 109014, Dec. 2025, doi: 10.1016/j.cmpb.2025.109014.
S. Ullah Khan, M. Ahmad Khan, M. Azhar, F. Khan, Y. Lee, and M. Javed, “Multimodal medical image fusion towards future research: A review,” J. King Saud Univ. - Comput. Inf. Sci., vol. 35, no. 8, p. 101733, Sep. 2023, doi: 10.1016/j.jksuci.2023.101733.
S. O. Boufaida, A. Benmachiche, and M. Maatallah, “Real-Time Image Processing Algorithms for Embedded Systems,” Jan. 09, 2026, arXiv: arXiv:2601.06243. doi: 10.48550/arXiv.2601.06243.
K. Rais, M. Amroune, A. Benmachiche, and M. Y. Haouam, “Exploring Variational Autoencoders for Medical Image Generation: A Comprehensive Study,” Nov. 11, 2024, arXiv: arXiv:2411.07348. doi: 10.48550/arXiv.2411.07348.
M. Maatallah, A. Benmachiche, K. Rais, and S. Touam, “Intelligent Fusion of Multi-Modal Medical Imaging: A Comprehensive Review of Methods, Challenges, and Clinical Integration,” J. Electron. Electromed. Eng. Med. Inform., vol. 8, no. 3, pp. 897-936, May 2026, doi: 10.35882/jeeemi.v8i3.1527.
A. Bouamrane, M. Derdour, A. Bennour, A. Benmachiche, and M. Gasmi, “Machine Learning for Medical Image Analysis,” in AI for Medical Image Analysis: Reconciling Innovation and Ethical Considerations, N. Ben Aoun, S. Ahmad, and M. Hammad, Eds., Cham: Springer Nature Switzerland, 2026, pp. 97-125. doi: 10.1007/978-3-032-02963-8_4.
Q. Zhang et al., “Multimodal Fusion on Low-quality Data: A Comprehensive Survey,” Inf. Fusion, p. 104437, May 2026, doi: 10.1016/j.inffus.2026.104437.
K. Rais, M. Amroune, M. Y. Haouam, and A. Benmachiche, “GAN-PSO: A novel approach for effective data augmentation through medical image generation,” Digit. Signal Process., vol. 180, p. 106246, Sep. 2026, doi: 10.1016/j.dsp.2026.106246.
K. Rais, M. Amroune, M. Y. Haouam, A. Benmachiche, and S. Abid, “Dynamic feature context activation and data augmentation for enhanced medical image segmentation,” Multimed. Tools Appl., vol. 85, Feb. 2026, doi: 10.1007/s11042-026-21296-5.
M. Haribabu, V. Guruviah, and P. Yogarajah, “Recent advancements in multimodal medical image fusion techniques for better diagnosis: an overview,” Curr. Med. Imaging Rev., vol. 19, no. 7, pp. 673-694, 2023, doi: 10.2174/1573405618666220606161137.
A. Benmachiche, M. Derdour, M. S. Kahil, M. C. Ghanem, and M. Deriche, “Adaptive Hybrid PSO-APF Algorithm for Advanced Path Planning in Next-Generation Autonomous Robots,” Sensors, vol. 25, no. 18, Sep. 2025, doi: 10.3390/s25185742.
I. Boutabia, A. Benmachiche, A. A. Betouil, C. Chemam, and K. Rais, “Advanced Text Prediction System Integrated Within the Search Engine for the Open Classroom Approach Based on Particle Swarm Optimization and Long Short-Term Memory Models,” Arab. J. Sci. Eng., Mar. 2026, doi: 10.1007/s13369-026-11247-5.
A. Benmachiche, A. Makhlouf, and T. Bouhadada, “Optimization learning of hidden Markov model using the bacterial foraging optimization algorithm for speech recognition,” Int. J. Knowl.-Based Intell. Eng. Syst., vol. 23, pp. 171-181, Oct. 2020, doi: 10.3233/KES-200039.
S. O. Boufaida, A. Benmachiche, A. Bennour, M. Maatallah, M. Derdour, and F. Ghabban, “Enhancing MOOC Course Classification with Convolutional Neural Networks via Lion Algorithm-Based Hyperparameter Tuning,” SN Comput. Sci., vol. 6, no. 6, p. 707, Jul. 2025, doi: 10.1007/s42979-025-04179-8.
N. Goswami, A. Dogra, S. Bakshi, and B. Goyal, “Multimodal Medical Image Fusion: Techniques, Databases, Evaluation Metrics, and Clinical Applications -A Comprehensive Review”, doi: 10.2174/0118744400417835251022042920.
A. Benmachiche, A. Makhlouf, and T. Bouhadada, “Evolutionary learning of HMM with Gaussian mixture densities for Automatic speech recognition,” in Proceedings of the 9th International Conference on Information Systems and Technologies, in ICIST ’19. New York, NY, USA: Association for Computing Machinery, Mar. 2019, pp. 1-6. doi: 10.1145/3361570.3361591.
B. K. Sedraoui, A. Benmachiche, A. Makhlouf, K. Rais, and C. Chemam, “CNN-OOA-Based Cyber Threat Detection: Protecting E-Learning from Phishing,” Arab. J. Sci. Eng., Apr. 2026, doi: 10.1007/s13369-026-11122-3.
G. Panda, S. Kundu, S. Bhattacharya, and A. Routray, “ell ₀ℓ0-Regularized Sparse Coding-Based Interpretable Network for Multi-Modal Image Fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 48, no. 4, pp. 4081-4097, Apr. 2026, doi: 10.1109/TPAMI.2025.3643898.
I. Boutabia, A. Benmachiche, A. Bennour, A. A. Betouil, M. Derdour, and F. Ghabban, “Hybrid CNN-ViT Model for Student Engagement Detection in Open Classroom Environments,” SN Comput. Sci., vol. 6, no. 6, p. 684, Jul. 2025, doi: 10.1007/s42979-025-04228-2.
S. O. Boufaida, A. Benmachiche, M. Derdour, M. Maatallah, M. S. Kahil, and M. C. Ghanem, “TSA-GRU: A Novel Hybrid Deep Learning Module for Learner Behavior Analytics in MOOCs,” Future Internet, vol. 17, no. 8, Aug. 2025, doi: 10.3390/fi17080355.
B. K. Sedraoui, A. Benmachiche, A. Bennour, A. Makhlouf, M. Derdour, and F. Ghabban, “LSTM-SWAP: A Hybrid Deep Learning Model for Cheating Detection,” SN Comput. Sci., vol. 6, no. 7, p. 798, Sep. 2025, doi: 10.1007/s42979-025-04334-1.
I. Soualmia, S. Maalem, A. Benmachiche, K. Rais, and M. Derdour, “Comparative Survey of AI-Driven Credit Card Fraud Detection: Machine Learning, Deep Learning and Hybrid Systems,” in 2025 International Conference on Networking and Advanced Systems (ICNAS), Oct. 2025, pp. 1-9. doi: 10.1109/ICNAS68168.2025.11298125.
Sa. I. Ibrahim, M. A. Makhlouf, and Gh. S. El-Tawel, “Multimodal medical image fusion algorithm based on pulse coupled neural networks and nonsubsampled contourlet transform,” Med. Biol. Eng. Comput., vol. 61, no. 1, pp. 155-177, Jan. 2023, doi: 10.1007/s11517-022-02697-8.
M. Rafiq, A. Maurya, P. Singh, and M. Diwakar, “Laplacian Pyramid-Based Fusion with Contrast-Entropy Attention and Sign-Consistent Softmax for Enhanced Multimodal Medical Imaging,” in 2026 2nd International Conference on Cognitive Computing in Engineering, Communications, Sciences and Biomedical Health Informatics (IC3ECSBHI), Feb. 2026, pp. 702-706. doi: 10.1109/IC3ECSBHI67834.2026.11468941.
F. Yang, M. Jia, L. Lu, and M. Yin, “Adaptive zero-learning medical image fusion,” Biomed. Signal Process. Control, vol. 84, p. 105008, Jul. 2023, doi: 10.1016/j.bspc.2023.105008.
X. Feng et al., “MMIF-VAEFusion: An end-to-end multi-modal medical image fusion network using vector quantized variational auto-encoder,” Biomed. Signal Process. Control, vol. 102, p. 107407, Apr. 2025, doi: 10.1016/j.bspc.2024.107407.
M. Safari, A. Fatemi, and L. Archambault, “MedFusionGAN: multimodal medical image fusion using an unsupervised deep generative adversarial network,” BMC Med. Imaging, vol. 23, no. 1, p. 203, 2023, doi: 10.1186/s12880-023-01160-w.
T. Zhang, X. Yang, R. Lu, D. Zhang, X. Xie, and Z. Zhu, “Modal Feature Disentanglement and Contribution Estimation for Multimodality Image Fusion,” IEEE Trans. Instrum. Meas., vol. 74, pp. 1-16, 2025, doi: 10.1109/TIM.2025.3545534.
J. Huang, T. Tan, X. Li, T. Ye, and Y. Wu, “Multiple attention channels aggregated network for multimodal medical image fusion,” Med. Phys., vol. 52, no. 4, pp. 2356-2374, 2025, doi: 10.1002/mp.17607.
W. Tang, F. He, Y. Liu, and Y. Duan, “MATR: Multimodal Medical Image Fusion via Multiscale Adaptive Transformer,” IEEE Trans. Image Process., vol. 31, pp. 5134-5149, 2022, doi: 10.1109/TIP.2022.3193288.
Z. Zhang, T. Zhang, and Y. Sun, “FMTFuse: Edge Fourier-Enhanced Multi-Scale Transformer for Multi-Modal Image Fusion,” in ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2026, pp. 12102-12106. doi: 10.1109/ICASSP55912.2026.11463388.
W. Wang et al., “MDC-RHT: Multi-Modal Medical Image Fusion via Multi-Dimensional Dynamic Convolution and Residual Hybrid Transformer,” Sensors, vol. 24, no. 13, Jun. 2024, doi: 10.3390/s24134056.
F. Luo, D. Wu, L. R. Pino, and W. Ding, “A novel multimodel medical image fusion framework with edge enhancement and cross-scale transformer,” Sci. Rep., vol. 15, no. 1, p. 11657, Apr. 2025, doi: 10.1038/s41598-025-93616-y.
R. He et al., “Multiscale self-attention convolution and adaptive fusion for enhanced multimodal medical image fusion,” Expert Syst. Appl., vol. 299, p. 129967, Mar. 2026, doi: 10.1016/j.eswa.2025.129967.
D. Cao, J. Wang, J. Yan, Z. Chen, X. Liao, and H. Cheng, “Neighborhood-Attention-Based Multiscale Alignment and Hierarchical Reconstruction for Multimodal Medical Image Fusion,” ACM Trans Multimed. Comput Commun Appl, février 2026, doi: 10.1145/3797039.
Z. Zhao et al., “DDFM: Denoising Diffusion Model for Multi-Modality Image Fusion,” in 2023 IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France: IEEE, Oct. 2023, pp. 8048-8059. doi: 10.1109/ICCV51070.2023.00742.
G. C. Kumar, K. M, J. S, and N. S, “Structured constraints based Deep guided Generative adversarial network(GAN) for deformable multimodal medical image fusion(MMIF) and enhancement,” in 2025 2nd International Conference on New Frontiers in Communication, Automation, Management and Security (ICCAMS), Jul. 2025, pp. 1-5. doi: 10.1109/ICCAMS65118.2025.11234098.
H. Song, Y. Mao, J. Feng, and M. Ye, “MAPD-Mamba: Modality-Adaptive Perception-Driven Mamba Fusion Network,” in ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 2026, pp. 5306-5310. doi: 10.1109/ICASSP55912.2026.11462025.
L. Xie et al., “Deep label fusion: A generalizable hybrid multi-atlas and deep convolutional neural network for medical image segmentation,” Med. Image Anal., vol. 83, p. 102683, Jan. 2023, doi: 10.1016/j.media.2022.102683.
L. Huang, T. Denoeux, P. Vera, and S. Ruan, “Evidence Fusion with Contextual Discounting for Multi-modality Medical Image Segmentation,” in Medical Image Computing and Computer Assisted Intervention - MICCAI 2022, L. Wang, Q. Dou, P. T. Fletcher, S. Speidel, and S. Li, Eds., Cham: Springer Nature Switzerland, 2022, pp. 401-411. doi: 10.1007/978-3-031-16443-9_39.
L. Tang et al., “GAN-Guided Few-Shot Attention Network for Medical Images Fusion Quality Assessment,” IEEE Trans. Med. Imaging, vol. 44, no. 11, pp. 4292-4306, Nov. 2025, doi: 10.1109/TMI.2025.3572511.
Y. Wang et al., “RFSC: Multimodal medical image alignment fusion diagnostic classification network based on de discriminator image translation,” Biomed. Signal Process. Control, vol. 109, p. 107905, Nov. 2025, doi: 10.1016/j.bspc.2025.107905.
X. Huo et al., “HiFuse: Hierarchical Multi-Scale Feature Fusion Network for Medical Image Classification,” Sep. 21, 2022, arXiv: arXiv:2209.10218. doi: 10.48550/arXiv.2209.10218.
J. Yin, J. Peng, X. Li, and J. Wang, “Enhanced Aortic CT Synthesis Based on Multiscale Information Fusion,” IEEE Multimed., vol. 32, no. 2, pp. 75-84, Apr. 2025, doi: 10.1109/MMUL.2025.3546908.
Mst. N. Aktar, A. J. Lambert, and M. Pickering, “An automatic fusion algorithm for multi-modal medical images,” Comput. Methods Biomech. Biomed. Eng. Imaging Vis., vol. 6, no. 5, pp. 584-598, Sep. 2018, doi: 10.1080/21681163.2017.1304244.
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-Image Translation with Conditional Adversarial Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jul. 2017, pp. 5967-5976. doi: 10.1109/CVPR.2017.632.
S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” in Computer Vision - ECCV 2018, vol. 11211, V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, Eds., in Lecture Notes in Computer Science, vol. 11211., Cham: Springer International Publishing, 2018, pp. 3-19. doi: 10.1007/978-3-030-01234-2_1.
J. Hu, L. Shen, and G. Sun, “Squeeze-and-Excitation Networks,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2018, pp. 7132-7141. doi: 10.1109/CVPR.2018.00745.
A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. doi: 10.5555/3295222.3295349.
S. Bakas et al., “Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge,” Apr. 23, 2019, arXiv: arXiv:1811.02629. doi: 10.48550/arXiv.1811.02629.
B. H. Menze et al., “The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS),” IEEE Trans. Med. Imaging, vol. 34, no. 10, pp. 1993-2024, Oct. 2015, doi: 10.1109/TMI.2014.2377694.
S. Bakas et al., “Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features,” Sci. Data, vol. 4, p. 170117, Sep. 2017, doi: 10.1038/sdata.2017.117.
Copyright (c) 2026 Ali-Abdelatif Betouil, Abdelmadjid Benmachiche, Khadija Rais, Amel Sahki, Imene Soualmia

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).


.png)
.png)
.png)
.png)
.png)