Comparative Analysis of YOLO11 and Mask R-CNN for Automated Glaucoma Detection
Abstract
Glaucoma is a progressive optic neuropathy and a major cause of irreversible blindness. Early detection is crucial, yet current practice depends on manual estimation of the vertical Cup-to-Disc Ratio (vCDR), which is subjective and inefficient. Automated fundus image analysis provides scalable solutions but is challenged by low optic cup contrast, dataset variability, and the need for clinically interpretable outcomes. This study aimed to develop and evaluate an automated glaucoma screening pipeline based on optic disc (OD) and optic cup (OC) segmentation, comparing a single-stage model (YOLO11-Segmentation) with a two-stage model (Mask R-CNN with ResNet50-FPN), and validating it using vCDR at a threshold of 0.7. The contributions are fourfold: establishing a benchmark comparison of YOLO11 and Mask R-CNN across three datasets (REFUGE, ORIGA, G1020); linking segmentation accuracy to vCDR-based screening; analyzing precision–recall trade-offs between the models; and providing a reproducible baseline for future studies. The pipeline employed standardized preprocessing (optic nerve head cropping, resizing to 1024×1024, conservative augmentation). YOLO11 was trained for 200 epochs, and Mask R-CNN for 75 epochs. Evaluation metrics included Dice, Intersection over Union (IoU), mean absolute error (MAE), correlation, and classification performance. Results showed that Mask R-CNN achieved higher disc Dice (0.947 in G1020, 0.938 in REFUGE) and recall (0.880 in REFUGE), while YOLO11 attained stronger vCDR correlation (r = 0.900 in ORIGA) and perfect precision (1.000 in G1020). Overall accuracy exceeded 0.92 in REFUGE and G1020. In conclusion, YOLO11 favored conservative screening with fewer false positives, while Mask R-CNN improved sensitivity. These complementary strengths highlight the importance of model selection by screening context and suggest future research on hybrid frameworks and multimodal integration
Downloads
References
A. A. Jafer Chardoub, M. Zeppieri, and K. Blair, Juvenile Glaucoma. StatPearls Publishing, 2024.
X. Cao, X. Sun, S. Yan, and Y. Xu, “A narrative review of glaucoma screening from fundus images,” Ann. Eye Sci., vol. 6, p. 27, 2021, doi: 10.21037/aes-2020-lto-005.
L. Wang et al., “Automated segmentation of the optic disc from fundus images using an asymmetric deep learning network,” Pattern Recognit., vol. 112, p. 107810, 2021, doi: 10.1016/j.patcog.2020.107810.
J. Shen, Y. Hu, X. Zhang, Y. Gong, R. Kawasaki, and J. Liu, “Structure-Oriented Transformer for retinal diseases grading from OCT images,” Comput. Biol. Med., vol. 152, p. 106445, 2023, doi: 10.1016/j.compbiomed.2022.106445.
T. Nazir, A. Irtaza, and V. Starovoitov, “Optic Disc and Optic Cup Segmentation for Glaucoma Detection from Blur Retinal Images Using Improved Mask-RCNN,” Int. J. Opt., pp. 1–12, 2021, doi: 10.1155/2021/6641980.
N. Chen and X. Lv, “Research on segmentation model of optic disc and optic cup in fundus,” BMC Ophthalmol., vol. 24, no. 1, 2024, doi: 10.1186/s12886-024-03532-4.
E. Moris et al., “Assessing Coarse-to-Fine Deep Learning Models for Optic Disc and Cup Segmentation in Fundus Images,” 2022. [Online]. Available: https://arxiv.org/abs/2209.14383
A. Bansal, J. Kubíček, M. Penhaker, and M. Augustynek, “A comprehensive review of optic disc segmentation methods in adult and pediatric retinal images: from conventional methods to artificial intelligence (CR-ODSeg-AP-CM2AI),” Artif. Intell. Rev., vol. 58, no. 4, 2025, doi: 10.1007/s10462-024-11056-y.
A. K. Chaurasia et al., “Highly accurate and precise automated cup-to-disc ratio quantification for glaucoma screening,” Ophthalmol. Sci., vol. 4, no. 5, p. 100540, 2024, doi: 10.1016/j.xops.2024.100540.
M. Khanna, L. K. Singh, S. Thawkar, and M. Goyal, “Deep learning based computer-aided automatic prediction and grading system for diabetic retinopathy,” Multimed. Tools Appl., vol. 82, no. 25, pp. 39255–39302, 2023, doi: 10.1007/s11042-023-14970-5.
Z. D. Soh et al., “Asian-specific vertical cup-to-disc ratio cut-off for glaucoma screening: An evidence-based recommendation from a multi-ethnic Asian population,” Clin. Exp. Ophthalmol., vol. 48, no. 9, pp. 1210–1218, 2020, doi: 10.1111/ceo.13836.
B. P. Yap et al., “Generalizability of Deep Neural Networks for Vertical Cup-to-Disc Ratio Estimation in Ultra-Widefield and Smartphone-Based Fundus Images,” Transl. Vis. Sci. Technol., vol. 13, no. 4, p. 6, 2024, doi: 10.1167/tvst.13.4.6.
C. Mishra and K. Tripathy, Fundus Camera. StatPearls Publishing, 2023. [Online]. Available: https://www.ncbi.nlm.nih.gov/books/NBK585111/
U. Iqbal, “Smartphone fundus photography: a narrative review,” Int. J. Retin. Vitr., vol. 7, no. 1, pp. 1–8, 2021, doi: 10.1186/s40942-021-00313-9.
S. Molière et al., “Reference standard for the evaluation of automatic segmentation algorithms: Quantification of inter observer variability of manual delineation of prostate contour on MRI,” Diagn. Interv. Imaging, vol. 105, no. 2, pp. 65–73, 2023, doi: 10.1016/j.diii.2023.08.001.
M. L. Ali and Z. Zhang, “The YOLO Framework: A Comprehensive Review of Evolution, Applications, and Benchmarks in Object Detection,” Computers, vol. 13, no. 12, p. 336, 2024, doi: 10.3390/computers13120336.
M. N. Bajwa, Singh, W. Neumeier, M. I. Malik, A. Dengel, and S. Ahmed, “G1020: A Benchmark Retinal Fundus Image Dataset for Computer-Aided Glaucoma Detection,” arXiv Prepr. arXiv2006.09158, 2020, [Online]. Available: https://arxiv.org/abs/2006.09158
J. I. Orlando et al., “REFUGE Challenge: A unified framework for evaluating automated methods for glaucoma assessment from fundus photographs,” Med. Image Anal., vol. 59, p. 101570, 2019, doi: 10.1016/j.media.2019.101570.
Z. Zhang et al., “ORIGA(-light): an online retinal fundus image database for glaucoma analysis and research,” in Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2010, pp. 3065–3068. doi: 10.1109/IEMBS.2010.5626137.
F. Isensee, P. F. Jaeger, S. A. A. Kohl, J. Petersen, and K. H. Maier-Hein, “nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation,” Nat. Methods, vol. 18, no. 2, pp. 203–211, 2020, doi: 10.1038/s41592-020-01008-z.
A. Septiarini, H. Hamdani, E. Setyaningsih, E. Junirianto, and F. Utaminingrum, “Automatic Method for Optic Disc Segmentation Using Deep Learning on Retinal Fundus Images,” Healthc. Inform. Res., vol. 29, no. 2, pp. 145–151, 2023, doi: 10.4258/hir.2023.29.2.145.
O. Kovalyk, J. Morales-S’anchez, R. Verd’u-Monedero, I. Sell’es-Navarro, A. Palaz’on-Cabanes, and J.-L. Sancho-G’omez, “PAPILA: Dataset with fundus images and clinical data of both eyes of the same patient for glaucoma assessment,” Sci. Data, vol. 9, no. 1, pp. 1–7, 2022, doi: 10.1038/s41597-022-01388-1.
Y. Xu, R. Quan, W. Xu, Y. Huang, X. Chen, and F. Liu, “Advances in Medical Image Segmentation: A Comprehensive Review of Traditional, Deep Learning and Hybrid Approaches,” Bioengineering, vol. 11, no. 10, p. 1034, 2024, doi: 10.3390/bioengineering11101034.
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2961–2969. doi: 10.1109/ICCV.2017.322.
A. Dutta and A. Zisserman, “The VIA Annotation Software for Images, Audio and Video,” in Proceedings of the 30th ACM International Conference on Multimedia, 2019, pp. 2276–2279. doi: 10.1145/3343031.3350535.
T.-Y. Lin et al., “Microsoft COCO: Common Objects in Context,” in European Conference on Computer Vision (ECCV), 2014, pp. 740–755. doi: 10.1007/978-3-319-10602-1_48.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788. doi: 10.1109/CVPR.2016.91.
A. C. Thompson, A. A. Jammal, S. I. Berchuck, E. B. Mariottoni, and F. A. Medeiros, “Assessment of a Segmentation-Free Deep Learning Algorithm for Diagnosing Glaucoma From Optical Coherence Tomography Scans,” JAMA Ophthalmol., vol. 138, no. 4, pp. 333–340, 2020, doi: 10.1001/jamaophthalmol.2019.5983.
M. Carranza-Garc’ia, J. Torres-Mateo, P. Lara-Ben’itez, and J. Garc’ia-Guti’errez, “On the Performance of One-Stage and Two-Stage Object Detectors in Autonomous Vehicles Using Camera Data,” Remote Sens., vol. 13, no. 1, p. 89, 2020, doi: 10.3390/rs13010089.
M. Song, L. Das, and K. Comuy, “Enhancing Retinal Imaging with Data Augmentation and Preprocessing,” 2024. [Online]. Available: https://www.researchgate.net/publication/387751248
Y. Shi, W. Wang, M. Yuan, and X. Wang, “Self-Paced Dual-Axis Attention Fusion Network for Retinal Vessel Segmentation,” Electronics, vol. 12, no. 9, p. 2107, 2023, doi: 10.3390/electronics12092107.
X. R. Gao, F. Wu, P. T. Yuhas, R. K. Rasel, and M. Chiariglione, “Automated vertical cup-to-disc ratio determination from fundus images for glaucoma detection,” Sci. Rep., vol. 14, no. 1, pp. 1–11, 2024, doi: 10.1038/s41598-024-55056-y.
D. Bolya, C. Zhou, F. Xiao, and Y. J. Lee, “YOLACT: Real-time Instance Segmentation,” arXiv Prepr. arXiv1904.02689, 2019, [Online]. Available: https://arxiv.org/abs/1904.02689
M. Hussain, “YOLOv5, YOLOv8 and YOLOv10: The Go-To Detectors for Real-time Vision,” 2020. [Online]. Available: https://arxiv.org/html/2407.02988v1
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., p. 1, 2018, doi: 10.1109/TPAMI.2018.2844175.
Y. Zhang, J. Chu, L. Leng, and J. Miao, “Mask-Refined R-CNN: A Network for Refining Object Details in Instance Segmentation,” Sensors, vol. 20, no. 4, p. 1010, 2020, doi: 10.3390/s20041010.
V. K. Velpula, J. Vadlamudi, P. P. Kasaraneni, and Y. V. P. Kumar, “Automated Glaucoma Detection in Fundus Images Using Comprehensive Feature Extraction and Advanced Classification Techniques,” in ECSA-11, Basel Switzerland: MDPI, Nov. 2024, p. 33. doi: 10.3390/ecsa-11-20437.
F. Renard, S. Guedria, N. D. Palma, and N. Vuillerme, “Variability and reproducibility in deep learning for medical image segmentation,” Sci. Rep., vol. 10, no. 1, pp. 1–16, 2020, doi: 10.1038/s41598-020-69920-0.
Ultralytics, “Configuration,” 2023. [Online]. Available: https://docs.ultralytics.com/usage/cfg/
PyTorch Contributors, “TorchVision Object Detection Finetuning Tutorial,” 2023. [Online]. Available: https://docs.pytorch.org/tutorials/intermediate/torchvision_tutorial.html
H. Fu et al., “A Retrospective Comparison of Deep Learning to Manual Annotations for Optic Disc and Optic Cup Segmentation in Fundus Photographs,” Transl. Vis. Sci. Technol., vol. 9, no. 2, p. 33, 2020, doi: 10.1167/tvst.9.2.33.
Y. Gao, X. Yu, C. Wu, W. Zhou, X. Wang, and Y. Zhuang, “Accurate Optic Disc and Cup Segmentation from Retinal Images Using a Multi-Feature Based Approach for Glaucoma Assessment,” Symmetry (Basel)., vol. 11, no. 10, p. 1267, 2019, doi: 10.3390/sym11101267.
H. Alanazi, “Optimizing Medical Image Analysis: A Performance Evaluation of YOLO-Based Segmentation Models,” Int. J. Adv. Comput. Sci. Appl., vol. 16, no. 4, 2025, doi: 10.14569/ijacsa.2025.01604111.
Viso.ai, “Understanding Intersection over Union for Model Accuracy,” 2024. [Online]. Available: https://viso.ai/computer-vision/intersection-over-union-iou/
F. Wu, M. Chiariglione, and X. R. Gao, “Automated Optic Disc and Cup Segmentation for Glaucoma Detection from Fundus Images Using the Detectron2’s Mask R-CNN,” in 2022 International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT), 2022, pp. 1–6. doi: 10.1109/ISMSIT56059.2022.9932660.
S. Saha, J. Vignarajan, and S. Frost, “A fast and fully automated system for glaucoma detection using color fundus photographs,” Sci. Rep., vol. 13, no. 1, pp. 1–12, 2023, doi: 10.1038/s41598-023-44473-0.
A. Aljohani and R. Y. Aburasain, “A hybrid framework for glaucoma detection through federated machine learning and deep learning models,” BMC Med. Inform. Decis. Mak., vol. 24, no. 1, pp. 1–12, 2024, doi: 10.1186/s12911-024-02518-y.
M. AlShawabkeh, S. A. AlRyalat, M. Al Bdour, A. Alni’mat, and M. Al-Akhras, “The utilization of artificial intelligence in glaucoma: diagnosis versus screening,” Front. Ophthalmol., vol. 4, p. 1368081, 2024, doi: 10.3389/fopht.2024.1368081.
Deng, Y., Zhang, W., Xu, W., Lei, W., Chua, T.-S., & Lam, W. (2022). A Unified Multi-task Learning Framework for Multi-goal Conversational Recommender Systems. ArXiv.org. https://arxiv.org/abs/2204.06923
S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” arXiv (Cornell University), Jun. 2018, doi: https://doi.org/10.1109/cvpr.2018.00913.
T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection,” arXiv (Cornell University), Jul. 2017, doi: https://doi.org/10.1109/cvpr.2017.106.
M. Ennab and Hamid Mcheick, “Enhancing interpretability and accuracy of AI models in healthcare: a comprehensive review on challenges and future directions,” Frontiers in Robotics and AI, vol. 11, Nov. 2024, doi: https://doi.org/10.3389/frobt.2024.1444763.
A. Nikolaidou and K. T. Tsaousis, “Teleophthalmology and Artificial Intelligence As Game Changers in Ophthalmic Care After the COVID-19 Pandemic,” Cureus, Jul. 2021, doi: https://doi.org/10.7759/cureus.16392.
E. Noury et al., “Deep Learning for Glaucoma Detection and Identification of Novel Diagnostic Areas in Diverse Real-World Datasets,” Translational Vision Science & Technology, vol. 11, no. 5, p. 11, May 2022, doi: https://doi.org/10.1167/tvst.11.5.11.
A. Holzinger, “Explainable AI and Multi-Modal Causability in Medicine,” i-com, vol. 19, no. 3, pp. 171–179, Dec. 2020, doi: https://doi.org/10.1515/icom-2020-0024.
I. Loshchilov and F. Hutter, “Decoupled Weight Decay Regularization,” arXiv.org, 2017. https://doi.org/10.48550/arXiv.1711.05101
Copyright (c) 2025 Muhammad Naufaldi Fayyadh, Triando Hamonangan Saragih, Andi Farmadi, Muhammad Itqan Mazdadi, Rudy Herteno, Vugar Abdullayev

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).


.png)
.png)
.png)
.png)
.png)