Automatic Target Recognition using Unmanned Aerial Vehicle Images with Proposed YOLOv8-SR and Enhanced Deep Super-Resolution Network

Gangeshwar Mishra; Rohit Tanwar; Prinima Gupta

doi:10.35882/jeeemi.v7i4.888

Gangeshwar Mishra Department of CST, Manav Rachna University, Faridabad, Haryana, India https://orcid.org/0000-0002-9183-0928
Rohit Tanwar School of CSE, Shri Mata Vaishno Devi University, Katra, Jammu & Kashmir, India https://orcid.org/0000-0002-9087-6019
Prinima Gupta Department of CST, Manav Rachna University, Faridabad, Haryana, India https://orcid.org/0000-0002-8575-6047

DOI: https://doi.org/10.35882/jeeemi.v7i4.888

Keywords: Deep Learning, High Resolution, Image Processing, Object Detection, YoLov8

Abstract

Modern surveillance necessitates the use of automatic target recognition (ATR) to identify targets or objects quickly and accurately for multiclass classification in unmanned aerial vehicles (UAVs) such as pedestrians, people, bicycles, cars, vans, trucks, tricycles, buses, and motors. The inadequate recognition rate in target detection for UAVs could be due to the fundamental issues provided by the poor resolution of photos recorded from the distinct perspective of the UAVs. The VisDrone dataset used for image analysis consists of a total of 10,209 UAV photos. This research work presents a comprehensive framework specifically for multiclass target classification using VisDrone UAV imagery. The YOLOv8-SR, which stands for "You Only Looked Once Version 8 with Super-Resolution," is a developed model that builds on the YOLOv8s model with the Enhanced Deep Super-Resolution Network (EDSR). The YOLOv8-SR uses the EDSR to convert the low-resolution image to a high-resolution image, allowing it to estimate pixel values for better processing better. The high-resolution image was generated by the EDSR model, having a Peak Signal-to-Noise Ratio (PSNR) of 25.32 and a Structural Similarity Index (SSIM) of 0.781. The YOLOv8-SR model's precision is 63.44%, recall is 46.64%, F1-score is 52.69%, mean average precision (mAP@50) is 51.58%, and the mAP@50–95 is 50.67% over the range of confidence thresholds. The investigation fundamentally transforms the precision and effectiveness of ATR, indicating a future in which ingenuity overcomes obstacles that were once considered insurmountable. This development is characterized by the use of an improved deep super-resolution network to produce super-resolution images from low-resolution inputs. The YoLov8-SR model, a sophisticated version of the YoLov8s framework, is key to this breakthrough. By amalgamating the EDSR methodology with the advanced YOLOv8-SR framework, the system generates high-resolution images abundant in detail, markedly exceeding the informational quality of their low-resolution versions.

Downloads

Download data is not yet available.

Author Biographies

Gangeshwar Mishra, Department of CST, Manav Rachna University, Faridabad, Haryana, India

Gangeshwar Mishra is the Director specializing in Artificial Intelligence and Machine Learning . With over 10 years of experience in technology leadership, he has demonstrated expertise in designing and architecting complex, high-volume products. His professional journey includes significant roles various organisation's, contributing to advancements in AI and ML applications. Gangeshwar's work has been instrumental in developing innovative solutions that have received recognition at both national and international levels.

Prinima Gupta, Department of CST, Manav Rachna University, Faridabad, Haryana, India

Dr. Prinima Gupta is a Professor in the Department of Computer Science & Technology at Manav Rachna University, Faridabad, India. She holds a Ph.D. in Computer Science and Engineering. Her research interests include Information Security, Data Mining. Dr. Gupta has contributed to the academic community through publications in refereed journals and conferences.

References

Z. Zhang and L. Zhu, “A review on unmanned aerial vehicle remote sensing: Platforms, sensors, data processing methods, and applications,” Drones, vol. 7, 2023, doi: 10.3390/drones7060398.

H. Xu, W. Zheng, F. Liu, P. Li, and R. Wang, “Unmanned aerial vehicle perspective small target recognition algorithm based on improved YOLOv5,” Remote Sens (Basel), vol. 15, 2023, doi: 10.3390/rs15143583.

A. Ramachandran and A. K. Sangaiah, “A review on object detection in unmanned aerial vehicle surveillance,” Int. J. Cogn. Comput. Eng., vol. 2, pp. 215–228, 2021, doi: 10.1016/j.ijcce.2021.11.005.

D. Reis, J. Kupec, J. Hong, and A. Daoudi, “Real-time flying object detection with YOLOv8,” arXiv preprint arXiv:2305.09972, 2023, doi: 10.48550/arXiv.2305.09972.

Y. Zhou, Y. Z. Yan, H. Y. Chen, and S. H. Pei, “Defect detection of photovoltaic cells based on improved Yolov8,” Laser Optoelectron. Prog., pp. 1–17, Oct. 2023.

J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” arXiv preprint arXiv:1506.02640, 2015, doi: 10.48550/arXiv.1506.02640.

Z. Luo, C. Wang, Z. Qi, and C. Luo, “LA_YOLOv8s: A lightweight-attention YOLOv8s for oil leakage detection in power transformers,” Alexandria Eng. J., vol. 92, pp. 82–91, 2024, doi: 10.1016/j.aej.2024.02.054.

Z. Yu and others, “An enhancement algorithm for head characteristics of caged chickens detection based on cyclic consistent migration neural network,” Poultry Sci., vol. 103, 2024, doi: 10.1016/j.psj.2024.103663.

S. Kumar and H. Kumar, “LUNGCOV: A diagnostic framework using machine learning and imaging modality,” Int. J. Tech. Phys. Probl. Eng. (IJTPE), vol. 14, 2022.

S. Kumar and H. Kumar, “Classification of COVID-19 X-ray images using transfer learning with visual geometrical groups and novel sequential convolutional neural networks,” MethodsX, vol. 11, 2023, doi: 10.1016/j.mex.2023.102295.

Z. Diao and others, “Navigation line extraction algorithm for corn spraying robot based on improved YOLOv8s network,” Comput. Electron. Agric., vol. 212, 2023, doi: 10.1016/j.compag.2023.108049.

C. Fu and L. Cohen, “Conic Linear Units: Improved Model Fusion and Rotational-Symmetric Generative Model,” in Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, SCITEPRESS - Science and Technology Publications, 2024, pp. 686–693. doi: 10.5220/0012406500003660.

Ultralytics Glossary, “SiLU (Sigmoid Linear Unit),” url: https://www.ultralytics.com/glossary/silu-sigmoid-linear-unit#how-silu-works, accessed on 9th Oct 2025.

Ultralytics YOLO Docs, “Home”, url: https://docs.ultralytics.com, accessed on 9th Oct 2025.

A. Paul and others, “Smart solutions for capsicum harvesting: Unleashing the power of YOLO for detection, segmentation, growth stage classification, counting, and real-time mobile identification,” Comput. Electron. Agric., vol. 219, 2024, doi: 10.1016/j.compag.2024.108832.

A. Wang and others, “NVW-YOLOv8s: An improved YOLOv8s network for real-time detection and segmentation of tomato fruits at different ripeness stages,” Comput. Electron. Agric., vol. 219, 2024, doi: 10.1016/j.compag.2024.108833.

W. Yang, X. Zhang, Y. Tian, W. Wang, and J.-H. Xue, “Deep learning for single image super-resolution: A brief review,” IEEE Trans. Multimedia, pp. 3106–3121, 2019, doi: 10.1109/TMM.2019.2919431.

I. V Grossu, O. Savencu, M. Verga, and N. Verga, “Optimization technique for increasing resolution in computed tomography imaging,” MethodsX, vol. 10, 2023, doi: 10.1016/j.mex.2023.102228.

R. Rombach and others, “High-resolution image synthesis with latent diffusion models,” arXiv preprint arXiv:2112.10752, 2021, doi: 10.48550/arXiv.2112.10752.

Y. Zhang, K. Li, K. Li, B. Zhong, and Y. Fu, “Residual networks of residual networks: Multilevel residual networks for image super-resolution,” IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 6, pp. 1126–1140, 2019, doi: 10.1109/TCSVT.2018.2858544.

X. Wang et al., “ESRGAN: Enhanced super-resolution generative adversarial networks,” in Proc. Eur. Conf. Comput. Vis. (ECCV) Workshops, 2018. doi: 10.48550/arXiv.1809.00219.

S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Machine Learning (ICML), 2015, pp. 448–456. doi: 10.48550/arXiv.1502.03167.

V. Nair and G. E. Hinton, “Rectified linear units improve restricted Boltzmann machines,” in Proc. 27th Int. Conf. Machine Learning (ICML), 2010, pp. 807–814.

W. Shi et al., “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016, pp. 1874–1883. doi: 10.1109/CVPR.2016.207.

VisDrone, “The dataset for drone based detection and tracking is released, including both image/video, and annotations,” 2020.

S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “CBAM: Convolutional Block Attention Module,” Jul. 2018.

S. Kumar and others, “A methodical exploration of imaging modalities from dataset to detection through machine learning paradigms in prominent lung disease diagnosis: A review,” BMC Med. Imaging, vol. 24, 2024, doi: 10.1186/s12880-024-01192-w.

P. Zhu and others, “Detection and tracking meet drones challenge,” arXiv preprint arXiv:2001.06303, 2020, doi: 10.48550/arXiv.2001.06303.

M. Elham and J. Kim, “Drone-based crowd monitoring using adaptive object detection and tracking,” Sensors, vol. 21, no. 5, p. 1542, 2021, doi: 10.3390/s21051542.

H. T. Tran, D. T. Nguyen, and N. T. Vo, “Improved UAV image recognition using an augmented YOLOv5s with attention mechanism,” J Intell Robot Syst, vol. 107, no. 3, p. 47, 2023, doi: 10.1007/s10846-023-01796-3.

M. Mahmood and S. Abbas, “Multiscale detection in UAV surveillance using hybrid YOLOv4-tiny framework,” Electronics (Basel), vol. 11, no. 19, p. 3032, 2022, doi: 10.3390/electronics11193032.

Y. Zhou and H. Zhang, “Real-time vehicle detection and tracking in drone footage using transformer-enhanced YOLOv8,” Remote Sens (Basel), vol. 16, no. 1, p. 112, 2024, doi: 10.3390/rs16010112.

D. Gupta and S. Rathore, “An improved YOLOv8 for multiclass object detection in aerial imagery,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, p. 5500210, 2023, doi: 10.1109/TGRS.2023.5500210.

L. Wang and H. Li, “Small object classification using YOLOv8-SR in complex urban landscapes,” J Vis Commun Image Represent, vol. 89, p. 103759, 2023, doi: 10.1016/j.jvcir.2023.103759.

X. Zhao and Y. Sun, “Multi-class object detection in UAV images with attention-based YOLO framework,” Remote Sens (Basel), vol. 15, no. 8, p. 1920, 2023, doi: 10.3390/rs15081920.

D. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” in Proceedings of the International Conference on Learning Representations (ICLR), 2019.

M. Goyal, R. Singh, and A. K. Sangaiah, “Hyper-parameter tuning and performance evaluation of deep learning models for UAV image classification,” Neural Comput. Appl., vol. 34, pp. 11573–11585, 2022, doi: 10.1007/s00521-021-06183-5.

N. S. Raza and others, “A comprehensive evaluation of optimization algorithms for YOLO-based object detectors in UAV applications,” Drones, vol. 7, no. 9, p. 531, 2023, doi: 10.3390/drones7090531.

X. Yang, L. Nie, Y. Zhang, and L. Zhang, “Image Generation and Super-Resolution Reconstruction of Synthetic Aperture Radar Images Based on an Improved Single-Image Generative Adversarial Network,” Information, vol. 16, no. 5, p. 370, 2025, doi: 10.3390/info16050370.

A. Rouhbakhshmeghrazi and J. Li, “Super-Resolution Reconstruction of UAV Images with GANs: Achievements and Challenges,” Remote Sens (Basel), vol. 14, no. 1, p. 121, 2022, doi: 10.3390/rs14010121.

J. Wang, H. Li, and Z. Gao, “A No-Reference Quality Assessment Metric Based on PSNR and SSIM Learning Fusion for Reconstructed Images,” IEEE Access, vol. 9, pp. 143562–143573, 2021, doi: 10.1109/ACCESS.2021.3120765.

D. Liu, J. Yang, C. Huang, and J. Huang, “Image quality assessment for super-resolution: A survey,” Neurocomputing, vol. 469, pp. 427–446, 2022, doi: 10.1016/j.neucom.2021.10.087.

H. Lu, Y. Zhang, and J. Liu, “Objective image quality assessment based on improved PSNR and SSIM for super-resolution evaluation,” Signal Process. Image Commun., vol. 103, 2022, doi: 10.1016/j.image.2022.116696.

M. Zhou, F. Zhu, and Y. Yang, “Perceptual image quality assessment based on deep feature similarity and multi-scale SSIM,” Multimedia Tools Appl., vol. 81, pp. 28521–28542, 2022, doi: 10.1007/s11042-021-11432-5.

A. Tripathi, V. Gohokar, and R. Kute, “Comparative Analysis of YOLOv8 and YOLOv9 Models for Real-Time Plant Disease Detection in Hydroponics,” Engineering, Technology & Applied Science Research, vol. 14, no. 5, pp. 17269–17275, Oct. 2024, doi: 10.48084/etasr.8301.

P. Hidayatullah, N. Syakrani, M. R. Sholahuddin, T. Gelar, and R. Tubagus, “YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review,” Apr. 2025.