Baby Cry Sound Detection: A Comparison of Mel Spectrogram Image on Convolutional Neural Network Models

Ridha Fahmi Junaidi; Mohammad Reza Faisal; Andi Farmadi; Rudy Herteno; Dodon Turianto Nugrahadi; Luu Duc  Ngo; Bahriddin Abapihi

doi:10.35882/jeeemi.v6i4.465

Ridha Fahmi Junaidi Lambung Mangkurat University
Mohammad Reza Faisal Lambung Mangkurat University
Andi Farmadi Lambung Mangkurat University
Rudy Herteno Lambung Mangkurat University
Dodon Turianto Nugrahadi Lambung Mangkurat University
Luu Duc Ngo Bac Lieu University
Bahriddin Abapihi Halu Oleo University

DOI: https://doi.org/10.35882/jeeemi.v6i4.465

Keywords: TERMS baby cry sound detection, Convolutional Neural Network, Mel Spectrogram, audio classification

Abstract

Baby cries contain patterns that indicate their needs, such as pain, hunger, discomfort, colic, or fatigue. This study explores the use of Convolutional Neural Network (CNN) architectures for classifying baby cries using Mel Spectrogram images. The primary objective of this research is to compare the effectiveness of various CNN architectures such as VGG-16, VGG-19, LeNet-5, AlexNet, ResNet-50, and ResNet-152 in detecting baby needs based on their cries. The datasets used include the Donate-a-Cry Corpus and Dunstan Baby Language. The results show that AlexNet achieved the best performance with an accuracy of 84.78% on the Donate-a-Cry Corpus dataset and 72.73% on the Dunstan Baby Language dataset. Other models like ResNet-50 and LeNet-5 also demonstrated good performance although their computational efficiency varied, while VGG-16 and VGG-19 exhibited lower performance. This research provides significant contributions to the understanding and application of CNN models for baby cry classification. Practical implications include the development of baby cry detection applications that can assist parents and healthcare provide.

Downloads

References

[1] Y. G. Tefera and A. A. Ayele, “Newborns and Under-5 mortality in Ethiopia: the necessity to revitalize Partnership in Post-COVID-19 era to meet the SDG targets,” J Prim Care Community Health, vol. 12, p. 2150132721996889, 2021.
[2] K. Rezaee, H. G. Zadeh, L. Qi, H. Rabiee, and M. R. Khosravi, “Can you understand why i am crying? a decision-making system for classifying infants’ cry languages based on deepsvm model,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 23, no. 1, pp. 1–17, 2024.
[3] A. Pramod, H. S. Naicker, and A. K. Tyagi, “Machine learning and deep learning: Open issues and future research directions for the next 10 years,” Computational analysis and deep learning for medical care: Principles, methods, and applications, pp. 463–490, 2021.
[4] S. Tripathy and R. Singh, “Convolutional neural network: an overview and application in image classification,” in Proceedings of Third International Conference on Sustainable Computing: SUSCOM 2021, 2022, pp. 145–153.
[5] R. E. Saragih, Q. H. To, and others, “A survey of face recognition based on convolutional neural network,” Indonesian Journal of Information Systems, vol. 4, no. 2, 2022.
[6] D. Issa, M. F. Demirci, and A. Yazici, “Speech emotion recognition with deep convolutional neural networks,” Biomed Signal Process Control, vol. 59, p. 101894, 2020.
[7] M. Ashraf et al., “A hybrid cnn and rnn variant model for music classification,” Applied Sciences, vol. 13, no. 3, p. 1476, 2023.
[8] M. K. Gourisaria, R. Agrawal, M. Sahni, and P. K. Singh, “Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques,” Discover Internet of Things, vol. 4, no. 1, p. 1, 2024.
[9] G. Owino, A. Waititu, A. Wanjoya, and J. Okwiri, “Autonomous Surveillance of Infants’ Needs Using CNN Model for Audio Cry Classification,” Journal of Data Analysis and Information Processing, vol. 10, no. 4, pp. 198–219, 2022.
[10] T. N. Maghfira, T. Basaruddin, and A. Krisnadhi, “Infant cry classification using CNN–RNN,” in Journal of Physics: Conference Series, 2020, p. 12019.
[11] W. Bian, J. Wang, B. Zhuang, J. Yang, S. Wang, and J. Xiao, “Audio-based music classification with DenseNet and data augmentation,” in PRICAI 2019: Trends in Artificial Intelligence: 16th Pacific Rim International Conference on Artificial Intelligence, Cuvu, Yanuca Island, Fiji, August 26-30, 2019, Proceedings, Part III 16, 2019, pp. 56–65.
[12] M. F. Nafiz, D. Kartini, M. R. Faisal, F. Indriani, and T. Hamonangan, “Automated Detection of COVID-19 Cough Sound using Mel-Spectrogram Images and Convolutional Neural Network,” J. Ilm. Tek. Elektro Komput. dan Inform, vol. 9, no. 3, pp. 535–548, 2023.
[13] Y. Yohannes and R. Wijaya, “Klasifikasi Makna Tangisan Bayi Menggunakan CNN Berdasarkan Kombinasi Fitur MFCC dan DWT,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 8, no. 2, pp. 599–610, 2021.
[14] P. A. Riadi, M. R. Faisal, D. Kartini, R. A. Nugroho, D. T. Nugrahadi, and D. B. Magfira, “A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 1, pp. 73–83, 2024.
[15] P. Kulkarni, S. Umarani, V. Diwan, V. Korde, and P. P. Rege, “Child cry classification-an analysis of features and models,” in 2021 6th International Conference for Convergence in Technology (I2CT), 2021, pp. 1–7.
[16] E. Sutanto, F. Fahmi, W. Shalannanda, and A. Aridarma, “Cry Recognition for Infant Incubator Monitoring System Based on Internet of Things using Machine Learning.,” International Journal of Intelligent Engineering & Systems, vol. 14, no. 1, 2021.
[17] C. A. Bratan et al., “Dunstan Baby Language Classification with CNN,” in 2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 2021, pp. 167–171.
[18] X. Qiao, S. Jiao, H. Li, G. Liu, X. Gao, and Z. Li, “Infant cry classification using an efficient graph structure and attention-based model,” Kuwait Journal of Science, vol. 51, no. 3, p. 100221, 2024.
[19] D. Ćirić, Z. Perić, J. Nikolić, and N. Vučić, “Audio signal mapping into spectrogram-based images for deep learning applications,” in 2021 20th International Symposium INFOTEH-JAHORINA (INFOTEH), 2021, pp. 1–6.
[20] R. Yunida et al., “LSTM and Bi-LSTM Models For Identifying Natural Disasters Reports From Social Media,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 5, no. 4, pp. 241–249, 2023.
[21] D. Joshi, J. Pareek, and P. Ambatkar, “Comparative study of Mfcc and Mel spectrogram for raga classification using CNN,” Indian J Sci Technol, vol. 16, no. 11, pp. 816–822, 2023.
[22] H. de S. Moura, “Automatic Recognition of Baby Cry,” 2022.
[23] A. Ustubioglu, B. Ustubioglu, and G. Ulutas, “Mel spectrogram-based audio forgery detection using CNN,” Signal Image Video Process, vol. 17, no. 5, pp. 2211–2219, 2023.
[24] B. Ustubioglu, G. Tahaoglu, and G. Ulutas, “Detection of audio copy-move-forgery with novel feature matching on Mel spectrogram,” Expert Syst Appl, vol. 213, p. 118963, 2023.
[25] T. Adhikari, “Designing a Convolutional Neural Network for Image Recognition: A Comparative Study of Different Architectures and Training Techniques,” Available at SSRN 4366645, 2023.
[26] M. M. Taye, “Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions,” Computation, vol. 11, no. 3, p. 52, 2023.
[27] Z. Guo, C. Yang, D. Wang, and H. Liu, “A novel deep learning model integrating CNN and GRU to predict particulate matter concentrations,” Process Safety and Environmental Protection, vol. 173, pp. 604–613, 2023.
[28] S. L. Tan, G. Selvachandran, W. Ding, R. Paramesran, and K. Kotecha, “Cervical cancer classification from pap smear images using deep convolutional neural network models,” Interdiscip Sci, vol. 16, no. 1, pp. 16–38, 2024.
[29] P. Kubi, A. Islam, M. A. H. Bin Zaher, and S. H. Ripon, “A Deep Learning-Based Technique to Determine Various Stages of Alzheimer’s Disease from 3D Brain MRI Images,” in International Conference on Information Integration and Web Intelligence, 2023, pp. 162–175.
[30] N. H. Arif, M. R. Faisal, A. Farmadi, D. Nugrahadi, F. Abadi, and U. A. Ahmad, “An Approach to ECG-based Gender Recognition Using Random Forest Algorithm,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 107–115, 2024.
[31] P. Rani et al., “Baby Cry Classification Using Machine Learning,” Int. J. Innov. Sci. Res. Technol, vol. 7, 2022.
[32] M. Mahmud et al., “Implementation of C5. 0 Algorithm using Chi-Square Feature Selection for Early Detection of Hepatitis C Disease,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 116–124, 2024.
[33] D. Widhyanti and D. Juniati, “Classification of baby cry sound using higuchi’s fractal dimension with K-nearest neighbor and support vector machine,” in Journal of Physics: Conference Series, 2021, p. 12014.