Baby Cry Sound Detection: A Comparison of Mel Spectrogram Image on Convolutional Neural Network Models
Abstract
Baby cries contain patterns that indicate their needs, such as pain, hunger, discomfort, colic, or fatigue. This study explores the use of Convolutional Neural Network (CNN) architectures for classifying baby cries using Mel Spectrogram images. The primary objective of this research is to compare the effectiveness of various CNN architectures such as VGG-16, VGG-19, LeNet-5, AlexNet, ResNet-50, and ResNet-152 in detecting baby needs based on their cries. The datasets used include the Donate-a-Cry Corpus and Dunstan Baby Language. The results show that AlexNet achieved the best performance with an accuracy of 84.78% on the Donate-a-Cry Corpus dataset and 72.73% on the Dunstan Baby Language dataset. Other models like ResNet-50 and LeNet-5 also demonstrated good performance although their computational efficiency varied, while VGG-16 and VGG-19 exhibited lower performance. This research provides significant contributions to the understanding and application of CNN models for baby cry classification. Practical implications include the development of baby cry detection applications that can assist parents and healthcare provide.
Downloads
References
[2] K. Rezaee, H. G. Zadeh, L. Qi, H. Rabiee, and M. R. Khosravi, “Can you understand why i am crying? a decision-making system for classifying infants’ cry languages based on deepsvm model,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 23, no. 1, pp. 1–17, 2024.
[3] A. Pramod, H. S. Naicker, and A. K. Tyagi, “Machine learning and deep learning: Open issues and future research directions for the next 10 years,” Computational analysis and deep learning for medical care: Principles, methods, and applications, pp. 463–490, 2021.
[4] S. Tripathy and R. Singh, “Convolutional neural network: an overview and application in image classification,” in Proceedings of Third International Conference on Sustainable Computing: SUSCOM 2021, 2022, pp. 145–153.
[5] R. E. Saragih, Q. H. To, and others, “A survey of face recognition based on convolutional neural network,” Indonesian Journal of Information Systems, vol. 4, no. 2, 2022.
[6] D. Issa, M. F. Demirci, and A. Yazici, “Speech emotion recognition with deep convolutional neural networks,” Biomed Signal Process Control, vol. 59, p. 101894, 2020.
[7] M. Ashraf et al., “A hybrid cnn and rnn variant model for music classification,” Applied Sciences, vol. 13, no. 3, p. 1476, 2023.
[8] M. K. Gourisaria, R. Agrawal, M. Sahni, and P. K. Singh, “Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques,” Discover Internet of Things, vol. 4, no. 1, p. 1, 2024.
[9] G. Owino, A. Waititu, A. Wanjoya, and J. Okwiri, “Autonomous Surveillance of Infants’ Needs Using CNN Model for Audio Cry Classification,” Journal of Data Analysis and Information Processing, vol. 10, no. 4, pp. 198–219, 2022.
[10] T. N. Maghfira, T. Basaruddin, and A. Krisnadhi, “Infant cry classification using CNN–RNN,” in Journal of Physics: Conference Series, 2020, p. 12019.
[11] W. Bian, J. Wang, B. Zhuang, J. Yang, S. Wang, and J. Xiao, “Audio-based music classification with DenseNet and data augmentation,” in PRICAI 2019: Trends in Artificial Intelligence: 16th Pacific Rim International Conference on Artificial Intelligence, Cuvu, Yanuca Island, Fiji, August 26-30, 2019, Proceedings, Part III 16, 2019, pp. 56–65.
[12] M. F. Nafiz, D. Kartini, M. R. Faisal, F. Indriani, and T. Hamonangan, “Automated Detection of COVID-19 Cough Sound using Mel-Spectrogram Images and Convolutional Neural Network,” J. Ilm. Tek. Elektro Komput. dan Inform, vol. 9, no. 3, pp. 535–548, 2023.
[13] Y. Yohannes and R. Wijaya, “Klasifikasi Makna Tangisan Bayi Menggunakan CNN Berdasarkan Kombinasi Fitur MFCC dan DWT,” JATISI (Jurnal Teknik Informatika dan Sistem Informasi), vol. 8, no. 2, pp. 599–610, 2021.
[14] P. A. Riadi, M. R. Faisal, D. Kartini, R. A. Nugroho, D. T. Nugrahadi, and D. B. Magfira, “A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 1, pp. 73–83, 2024.
[15] P. Kulkarni, S. Umarani, V. Diwan, V. Korde, and P. P. Rege, “Child cry classification-an analysis of features and models,” in 2021 6th International Conference for Convergence in Technology (I2CT), 2021, pp. 1–7.
[16] E. Sutanto, F. Fahmi, W. Shalannanda, and A. Aridarma, “Cry Recognition for Infant Incubator Monitoring System Based on Internet of Things using Machine Learning.,” International Journal of Intelligent Engineering & Systems, vol. 14, no. 1, 2021.
[17] C. A. Bratan et al., “Dunstan Baby Language Classification with CNN,” in 2021 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), 2021, pp. 167–171.
[18] X. Qiao, S. Jiao, H. Li, G. Liu, X. Gao, and Z. Li, “Infant cry classification using an efficient graph structure and attention-based model,” Kuwait Journal of Science, vol. 51, no. 3, p. 100221, 2024.
[19] D. Ćirić, Z. Perić, J. Nikolić, and N. Vučić, “Audio signal mapping into spectrogram-based images for deep learning applications,” in 2021 20th International Symposium INFOTEH-JAHORINA (INFOTEH), 2021, pp. 1–6.
[20] R. Yunida et al., “LSTM and Bi-LSTM Models For Identifying Natural Disasters Reports From Social Media,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 5, no. 4, pp. 241–249, 2023.
[21] D. Joshi, J. Pareek, and P. Ambatkar, “Comparative study of Mfcc and Mel spectrogram for raga classification using CNN,” Indian J Sci Technol, vol. 16, no. 11, pp. 816–822, 2023.
[22] H. de S. Moura, “Automatic Recognition of Baby Cry,” 2022.
[23] A. Ustubioglu, B. Ustubioglu, and G. Ulutas, “Mel spectrogram-based audio forgery detection using CNN,” Signal Image Video Process, vol. 17, no. 5, pp. 2211–2219, 2023.
[24] B. Ustubioglu, G. Tahaoglu, and G. Ulutas, “Detection of audio copy-move-forgery with novel feature matching on Mel spectrogram,” Expert Syst Appl, vol. 213, p. 118963, 2023.
[25] T. Adhikari, “Designing a Convolutional Neural Network for Image Recognition: A Comparative Study of Different Architectures and Training Techniques,” Available at SSRN 4366645, 2023.
[26] M. M. Taye, “Theoretical understanding of convolutional neural network: Concepts, architectures, applications, future directions,” Computation, vol. 11, no. 3, p. 52, 2023.
[27] Z. Guo, C. Yang, D. Wang, and H. Liu, “A novel deep learning model integrating CNN and GRU to predict particulate matter concentrations,” Process Safety and Environmental Protection, vol. 173, pp. 604–613, 2023.
[28] S. L. Tan, G. Selvachandran, W. Ding, R. Paramesran, and K. Kotecha, “Cervical cancer classification from pap smear images using deep convolutional neural network models,” Interdiscip Sci, vol. 16, no. 1, pp. 16–38, 2024.
[29] P. Kubi, A. Islam, M. A. H. Bin Zaher, and S. H. Ripon, “A Deep Learning-Based Technique to Determine Various Stages of Alzheimer’s Disease from 3D Brain MRI Images,” in International Conference on Information Integration and Web Intelligence, 2023, pp. 162–175.
[30] N. H. Arif, M. R. Faisal, A. Farmadi, D. Nugrahadi, F. Abadi, and U. A. Ahmad, “An Approach to ECG-based Gender Recognition Using Random Forest Algorithm,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 107–115, 2024.
[31] P. Rani et al., “Baby Cry Classification Using Machine Learning,” Int. J. Innov. Sci. Res. Technol, vol. 7, 2022.
[32] M. Mahmud et al., “Implementation of C5. 0 Algorithm using Chi-Square Feature Selection for Early Detection of Hepatitis C Disease,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 2, pp. 116–124, 2024.
[33] D. Widhyanti and D. Juniati, “Classification of baby cry sound using higuchi’s fractal dimension with K-nearest neighbor and support vector machine,” in Journal of Physics: Conference Series, 2021, p. 12014.
Copyright (c) 2024 Ridha Fahmi Junaidi, Mohammad Reza Faisal, Andi Farmadi, Rudy Herteno, Dodon Turianto Nugrahadi, Luu Duc Ngo, Bahriddin Abapihi

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).