Cybersentinel: The Cyberbullying Detection Application Based on Machine Learning and VADER Lexicon with GridSearchCV Optimization

Siti  Ernawati; Frieyadie Frieyadie; Eka Rini Yulia

doi:10.35882/jeeemi.v6i4.580

Siti Ernawati Universitas Nusa Mandiri
Frieyadie Frieyadie Universitas Nusa Mandiri
Eka Rini Yulia Universitas Nusa Mandiri

DOI: https://doi.org/10.35882/jeeemi.v6i4.580

Keywords: Cyberbullying, Design Science Research, GridSearchCV, Machine Learning, Sentiment Analysis, Vader Lexicon

Abstract

Cyberbullying is becoming an increasingly troubling issue in today's digital age, with serious impacts on the well-being of individuals and society as a whole. With the number of social media users continuously rising, there is an urgent need to develop effective solutions for detecting cyberbullying. This urgency negatively affects the well-being of individuals, especially children and adolescents. The Big Data era also brings many new challenges, including the ability of organizations to manage, process, and extract value from available data to generate useful information. The aim of this research is to develop Cybersentinel, a cyberbullying detection application that combines Machine Learning and VADER Lexicon approaches to improve classification accuracy. It involves comparing several Machine Learning algorithms optimized using the GridSearchCV technique to find the best combination of parameters. The dataset used consists of social media comments labeled as bullying and non-bullying. The successfully developed model uses the Support Vector Machnine algorithm, achieving a best accuracy of 98.83%. The system is developed using Python with the Streamlit framework. This application development follows the Design Science Research (DSR) approach, which integrates principles, practices, and procedures to facilitate problem-solving and support the design and creation of applications. Testing is conducted using blackbox testing. The results show that parameter optimization using GridSearchCV can significantly enhance model performance, and applying the DSR method allows for the development of Cybersentinel tailored to specific needs. Thus, Cybersentinel provides an effective solution for detecting cyberbullying and contributes to improving the safety of social media users.

Downloads

Download data is not yet available.

References

A. Muneer, “A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter,” Electronics Journal, vol. 10, no. 22, pp. 1–20, 2020.

UNICEF, “Cyberbullying: What is it and how to stop it,” UNICEF.Org, 2022. .

C. V Baccarella, T. F. Wagner, J. H. Kietzmann, and I. P. Mccarthy, “Social media ? It ’ s serious ! Understanding the dark side of social media,” European Management Journal journal, vol. 36, pp. 2017–2019, 2018.

T. K. Balaji, C. Sekhara, R. Annavarapu, and A. Bablani, “Machine Learning Algorithms for Social Media Analysis : A Survey,” Computer Science Review, vol. 40, no. 100395, pp. 1–32, 2021.

S. Ray, “A Quick Review of Machine Learning Algorithms,” in 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 2019, pp. 35–39.

D. Sultan et al., “A Review of Machine Learning Techniques in Cyberbullying Detection,” Tech Science Press, vol. 74, no. 3, pp. 5625–5640, 2022.

S. Salawu, Y. He, and J. Lumsden, “Approaches to Automated Detection of Cyberbullying : A Survey,” IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, vol. 3045, no. c, pp. 1–20, 2017.

E. Olshannikova, T. Olsson, J. Huhtamäki, and H. Kärkkäinen, “Conceptualizing Big Social Data,” Journal of Big Data, 2017.

M. Dreier, M. E. Beutel, E. Duven, and S. Giralt, “A hidden type of internet addiction ? Intense and addictive use of social networking sites in adolescents,” Computers in Human Behavior, vol. 55, pp. 172–177, 2016.

M. Khairy, T. M. Mahmoud, and T. Abd-El-Hafeez, “The effect of rebalancing techniques on the classification performance in cyberbullying datasets,” Neural Computing and Applications, vol. 36, no. 3, pp. 1049–1065, 2024.

M. Hamlett, G. Powell, Y. N. Silva, and D. Hall, “A Labeled Dataset for Investigating Cyberbullying Content Patterns in Instagram,” in Proceedings of the Sixteenth International AAAI Conference onWeb and Social Media (ICWSM 2022), 2022, pp. 1251–1258.

S. Ernawati, R. Wati, N. Nuris, and L. S. Marita, “Comparison of Naïve Bayes Algorithm with Genetic Algorithm and Particle Swarm Optimization as Feature Selection for Sentiment Analysis Review of Digital Learning Application Comparison of Na ¨ ıve Bayes Algorithm with Genetic Algorithm and Particle Swarm Optimization as Feature Selection for Sentiment Analysis Review of Digital Learning Application.”

S. Ernawati, “Implementation of The Naïve Bayes Algorithm with Feature Selection using Genetic Algorithm for Sentiment Review Analysis of Fashion Online Companies,” in 2018 6th International Conference on Cyber and IT Service Management (CITSM), 2018, pp. 1–5.

K. S. Alam, S. Bhowmik, and P. R. K. Prosun, “Cyberbullying Detection : An Ensemble Based Machine Learning Approach,” in Proceedings of the Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV 2021), 2021, pp. 710–715.

A. Ali and A. M. Syed, “Cyberbullying Detection Using Machine Learning,” Pakistan Journal of Engineering and Technology, PakJET, vol. SI, no. 01, pp. 45–50, 2020.

R. Shah, S. Aparajit, R. Chopdekar, and R. Patil, “Machine Learning based Approach for Detection of Cyberbullying Tweets,” vol. 175, no. 37, pp. 52–57, 2020.

M. M. Islam, M. A. Uddin, L. Islam, A. Akter, S. Sharmin, and U. K. Acharjee, “Cyberbullying Detection on Social Networks Using Machine Learning Approaches,” in 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2020, pp. 1–6.

K. Peffers et al., “The Design Science Research Process: A Model For Producing And Presenting Information Systems Research,” in First International Conference on Design Science Research in Information Systems and Technology, 2020, pp. 1–24.

J. R. Venable, J. Pries-heje, and R. L. Baskerville, “Choosing a Design Science Research Methodology,” in Australasian Conference on Information Systems 2017, 2017, pp. 1–11.

C. Lawrence, T. Tuunanen, and M. D. Myers, “Extending Design Science Research Methodology for a Multicultural World,” in IFIP Advances in Information and Communication Technology, 2020, no. March, pp. 112–126.

J. Q. Azasoo, “A Retrofit Design Science Methodology for Smart Metering Design in Developing Countries,” in 15th International Conference on Computational Science and Its Applications (ICCSA), 2015, no. June.

A. Kadhim, “An Evaluation of Preprocessing Techniques for Text Classification,” International Journal of Computer Science and Information Security (IJCSIS), vol. 16, no. 6, pp. 22–32, 2018.

C. P. Chai, “Comparison of Text Preprocessing Methods,” Natural Language Engineering, vol. 29, no. 3, 2023.

M. Chiny, M. Chihab, and Y. Chihab, “LSTM , VADER and TF-IDF based Hybrid Sentiment Analysis Model,” (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 12, no. 7, pp. 265–275, 2021.

T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” Journal of Rock Mechanics and Geotechnical Engineering, vol. 14, no. 4, pp. 1292–1303, 2022.

R. G. S. K, A. K. Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” in 2019 5th International Conference for Convergence in Technology (I2CT), 2019, pp. 9–13.

A. I. Kadhim, “Survey on Supervised Machine Learning Techniques,” Artificial Intelligence Review, vol. 52, no. 1, pp. 273–292, 2019.

T. Pano and R. Kashef, “A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19,” MDPI-Big Data and Cognitive Computing, vol. 4, no. 33, pp. 2–17, 2020.

V. D. Chaithra, “Hybrid Approach : Naive Bayes and Sentiment VADER for Analyzing Sentiment of Mobile Unboxing Video Comments,” International Journal of Electrical and Computer Engineering (IJECE), vol. 9, no. 5, pp. 4452–4459, 2019.

E. P. Costa, C. Postal, A. C. Lorena, R. S. Ad, C. Postal, and A. A. Freitas, “A Review of Performance Evaluation Measures for Hierarchical Classifiers,” Association for the Advancement of Artificial Intellegence, pp. 1–6, 2007.

A. Zhou, “Automatic Detection of Cyberbullying on Social Networks based on Bullying Features,” in Proceedings of the 17th international conference on distributed computing and networking, 2016, pp. 1–6.

C. Van Hee, E. Lefever, B. Verhoeven, J. Mennes, and B. Desmet, “Automatic Detection and Prevention of Cyberbullying,” in International Conference on Human and Social Analytics (HUSO 2015), 2015, pp. 1–6.

M. Kumar, S. K. Singh, and D. R. K. Dwivedi, “A Comparative Study of Black Box Testing and White Box Testing Techniques,” International Journal of Advance Research in Computer Science and Management Studies, vol. 10, no. 10, pp. 32–44, 2015.