Cybersentinel: The Cyberbullying Detection Application Based on Machine Learning and VADER Lexicon with GridSearchCV Optimization
Abstract
Cyberbullying is becoming an increasingly troubling issue in today's digital age, with serious impacts on the well-being of individuals and society as a whole. With the number of social media users continuously rising, there is an urgent need to develop effective solutions for detecting cyberbullying. This urgency negatively affects the well-being of individuals, especially children and adolescents. The Big Data era also brings many new challenges, including the ability of organizations to manage, process, and extract value from available data to generate useful information. The aim of this research is to develop Cybersentinel, a cyberbullying detection application that combines Machine Learning and VADER Lexicon approaches to improve classification accuracy. It involves comparing several Machine Learning algorithms optimized using the GridSearchCV technique to find the best combination of parameters. The dataset used consists of social media comments labeled as bullying and non-bullying. The successfully developed model uses the Support Vector Machnine algorithm, achieving a best accuracy of 98.83%. The system is developed using Python with the Streamlit framework. This application development follows the Design Science Research (DSR) approach, which integrates principles, practices, and procedures to facilitate problem-solving and support the design and creation of applications. Testing is conducted using blackbox testing. The results show that parameter optimization using GridSearchCV can significantly enhance model performance, and applying the DSR method allows for the development of Cybersentinel tailored to specific needs. Thus, Cybersentinel provides an effective solution for detecting cyberbullying and contributes to improving the safety of social media users.
Downloads
References
A. Muneer, “A Comparative Analysis of Machine Learning Techniques for Cyberbullying Detection on Twitter,” Electronics Journal, vol. 10, no. 22, pp. 1–20, 2020.
UNICEF, “Cyberbullying: What is it and how to stop it,” UNICEF.Org, 2022. .
C. V Baccarella, T. F. Wagner, J. H. Kietzmann, and I. P. Mccarthy, “Social media ? It ’ s serious ! Understanding the dark side of social media,” European Management Journal journal, vol. 36, pp. 2017–2019, 2018.
T. K. Balaji, C. Sekhara, R. Annavarapu, and A. Bablani, “Machine Learning Algorithms for Social Media Analysis : A Survey,” Computer Science Review, vol. 40, no. 100395, pp. 1–32, 2021.
S. Ray, “A Quick Review of Machine Learning Algorithms,” in 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), 2019, pp. 35–39.
D. Sultan et al., “A Review of Machine Learning Techniques in Cyberbullying Detection,” Tech Science Press, vol. 74, no. 3, pp. 5625–5640, 2022.
S. Salawu, Y. He, and J. Lumsden, “Approaches to Automated Detection of Cyberbullying : A Survey,” IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, vol. 3045, no. c, pp. 1–20, 2017.
E. Olshannikova, T. Olsson, J. Huhtamäki, and H. Kärkkäinen, “Conceptualizing Big Social Data,” Journal of Big Data, 2017.
M. Dreier, M. E. Beutel, E. Duven, and S. Giralt, “A hidden type of internet addiction ? Intense and addictive use of social networking sites in adolescents,” Computers in Human Behavior, vol. 55, pp. 172–177, 2016.
M. Khairy, T. M. Mahmoud, and T. Abd-El-Hafeez, “The effect of rebalancing techniques on the classification performance in cyberbullying datasets,” Neural Computing and Applications, vol. 36, no. 3, pp. 1049–1065, 2024.
M. Hamlett, G. Powell, Y. N. Silva, and D. Hall, “A Labeled Dataset for Investigating Cyberbullying Content Patterns in Instagram,” in Proceedings of the Sixteenth International AAAI Conference onWeb and Social Media (ICWSM 2022), 2022, pp. 1251–1258.
S. Ernawati, R. Wati, N. Nuris, and L. S. Marita, “Comparison of Naïve Bayes Algorithm with Genetic Algorithm and Particle Swarm Optimization as Feature Selection for Sentiment Analysis Review of Digital Learning Application Comparison of Na ¨ ıve Bayes Algorithm with Genetic Algorithm and Particle Swarm Optimization as Feature Selection for Sentiment Analysis Review of Digital Learning Application.”
S. Ernawati, “Implementation of The Naïve Bayes Algorithm with Feature Selection using Genetic Algorithm for Sentiment Review Analysis of Fashion Online Companies,” in 2018 6th International Conference on Cyber and IT Service Management (CITSM), 2018, pp. 1–5.
K. S. Alam, S. Bhowmik, and P. R. K. Prosun, “Cyberbullying Detection : An Ensemble Based Machine Learning Approach,” in Proceedings of the Third International Conference on Intelligent Communication Technologies and Virtual Mobile Networks (ICICV 2021), 2021, pp. 710–715.
A. Ali and A. M. Syed, “Cyberbullying Detection Using Machine Learning,” Pakistan Journal of Engineering and Technology, PakJET, vol. SI, no. 01, pp. 45–50, 2020.
R. Shah, S. Aparajit, R. Chopdekar, and R. Patil, “Machine Learning based Approach for Detection of Cyberbullying Tweets,” vol. 175, no. 37, pp. 52–57, 2020.
M. M. Islam, M. A. Uddin, L. Islam, A. Akter, S. Sharmin, and U. K. Acharjee, “Cyberbullying Detection on Social Networks Using Machine Learning Approaches,” in 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), 2020, pp. 1–6.
K. Peffers et al., “The Design Science Research Process: A Model For Producing And Presenting Information Systems Research,” in First International Conference on Design Science Research in Information Systems and Technology, 2020, pp. 1–24.
J. R. Venable, J. Pries-heje, and R. L. Baskerville, “Choosing a Design Science Research Methodology,” in Australasian Conference on Information Systems 2017, 2017, pp. 1–11.
C. Lawrence, T. Tuunanen, and M. D. Myers, “Extending Design Science Research Methodology for a Multicultural World,” in IFIP Advances in Information and Communication Technology, 2020, no. March, pp. 112–126.
J. Q. Azasoo, “A Retrofit Design Science Methodology for Smart Metering Design in Developing Countries,” in 15th International Conference on Computational Science and Its Applications (ICCSA), 2015, no. June.
A. Kadhim, “An Evaluation of Preprocessing Techniques for Text Classification,” International Journal of Computer Science and Information Security (IJCSIS), vol. 16, no. 6, pp. 22–32, 2018.
C. P. Chai, “Comparison of Text Preprocessing Methods,” Natural Language Engineering, vol. 29, no. 3, 2023.
M. Chiny, M. Chihab, and Y. Chihab, “LSTM , VADER and TF-IDF based Hybrid Sentiment Analysis Model,” (IJACSA) International Journal of Advanced Computer Science and Applications, vol. 12, no. 7, pp. 265–275, 2021.
T. Yan, S.-L. Shen, A. Zhou, and X. Chen, “Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm,” Journal of Rock Mechanics and Geotechnical Engineering, vol. 14, no. 4, pp. 1292–1303, 2022.
R. G. S. K, A. K. Verma, and S. Radhika, “K-Nearest Neighbors and Grid Search CV Based Real Time Fault Monitoring System for Industries,” in 2019 5th International Conference for Convergence in Technology (I2CT), 2019, pp. 9–13.
A. I. Kadhim, “Survey on Supervised Machine Learning Techniques,” Artificial Intelligence Review, vol. 52, no. 1, pp. 273–292, 2019.
T. Pano and R. Kashef, “A Complete VADER-Based Sentiment Analysis of Bitcoin (BTC) Tweets during the Era of COVID-19,” MDPI-Big Data and Cognitive Computing, vol. 4, no. 33, pp. 2–17, 2020.
V. D. Chaithra, “Hybrid Approach : Naive Bayes and Sentiment VADER for Analyzing Sentiment of Mobile Unboxing Video Comments,” International Journal of Electrical and Computer Engineering (IJECE), vol. 9, no. 5, pp. 4452–4459, 2019.
E. P. Costa, C. Postal, A. C. Lorena, R. S. Ad, C. Postal, and A. A. Freitas, “A Review of Performance Evaluation Measures for Hierarchical Classifiers,” Association for the Advancement of Artificial Intellegence, pp. 1–6, 2007.
A. Zhou, “Automatic Detection of Cyberbullying on Social Networks based on Bullying Features,” in Proceedings of the 17th international conference on distributed computing and networking, 2016, pp. 1–6.
C. Van Hee, E. Lefever, B. Verhoeven, J. Mennes, and B. Desmet, “Automatic Detection and Prevention of Cyberbullying,” in International Conference on Human and Social Analytics (HUSO 2015), 2015, pp. 1–6.
M. Kumar, S. K. Singh, and D. R. K. Dwivedi, “A Comparative Study of Black Box Testing and White Box Testing Techniques,” International Journal of Advance Research in Computer Science and Management Studies, vol. 10, no. 10, pp. 32–44, 2015.
Copyright (c) 2024 Siti Ernawati, Frieyadie Frieyadie, Eka Rini Yulia
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).