Gender Classification on Social Media Messages Using fastText Feature Extraction and Long Short-Term Memory

Halimatus  Sa’diah; Mohammad Reza Faisal; Andi Farmadi; Friska Abadi; Fatma Indriani; Muhammad  Alkaff; Vugar Abdullayev

doi:10.35882/jeeemi.v6i3.407

Halimatus Sa’diah Department of Computer Science, Lambung Mangkurat University, Banjarbaru, South Kalimantantan, Indonesia
Mohammad Reza Faisal Department of Computer Science, Lambung Mangkurat University, Banjarbaru, South Kalimantantan, Indonesia
Andi Farmadi Department of Computer Science, Lambung Mangkurat University, Banjarbaru, South Kalimantantan, Indonesia
Friska Abadi Department of Computer Science, Lambung Mangkurat University, Banjarbaru, South Kalimantantan, Indonesia
Fatma Indriani Department of Computer Science, Lambung Mangkurat University, Banjarbaru, South Kalimantantan, Indonesia
Muhammad Alkaff Department of Computer Science, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Information Technology, Lambung Mangkurat University, Banjarmasin, South Kalimantantan, Indonesia
Vugar Abdullayev Department of Computer Engineering, Azerbaijan State Oil and Industry University, Baku, Azerbaijan https://orcid.org/0009-0003-6051-3107

DOI: https://doi.org/10.35882/jeeemi.v6i3.407

Keywords: feature extraction, gender classification, fastText, RNN, LSTM

Abstract

Currently, social media is used as a platform for interacting with many people and has also become a source of information for social media researchers or analysts. Twitter is one of the platforms commonly used for research purposes, especially for data from tweets written by individuals. However, on Twitter, user information such as gender is not explicitly displayed in the account profile, yet there is a plethora of unstructured information containing such data, often unnoticed. This research aims to classify gender based on tweet data and account description data and determine the accuracy of gender classification using machine learning methods. The method used involves FastText as a feature extraction method and LSTM as a classification method based on the extracted data, while to achieve the most accurate results, classification is performed on tweet data, account description data, and a combination of both. This research shows that LSTM classification on account description data and combined data obtained an accuracy of 70%, while tweet data classification achieved 69%. This research concludes that FastText feature extraction with LSTM classification can be implemented for gender classification. However, there is no significant difference in accuracy results for each dataset. However, this research demonstrates that both methods can work well together and yield optimal results.

Downloads

Download data is not yet available.

References

F. Aftab et al., “A Comprehensive Survey on Sentiment Analysis Techniques,” Int. J. Technol., vol. 14, no. 6, pp. 1288–1298, 2023, doi: 10.14716/ijtech.v14i6.6632.

M. Cormier and M. Cushman, “Innovation via social media – The importance of Twitter to science,” Res. Pract. Thromb. Haemost., vol. 5, no. 3, pp. 373–375, 2021, doi: 10.1002/rth2.12493.

C. J. Powers et al., “Using artificial intelligence to identify emergency messages on social media during a natural disaster: A deep learning approach,” Int. J. Inf. Manag. Data Insights, vol. 3, no. 1, p. 100164, 2023, doi: 10.1016/j.jjimei.2023.100164.

M. R. Faisal, I. Budiman, F. Abadi, D. T. Nugrahadi, M. Haekal, and I. Sutedja, “Applying Features Based on Word Embedding Techniques to 1D CNN for Natural Disaster Messages Classification,” 2022 5th Int. Conf. Comput. Informatics Eng. IC2IE 2022, no. December, pp. 192–197, 2022, doi: 10.1109/IC2IE56416.2022.9970188.

K. Y. Firlia, M. R. Faisal, D. Kartini, R. A. Nugroho, and F. Abadi, “Analysis of New Features on the Performance of the Support Vector Machine Algorithm in Classification of Natural Disaster Messages,” Proc. - 2021 4th Int. Conf. Comput. Informatics Eng. IT-Based Digit. Ind. Innov. Welf. Soc. IC2IE 2021, no. September, pp. 317–322, 2021, doi: 10.1109/IC2IE53219.2021.9649107.

M. Dou, Y. Wang, Y. Gu, S. Dong, M. Qiao, and Y. Deng, “Disaster damage assessment based on fine-grained topics in social media,” Comput. Geosci., vol. 156, no. March, p. 104893, 2021, doi: 10.1016/j.cageo.2021.104893.

Muhamad Fawwaz Akbar, Muhammad Itqan Mazdadi, Muliadi, Triando Hamonangan Saragih, and Friska Abadi, “Implementation of Information Gain Ratio and Particle Swarm Optimization in the Sentiment Analysis Classification of Covid-19 Vaccine Using Support Vector Machine,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 4, pp. 261–270, 2023, doi: 10.35882/jeeemi.v5i4.328.

A. Karami et al., “2020 U.S. presidential election in swing states: Gender differences in Twitter conversations,” Int. J. Inf. Manag. Data Insights, vol. 2, no. 2, 2022, doi: 10.1016/j.jjimei.2022.100097.

E. Cano-Marin, M. Mora-Cantallops, and S. Sánchez-Alonso, “Twitter as a predictive system: A systematic literature review,” J. Bus. Res., vol. 157, no. December 2022, 2023, doi: 10.1016/j.jbusres.2022.113561.

A. S. Zakia, Indriati, and Marji, “Klasifikasi Jenis Kelamin Pengguna Twitter dengan menggunakan Metode BM25 dan K-Nearest Neighbor (KNN),” J. Pengemb. Teknol. Inf. dan Ilmu Komput., vol. 4, no. 10, pp. 3331–3337, 2020, [Online]. Available: http://j-ptiik.ub.ac.id.

M. Vicente, F. Batista, and J. P. Carvalho, “Gender Detection Of Twitter Users Based On Multiple Information Sources,” ISCTE-IUL Repos., no. 351, 2018.

E. Fosch-Villaronga, A. Poulsen, R. A. Søraa, and B. H. M. Custers, “A little bird told me your gender: Gender inferences in social media,” Inf. Process. Manag., vol. 58, no. 3, p. 102541, 2021, doi: 10.1016/j.ipm.2021.102541.

F. A. Mubarok, M. Reza Faisal, D. Kartini, D. T. Nugrahadi, and T. H. Saragih, “Gender Classification of Twitter Users Using Convolutional Neural Network,” MATRIK J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 23, no. 1, pp. 79–92, 2023, doi: 10.30812/matrik.v23i1.3318.

Y. Gunawan, J. C. Young, and A. Rusli, “FastText Word Embedding and Random Forest Classifier for User Feedback Sentiment Classification in Bahasa Indonesia,” Ultim. J. Tek. Inform., vol. 13, no. 2, 2021.

F. Alfariqi, W. Maharani, and J. H. Husen, “Klasifikasi Sentimen pada Twitter dalam Membantu Pemilihan Kandidat Karyawan dengan Menggunakan Convolutional Neural Network dan Fasttext Embeddings,” e-Proceeding Eng., vol. 7, no. 2, pp. 8052–8062, 2020.

Y. V. Aritonang, D. P. Napitupulu, M. H. Sinaga, and J. Amalia, “Pengaruh Hyperparameter pada Fasttext terhadap Performa Model Deteksi Sarkasme Berbasis Bi-LSTM,” JATISI (Jurnal Tek. Inform. dan Sist. Informasi), vol. 9, no. 3, pp. 2612–2625, 2022, doi: 10.35957/jatisi.v9i3.1331.

E. M. Dharma, F. L. Gaol, H. L. H. S. Warnars, and B. Soewito, “the Accuracy Comparison Among Word2Vec, Glove, and Fasttext Towards Convolution Neural Network (Cnn) Text Classification,” J. Theor. Appl. Inf. Technol., vol. 100, no. 2, pp. 349–359, 2022.

G. S. . Murthy, S. R. Allu, B. Andhavarapu, M. Bgadi, and M. Belusonti, “Text based Sentiment Analysis using Long Short Term Memory (LSTM),” Int. J. Eng. Res. Technol., vol. 9, no. 05, pp. 299–303, 2020.

A. C. M. V. Srinivas, C. Satyanarayana, C. Divakar, and K. P. Sirisha, “Sentiment Analysis using Neural Network and LSTM,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1074, no. 1, p. 012007, 2021, doi: 10.1088/1757-899x/1074/1/012007.

M. A. Nurrohmat and A. SN, “Sentiment Analysis of Novel Review Using Long Short-Term Memory Method,” IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 13, no. 3, p. 209, 2019, doi: 10.22146/ijccs.41236.

S. Monika Nooralifa, M. Reza Faisal, F. Abadi, R. Adi Nugroho, J. A. Yani Km, and K. Selatan, “Identifikasi otomatis pesan saksi mata pada media sosial saat bencana gempa,” Kumpul. J. Ilmu Komputer(KLIK), vol. 08, no. 2, p. 129, 2021.

M. R. Faisal et al., “LSTM and Bi-LSTM Models For Identifying Natural Disasters Reports From Social Media,” J. Electron. Electromed. Eng. Med. Informatics, vol. 5, no. 4, pp. 241–249, 2023.

M. Padhilah et al., “Implementasi Neural Network Multilayer Perceptron Dan Stemming Nazief & Adriani Pada Chatbot Faq Prakerja,” J. Sains Komput. Inform. (J-SAKTI, vol. 6, no. 2, pp. 671–685, 2022.

M. R. Faisal, R. A. Nugroho, R. Ramadhani, F. Abadi, R. Herteno, and T. H. Saragih, “Natural disaster on twitter: Role of feature extraction method of word2vec and lexicon based for determining direct eyewitness,” Trends Sci., vol. 18, no. 23, 2021, doi: 10.48048/tis.2021.680.

M. Khairie, M. R. Faisal, R. Herteno, I. Budiman, F. Abadi, and M. I. Mazdadi, “The Effect of Channel Size on Performance of 1D CNN Architecture for Automatic Detection of Self-Reported COVID-19 Symptoms on Twitter,” 2023 Int. Semin. Intell. Technol. Its Appl. Leveraging Intell. Syst. to Achieve Sustain. Dev. Goals, ISITIA 2023 - Proceeding, no. August, pp. 621–625, 2023, doi: 10.1109/ISITIA59021.2023.10220444.

B. Darmawan, A. Dwi Laksito, M. Resa Arif Yudianto, and A. Sidauruk, “Analisis Perbandingan Ekstraksi Fitur Teks pada Sentimen Analisis Kenaikan Harga BBM,” Krea-TIF J. Tek. Inform. , vol. 11, no. 1, pp. 53–63, 2023, doi: 10.32832/krea-tif.v11i1.13819.

S. Ghosal and A. Jain, “Depression and Suicide Risk Detection on Social Media using fastText Embedding and XGBoost Classifier,” Procedia Comput. Sci., vol. 218, pp. 1631–1639, 2022, doi: 10.1016/j.procs.2023.01.141.

S. A. Shalehah and Y. S. Triana, “Analisa Kinerja RNN Menggunakan FastText Embedding terhadap Ulasan Peduli Lindungi di Masa Covid-19,” Mercu Buana, pp. 1–20, 2022.

S. Sadiq, T. Aljrees, and S. Ullah, “Deepfake Detection on Social Media: Leveraging Deep Learning and FastText Embeddings for Identifying Machine-Generated Tweets,” IEEE Access, vol. 11, no. August, pp. 95008–95021, 2023, doi: 10.1109/ACCESS.2023.3308515.

S. Hu, A. Kumar, F. Al-Turjman, S. Gupta, S. Seth, and Shubham, “Reviewer Credibility and Sentiment Analysis Based User Profile Modelling for Online Product Recommendation,” IEEE Access, vol. 8, pp. 26172–26189, 2020, doi: 10.1109/ACCESS.2020.2971087.

S. Song et al., “Research on a working face gas concentration prediction model based on LASSO-RNN time series data,” Heliyon, vol. 9, no. 4, 2023, doi: 10.1016/j.heliyon.2023.e14864.

J. Pardede and I. Pakpahan, “Analisis Sentimen Penanganan Covid-19 Menggunakan Metode Long Short-Term Memory Pada Media Sosial Twitter,” J. Publ. Tek. Inform., vol. 2, no. 3, pp. 12–25, 2023.

M. Muñoz-Organero, P. Callejo, and M. Á. Hombrados-Herrera, “A new RNN based machine learning model to forecast COVID-19 incidence, enhanced by the use of mobility data from the bike-sharing service in Madrid,” Heliyon, vol. 9, no. 6, p. e17625, 2023, doi: 10.1016/j.heliyon.2023.e17625.

V. Matoušek, “Application of LSTM Neural Networks in Language Modelling,” Univ. West Bohemia, Fac. Appl. Sci. Dep. Cybern. Univerzitn´ı 22, Plzen, Czech rep, no. June 2018, 2013, doi: 10.1007/978-3-642-40585-3.

M. R. Faisal et al., “A Social Community Sensor for Natural Disaster Monitoring in Indonesia Using Hybrid 2D CNN LSTM,” ACM Int. Conf. Proceeding Ser., no. December, pp. 250–258, 2023, doi: 10.1145/3626641.3626932.

I. Budiman, M. R. Faisal, D. T. Nugrahadi, Muliadi, M. K. Delimayanti, and S. E. Prastya, “Harvesting Natural Disaster Reports from Social Media with 1D Convolutional Neural Network and Long Short-Term Memory,” 2023 8th Int. Conf. Informatics Comput. ICIC 2023, no. January, pp. 1–6, 2023, doi: 10.1109/ICIC60109.2023.10382045.

H. Zhou, “Research of Text Classification Based on TF-IDF and CNN-LSTM,” J. Phys. Conf. Ser., vol. 2171, no. 1, 2022, doi: 10.1088/1742-6596/2171/1/012021.

W. K. Sari, D. P. Rini, and R. F. Malik, “Text Classification Using Long Short-Term Memory With GloVe Features,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 5, no. 2, p. 85, 2020, doi: 10.26555/jiteki.v5i2.15021.

C. Wang, D. Han, Q. Liu, and S. Luo, “A Deep Learning Approach for Credit Scoring of Peer-to-Peer Lending Using Attention Mechanism LSTM,” IEEE Access, vol. 7, pp. 2161–2168, 2019, doi: 10.1109/ACCESS.2018.2887138.

A. Ajitha, M. Goel, M. Assudani, S. Radhika, and S. Goel, “Design and development of Residential Sector Load Prediction model during COVID-19 Pandemic using LSTM based RNN,” Electr. Power Syst. Res., vol. 212, no. October 2021, p. 108635, 2022, doi: 10.1016/j.epsr.2022.108635.

N. P. S. Wati and C. Pramartha, “Penerapan Long Short Term Memory dalam Mengklasifikasi Jenis Ujaran Kebencian pada Tweet Bahasa Indonesia,” J. Nas. Teknol. Inf. dan Apl., vol. 1, no. 1, pp. 755–762, 2022.