LSTM and Bi-LSTM Models For Identifying Natural Disasters Reports From Social Media

Natural disaster events are occurrences that cause significant losses, primarily resulting in environmental and property damage and in the worst cases, even loss of life. In some cases of natural disasters, social media has been utilized as the fastest information bridge to inform many people, especially through platforms like Twitter. To provide accurate categorization of information, the field of text mining can be leveraged. This study implements a combination of the word2vec and LSTM methods and the combination of word2vec and Bi-LSTM to determine which method is the most accurate for use in the case study of news related to disaster events. The utility of word2vec lies in its feature extraction method, transforming textual data into vector form for processing in the classification stage. On the other hand, the LSTM and Bi-LSTM methods are used as classification techniques to categorize the vectorized data resulting from the extraction process. The experimental results show an accuracy of 70.67% for the combination of word2vec and LSTM and an accuracy of 72.17% for the combination of word2vec and Bi-LSTM. This indicates an improvement of 1.5% achieved by combining the word2vec and Bi-LSTM methods. This research is significant in identifying the comparative performance of each combination method, word2vec + LSTM and word2vec + Bi-LSTM, to determine the best-performing combination in the process of classifying data related to earthquake natural disasters. The study also offers insights into various parameters present in the word2vec, LSTM, and Bi-LSTM methods that researchers can determine. The findings of the present study have demonstrated that, upon thorough examination and analysis, the Bi-Directional Long Short-Term Memory (Bi-LSTM) approach has resulted in superior performance when compared to the Long Short-Term Memory (LSTM) approach, particularly in conjunction with the word2vec feature extraction methodology.–


I. INTRODUCTION
Every year, several natural disasters in the United States and worldwide cause structural damage, deaths, and chaos [1].Disasters can have significant social impacts on cultural heritage.They can also destroy cultural heritage, leading to long-term financial losses and livelihood disruptions [2].In some cases of natural disasters, the presence of social media plays a crucial role in assisting every activity within the disaster management cycle.During the pre-disaster stage, social media can serve as an early warning before a disaster occurs [3].Therefore, numerous studies have been conducted utilizing social media data as a social network sensor, as demonstrated by [4], which can function as a primary information dissemination platform.The concept of a social network sensor is a new concept originating from physical sensors.This concept aims to blend the idea of physical sensors into cyberspace through social media [5].The social media ecosystem, such as online platforms like Twitter, provides an environment where various individuals, both experts and non-experts, can easily share, discuss, and engage with knowledge.Its usage is evidence of the value Twitter brings to researchers.A study in 2017 reported that 1%-5% of 187 million Twitter users are active scientists [6].Given the widespread and global use of social media, platforms like Twitter, a prominent communication medium, especially during disasters, can be utilized as emergency communication channels [7] [8].
One of the techniques that can be used to conduct research and extract information from social media is in the field of Text Mining.Text Mining is a scientific field that provides methods to analyze and process unstructured data, constituting around 95% of big data [9].The text mining domain has a subsection known as text classification.Text Classification, also known as document classification or text categorization, encompasses topics such as sentiment analysis, emotion detection, spam analysis, and document indexing [10].Sentiment analysis involves classifying text opinions into categories such as positive, negative, and neutral.This is often referred to as subjectivity analysis, opinion mining, and sentiment extraction [11].
To perform sentiment analysis, the data needs to be preprocessed first.Word embedding, also known as word vector representation, is one technique to transform words into vectors or arrays consisting of numerical values.Word2vec is a neural network-based model [12] that is effectively used to identify synonyms frequently appearing in similar contexts of extensive data [13].Word2vec has proven effective in representing the meanings of words.However, hyperparameter configuration and feature selection impact the performance of Word2vec [14].
For the classification process, there is a field that adopts the capabilities of recurrent neural networks (RNNs).RNNs are an Artificial Neural Network (ANN) that allows past knowledge to be used through a recurrent architecture [15].Representative RNNs, such as Long Short Term Memory (LSTM), have made breakthrough progress in speech and video processing, social applications, text sentiment analysis, and more [16].The outputs for each RNN layer employ Dense Layers, and spatial contributions are captured by combining information using additional fully connected layers [17].One classification method that adopts RNNs is LSTM and Bi-LSTM.Long Short-Term Memory (LSTM) is a gated recurrent neural network adding a mechanism to control information flow within the network based on a more complex recurrent neural network [18].Just like in [19], numerous misclassified categories and unknown types exist.Therefore, functional and algorithmic work is needed to organize categories, requiring an LSTM-based classification model to address that issue.
Also, the study [20] showed that LSTM classification also yielded good performance in case studies of detection and diagnosis of motor electrical disorders.Due to the good performance of this classifier, we suggest that this classifier can be used by society as a benchmark for the development of new and improved motor electrical fault classification algorithms.Apart from the LSTM method used as a classification technique, another technique is a variant of RNN known as Bidirectional LSTM (Bi-LSTM).In a bidirectional LSTM network, LSTM neurons are divided into two directions: one for the forward state and the other for the backward state.One forward state or backward state are of Bi-LSTM refers to one of the two states maintained by the Long Short-Term Memory (LSTM) within the architecture of a Recurrent Neural Network (RNN) known as Bidirectional LSTM (Bi-LSTM).
The term one forward state pertains to the internal state of an LSTM cell in the forward portion of the Bi-LSTM.The backward state refers to the states generated by the LSTM cells that run in the backward direction (from end to start of the sequence).These states reflect the model's understanding of the sequence as it processes it in the reverse direction.The forward state refers to the states generated by the LSTM cells that run in the forward direction (from start to end of the sequence).These states reflect the model's understanding of the sequence as it processes it in the forward direction.[21], [22].
Several case studies and applications of these methods have been previously implemented in research.One of them is the research [23] that conducted sentiment analysis by comparing Bag-of-Words (BoW) with TF-IDF feature extraction and Word2vec, using LR and SVM classification.The results showed comparable performance for SVM classification, while for LR classification, Word2vec outperformed with the highest accuracy of 87.4%.A study [24] also analyzed sentiments towards products, services, politics, social events, and corporate strategies.Reviews (from sources like TripAdvisor, Amazon, and IMDB) and social media posts (primarily from Twitter and Facebook) were subjected to LSTM classification, showing good performance with 85% accuracy when more training data was available.
Furthermore, a study by [25] successfully implemented Bi-LSTM for text classification with a precision of 91.54%, recall of 92.82%, and an f1-score of 92.18%.Applying Bi-LSTM made the model work more optimally as contextual information from comment text data was effectively absorbed.The best model obtained outperformed RNN, CNN, regular LSTM, and Naïve Bayes algorithms.There's also research by [26] that conducted sentiment analysis in the Indonesian language, using LSTM with Word2vec as the word embedding method for sentiment analysis.The model achieved sentiment analysis with an accuracy of up to 85.96%.
Based on research [23]- [26], various methods are presented that are considered superior in feature extraction and classification processes.However, there has not been a study that explicitly combines these superior methods in a case study.For example, research [26] only focused on LSTM classification with Word2vec extraction, and research [25] only discussed the advantages of Bi-LSTM.Thus, based on previous literature review, this study proposes a Text Classification method focused on a case study of earthquakes based on information gathered from Twitter.Several relevant methods are used in this data processing process, including Word2vec as the text data extraction process obtained from Twitter, then LSTM and Bi-LSTM as text categorization techniques.
In addition to identifying the case study, this research also aims to contribute, a. Insights into which method yields the best results by comparing the accuracy obtained from LSTM and Bi-LSTM classification with feature extraction from Word2vec; b.Information on how well the accuracy is achieved using the LSTM and Bi-LSTM methods; c.A reference that can be utilized by future researchers interested in studying earthquake case studies and as a guide for the appropriate method selection.

II. MATERIAL AND METHODS
In general, the research process involves comparing the classification outcomes of two methods: LSTM and Bi-LSTM.To facilitate this comparison, several stages are carried out.These stages include data collection, data preprocessing, feature extraction using word2vec, data partitioning into training and testing sets, model testing, and evaluation.The proposed model can be observed in the following FIGURE 1.

A. DATASET
The dataset used in this study is available in [27], consisting of Twitter data related to earthquake disasters from https://github.com/rezafaisal/NaturalDisasterOnTwitter.Each of the used dataset already has labels for three classes that identify the source of the tweets, namely (i) eyewitness, (ii) non-eyewitness, and (iii) don't know.The total number of data used is 3000, with an equal distribution of 1000 data points for each class.In the eyewitness category are messages about natural disasters posted by eyewitnesses at the disaster's location.Messages in the non-eyewitness category are messages about natural disasters uploaded by users who are not eyewitnesses.On the other hand, messages in the don't know category contains words related to natural disasters, but the meaning is not about natural disasters [28].Examples of the sample dataset used can be seen in TABLE 1.

B. PREPROCESSING
Preprocessing is the process of enhancing the quality of raw data before using it in the subsequent stages.The preprocessing steps in this study involve the following actions: Cleansing, which involves removing characters that do not contribute to sentiment analysis, leaving only alphabet characters.This helps eliminate unnecessary noise from the text data.Case Folding, In case of folding, all words are converted to lowercase.This ensures uniformity in the text data and makes it easier to process and analyze, as capitalization differences are disregarded.Tokenization, Tokenization is the process of converting text into tokens or smaller units such as words or phrases.This is done before transforming the text into vectors, making it easier to filter out unnecessary tokens.By conducting these preprocessing steps, the text data is refined and prepared for further analysis, enhancing the quality and consistency of the dataset for sentiment analysis.

C. FEATURE EXTRACTION
Feature extraction is the process of obtaining characteristics that describe a piece of data.The feature extraction method used is word vector representation, also known as word2vec.
Word2vec is a natural language processing technique.The word2vec algorithm learns word associations from an enormous corpus of text using a neural network model [29].Word2vec is a feature extraction method applied to map words into vectors, capturing the meanings and contexts of words within documents.This method has two architectural models: Continuous Bag-of-Words (CBOW) and Skip-Gram.Both models consist of input, projection, and output layers, a although their processes for generating output differ.The input layer takes Wn = {W(t-2),W(t-1), ...,W(t+1),W(t+2)} as arguments, where Wn represents a word.The projection layer adapts to a multidimensional vector array and accumulates the sum of several vectors.The output layer then displays the resulting vector from the projection layer [30].Word embedding schemes like Word2vec and others assign equal weights to each word in a sentence and compute the average embedding of each word.In both supervised and unsupervised Natural Language Processing (NLP) tasks, it has been proven that weighted word embeddings can enhance performance [31], [32].

D. LSTM
LSTM (Long Short-Term Memory) is an advancement over conventional RNNs, which only possess a single type of memory.The "A" unit structure of LSTM incorporates gate mechanisms that regulate the flow of information within the memory or cell state.LSTM introduces gate mechanisms: the input gate, forget gate, and output gate.The forget gate determines which information from the cell state is discarded.The input gate determines new information that is stored within the cell state.The input gate computes a new value to update the cell state, while the output gate determines the output value based on the cell state The structure of LSTM can be observed in FIGURE 2.
In the LSTM gates, a sigmoid (σ) function is the activation function, which assigns values of 0 or 1.These binary values are used to provide clear and positive gate outputs.A value of 0 serves to discard or forget a feature, while a value of 1 signifies that the feature should be stored within the network.The equations for the input gate are represented by equation ( 1), the forget gate by equation ( 2), and the output gate by equation (3).

E. Bi -LSTM
Bidirectional means two directions.Bi-LSTM applies bidirectional input to the recurrent LSTM layer.The initial idea is to utilize information available from both the past and the future to optimize the utilization of the existing information.The application of Bidirectional in recurrent neural networks involves combining two independent recurrent neural networks so that the network possesses both backward and forward information about the sequence at each time step [21], [22].Bi-LSTM is particularly useful for sequential labeling tasks when access to information before and after a given point is crucial.However, the hidden state in LSTM only captures information from the past, while information following it is not known.This issue can be addressed using Bi-LSTM [25].Fundamentally, Bi-LSTM consists of two LSTM networksthe forward LSTM and the backward LSTMwhich capture information from both directions and mitigate the vanishing gradient problem in RNN methods.Bi-LSTM has demonstrated excellent results in various Natural Language Processing tasks.
Bi-LSTM encompasses numerous parameters and hyperparameters, among which commonly used ones include epochs, batch size, dropout rate, optimizer, learning rate, word vector dimension, the number of neurons in the LSTM hidden layer, L2 regularization lambda, and loss function.The architecture of Bi-LSTM can be observed in FIGURE 3.

III. RESULT
This section compares the accuracy produced by LSTM and Bi-LSTM, which is utilized to determine the success rate of the chosen method.However, the used dataset needs to be preprocessed to ascertain the achieved accuracy.The first step involves data preprocessing, which encompasses cleansing, case folding, and tokenizing.The preprocessing results on the data can be observed in TABLE 2. The outcomes of this preprocessing phase will then proceed to the feature extraction stage, where all text data will be transformed into vector form and placed into an array.This extraction process will utilize the word2vec method to map words into vectors, including meanings and contextual understanding within documents.An example of the word2vec feature extraction results can be seen in TABLE 3, which displays data attributes.Based on the table, the horizontal column 0 represents per-document data that has been transformed into vector form, producing 100 columns through this extraction process.Then, vertically from column 0 downward, there are the documents used in this study, resulting in 3000 rows corresponding to the amount of data used.After the preprocessing and feature extraction processes are complete, the data obtained from this extraction will be divided into two schemes: training data and testing data.Training data is the data used for training, while testing data is used for prediction based on the trained data.The consideration for dividing the dataset in such a manner is to create a training set and provide the best model estimation [34].

TABEL 3 The Results of feature extraction using
In this study, the data is divided using an 80% training and 20% testing split, which will be implemented for testing both the LSTM and Bi-LSTM models.However, before proceeding with the data splitting process, the extracted data is first normalized using the standard scaler normalization technique.The Standard Scaler technique standardizes attributes by subtracting the mean from each value and dividing the result by the standard deviation of the attribute, resulting in a distribution with a mean of zero and a unit variance [35].
Subsequently, the parameters for word2vec used in this testing are as follows: 100 for the vector size, 100 for the number of epochs, a window size of 3, dropout rates of (0.1, 0.2, 0.3, 0.4), and learning rates of (0.001, 0.002, 0.003, 0.004).
The first testing phase will begin with the LSTM model.After testing with LSTM, the evaluation will continue with Bi-LSTM.In the LSTM method testing, the highest accuracy obtained from the experiments is 70.67% at epoch 30.Using the same parameters, the testing with the Bi-LSTM method yields the highest accuracy of 72.17%, also achieved at epoch 30.The obtained accuracies can be compared in TABLE 4 and FIGURE 4.
accuracy results, it can be concluded that the best classification ability with the word2vec feature extraction is achieved using the Bi-LSTM method.

IV. DISCUSSION
The evaluation results of the combined word2vec and LSTM model align with previous studies conducted by [23], [24], [26].However, a fundamental difference between this study and prior research can be seen in the obtained accuracy.In the study by [26], an accuracy of 85.96% was achieved, while in this research, the accuracy achieved is only 70.67%.This difference can be attributed to variations in factors such as the dataset used and the parameter values applied to word2vec.In [26], the vector size parameters used were 100, 200, and 300, with dropout values of (0.2, 0.5, and 0.7), and learning rates of (0.0001 and 0.001).In contrast, this study employed vector dimensions up to 100, dropout rates of (0.1, 0.2, 0.3, 0.4), and learning rates of (0.001, 0.002, 0.003, 0.004).These differing parameter values are considered a potential reason for the decrease in achieved accuracy.Furthermore, the research on the combination of word2vec and Bi-LSTM aligns with studies conducted by [23], [25].However, [25] utilized TF-IDF as the feature extraction method.The commonality between [25] and this study lies in both being text classification studies, where feature extraction is necessary prior to classification.Both studies also employ the Bi-LSTM classification method.The final results from [25] only present precision (91.54%), recall (92.82%), and F1-score (92.18%) values.In contrast, the present yields significantly different results, including accuracy (72.17%), recall (72.17%), precision (72.32%), and F1-score (72.22%).
The underlying reasons for this notable discrepancy in accuracy are akin to the explanations provided earlier, influenced by parameters and data used.Based on the achieved accuracy results, predictive results per class for each method can also be found in   4, where Bi-LSTM achieves higher accuracy than LSTM.
Various factors influence the significant difference in accuracy results.Different feature extraction methods are employed, leading to variations in the input data used for the Bi-LSTM model.Additionally, this study includes the standard scaler normalization method after the extraction process, which was not used in [25].Similar to the LSTM model explanation, the results of the Bi-LSTM model are also influenced by the chosen parameters.The parameters used in this study are standard ones frequently used in many other research projects.Consequently, the study's weakness lies in the parameter value selection, as no optimization was performed to find the best parameters.While this study does not address this issue, it serves as a point for further investigation.
Furthermore, the findings from this study contribute to the knowledge by showcasing the results of the combination of word2vec and LSTM compared to the combination of word2vec and Bi-LSTM.The comparison between these combinations provides insight that the combination of word2vec and Bi-LSTM performs better than the combination of word2vec and LSTM.However, the study's weakness lies in the determination of parameter values.Specifically, the researchers randomly determined the parameter values for the word2vec feature extraction without prior testing to identify the optimal parameters for use.This approach can negatively impact the quality of the data results obtained and subsequently lead to inaccuracies in the classification results.

V. CONCLUSION
This study presents a discussion on the algorithms of deepening Recurrent Neural Networks (RNN), specifically the Long Short-Term Memory (LSTM) and Bidirectional LSTM (Bi-LSTM) methods.However, another technique is needed to transform the textual data into vectors using word2vec to assess the performance of these methods when implemented in text mining.The combination of word2vec and LSTM classification yielded an accuracy of 70.67% for classifying textual data related to earthquake disasters.Subsequently, using the same data, the combination of word2vec and Bi-LSTM classification resulted in an increased accuracy compared to the previous combination.This improvement amounted to 1.5%, achieving an accuracy of 72.17%.
Therefore, Bi-LSTM shows potential for further combination with word2vec, which could involve reconfiguring the parameter values.Adjusting the parameters is likely to produce different accuracy values.This presents an avenue for future researchers to explore parameter adjustments or other possible combinations of methods.The aim would be to achieve even better model performance.

FIGURE 3 .
FIGURE 3. The General Architecture of Bi -LSTM

TABLE 2
TABLE 5 dan FIGURE 5 then, for the precision, recall, and F1-score of other classes are shown in TABLE 6 and FIGURE 6.

Comparison of Precision, Recall, and F1-Score Results for Each
The graphs and tables comparing prediction results, recall, precision, and F1-score per class show that Bi-LSTM outperforms LSTM.The accuracy for each class is a crucial factor that impacts the overall accuracy, as indicated in TABLE