RNN for Sequence LabellingWhat is RNN?A Recurrent Neutral Network, or RNN, is a kind of Artificial Neural Network that works with time series or sequential data. These deep learning techniques are often employed for ordinal and temporal issues like language translation, natural language processing (NLP), speech recognition, and image captioning. The RNN is used in multiple applications like Siri, Voice Search, and google translate. The Recurrent Neural Network learns from the training input like feedforward and convolutional neural networks (CNN). Recurrent Neural Network works efficiently with Natural Language Processing (NLP) tasks. Sequence labeling labels each element in a sequence for purposes such as partofspeech tagging, named entity identification, sentiment analysis, etc. The RNN does well at processing sequential data because of the ability to keep a concealed state that catches information from prior time steps. RNNs are a natural choice for sequence labeling because it entails assigning a label to each element in a series. Architecture of Recurrent Neural NetworkRecurrent cell is the fundamental building block of an RNN, receiving an input at each time step and producing an output. The hidden state serves as the network's memory and is updated based on the current input and the prior hidden state. Each time step's output can be used for various purposes, such as classification or additional processing. Recurrent Neutral Network for Sequence LabelingIn sequence labeling, a label is assigned to each element in a sequence based on its context. For example, in partofspeech tagging, every word of the phrase is labeled with their part of speech. Manytomany architectures are used for sequence labeling in RNN, in which input and output sequences have the same length. Then an input element is fed into the RNN, the result at each time step is utilized to forecast the label for that element. Let's see the mathematical forms and implementation of Sequence labeling using RNN. Mathematical ImplementationLet's take an input sequence named X and an output sequence named Y. We can represent these input and output sequences as: X = [x_{1}, x_{2}, x_{3},……., x_{n}] Y = [y_{1}, y_{2}, y_{3},…….., y_{n}] The RNN computation can be represented as:
Where f represents the recurrent cell function,
Process of Sequence Labeling using RNNLet's see the steps involved in Sequence Labeling using RNN with their mathematical expressions and examples. 1. Data Preparation: The first step is to prepare the dataset. For sequence labeling, we need a labeled dataset for training and evaluation. The dataset must consist of the input sequences with their output labels. For instance, the dataset of a named entity recognition task must have a dataset consisting of sentences and their labels. 2. Input and Output Encoding: The next step is to encode input and output data in the format accepted by RNN. In NLP, inputs are generally represented as word embeddings or onehot vectors, whereas outputs are frequently represented as numerical labels or onehot vectors. 3. Architecture: The next step is to choose the suitable RNN architecture for the sequence labeling. Long ShortTerm Memory (LSTM) and Gated Recurrent Units (GRUs) are the most common RNN architectures because of their capability to identify longterm dependencies. 4. After selecting the appropriate RNN architecture, we must process the input sequence and generate the output sequence. This process of generating an output sequence using the input sequence is known as forward pass. Let's take an input sequence named X and an output sequence named Y. We can represent these input and output sequences as: X = [x_{1}, x_{2}, x_{3},……., x_{n}] Y = [y_{1}, y_{2}, y_{3},…….., y_{n}] The hidden state and output are calculated using LSTM by performing these equations:
Here, σ is the sigmoid activation function
5. Calculating Loss: A loss function is needed to calculate the difference between the actual output O_{t} and predicted output y Crossentropy loss and categorical loss are often loss functions for sequence labeling tasks. It can be calculated as: L_{t }= loss (y_{t}, O_{t}) The overall loss L can be calculated as the sum of all the losses: L = ∑ L_{t}. 6. Backpropagation Through Time (BPTT): Backpropagation through time (BPTT) is used for updating model parameters and minimizing loss. At each time step, the gradients of the loss concerning the parameters are calculated, and the parameters are updated using gradient descent optimization. 7. Training and Evaluation of the Dataset: The next step is to train the RNN with the labeled dataset. The loss is calculated on the validation data set. Overfitting must be prevented by decreasing the loss function. The RNN will be evaluated after it is trained to check its accuracy, precision for its efficiency. Let's understand the sequence labeling using the Named Entity Recognition Task. We have to label and identify the entities in the sentence. For example, take a simple sentence "Red Fort is in Delhi." And it will give labels for each entity in the sentence as [FAC, FAC, O, O, GPE]. In this, FAC refers to the name of the building, and GPE stands for Location. Explanation: We have converted the input text to its label according to its meanings and represented the labels in a vector. We process the input sequence word by word with an LSTMbased RNN, updating the hidden state at each time step. Each time step's final output is utilized to anticipate the label for that word. We will use the loss function to compare the predicted labels with the actual ones during the training. The gradients are calculated using BPTT, and the parameter of the LSTM is updated through gradient descent. Finally, the trained model is evaluated on a test dataset to check its accuracy. Implementation of Sequence Labeling using RNN in PythonWe can implement the sequence labeling using RNN with a deep learning library given by Python: Tensorflow. We will implement the Sequence labeling of RNN for performing Named Entity Recognition (NER). In this implementation, we will train an LSTM model for predicting the labels of the entities in the sentence. Firstly, we will take a data set of various labeled sentences in which each word is tagged with its labels. Then, we will train the model to learn the patterns and relationships between the words and the labels. Program: Program to implement RNN sequence labeling using Tensorflow and understanding the LSTM model Code: Output: Model: "sequential_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= embedding_1 (Embedding) (None, 80, 200) 3000000 lstm_1 (LSTM) (None, 80, 74) 81400 time_distributed_1 (TimeDi (None, 80, 5) 375 stributed) ================================================================= Total params: 3081775 (11.76 MB) Trainable params: 3081775 (11.76 MB) Nontrainable params: 0 (0.00 Byte) _________________________________________________________________ Explanation: Step 1: We have implemented the required libraries, including Tensorflow and its models. Step 2: We have created an RNN model for sequence labeling. It consists of multiple arguments.
Step 3: In the function made to create the RNN model (rnn_model), we have created a sequential model and added multiple layers, including an embedding layer, LSTM layer, Dense layer, and TimeDistributed layer. These layers work as follows:
Step 4: Returning the model: The function rnn_model returns the RNN model. Step 5: Using the Model: We have given values to the parameters of the RNN models like vocabulary size, labels, etc. At last, we have printed the summary of the LSTM model using the summary() to get the model layer and output shapes.
Next TopicCatBoost in Machine Learning
