RNN for Sequence Labelling

What is RNN?

A Recurrent Neutral Network, or RNN, is a kind of Artificial Neural Network that works with time series or sequential data. These deep learning techniques are often employed for ordinal and temporal issues like language translation, natural language processing (NLP), speech recognition, and image captioning. The RNN is used in multiple applications like Siri, Voice Search, and google translate. The Recurrent Neural Network learns from the training input like feedforward and convolutional neural networks (CNN).

Recurrent Neural Network works efficiently with Natural Language Processing (NLP) tasks. Sequence labeling labels each element in a sequence for purposes such as part-of-speech tagging, named entity identification, sentiment analysis, etc.

The RNN does well at processing sequential data because of the ability to keep a concealed state that catches information from prior time steps. RNNs are a natural choice for sequence labeling because it entails assigning a label to each element in a series.

Architecture of Recurrent Neural Network

Recurrent cell is the fundamental building block of an RNN, receiving an input at each time step and producing an output. The hidden state serves as the network's memory and is updated based on the current input and the prior hidden state. Each time step's output can be used for various purposes, such as classification or additional processing.

Recurrent Neutral Network for Sequence Labeling

In sequence labeling, a label is assigned to each element in a sequence based on its context. For example, in part-of-speech tagging, every word of the phrase is labeled with their part of speech.

Many-to-many architectures are used for sequence labeling in RNN, in which input and output sequences have the same length. Then an input element is fed into the RNN, the result at each time step is utilized to forecast the label for that element.

Let's see the mathematical forms and implementation of Sequence labeling using RNN.

Mathematical Implementation

Let's take an input sequence named X and an output sequence named Y. We can represent these input and output sequences as:

X = [x₁, x₂, x₃,……., x_n]
Y = [y₁, y₂, y₃,…….., y_n]

The RNN computation can be represented as:

Hidden State Update: h_t= f(h_t-1, x_t)
Output Computation: o_t= g (h_t)
Loss Function: L = ∑ loss (y_t, O_t)

Where f represents the recurrent cell function,

g is the output function
loss is the function to calculate the difference between predicted and actual labels.

Process of Sequence Labeling using RNN

Let's see the steps involved in Sequence Labeling using RNN with their mathematical expressions and examples.

1. Data Preparation: The first step is to prepare the dataset. For sequence labeling, we need a labeled dataset for training and evaluation. The dataset must consist of the input sequences with their output labels. For instance, the dataset of a named entity recognition task must have a dataset consisting of sentences and their labels.

2. Input and Output Encoding: The next step is to encode input and output data in the format accepted by RNN. In NLP, inputs are generally represented as word embeddings or one-hot vectors, whereas outputs are frequently represented as numerical labels or one-hot vectors.

3. Architecture: The next step is to choose the suitable RNN architecture for the sequence labeling. Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRUs) are the most common RNN architectures because of their capability to identify long-term dependencies.

4. After selecting the appropriate RNN architecture, we must process the input sequence and generate the output sequence. This process of generating an output sequence using the input sequence is known as forward pass.

Let's take an input sequence named X and an output sequence named Y. We can represent these input and output sequences as:

X = [x₁, x₂, x₃,……., x_n]
Y = [y₁, y₂, y₃,…….., y_n]

The hidden state and output are calculated using LSTM by performing these equations:

Forget Gate: f_t= σ (W_f . [h_t-1, x_t] + b_f)
Input Gate: i_t= σ (W_i. [h_t-1, x_t] + b_i)
Candidate State: C = tanh (W_c[h, x] + b_c)
Updated Cell State: C_t= f_t⊙ C_t-1+ i_t⊙ C
Output Gate: O_t= (W_o[h_t-1, x_t] + b₀)
Hidden State: h_t= O_t⊙ tanh (c)
Output: O_t= g (h_t)

Here, σ is the sigmoid activation function

⊙ is component-wise multiplication
W is the weight
b is the bias parameter of the LSTM

5. Calculating Loss: A loss function is needed to calculate the difference between the actual output O_t and predicted output y Cross-entropy loss and categorical loss are often loss functions for sequence labeling tasks. It can be calculated as:

L_t= loss (y_t, O_t)

The overall loss L can be calculated as the sum of all the losses: L = ∑ L_t.

6. Backpropagation Through Time (BPTT): Backpropagation through time (BPTT) is used for updating model parameters and minimizing loss. At each time step, the gradients of the loss concerning the parameters are calculated, and the parameters are updated using gradient descent optimization.

7. Training and Evaluation of the Dataset: The next step is to train the RNN with the labeled dataset. The loss is calculated on the validation data set. Overfitting must be prevented by decreasing the loss function. The RNN will be evaluated after it is trained to check its accuracy, precision for its efficiency.

Let's understand the sequence labeling using the Named Entity Recognition Task. We have to label and identify the entities in the sentence. For example, take a simple sentence "Red Fort is in Delhi." And it will give labels for each entity in the sentence as [FAC, FAC, O, O, GPE].

In this, FAC refers to the name of the building, and GPE stands for Location.

Explanation:

We have converted the input text to its label according to its meanings and represented the labels in a vector. We process the input sequence word by word with an LSTM-based RNN, updating the hidden state at each time step. Each time step's final output is utilized to anticipate the label for that word.

We will use the loss function to compare the predicted labels with the actual ones during the training. The gradients are calculated using BPTT, and the parameter of the LSTM is updated through gradient descent.

Finally, the trained model is evaluated on a test dataset to check its accuracy.

Implementation of Sequence Labeling using RNN in Python

We can implement the sequence labeling using RNN with a deep learning library given by Python: Tensorflow. We will implement the Sequence labeling of RNN for performing Named Entity Recognition (NER).

In this implementation, we will train an LSTM model for predicting the labels of the entities in the sentence.

Firstly, we will take a data set of various labeled sentences in which each word is tagged with its labels. Then, we will train the model to learn the patterns and relationships between the words and the labels.

Program: Program to implement RNN sequence labeling using Tensorflow and understanding the LSTM model

Code:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, TimeDistributed
 
# Code implementation of the RNN for sequence labeling
def rnn_model(vocab, labels, embedding_dimen, lstm_model_units):
    model = Sequential()
    model.add(Embedding(vocab, embedding_dimen, input_length =      max_sequence_length))
    model.add(LSTM(lstm_model_units, return_sequences=True))
    model.add(TimeDistributed(Dense(labels, activation='softmax')))
    return model
 
# Using the RNN model
vocab = 15000        # adding the actual size of the vocabulary
labels = 5               # adding the actual number of entity labels
embedding_dimen = 200
lstm_model_units = 74
max_sequence_length = 80
 
model = rnn_model(vocab, labels, embedding_dimen, lstm_model_units)
model.summary()

Output:

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 embedding_1 (Embedding)     (None, 80, 200)           3000000   
                                                                 
 lstm_1 (LSTM)               (None, 80, 74)            81400     
                                                                 
 time_distributed_1 (TimeDi  (None, 80, 5)             375       
 stributed)                                                      
                                                                 
=================================================================
Total params: 3081775 (11.76 MB)
Trainable params: 3081775 (11.76 MB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Explanation:

Step 1: We have implemented the required libraries, including Tensorflow and its models.

Step 2: We have created an RNN model for sequence labeling. It consists of multiple arguments.

vocab: it defines the vocabulary size, i.e., unique words in the dataset.
labels: it defines the labels of the entities to be predicted by the model.
embedding_dimen: it describes the dimension of word embedding representing words in the vector space.
lstm_model_units: it defines the number of LSTM units in the LSTM layer.

Step 3: In the function made to create the RNN model (rnn_model), we have created a sequential model and added multiple layers, including an embedding layer, LSTM layer, Dense layer, and TimeDistributed layer.

These layers work as follows:

Embedding layer: This layer helps to understand the word embedding from the input data. It converts each word index into a dense vector representation with the specified dimensions denoted by embedding_dimen. The input_length specifies the maximum length of input sequences.
LSTM layer: The LSTM layer makes the sequence-to-sequence mapping. The LSTM layer returns the hidden state in the input sequence needed for sequence labeling.
Dense Layer: This fully connected layer returns the predicted probabilities of each label using the softmax activation function as an output.
TimeDistributed Layer: This layer labels each word in the sequence.

Step 4: Returning the model: The function rnn_model returns the RNN model.

Step 5: Using the Model: We have given values to the parameters of the RNN models like vocabulary size, labels, etc. At last, we have printed the summary of the LSTM model using the summary() to get the model layer and output shapes.

Next TopicCatBoost in Machine Learning

← prev next →