Text Generation using Gated Recurrent Unit Networks

What are Gated Recurrent Unit Networks?

A Gated Recurrent Unit Network is a Recurrent Neural Network alternative to Long Short-Term Memory Networks (LSTM). The GRU can work on sequential data like text, speech, and time series.

The core idea behind GRU is to employ gating techniques to selectively update the network's hidden state at each time step. The gating mechanisms govern the flow of information into and out of the network. The GRU contains two gating mechanisms: the reset and update gates.

Problem Statement

We will build a Text Generator using the Gated Recurrent Unit. We will train the network by passing a text file that will map each character to a unique number. Then, we will hot-encode each character into a vector that the network needs.

We will use a collection of famous singers' song lyrics for the Text Generation in .txt format.

It can be downloaded from Song Lyrics.

Approach to implementation of the Text Generation using the Gated Recurrent Unit Network

Step 1: Libraries and Dataset

The first step is to import the required libraries like numpy, tensorflow, and keras and their different models like LSTM, Sequential, etc. Then, we will load and read the text file., and store it in a string.

Code:

from __future__ import absolute_import, division, print_function, unicode_literals
import numpy as np 
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.layers import LSTM
from keras.optimizers import RMSprop
from keras.callbacks import LambdaCallback
from keras.callbacks import ModelCheckpoint
from keras.callbacks import ReduceLROnPlateau
import random
import sys
with open('song.txt', 'r') as file:
    file = file.read()   
print(file)

Output:

Text Generation using Gated Recurrent Unit Networks

Step 2: Mapping of the characters

After reading the file, we will store all the unique characters in a list. Then, we will make dictionaries that map all the characters to indexes.

Code:

# Store all the unique characters present in the text
vocab = sorted(list(set(text)))
  
# Dictionaries made to map each character to an index
convert_char_to_indices = dict((c, i) for i, c in enumerate(vocabulary))
convert_indices_to_char = dict((i, c) for i, c in enumerate(vocabulary))
  
print(vocab)

Output:

['\n', ' ', '!', '"', '&', "'", '(', ')', ',', '-', '.', ';', '?', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'Y', '[', ']', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

Step 3: Preprocessing the data

After getting the unique words, we will preprocess the data, divide the text into the longest subsequence of the maximum length and then hot encode each character into a vector.

Code:

# Shortening the text into a subsequence of the maximum length
max_len = 80
step = 7
sentence = []
next_character = []
for i in range(0, len(file) - max_len, steps):
    sentence.append(file[i: i + max_len])
    next_character.append(file[i + max_len])
      
# Hot encoding acter into a boolean vector
x = np.zeros((len(sentence), max_len, len(vocab)), dtype = bool)
y = np.zeros((len(sentence), len(vocab)), dtype = bool)
  
for i sent in enumerate(sentence):
    for t, char in enumerate(sent):
        X[i, t, convert_char_to_indices[char]] = 1
    y[i, convert_char_to_indices[next_character[i]]] = 1

Step 4: Making the GRU network

We will make a sequential model and add multiple layers like GRU, Dense, Activation, etc. We can build the network with as many layers as we want.

Code:

model = Sequential()
  
# Defining the cell type
model.add(GRU(140, input_shape =(max_len, len(vocab))))
  
# Defining the densely connected Neural Network layer
model.add(Dense(len(vocab)))
# Defining the activation function for the cell
model.add(Activation('softmax'))
  
# Defining the optimizing function
optimizer = RMSprop(learning_rate = 0.01)
  
# Configuring the model for training
model.compile(loss ='categorical_crossentropy', optimizer = optimizer)
model

Output:

Step 5: Making the helper functions

We will make some helper functions that will be used at the time of training of the model.

Function to sample the next character

It will sample an index from the probability array and convert the vector into a numpy array.

Code:

# Helper function used to sample an index from a probability array
def sa_ind(pred, temp = 1.0):
    preds = np.asarray(pred).astype('float64')
  
    # Normalizing the predictions array
    pred = np.log(pred) / temperature
    exp_pred = np.exp(pred)
    pred = exp_preds / np.sum(exp_pred)
  
    # sampling
    probab = np.random.multinomial(1, pred, 1)
  
    return np.argmax(probab)

Function to generate text after each epoch in the model.

This function will generate random text after the execution of each epoch.

Code:

# Helper function used to generate text after the end of each epoch
def on_epoch_end(epoch, logs):
    print()
    print('----- Generating text after Epoch: % d' % epoch)
  
    # A random starting index for the text generation
    start_ind = random.randint(0, len(file) - max_len - 1)
  
    # Sampling for different values of diversity
    for divers in [0.4, 0.8, 1.2, 1.6]:
        print('----- Diversity:', divers)
  
        generated = ''
  
        # Seed sentence
        sentences = file[start_ind: start_ind + max_len]
  
        generated += sentences
        print('----- Generating with seed: "' + sentence + '"')
        sys.stdout.write(generated)
  
        for i in range(400):
            x_pred = np.zeros((1, max_len, len(vocab)))
  
            for t, char in enumerate(sentences):
                x_pred[0, t, convert_char_to_indices[char]] = 1.
  
            # Predictions for the next character
            pred = model.predict(x_pred, verbose = 0)[0]
  
            # Getting the index of the most probable next character
            next_ind = sample_index(pred, divers)
  
            # Getting the most probable next character using the mapping built
            next_char = convert_indices_to_char[next_ind]
  
            # Building the generated text
            generated += next_char
            sentence = sentences[1:] + next_char
  
            sys.stdout.write(next_char)
            sys.stdout.flush()
        print()
  
# A custom callback function 
callback_func = LambdaCallback(on_epoch_end = on_epoch_end)

Function which saves the model which has decreased loss after every epoch.

The random text will generate whenever the model executes with epochs, and the model will decrease loss after every epoch and get saved to the path defined in the function.

Code:

# helper function used to save the model after each epoch with decreased loss
path = "file.tensor"
checkpoint = ModelCheckpoint(path, monitor ='loss',
                             verbose = 1, save_best_only = True,
                             mode ='min')

Function which reduces the learning rate

This function helps to reduce the learning rate after each epoch.

Code:

# helper function to reduce the learning rate
reduce_alpha = ReduceLROnPlateau(monitor ='loss', factor = 0.2,
                              patience = 1, min_lr = 0.001)
callbacks = [print_callback, checkpoint, reduce_alpha]

Step 5: Training of the Model

Now, we will train the model with different batches and epochs. Here, we have taken a batch size of 140 and an epoch of 40.

Code:

Output:

Step 6: Generate the text

It will generate random and new text.

Code:

def generate_text(length, divers):
    # Get random starting text
    start_ind = random.randint(0, len(text) - max_len - 1)
  
    # Defining the generated text
    generated = ''
    sentence = file[start_ind: start_ind + max_len]
    generated += sentences
  
    # Generating new text of a given length
    for i in range(length):
  
            # Initializing the prediction vector
            x_pred = np.zeros((1, max_len, len(vocab)))
            for t, char in enumerate(sentences):
                x_pred[0, t, conver_char_to_indices[char]] = 1.
  
            # Making the predictions
            preds = model.predict(x_pred, verbose = 0)[0]
  
            # Getting the index of the next most probable index
            next_ind = sample_ind(pred, divers)
  
            # Getting the most probable next character using the mapping built
            next_char = convert_indices_to_char[next_ind]
  
            # Generating new text
            generated += next_char
            sentences = sentences[1:] + next_char
    return generated
  
print(generate_text(500, 0.2))

Output:

nd kept out of sight
But other girls were never quite
Like this, di-di-di-di'n'd say stame tome trre tars tarl ther stand that there tars in ther stars tame to me st man tars tome trre tars that that on ther stars that on ther stars that the ske stars in ther stars tarl ing and warl that that thatting san that stack in that there tome stass and that the can stars that the trre to ther can tars tome trre tars that the ske stand and that that the skn tars tome trre tome tore tome tore tome
And you say stame tome trre tome that to grin a long tome trre that long tore thars tom.

Next TopicViterbi Algorithm in NLP

← prev next →