Keras Models

Keras has come up with two types of in-built models; Sequential Model and an advanced Model class with functional API. The Sequential model tends to be one of the simplest models as it constitutes a linear set of layers, whereas the functional API model leads to the creation of an arbitrary network structure.

Keras Sequential Model

The layers within the sequential models are sequentially arranged, so it is known as Sequential API. In most of the Artificial Neural Network, the layers are sequentially arranged, such that the data flow in between layers is in a specified sequence until it hit the output layer.

Getting started with the Keras Sequential model

The sequential model can be simply created by passing a list of instances of layers to the constructor:

from keras.models import Sequential 
from keras.layers import Dense, Activation 
model = Sequential([
          Dense(32, inpuit_shape=(784,)),
          Activation('relu'),
         Dense(10), 
         Activation('softmax'), 
])

The .add() method is used to add layers:

model = Sequential()
model.add(Dense(32, input_dim=784 )) 
model.add(Activation('relu')) 

Specifying the input shape

Since the model must be aware of the input size that it is expecting, so the very first layer in the sequential model necessitates particulars about its input shape as the rest of the other layers can automatically speculate the shape. It can be done in the following ways:

The input_shape argument is passed to the foremost layer. It comprises of a tuple shape, i.e., a tuple of integers or None, such that None means that any positive integer might anticipate). It excludes the batch dimension.
Some of the 2D layers, such as Dense, supports input shape specification through input_dim argument, whereas some of the 3D temporal layers support input_dim and input_length
The batch_size argument is passed to the layer to define a batch size for the inputs. If batch_size=32 and input_shape=(6,8) is passed to a layer, then, in that case, it is expected for every batch of inputs will have a batch shape (32,6,8).

These are the following snippets that are strictly equivalent:

model=Sequential ()
model.add(Dense(32, input_shape=(784,))) 

model=Sequential ()
model.add(Dense(32, input_dim=784))

Compilation

At first the model is compiled for which the compile process is used for constructing the learning procedure afterward the model undergoes the training in the next step. The compilation include three parameter, which are as follows:

An optimizer: As the name suggest, an optimizer can be a string of an existing optimizer like (rmsprop or adagrad), or simply an instance of class optimizer.
A loss function: A loss function act as an objective that every model tries to minimize for example categorical_crossentropy or mse. It is also known as objective function.
A list of metrics: A list of metrics refers to a string of identifiers of an existing metric or custom metric function. It is suggested to set to metrics=['accuracy'] for any classification problem.

#for a multi-class classification problem
model.compile(optimizer='rmsprop',
                          loss='categorical_crossentropy',
                          metrics=['accuracy'])
#for a binary classification problem 
model.compile(optimizer='rmsprop',
	              loss='binary_crossentropy',
                           metrics=['accuracy']
#for a mean squared error regression problem
model.compile(optimizer='rmsprop',
                          loss='mse')
#for custom metrics 
import keras.backend as K
def mean _pred(y_true, y_pred):
      return K.mean(y_pred)
model.compile(optimizer='rsmprop',
                          loss='binary_crossentropy'
                          metrics=['accuracy', mean_pred])

Training

The Numpy arrays of input data or labels are incorporated for training the Keras model and so it make use of fit function.

#for a single-input model with 2 classes (binary classification)
model = Sequential ()
model.add(Dense(32, activation='relu', input_dim=100 )) 
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
                         loss='binary_crossentropy',
                         metrics=['accuracy'])
#generate dummy data
import numpy as np
data =  np.random.random((1000, 100))
labels =  np.random.randint(2, size=(1000, 1))
#train the model, iterating on the data in batches of 32 samples 
model.fit(data, labels, epochs=10, batch_size=32)

#for a single input model with 10 classes (categorical classification)
model = Sequential ()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(10, activation='softmax'))
model.compile(optimizer='rmsprop',
                         loss='categorical_crossentropy',
                         metrics=['accuracy'])
#generate dummy data
import numpy as np
data = np.random.random((1000,100))
labels=np.random.randint(10, size=(1000,1))
#convert labels to categorical one-hot encoding
one_hot_labels = keras.utils.to_categorical(labels, num_classes=10)
#train the model, iterating on the data in the batches of 32 samples
model.fit(data, one_hot_labels, epochs=10, batch_size=32)

Example: Training a simple deep learning Neural Network on the MNIST dataset

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.optimizers import RMSprop

batch_size = 128
num_classes = 10
epochs = 20

# split the data between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Dense(512, activation='relu', input_shape=(784,)))
model.add(Dropout(0.2))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.2))
model.add(Dense(num_classes, activation='softmax'))

model.summary()

model.compile(loss='categorical_crossentropy',
              optimizer=RMSprop(),
              metrics=['accuracy'])

history = model.fit(x_train, y_train,
                    batch_size=batch_size,
                    epochs=epochs,
                    verbose=1,
                    validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Stacked LSTM for Sequence Classification

To make model capable enough to learn high-level temporal representation, 3 LSTM layers are stacked on above one another.

The layers are stacked in such a way that first two layers yields complete sequences of output and the third one produces final phase in its output sequence, which helps in successful transformation of input sequence to the single vector (i.e. dropdown of temporal dimension).

from keras.models import Sequential 
from keras.layers import LSTM, Dense 
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
#expected input data shape: (batch_size, timesteps, data_dim)
model = Sequential ()
model.add(LSTM(32, return_sequences=True,
                  input_shape(timesteps, data_dim))) #returns a sequence of sequence of vectors of dimension 32
model.add(LSTM(32, return_sequences=True)) #returns a sequence of vectors of dimension 32
model.add(LSTM(32)) #return a single vector of dimension 32
model.add(Dense(10, activation='softmax'))
model.compile(loss='categorical_crossentropy',
                         optimizer='rmsprop',
                         metrics=['accuracy'])
#generate dummy training data
x_train = np.random.random((1000, timesteps, data_dim))
y_train = np.random.random((1000, num_classes))
#generate dummy validation data
x_val = np.random.random((100, timesteps, data_dim))
y_val = np.random.random((100, num_classes))
model.fit(x_train, y_train, 
                batch_size=64, epochs=5,
                validation_data=(x_val, y_val))

Same Stacked LSTM model, rendered "stateful"

A model whose central (internal) states are used again as initial states for another batch's sample, which were acquired after a batch of samples were processed is called as a 'stateful recurrent model'. It not only manages the computational complexity but also permit to process longer sequence.

from keras.models import Sequential
from keras.layers import LSTM, Dense
import numpy as np
data_dim = 16
timesteps = 8
num_classes = 10
batch_size = 32
# Expected input batch shape: (batch_size, timesteps, data_dim)
# Note that we have to provide the full batch_input_shape since the network is stateful.
# the sample of index i in batch k is the follow-up for the sample i in batch k-1.
model = Sequential()
model.add(LSTM(32, return_sequences=True, stateful=True,
               batch_input_shape=(batch_size, timesteps, data_dim)))
model.add(LSTM(32, return_sequences=True, stateful=True))
model.add(LSTM(32, stateful=True))
model.add(Dense(10, activation='softmax'))

model.compile(loss='categorical_crossentropy',
              optimizer='rmsprop',
              metrics=['accuracy'])

# Generate dummy training data
x_train = np.random.random((batch_size * 10, timesteps, data_dim))
y_train = np.random.random((batch_size * 10, num_classes))

# Generate dummy validation data
x_val = np.random.random((batch_size * 3, timesteps, data_dim))
y_val = np.random.random((batch_size * 3, num_classes))

model.fit(x_train, y_train,
          batch_size=batch_size, epochs=5, shuffle=False,
          validation_data=(x_val, y_val))

Keras Functional API

Keras Functional API is used to delineate complex models, for example, multi-output models, directed acyclic models, or graphs with shared layers. In other words, it can be said that the functional API lets you outline those inputs or outputs that are sharing layers.

First Example: A densely-connected network

To implement a densely-connected network, the sequential model results better, but it would not be a bad decision if we try it out with another model.

The implementation of Keras Functional API is similar to that of the Keras Sequential model.

An instance layer is called by a tensor and returns a tensor as an output.
To define a model, both input tensor(s) and output tensor(s) are used.

from keras.layers import Input, Dense
from keras.models import Model
# Returns a Tensor
inputs = Input(shape=(784,))
# An instance layer is callable on a tensor and returns a tensor
output_1 = Dense(64, activation='relu')(inputs)
output_2 = Dense(64, activation='relu')(output_1)
predictions = Dense(10, activation='softmax')(output_2)
# Creates a model that includes the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(data, labels)  # start training

All models are callable, just like layers

Since we are discussing the functional API model, we can reuse the trained models simply by treating any such model as if it is a layer. It is done by calling a model on a tensor.

When we call a model on tenor, it should be noted that we aren't only reusing its architecture but its weights too.

x = Input(shape=(784,))
# It works, and returns the 10-way softmax we defined above.
y = model(x)

The code given above allows an instance to build a model for processing input sequences. Also, with the help of an individual line, we can convert an image classification model into a video classification model.

from keras.layers import TimeDistributed

# Input tensor for sequences of 20 timesteps, such that each contains a 784-dimensional vector
input_sequences = Input(shape=(20, 784))

# It applies our previous model to every timestep in the input sequences. The output of the previous model was a 10-way softmax, so the output of the layer given below will be a sequence of 20 vectors of size 10.
processed_sequences = TimeDistributed(model)(input_sequences)

Multi-input and multi-output models

Since functional API explains well multi-input and multi-output models, it handles a large number of intertwined datastreams by manipulating them. Let us look at an example given below to understand more briefly about its concept. Basically, we are going to forecast how many retweets and likes a news headline on social media like twitter will get.

Both the headline, which is a sequence of words, and an auxiliary input will be given to the model that accepts data, for example, at what time or the date the headline got posted, etc. The two-loss functions are also used to oversee the model, such that if we use the main loss function in the initial steps, it would be the best choice for regularizing the deep learning models.

Here the main_input obtains the headline as a sequence of integers for which each integer will encode each word. The integers are in a range from 1 to 10,000, and the sequences are of 100 words.

from keras.layers import Input, Embedding, LSTM, Dense
from keras.models import Model
import numpy as np
np.random.seed(0)  # Sets a random seed for reproducibility.

# Headline input receive sequences of 100 integers in between 1 and 10000.
# Here we can name any layer by passing it a "name" argument.
main_input = Input(shape=(100,), dtype='int32', name='main_input')

# The embedding layer encodes the input sequence into a sequence of dense 512-dimensional vectors.
x = Embedding(output_dim=512, input_dim=10000, input_length=100)(main_input)

# The LSTM transforms the vector sequence into a single vector that contains the information about an entire sequence.
lstm_out = LSTM(32)(x)

Then the auxiliary loss will be inserted that will permit the LSTM and Embedding layer to train itself smoothly even when the main loss in the model is higher.

Next we will input the aux_input to our model, which is done by concatenating it with LSTM output.

auxiliary_input = Input(shape=(5,), name='aux_input')
x = keras.layers.concatenate([lstm_out, auxiliary_input])

# Stacks a densely-connected deep network on the top.
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)
x = Dense(64, activation='relu')(x)

# Then add the main logistic regression layer.
main_output = Dense(1, activation='sigmoid', name='main_output')(x)

# Defines a model with two inputs and outputs
model = Model(inputs=[main_input, auxiliary_input], outputs=[main_output, auxiliary_output])

Thereafter, we will compile our model by assigning 0.2 weight on the auxiliary loss. And then we will use a list or a directory to identify the loss or loss_weight for all of the distinct outputs. To use same loss on every output, a single loss argument (loss) will be passed.

model.compile(optimizer='rmsprop', loss='binary_crossentropy',
              loss_weights=[1., 0.2])

Next we will train our model by passing a lists of an input array as well as target arrays.

headline_data = np.round(np.abs(np.random.rand(12, 100) * 100))
additional_data = np.random.randn(12, 5)
headline_labels = np.random.randn(12, 1)
additional_labels = np.random.randn(12, 1)
model.fit([headline_data, additional_data], [headline_labels, additional_labels],
          epochs=50, batch_size=32) 

As we have named inputs and outputs, the model will be compiled as follows;

model.compile(optimizer='rmsprop',
              loss={'main_output': 'binary_crossentropy', 'aux_output': 'binary_crossentropy'},
              loss_weights={'main_output': 1., 'aux_output': 0.2})

# And train it through:
model.fit({'main_input': headline_data, 'aux_input': additional_data},
          {'main_output': headline_labels, 'aux_output': additional_labels},
          epochs=50, batch_size=32)

The model can be inferenced by;

or,

Shared layers

Another example to be taken into consideration to understand the functional API model would be the shared layers. For this purpose, we will be examining the tweet's dataset. Since we are willing to compose such a model that can determine if two tweets belong to the same person or not, this will make it easy for an instance to compare users based on the similarities of tweets.

We will construct a model that will go through the encoding of two tweets into vectors, followed by concatenating them, and then we will include logistic regression. The model will output a probability for two tweets belonging to the same person. Next, we will train our model on pairs of both positive as well as negative tweets.

Since here our chosen problem is symmetric, our mechanism must reuse the first encoded tweet so as to encode the other tweet for which we will be using a shared LSTM layer.

To build this model with a functional API, we will input a binary matrix of shape (280,256) for a tweet. Here 280 is a vector sequence of size 256, such that each 256-dimensional vector will encode the presence or absence of a character.

import keras
from keras.layers import Input, LSTM, Dense
from keras.models import Model

tweet_a = Input(shape=(280, 256))
tweet_b = Input(shape=(280, 256))

Next we will input a layer and then will call it on various inputs as per the requirement, so that we can share a layer on several inputs.

# The layer takes an input as a matrix and will return a vector of size 64
shared_lstm = LSTM(64)

# When we reuse the same layer instance multiple times, the weights of the layer will also be reused (it is effectively *the same* layer)
encoded_a = shared_lstm(tweet_a)
encoded_b = shared_lstm(tweet_b)

# Next we will concatenate the two vectors:
merged_vector = keras.layers.concatenate([encoded_a, encoded_b], axis=-1)

# And then we will add logistic regression on top
predictions = Dense(1, activation='sigmoid')(merged_vector)

# After that we will define a trainable model by linking the tweet inputs to the predictions
model = Model(inputs=[tweet_a, tweet_b], outputs=predictions)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])
model.fit([data_a, data_b], labels, epochs=10)

Now to understand how to read the shared layer's output or output shape, we will briefly look at the concept of layer "node".

At the time of calling a layer on any input, we are actually generating new tensor by appending a node to the layer and linking the input tensors to the output tensor. If the same layer is called several times, then that layer will own so many nodes, which will be indexed as 0,1,2,..

To get the tensor output of a layer instance, we used layer.get_output() and for its output shape, layer.output_shape in the older versions of Keras. But now get_output() has been replaced by output.

The layer will return one output of the layer as long as one layer is connected to a single input.

 a = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)

assert lstm.output == encoded_a

In case if the layer comprises of multiple inputs;

 a = Input(shape=(280, 256))
b = Input(shape=(280, 256))

lstm = LSTM(32)
encoded_a = lstm(a)
encoded_b = lstm(b)

lstm.output

Output:

>> AttributeError: Layer lstm_1 has multiple inbound nodes,
hence the notion of "layer output" is ill-defined.
Use `get_output_at(node_index)` instead.

Now the following will execute it;

assert lstm.get_output_at(0) == encoded_a
assert lstm.get_output_at(1) == encoded_b

So, the same carries for characters such as input_shape and output_shape. If a layer comprises of individual layer or all the nodes are having similar input/output, only then we can say that the conception of "layer input/output shape" is completely defined and the shape will be return by layer.output_shape/ layer.input_shape.

In case if we apply conv2D layer to an input of shapes (32, 32, 3) and then to (64, 64, 3), then the layer will encompass several shapes of input/output. And to fetch them we will require to specify the index of nodes to which they belong to.

a = Input(shape=(32, 32, 3))
b = Input(shape=(64, 64, 3))

conv = Conv2D(16, (3, 3), padding='same')
conved_a = conv(a)

# Only one input so far, the following will work:
assert conv.input_shape == (None, 32, 32, 3)
	
conved_b = conv(b)
# now the "input_shape" property wouldn't work, but this does:
assert conv.get_input_shape_at(0) == (None, 32, 32, 3)
assert conv.get_input_shape_at(1) == (None, 64, 64, 3)

Next TopicKeras layers

← prev next →