Recurrent Layers
RNN
The RNN layer act as a base class for the recurrent layers.
Arguments
 cell: It can be defined as an instance of RNN cell, which is a class that constitutes:
 A call(input_at_t, states_at_t) method that returns (output_at_t, states_at_t_plus_1). It may optionally take a constant argument, which is explained below more briefly in the section "Note on passing external constants".
 A state_size attribute can be simply defined as a single integer (state integer) or a list/tuple of integers (one size per state). In case of a single integer, it acts as a size of the recurrent state that is mandatory to be similar to the size of the output cell.
 An output_size attribute, which can be referred to as a single integer or a TensorShape that epitomizes the shape of output. In case of a backwardcompatible reason when the attribute is unavailable for the cell, there may be a chance that the value may get inferred by its initial element on the state_size.
Also, there may be a possibility where the cell is a list of RNN cell instances; then, in that case, the cell gets stacked one after another in the RNN, leading to an efficient implementation of the stacked RNN.
 return_sequences: It is a Boolean that depicts the last output to be returned either in the output sequence or the full sequence.
 return_states: It is also Boolean that depicts for the last state if it should be returned in addition to the output.
 go_backwards: It is Boolean, which is by default False. In case if it is set to True, then it backwardly processes the sequence of input and reverts back with the reversed sequence.
 stateful: It is Boolean, which is by default False. If stateful is set to True, then for each sample in the batch at the i^{th }index, the last state will be utilized as the initial state for the sample of the i^{th} index in the following batch.
 unroll: It is Boolean (False by default). If in case it is true, then either it will unroll the network, or it will utilize a symbolic loop. The RNN can speed up on unrolling even if it is memoryintensive as it is much more suitable for shorter sequences.
 input_dim: It is an integer that depicts the dimensionality of the input. The input_shape argument will be utilized when this layer will be used as an initial layer in the model.
 input_length: It describes the length of the input sequences, which is specified when it is constant. It is used when we first want to connect it to the Flatten and then to the Dense layers upstream as it helps to compute the output shape of the dense layer. If the recurrent layer is not the initial layer in the model, then you will have to specify the length of the input at the level of the first layer via input_shape
Input shape
It is a 3D tensor of shape (batch_size, timesteps, input_dim).
Output shape
 If the return_state: a list of tensors, then the first tensor will be the output and the remaining will be the last states, each of shape (batch_size, units) like for example; For RNN and GRU, the number of state of tensors is 1 and for LSTM is 2.
 If the return_sequence: 3D, then the shape of a tensor will be (batch_size, timesteps, units), else if it is a 2D, then the shape will be (batch_size, units).
Masking
Masking is supported by this layer to input the data with several numbers of timesteps. The Embedding layer is utilized with the mask_zero parameter, which is set to True, for introducing masks to the data.
Note on using statefulness in RNNs
If you set the RNN layer as 'stateful', then it means states that are computed in a single batch for the samples are used again as initial states for the next batch samples. It means that onetoone mapping is done in between the samples in distinct consecutive batches.
For enabling statefulness, you need to specify stateful=True inside the constructor layer followed by specifying a fixed batch size for the model, which is done by passing if sequential model: batch_input_shape=(…) to the initial (first) layer in the model, else for any functional model consisting 1 or more Input layers: batch_shape=(…) to all the first layers in the model. The expected shape of inputs includes the batch size to be a tuple of integers, for example (32, 10, 100) and specify shuffle=False while calling fit().
Also, you need to call .reset_states() either on a specified layer or on the entire model, if you are willing to reset the states of your model.
Note on specifying the initial states of RNNs
The initial state of RNN layers can be symbolically specified by calling them with initial_state keyword argument, such that its value must be a tensor or list of tensors depicting the initial states of the RNN layer.
The initial state of RNN layers can be numerically specified by calling reset_states with states keyword argument, such that its value must either be a numpy array or a list of arrays depicting the initial states of the RNN layers.
Note on passing external constants to RNNs
The external constants can be pass on the cell by utilizing the constants keyword argument of RNN.__call__ and RNN.call method for which it necessitates the cell.call method to accept the same keyword arguments constants. These constants are utilized for conditioning the cell transformation on additional static inputs (that does not change over time).
Example
SimpleRNN
It is a fully connected layer whose output is sent back to the input.
Arguments
 units: It can be defined as a positive integer that represents the output space dimensionality.
 activation: It is an activation function to be used, which is a hyperbolic tangent (tanh) by default. If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 use_bias: It can be defined as a Boolean that depicts for the layer whether to use a bias vector or not.
 kernel_initializer: It refers to an initializer for the kernel weights matrix that is utilized to linearly transform the inputs.
 recurrent_initializer: It is an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It indicates to an initializer used for bias vector.
 kernel_regularizer: It refers to a regularizer function, which is implemented on the kernel weights matrix.
 recurrent_regularizer: It refers to a regularizer function that is applied to the weight matrix of recurrent_kernel.
 bias_regularizer: It is understood as a regularizer function, which is applied to a bias vector.
 activity_regularizer: It is the regularizer function that is applied to the activation (output of the layer).
 kernel_constraint: It can be defined as a constraint function applied to the kernel
 bias_constraint: It can be defined as a constraint function that is executed on the bias vector.
 recurrent_constraint: It can be defined as a constraint function that is applied to the recurrent_kernel weights matrix.
 dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the input.
 recurrent_dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the recurrent state.
 return_sequences: It refers to a Boolean that depicts for the last output to be returned either in the output sequence or the full sequence.
 return_states: It refers to a Boolean that depicts for the last state if it should be returned in addition to the output.
 go_backwards: It refers to a Boolean, which is set to False by default. In case, if it is true, then it backwardly processes the input sequence and reverts back the reversed sequence.
 stateful: It refers to a Boolean, which is by default False. If it is true, then for each sample in the batch at the i^{th }index, the last state will be utilized as the initial state for the sample of the i^{th} index in the following batch.
 unroll: It is Boolean (False by default). If in case it is true, then either it will unroll the network, or it will utilize a symbolic loop. The RNN can speed up on unrolling even if it is memoryintensive as it is much more suitable for shorter sequences.
GRU
It is called as Gated Recurrent Unit and comes with two of its variants, such that the default one, based on 1406.1078v3, consists of a reset gate that is applied before matrix multiplication to the hidden states and the other one has the order reversed, based on original 1406.1078v1.
The other version (second) is wellmatched with CuDNNGRU (GPUonly) that permits CPU inference. Hence, it can be said that it encompasses distinct biases for kernel and recurrent_kernel, so it is better to use 'reset_after'=True and recurrent_activation= 'sigmoid'.
Arguments
 units: It can be defined as a positive integer representing the output space dimensionality.
 activation: It refers to an activation function to be used, which is a hyperbolic tangent (tanh) by default. If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 recurrent_activation: It is an activation function that is utilized for the recurrent step and is by default, hard sigmoid (hard_sigmoid). If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 use_bias: It can be defined as a Boolean that depicts for the layer whether to use a bias vector or not.
 kernel_initializer: It indicates an initializer for the kernel weights matrix that is utilized to linearly transform the inputs.
 recurrent_initializer: It refers to an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It can be defined as an initializer for a bias vector.
 kernel_regularizer: It refers to a regularizer function, which is implemented on the kernel weights matrix.
 recurrent_regularizer: It refers to a regularizer function that is applied to the recurrent_kernel weight matrix.
 bias_regularizer: It refers to the regularizer function, which is executed on the bias vector.
 activity_regularizer: It indicates the regularizer function that is applied to the activation (output of the layer).
 kernel_constraint: It indicates the constraint function, which is being applied to the kernel
 bias_constraint: It refers to a constraint function applied to the bias vector.
 recurrent_constraint: It refers to the constraint function that is being implemented on the recurrent_kernel weights matrix.
 dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the input.
 recurrent_dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the recurrent state.
 implementation: It is an implementation mode, which is either 1 or 2. In mode 1, operations will be structure as a large number of smaller dot products and additions, whereas in mode 2, it will batch them as a few large operations. These modes will showcase different performance profiles over distinct hardware and applications.
 return_sequences: It refers to a Boolean that depicts for the last output to be returned either in the output sequence or the full sequence.
 return_states: It also refers to a Boolean that depicts for the last state if it should be returned in addition to the output.
 go_backwards: It refers to Boolean, which is by default False. In case, if it is true, then it backwardly processes the input sequence and reverts back the reversed sequence.
 stateful: It can be defined as a Boolean, which is by default False. If it is True, then for each sample in the batch at the i^{th }index, the last state will be utilized as the initial state for the sample of the i^{th} index in the following batch.
 unroll: It can be defined as a Boolean (False by default). If in case it is true, then either it will unroll the network, or it will utilize a symbolic loop. The RNN can speed up on unrolling even if it is memoryintensive as it is much more suitable for shorter sequences.
 reset_after: It is a GRU convention that depicts if the reset gate will be applied before or after the matrix multiplication. False = "before" (default), True = "after" (CuDNN compatible).
LSTM
It is called Long ShortTerm Memory, introduced by Hochreiter in 1997.
Arguments
 units: It refers to a positive integer that represents the output space dimensionality.
 activation: It can be defined as an activation function to be used, which is a hyperbolic tangent (tanh) by default. If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 recurrent_activation: It is an activation function that is utilized for the recurrent step and is by default, hard sigmoid (hard_sigmoid). If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 use_bias: It refers to Boolean that depicts for the layer whether to use a bias vector or not.
 kernel_initializer: It refers to an initializer for the kernel weights matrix that is utilized to linearly transform the inputs.
 recurrent_initializer: It refers to an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It indicates an initializer for bias vector.
 unit_forget_bias: It indicates a Boolean, and if set to True, 1 will be added to the bias of the forget gate at the initialization. Also, it will enforce the bias_initializer="zeros".
 kernel_regularizer: It refers to a regularizer function, which is being applied to the kernel weights matrix.
 recurrent_regularizer: It refers to a regularizer function that is applied to the recurrent_kernel weight matrix.
 bias_regularizer: It refers to the regularizer function, which is being implemented on the bias vector.
 activity_regularizer: It refers to the regularizer function that is applied to the activation (output of the layer).
 kernel_constraint: It refers to a constraint function executed on the kernel
 bias_constraint: It refers to a constraint function, which is being applied to the bias vector.
 recurrent_constraint: It is that constraint function that is applied to the recurrent_kernel weights matrix.
 dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the input.
 recurrent_dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the recurrent state.
 implementation: It is an implementation mode, which is either 1 or 2. In mode 1, operations will be structure as a large number of smaller dot products and additions, whereas in mode 2, it will batch them as a few large operations. These modes will showcase different performance profiles over distinct hardware and applications.
 return_sequences: It refers to a Boolean that depicts for the last output to be returned either in the output sequence or the full sequence.
 return_states: It refers to also Boolean that depicts for the last state if it should be returned in addition to the output.
 go_backwards: It can be defined as a Boolean, which is by default False. In case, if it is true, then it backwardly processes the input sequence and reverts back the reversed sequence.
 stateful: It can be understood as Boolean, which is by default False. If it is True, then for each sample in the batch at the i^{th }index, the last state will be utilized as the initial state for the sample of the i^{th} index in the following batch.
 unroll: It indicates to a Boolean (False by default). If in case it is true, then either it will unroll the network, or it will utilize a symbolic loop. The RNN can speed up on unrolling even if it is memoryintensive as it is much more suitable for shorter sequences.
ConvLSTM2D
It is a Convolutional LSTM layer, which is the same as that of the LSTM layer, just the fact both the input and recurrent transformations are convolutional.
Arguments
 filter: It refers to an integer that signifies the output space dimensionality or a total number of output filters present in a convolution.
 kernel_size: It can either be an integer or tuple/list of n integers that represents the dimensionality of the convolution window.
 strides: It is either an integer or a tuple/list of n integers that represents the convolution strides. If we specify any stride value!=1, it relates to its incompatibility with specifying the dilation_rate value!=1.
 padding: One of "valid" or "same" (casesensitive).
 data_format: It is a string of "channels_last" or "channels_first", which is the order of input dimensions. Here the "channels_last" relates to the input shape (batch, time, ..., channels), and the "channels_first" relates to the input shape (batch, time, channels, ...). It defaults to the image_data_format value that is found in Keras config at ~/.keras/keras.json. If you cannot find it in that folder, then it is residing at "channels_last".
 dilation_rate: It can be an integer or tuple/ list of n integers that relates to the dilation rate to be used for dilated convolution. If we specify any stride value!=1, it relates to its incompatibility with specifying the dilation_rate value!=1.
 activation: It is an activation function to be used. When nothing is specified, then by defaults, it is a linear activation a(x)= x, or we can say no activation function is applied.
 recurrent_activation: It is an activation function that is utilized for the recurrent step.
 use_bias: It refers to a Boolean that defines for a layer, whether to use a bias vector or not.
 kernel_initializer: It can be defined as an initializer for the kernel weights matrix.
 recurrent_initializer: It can be understood as an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It refers to an initializer used for bias vector.
 unit_forget_bias: It can be defined as a Boolean, and if set to True, 1 will be added to the bias of the forget gate at the initialization. Also, it will enforce the bias_initializer="zeros".
 kernel_regularizer: It refers to a regularizer function, which is being implemented on the kernel weights matrix.
 recurrent_regularizer: It refers to a regularizer function that is applied to the recurrent_kernel weight matrix.
 bias_regularizer: It refers to the regularizer function, which is being applied to the bias vector.
 activity_regularizer: It indicates the regularizer function that is applied to the activation (i.e., the output of the layer).
 kernel_constraint: It refers to a constraint function applied to the kernel matrix.
 recurrent_constraint: It is defined as a constraint function that is applied to the recurrent_kernel weights matrix.
 bias_constraint: It indicates a constraint function applied to the bias vector.
 return_sequences: It refers to a Boolean that depicts for the last output to be returned either in the output sequence or the full sequence.
 go_backwards: It refers to a Boolean, which is set to False by default. In case, if it is true, then it backwardly processes the input sequence and reverts back the reversed sequence.
 stateful: It indicates to a Boolean, which is by default False. If it is True, then for each sample in the batch at the i^{th }index, the last state will be utilized as the initial state for the sample of the i^{th} index in the following batch.
 dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the input.
 recurrent_dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the recurrent state.
Input shape
If the data_format is "channels_first", then the input shape of 5D tensor is (samples, time, channels, rows, cols), else if data_format is "channels_last" the input shape of 5D tensor is (samples, time, rows, cols, channels).
Output shape
 if return_sequences
 if the data_format is "channels_first", the output shape of a 5D tensor will be (samples, time, filters, output_row, output_col).
 if the data_format is "channels_last" the output shape of a 5D tensor will be (samples, time, output_row, output_col, filters).
 else
 if the data_format is "channels_first", the output shape of a 4D tensor will be (samples, filters, output_row, output_col).
 if the data_format is "channels_last" the output shape of a 4D tensor will be (samples, output_row, output_col, filters), where o_row and o_col depend on the filter and padding shape.
Raises
 ValueError: It is raised in case of an invalid constructor argument.
ConvLSTM2DCell
It is a cell class for the ConvLSTM2D layer.
Arguments
 filter: It refers to an integer that signifies the output space dimensionality or a total number of output filters present in a convolution.
 kernel_size: It can either be an integer or tuple/list of n integers that represents the dimensionality of the convolution window.
 strides: It can either be an integer or a tuple/list of n integers that represents the convolution strides. If we specify any stride value!=1, it relates to its incompatibility with specifying the dilation_rate value!=1.
 padding: One of "valid" or "same" (casesensitive).
 data_format: It can be defined as a string of either "channels_last" or "channels_first", which is the order of input dimensions. It defaults to the image_data_format value that is found in Keras config at ~/.keras/keras.json. If you cannot find it in that folder, then it is residing at "channels_last".
 dilation_rate: It can be an integer or tuple/ list of n integers that relates to the dilation rate to be used for dilated convolution. If we specify any stride value!=1, it relates to its incompatibility with specifying the dilation_rate value!=1.
 activation: It refers to an activation function to be used. When nothing is specified, then by defaults, it is a linear activation a(x)= x, or we can say no activation function is applied.
 recurrent_activation: It is an activation function that is utilized for the recurrent step.
 use_bias: It can be defined as a Boolean that depicts for a layer whether to utilize the bias vector or not.
 kernel_initializer: It refers to an initializer for the kernel weights matrix.
 recurrent_initializer: It refers to an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It can be defined as an initializer for a bias vector.
 unit_forget_bias: It can be defined as Boolean, and if set to True, 1 will be added to the bias of the forget gate at the initialization. Also, it will enforce the bias_initializer="zeros".
 kernel_regularizer: It can be defined as a regularizer function, which is applied to the kernel weights matrix.
 recurrent_regularizer: It can be defined as a regularizer function that is applied to the recurrent_kernel weight matrix.
 bias_regularizer: It refers to a regularizer function, which is applied to a bias vector.
 kernel_constraint: It refers to a constraint function applied to the kernel matrix.
 recurrent_constraint: It refers to that constraint function, which is applied to the recurrent_kernel weights matrix.
 bias_constraint: It can be defined as a constraint function, which is being applied to the bias vector.
 dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the input.
 recurrent_dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the recurrent state.
SimpleRNNCell
It is a cell class for SimpleRNN.
Arguments
 units: A positive integer that represents the output space's dimensionality.
 activation: It is an activation function to be used. When nothing is specified, then by defaults, it is a linear activation a(x) = x, or we can say no activation function is applied.
 use_bias: It can be defined as Boolean that depicts for a layer whether to utilize the bias vector or not.
 kernel_initializer: It refers to an initializer for the kernel weights matrix.
 recurrent_initializer: It refers to an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It indicates to an initializer for bias vector.
 kernel_regularizer: It can be defined as a regularizer function, which is being implemented on the kernel weights matrix.
 recurrent_regularizer: It can be defined as a regularizer function that is applied to the recurrent_kernel weight matrix.
 bias_regularizer: It refers to the regularizer function, which is applied to the bias vector.
 kernel_constraint: It refers to a constraint function applied to the kernel matrix.
 recurrent_constraint: It can be defined as a constraint function that is being executed on the recurrent_kernel weights matrix.
 bias_constraint: It is a constraint function applied to the bias vector.
 dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the input.
 recurrent_dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the recurrent state.
GRUCell
It is a cell class for the GRU layer.
Arguments
 units: It refers to a positive integer that represents the output space dimensionality.
 activation: It can be understood as an activation function to be used, which is a hyperbolic tangent (tanh) by default. If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 recurrent_activation: It can be defined as an activation function that is utilized for the recurrent step and is by default, hard sigmoid (hard_sigmoid). If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 use_bias: It refers to a Boolean that depicts for a layer whether to utilize bias vector or not.
 kernel_initializer: It refers to an initializer for the kernel weights matrix that is utilized to linearly transform the inputs.
 recurrent_initializer: It can be defined as an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It refers to an initializer for bias vector.
 kernel_regularizer: It can be defined as a regularizer function, which is applied to the kernel weights matrix.
 recurrent_regularizer: It refers to a regularizer function that is applied to the recurrent_kernel weight matrix.
 bias_regularizer: It is the regularizer function, which is applied to the bias vector.
 kernel_constraint: It refers to a constraint function applied to the kernel
 bias_constraint: It can be understood as a constraint function, which is being applied to the bias vector.
 recurrent_constraint: It indicates to that constraint function which is applied to the recurrent_kernel weights matrix.
 dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the input.
 recurrent_dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the recurrent state.
 implementation: It is an implementation mode, which is either 1 or 2. In mode 1, operations will be structure as a large number of smaller dot products and additions, whereas in mode 2, it will batch them as a few large operations. These modes will showcase different performance profiles over distinct hardware and applications.
 reset_after: It is a GRU convention that depicts if the reset gate will be applied before or after the matrix multiplication. False = "before" (default), True = "after" (CuDNN compatible).
LSTMCell
It is referred to as cell class for the LSTM layer.
Arguments
 units: It refers to a positive integer that represents the output space dimensionality.
 activation: It is an activation function to be used, which is a hyperbolic tangent (tanh) by default. If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 recurrent_activation: It is an activation function that is utilized for the recurrent step and is by default, hard sigmoid (hard_sigmoid). If None is passed then it means nothing has been applied (i.e. "linear" activation a(x) = x).
 use_bias: It refers to a Boolean that depicts for a layer whether to make use of a bias vector or not.
 kernel_initializer: It refers to an initializer for the kernel weights matrix that is utilized to linearly transform the inputs.
 recurrent_initializer: It indicates to an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It refers to an initializer for bias vector.
 unit_forget_bias: It is Boolean, and if set to True, 1 will be added to the bias of the forget gate at the initialization. Also, it will enforce the bias_initializer="zeros".
 kernel_regularizer: It refers to a regularizer function, which is being applied to the kernel weights matrix.
 recurrent_regularizer: It refers to a regularizer function that is being implemented on the recurrent_kernel weight matrix.
 bias_regularizer: It refers to the regularizer function, which is applied to the bias vector.
 kernel_constraint: It refers to a constraint function applied to the kernel
 bias_constraint: It refers to a constraint function applied to the bias vector.
 recurrent_constraint: It is that constraint function that is applied to the recurrent_kernel weights matrix.
 dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the input.
 recurrent_dropout: It is a float in between 0, and 1that depicts the total number of the fraction of units to be dropped to linearly transform the recurrent state.
 implementation: It is an implementation mode, which is either 1 or 2. In mode 1, operations will be structure as a large number of smaller dot products and additions, whereas in mode 2, it will batch them as a few large operations. These modes will showcase different performance profiles over distinct hardware and applications.
CuDNNGRU
It is one of the fastest implementations of GRU that is backed by CuDNN, has been found to run only on GPU with a TensorFlow backend.
Arguments
 units: It refers to a positive integer that represents the output space dimensionality.
 kernel_initializer: It can be defined as an initializer for the kernel weights matrix that is utilized to linearly transform the inputs.
 recurrent_initializer: It can be defined as an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It can be defined as an initializer for a bias vector.
 kernel_regularizer: It refers to a regularizer function, which is applied to the kernel weights matrix.
 recurrent_regularizer: It refers to a regularizer function that is applied to the recurrent_kernel weight matrix.
 bias_regularizer: It refers to the regularizer function, which is applied to the bias vector.
 activity_regularizer: It refers to the regularizer function that is applied to the activation (output of the layer).
 kernel_constraint: It is a constraint function applied to the kernel
 bias_constraint: It is a constraint function applied to the bias vector.
 recurrent_constraint: It is that constraint function that is applied to the recurrent_kernel weights matrix.
 return_sequences: It is a Boolean that depicts the last output to be returned either in the output sequence or the full sequence.
 return_states: It is also Boolean that depicts for the last state if it should be returned in addition to the output.
 stateful: It is Boolean, which is by default False. If it is True, then for each sample in the batch at the i^{th }index, the last state will be utilized as the initial state for the sample of the i^{th} index in the following batch.
CuDNNLSTM
It is the fastest implementation of LSTM that is backed by CuDNN, has been found to run only on GPU with a TensorFlow backend.
Arguments
 units: It is a positive integer that represents the output space dimensionality.
 kernel_initializer: It is an initializer for the kernel weights matrix that is utilized to linearly transform the inputs.
 recurrent_initializer: It is an initializer for the recurrent_kernel weights matrix that is supposed to be used while linearly transforming the recurrent states.
 bias_initializer: It is an initializer for bias vector.
 unit_forget_bias: It is Boolean, and if set to True, 1 will be added to the bias of the forget gate at the initialization. Also, it will enforce the bias_initializer="zeros".
 kernel_regularizer: It is a regularizer function, which is applied to the kernel weights matrix.
 recurrent_regularizer: It is a regularizer function that is applied to the recurrent_kernel weight matrix.
 bias_regularizer: It is the regularizer function, which is applied to the bias vector.
 activity_regularizer: It is the regularizer function that is applied to the activation (output of the layer).
 kernel_constraint: It is a constraint function applied to the kernel
 bias_constraint: It is a constraint function applied to the bias vector.
 recurrent_constraint: It is that constraint function that is applied to the recurrent_kernel weights matrix.
 return_sequences: It is a Boolean that depicts the last output to be returned either in the output sequence or the full sequence.
 return_states: It is also Boolean that depicts for the last state if it should be returned in addition to the output.
stateful: It is Boolean, which is by default False. If it is True, then for each sample in the batch at the i^{th }index, the last state will be utilized as the initial state for the sample of the i^{th} index in the following batch.
