## Artificial Neural NetworksAt earlier times, the conventional computers incorporated algorithmic approach that is the computer used to follow a set of instructions to solve a problem unless those specific steps need that the computer need to follow are known the computer cannot solve a problem. So, obviously, a person is needed in order to solve the problems or someone who can provide instructions to the computer so as to how to solve that particular problem. It actually restricted the problem-solving capacity of conventional computers to problems that we already understand and know how to solve. But what about those problems whose answers are not clear, so that is where our traditional approach face failure and so Neural Networks came into existence. Neural Networks processes information in a similar way the human brain does, and these networks actually learn from examples, you cannot program them to perform a specific task. They will learn only from past experiences as well as examples, which is why you don't need to provide all the information regarding any specific task. So, that was the main reason why neural networks came into existence. Artificial Neural Network is biologically inspired by the neural network, which constitutes after the human brain. Neural networks are modeled in accordance with the human brain so as to imitate their functionality. The human brain can be defined as a neural network that is made up of several neurons, so is the Artificial Neural Network is made of numerous perceptron. A neural network comprises of three main layers, which are as follows; **Input layer:**The input layer accepts all the inputs that are provided by the programmer.**Hidden layer:**In between the input and output layer, there is a set of hidden layers on which computations are performed that further results in the output.**Output layer:**After the input layer undergoes a series of transformations while passing through the hidden layer, it results in output that is delivered by the output layer.
## Motivation behind Neural NetworkBasically, the neural network is based on the neurons, which are nothing but the brain cells. A biological neuron receives input from other sources, combines them in some way, followed by performing a nonlinear operation on the result, and the output is the final result. The ## What are Artificial Neural Networks?Artificial Neural Networks are the computing system that is designed to simulate the way the human brain analyzes and processes the information. Artificial Neural Networks have self-learning capabilities that enable it to produce a better result as more data become available. So, if the network is trained on more data, it will be more accurate because these neural networks learn from the examples. The neural network can be configured for specific applications like data classification, pattern recognition, etc. With the help of the neural network, we can actually see that a lot of technology has been evolved from translating webpages to other languages to having a virtual assistant to order groceries online. All of these things are possible because of neural networks. So, an artificial neural network is nothing but a network of various artificial neurons. ## Importance of Neural Network:**Without Neural Network:**Let's have a look at the example given below. Here we have a machine, such that we have trained it with four types of cats, as you can see in the image below. And once we are done with the training, we will provide a random image to that particular machine that has a cat. Since this cat is not similar to the cats through which we have trained our system, so without the neural network, our machine would not identify the cat in the picture. Basically, the machine will get confused in figuring out where the cat is.**With Neural Network:**However, when we talk about the case with a neural network, even if we have not trained our machine with that particular cat. But still, it can identify certain features of a cat that we have trained on, and it can match those features with the cat that is there in that particular image and can also identify the cat. So, with the help of this example, you can clearly see the importance of the concept of a neural network.
## Working of Artificial Neural NetworksInstead of directly getting into the working of Artificial Neural Networks, lets breakdown and try to understand Neural Network's basic unit, which is called a So, a perceptron can be defined as a neural network with a single layer that classifies the linear data. It further constitutes four major components, which are as follows; - Inputs
- Weights and Bias
- Summation Functions
- Activation or transformation function
The main logic behind the concept of Perceptron is as follows: The inputs (x) are fed into the input layer, which undergoes multiplication with the allotted weights (w) followed by experiencing addition in order to form weighted sums. Then these inputs weighted sums with their corresponding weights are executed on the pertinent activation function. ## Weights and BiasAs and when the input variable is fed into the network, a random value is given as a weight of that particular input, such that each individual weight represents the importance of that input in order to make correct predictions of the result. However, bias helps in the adjustment of the curve of activation function so as to accomplish a precise output. ## Summation FunctionAfter the weights are assigned to the input, it then computes the product of each input and weights. Then the weighted sum is calculated by the summation function in which all of the products are added. ## Activation FunctionThe main objective of the activation function is to perform a mapping of a weighted sum upon the output. The transformation function comprises of activation functions such as tanh, ReLU, sigmoid, etc. The activation function is categorized into two main parts: - Linear Activation Function
- Non-Linear Activation Function
## Linear Activation FunctionIn the linear activation function, the output of functions is not restricted in between any range. Its range is specified from -infinity to infinity. For each individual neuron, the inputs get multiplied with the weight of each respective neuron, which in turn leads to the creation of output signal proportional to the input. If all the input layers are linear in nature, then the final activation of the last layer will actually be the linear function of the initial layer's input. ## Non- linear functionThese are one of the most widely used activation function. It helps the model in generalizing and adapting any sort of data in order to perform correct differentiation among the output. It solves the following problems faced by linear activation functions: - Since the non-linear function comes up with derivative functions, so the problems related to backpropagation has been successfully solved.
- For the creation of deep neural networks, it permits the stacking up of several layers of the neurons.
The non-linear activation function is further divided into the following parts: **Sigmoid or Logistic Activation Function** It provides a smooth gradient by preventing sudden jumps in the output values. It has an output value range between 0 and 1 that helps in the normalization of each neuron's output. For X, if it has a value above 2 or below -2, then the values of y will be much steeper. In simple language, it means that even a small change in the X can bring a lot of change in Y. It's value ranges between 0 and 1 due to which it is highly preferred by binary classification whose result is either 0 or 1.**Tanh or Hyperbolic Tangent Activation Function** The tanh activation function works much better than that of the sigmoid function, or simply we can say it is an advanced version of the sigmoid activation function. Since it has a value range between -1 to 1, so it is utilized by the hidden layers in the neural network, and because of this reason, it has made the process of learning much easier.**ReLU(Rectified Linear Unit) Activation Function** ReLU is one of the most widely used activation function by the hidden layer in the neural network. Its value ranges from 0 to infinity. It clearly helps in solving out the problem of backpropagation. It tends out to be more expensive than the sigmoid, as well as the tanh activation function. It allows only a few neurons to get activated at a particular instance that leads to effectual as well as easier computations.**Softmax Function** It is one of a kind of sigmoid function whereby solving the problems of classifications. It is mainly used to handle multiple classes for which it squeezes the output of each class between 0 and 1, followed by dividing it by the sum of outputs. This kind of function is specially used by the classifier in the output layer.
## Gradient Descent AlgorithmGradient descent is an optimization algorithm that is utilized to minimize the cost function used in various machine learning algorithms so as to update the parameters of the learning model. In linear regression, these parameters are coefficients, whereas, in the neural network, they are weights.
It all starts with the coefficient's initial value or function's coefficient that may be either 0.0 or any small arbitrary value. coefficient = 0.0 For estimating the cost of the coefficients, they are plugged into the function that helps in evaluating. cost = f(coefficient) or, cost = evaluate(f(coefficient)) Next, the derivate will be calculated, which is one of the concepts of calculus that relates to the function's slope at any given instance. In order to know the direction in which the values of the coefficient will move, we need to calculate the slope so as to accomplish a low cost in the next iteration. delta = derivative(cost) Now that we have found the downhill direction, it will further help in updating the values of coefficients. Next, we will need to specify alpha, which is a learning rate parameter, as it handles the amount of amendments made by coefficients on each update. coefficient = coefficient - (alpha * delta) Until the cost of the coefficient reaches It can be concluded that gradient descent is a very simple as well as straightforward concept. It just requires you to know about the gradient of the cost function or simply the function that you are willing to optimize. ## Batch Gradient DescentFor every repetition of gradient descent, the main aim of batch gradient descent is to processes all of the training examples. In case we have a large number of training examples, then batch gradient descent tends out to be one of the most expensive and less preferable too.
Let Now assume that ∑ computes the sum of all training examples from i=1 to m. Then the cost of function will be computed by:J Repeat { Ɵj = Ɵj - (learning rate/m) * ∑ (h For every j = 0...n } Here j feature of the ^{th}i training example. In case if ^{th}m is very large, then derivative will fail to converge at a global minimum.## Stochastic Gradient DescentAt a single repetition, the stochastic gradient descent processes only one training example, which means it necessitates for all the parameters to update after the one single training example is processed per single iteration. It tends to be much faster than that of the batch gradient descent, but when we have a huge number of training examples, then also it processes a single example due to which system may undergo a large no of repetitions. To evenly train the parameters provided by each type of data, properly shuffle the dataset.
Suppose that (x Cost (Ɵ, (x J Repeat { For i=1 to m{ Ɵj = Ɵj - (learning rate) * ∑ (h For every j=0...n } } ## Convergence trends in different variants of Gradient DescentThe Batch Gradient Descent algorithm follows a straight-line path towards the minimum. The algorithm converges towards the However, in the case of Stochastic Gradient Descent, the algorithm fluctuates all over the global minimum rather than converging. The learning rate is changed slowly so that it can converge. Since it processes only one example in one iteration, it tends out to be noisy. ## BackpropagationThe backpropagation consists of an input layer of neurons, an output layer, and at least one hidden layer. The neurons perform a weighted sum upon the input layer, which is then used by the activation function as an input, especially by the sigmoid activation function. It also makes use of supervised learning to teach the network. It constantly updates the weights of the network until the desired output is met by the network. It includes the following factors that are responsible for the training and performance of the network: - Random (initial) values of weights.
- A number of training cycles.
- A number of hidden neurons.
- The training set.
- Teaching parameter values such as learning rate and momentum.
## Working of BackpropagationConsider the diagram given below. - The preconnected paths transfer the inputs
**X**. - Then the weights
**W**are randomly selected, which are used to model the input. - After then, the output is calculated for every individual neuron that passes from the input layer to the hidden layer and then to the output layer.
- Lastly, the errors are evaluated in the outputs.
**Error**_{B}= Actual Output - Desired Output - The errors are sent back to the hidden layer from the output layer for adjusting the weights to lessen the error.
- Until the desired result is achieved, keep iterating all of the processes.
## Need of Backpropagation- Since it is fast as well as simple, it is very easy to implement.
- Apart from no of inputs, it does not encompass of any other parameter to perform tuning.
- As it does not necessitate any kind of prior knowledge, so it tends out to be more flexible.
- It is a standard method that results well.
## Building an ANNBefore starting with building an ANN model, we will require a dataset on which our model is going to work. The dataset is the collection of data for a particular problem, which is in the form of a CSV file.
Here we are going to solve this business problem using artificial neural networks. The problem that we are going to deal with is a So, we will start with installing the Since it is already installed, the output will be as given below. From the image given below, it can be seen that the TensorFlow library is successfully installed. pip install keras So, we have installed Keras library too. Now that we are done with the installation, the next step is to update all these libraries to the most updated version, and it can be done by following the given code. Since we are doing it for the very first time, it will ask whether to proceed or not. Confirm it with y and press enter. After the libraries are updated successfully, we will close the Anaconda prompt and get back to the Spyder IDE. Now we will start building our model in two parts, such that in part data pre-processing, however in 2 part, we will ^{nd}create the ANN model.Data pre-processing is very necessary to prepare the data correctly for building a future deep learning model. Since we are in front of a classification problem, so we have some independent variables encompassing some information about customers in a bank, and we are trying to predict the binary outcome for the dependent variable, i.e., either ## Part1: Data Pre-processingWe will start by importing some of the pre-defined Python libraries such as NumPy, Matplotlib, and Pandas so as to perform data-preprocessing. All these libraries perform some sort of specific tasks.
NumPy is a python library that stands for
It is also an open-source library with the help of which charts can be plotted in the python. The sole purpose of this library is to visualize the data for which it necessitates to import its
Pandas is also an open-source library that enables high-performance data manipulation as well as analyzing tools. It is mainly used to handle the data and make the analysis. An output image is given below, which shows that the libraries have been successfully imported. Next, we will import the data file from the current working directories with the help of Pandas. We will use From the code given above, By clicking on the Next, we will create the So, we will include all the independent variables from the After running the above code, we will see that we have successfully created the matrix of feature By clicking on
Next, we will split the dataset into the training and test set. But before that, we need to encode that matrix of the feature as it contains the The main reason behind encoding the categorical data before splitting is that it is must to encode the matrix of So, now we will encode our categorical independent variable by having a look at our matrix from console and for that we just need to press
From the image given above, we can see that we have only two categorical independent variables, which is the So we will need to create two label encoder objects, such that we will create our first label encoder object named After executing the code, we will now have a look at the
So, from the output image given above, we can see that France became 0, Germany became 1, and Spain became 2. Now in a similar manner, we will do the same for the other variable, i.e., Gender variable but with a new object.
We can clearly see that females became 0 and males became 1. Since there is no relational order between the categories of our categorical variable, so for that we need to create a dummy variable for the country categorical variable as it contains three categories unlike the gender variable having only two categories, which is why we will be removing one column to avoid the
By having a look at Next, we will remove one dummy variable to avoid falling into a dummy variable trap. We will take a matrix of features X and update it by taking all the lines of this matrix and all the columns except the first one.
It can be seen that we are left with only two dummy variables, so no more dummy variable trap. Now we are ready to split the dataset into the training set and test set. We have taken the By executing the code given above, we will get four different variables that can be seen under the variable explorer section.
Besides parallel computations, we are going to have highly computed intensive calculations as well as we don't want one independent variable dominating the other one, so we will be applying feature scaling to ease out all the calculations. After executing the above code, we can have a quick look at
Now that our data is well pre-processed, we will start by building an artificial neural network. ## Part2: Building an ANNWe will start with importing the Keras libraries as well as the desired packages as it will build the Neural Network based on TensorFlow After importing the Keras library, we will now import two modules, i.e., the Sequential module, which is required to initialize our neural network, and the Dense module that is needed to build the layer of our ANN. Next, we will initialize the ANN, or simply we can say we will be defining it as a sequence of layers. The deep learning model can be initialized in two ways, either by defining the sequence of layers or defining a graph. Since we are going to make our ANN with successive layers, so we will initialize our deep learning model by defining it as a sequence of layers. It can be done by creating an object of the sequential class, which is taken from the sequential model. The object that we are going to create is nothing but the model itself, i.e., a neural network that will have a row of classifiers because we are solving a classification problem where we have to predict a class, so our neural network model is going to be a classifier. As in the next step, we will be predicting the test set result using the classifier name, so we will call our model as a classifier that is nothing but our future Artificial Neural Network that we are going to build. Since this classifier is an object of Sequential class, so we will be using it, but will not pass any argument because we will be defining the layers step by step by starting with the input layer followed by adding some hidden layers and then the output layer. After this, we will start by adding the input layer and the first hidden layer. We will take the classifier that we initialized in the previous step by creating an object of the sequential class, and we will use the Within the **units**are the very first argument, which can be defined as the number of nodes that we want to add in the hidden layer.- The second argument is the
**kernel_initializer**that randomly initializes the weight as a small number close to zero so that they can be randomly initialized with a uniform function. Here we have a simple**uniform**function that will initialize the weight according to the uniform distribution. - The third argument is the
**activation**, which can be understood as the function that we want to choose in our hidden layer. So, we will be using the**rectifier function**for the**hidden layers**and the**sigmoid function**for the**output layer**. Since we are in the hidden layer, we are using the "**relu**" perimeter as it corresponds to the rectifier function. - And the last is the
**input_dim**argument that specifies the number of nodes in the input layer, which is actually the number of independent variables. It is very necessary to add the argument because, by so far, we have only initialized our ANN, we haven't created any layer yet, and that's why it doesn't know which node this hidden layer we are creating is expecting as inputs. After the first hidden layer gets created, we don't need to specify this argument for the next hidden layers.
Next, we will add the second hidden layer by using the same add method followed by passing the same parameter, which is the After adding the two hidden layers, we will now add the final output layer. This is again similar to the previous step, just the fact that we will be units parameter because in the output layer we only require one node as our dependent variable is a categorical variable encompassing a binary outcome and also when we have binary outcome then, in that case, we have only one node in the output layer. So, therefore, we will put units equals to 1, and since we are in the output layer, we will be replacing the As we are done with adding the layers of our ANN, we will now compile the whole artificial neural network by applying the stochastic gradient descent. We will start with our classifier object, followed by using the compile method and will pass on the following arguments in it. - The first argument is the
**optimizer**, which is simply the algorithm that we want to use to find the optimal set of weights in the neural networks. The algorithm that we are going to use is nothing but the stochastic gradient descent algorithm. Since there are several types of stochastic descent algorithms and the most efficient one is called "**adam**," which is going to be the input of this optimizer parameter. - The second parameter is the loss, which is a loss function within the stochastic gradient descent algorithm, which is used to find the optimal weights. Since our dependent variable has a
**binary outcome**, so we will be using**binary_crossentropy**logarithmic function, and when there is a**binary outcome**, then we will incorporate**categorical_crossentropy**. - The last argument will be the metrics, which is nothing but a criterion to evaluate our model, and we are using the "
**accuracy**." So, what happens is when the weights are updated after each observation, the algorithm makes use of this accuracy to improve the model's performance.
Next, we will fit the ANN to the training set for which we will be using the fit method to fit our ANN to the training set. In the fit method, we will be passing the following arguments: - The first argument is the dataset on which we want to train our classifier, which is the training set separated into two-argument such as
**X_train**(matrix of feature containing the observations of the train set) and**y_train**(containing the actual outcomes of the dependent variable for all the observations in the training set). - The next argument is the
**batch_size**, which is the number of observations, after which we want to update the weight. - And lastly, the no. of
**epochs**that we are going to apply to see the algorithm in action as well the improvement in accuracy over the different epochs.
From the output image given above, you can see that our model is ready and has reached an ## Part3: Making the Predictions and Evaluating the ModelSince we are done with training the ANN on the training set, now we will make the predictions on the set.
From the output image given above, we can see all the probabilities that the 2,000 customers of the test set will leave the bank. For example, if we have a look at first probability, i.e., 21% means that this first customer of the test set, indexed by zero, has a 20% chance to leave the bank. Since the predicted method returns the probability of the customers leave the bank and in order to use this confusion matrix, we don't need these probabilities, but we do need the predicted results in the form of True or False. So, we need to transform these probabilities into the predicted result. We will choose a threshold value to decide when the predicted result is one, and when it is zero. So, we predict Now, if we have a look at
So, the first five customers of the test set don't leave the bank according to the model, whereas the sixth customer in the test set leaves the bank. Next, we will execute the following code to get the confusion matrix.
From the output given above, we can see that out of 2000 new observations; we get 1542+141= So, now we will compute the accuracy on the console, which is the number of correct predictions divided by the total number of predictions. So, we can see that we got an accuracy of 84% on new observations on which we didn't train our ANN, even that get a good amount of accuracy. Since this is the same accuracy that we obtained in the training set but obtained here on the test set too. So, eventually, we can validate our model, and now the bank can use it to make a ranking of their customers, ranked by their probability to leave the bank, from the customer that has the highest probability to leave the bank, down to the customer that has the lowest probability to leave the bank. Next TopicConvolutional Neural Network |