Gradient Descent AlgorithmThe Gradient Descent is an optimization algorithm which is used to minimize the cost function for many machine learning algorithms. Gradient Descent algorithm is used for updating the parameters of the learning models. Following are the different types of Gradient Descent:
Variables used:Let 'k' be the number of training datasets. Let 'j' be the number of features in the dataset. If p == k, the minibatch gradient descent will behave similarly to the batch gradient descent. (where 'p' is batch gradient descent) Algorithm used for batch gradient descent:Let h_{θ}(a) be the hypothesis for linear regression. Then the cost function will be given by: Let Σ represents the sum of all the training datasets from t = 1 to k. G_{train}(θ) = (1/2k) Σ (h_{θ}(a^{(t)})  b^{(t)})^{2} Repeat { θg = θg  (learning rate/k) * Σ (h_{θ}(a^{(t)})  b^{(t)})a_{g}^{(t)} For every g = 0 …j } Where a_{g}^{(t)} represents the g^{th} feature of the t ^{th} training dataset, suppose if 'k' is very large (for example, 7 million number of training datasets), then batch gradient descent will take hours or maybe days for completing the process. Therefore, for large training datasets, batch gradient descent is not recommended to the users as this will slows down the learning process of the machine. Algorithm used for minibatch gradient descentSuppose 'p' is the number of datasets in one batch, where p < k. Let p = 10 and k = 100; However the users can adjust the batch size. This is generally written as a power of 2.
Algorithm used for stochastic gradient descent:
Hence, Let (a^{(t)}, b^{(t)}) be the training dataset Cost(θ, (a^{(t)}, b^{(t)})) = (1/2) Σ (hθ(a^{(t)})  b^{(t)})^{2} G_{train}(θ) = (1/k) Σ Cost (θ, (a^{(t)}, b^{(t)})) Repeat { For t = 1 to k{ Θ_{g} = θ_{g}  (learning rate) * Σ (h_{θ}(a^{(t)})  b^{(t)})a_{g}^{(t)} For every g = 0 …j } } ConclusionIn this tutorial, we have discussed about the different types of Gradient Descent algorithms and their variants.
Next TopicPrettytable in Python
