Understanding Stosticated Gradient Descent with exmaple

Hurry Up !! Check Out Todays Deal

Understanding Stosticated Gradient Descent with example

Stochastic gradient descent or SGD is very popular and powerful algorithm used in machine learning. SGD is an iterative algorithm which is used to compare various solution till the optimum solution is not found.They are widely used in trainning Neural Network.

Let's now understand what Stosticated Gradient Descent is, but before that first let's understand what Gradient Descent is?

Gradient Descent

Gradient mean a slope either upward or downwords and Descent means stepping downword in a scale.Hence, gradient descent simply means stepping upwoard or downword of a slope to reach the lowest or highest point of that slope. In machine learning the objective of gradient descent is such that it find the minimum value of the objective function such that the final result is optimum or satisfactory.

Let's take an exmaple imagine you are blindfolded and want to climb to the top of the hill with fewest steps along the way as possible.Initially what you will do is first take big-big steps in steepest direction of the hill but as you come close to the top of the hill your steps will be smaller and smaller to avoid skipping it.Now instead of climbing up think of gradient descent as hiking down to the bottom valley.This is a better way to visualize this algorithm as it is a minimizing alogrithm.

Gradient descent initially will take bigger step then as it get's closer to the minimum slope of the objective function it's steps gets smaller and smaller.Gradient descent always tries to find minimum value of the slope which is close to zero but not zero because if it gets zero the model will stop learning.

  1. Gradient descent start by picking random initial value for the parameter.(It is mostly partial derivative with respect to its input).

  2. Then we update gradient descent function by giving this parameter values.

  3. The output of gradient descent function is used in calculating step size: step size = gradient x learning rate

  4. Here deciding the value "learning rate" is very important because if the value of "learning rate" is larger it may skip the optimum point and if it is smaller it will take more time in finding minimum point.

  5. Now calculate new parameter by:

    new params = old params - step size

  6. This process is repeated until the optimum value or satisfacotry value of the objective fucntion is not found.

The downside of gradient descent algorithum is the amount of computation it takes in each iteration.Suppose we have 50,000 data points and 50 features.Then we calculate derivative of the function with respect to it's feature,so total will be 50000 x 50 = 2500000 computaion per iteration. It is common to take atleast 1000 iteration so 2500000 x 1000 = 2500000000 computaion to complelte the alogrithm. Which conclude that gradient descent is very slow on huge data.

Stochastic Gradient Descent (SGD)

'Stochastic' means involving random variable.Instead of selecting whole data and calculates it's derivative in each iteration, SGD selects few samples from the data points called 'batches' and compute it's derivatives.This step reduce the computaion steps enormusly.This samples or batches are randomly shuffle on each iteration of the alogrithum.

SGD
SGD

The path taken by SGD to find the optimum value is usually nosier than gradient descent as only one sample from the datasets is taken on each iteration. SGD usually taken heigher number of iteration to find optimum value than gradient descent but computation cost is much lesser than gradient descent since we are taking only samples instead of whole datastes.

Now let's look at example on how to implement SGD.

First let's import our necessary libraries:-

import numpy as np from sklearn.linear_model import SGDClassifier from sklearn.model_selection import train_test_split from sklearn.datasets import make_blobs from sklearn import metrics import matplotlib.pyplot as plt

Here we will be using SGDClassifier which is linear models but uses stochastic gradient descent (SGD) learning.

Now let's get our datasets.We have use here make_blobs to create our own datasets which contain 500000 data points with 2 features and 2 centers.

def get_datasets(): X,y = make_blobs(n_samples = 500000,n_features = 2, centers = 2,cluster_std = 2.5,random_state = 40) return X,y

Let us now implement SGDClassifier on the above datasets:

if __name__ == '__main__': X,y = get_datasets() X_train,X_test,y_train,y_test = train_test_split(X,y,random_state = 40) model = SGDClassifier(loss = "log",max_iter = 1000, n_jobs = -1) model.fit(X_train,y_train) y_pred = model.predict(X_test) print(f"Accuracy : {metrics.accuracy_score(y_test,y_pred)}")

Output:

Output
Output

References

Realate Post


Understanding State and Props in React

Understanding State and Props in React

We will look at what are states and props in react


What is Machine Learning ?

What is Machine Learning ?

This review of machine learning covers its history, key definitions, applications, and current issues in the business world.


OpenCV - Detecting corner location in subpixel(cornerSubPix()) using C++

OpenCV - Detecting corner location in subpixel(cornerSubPix()) using C++

In this tutorial we will look cornerSubPix() function which is use to find corner at sub-pixel level


OpenCV - Face Detection using Haar cascade classifiers (C++)

OpenCV - Face Detection using Haar cascade classifiers (C++)

In this tutorial we will learn how to detect face using Haar Cascade classifier pre-trainned model using OpenCV and C++


ReactJS - React Conditional Rendering

ReactJS - React Conditional Rendering

In this article we will see how to render components based on certain conditions


Another Techs


© 2022 Another Techs. All rights reserved.