Linear Regression
As we knew that there are three types of learning in a Machine Learning
- supervised learning
- non-supervised learning
- reinforcement learning
Under these Linear Regression comes under Supervised learning.
So what basically is Linear Regression ?
As the name suggests(Regression) we are going to predict a real number values as output.
Why should we use linear regression?
Consider a data with full of real numbers as both input and output.In that case The Linear Regression Algorithms works well out of all the algorithms.
Before we get into this let us learn about a few important parameters
Weights: Weights are the parameters which denotes the importance of each feature(columns).It is denoted in various symbols. In Machine Learning it is denoted as θ. They are also called as the slope.Weights or Slope help in deciding the angle of the linear line.
Example consider you are going to buy a mobile phone. You want security as an important feature. Here “security” gets a higher weight than other feature.
so our model will predict a mobile phone with better security and suggest you(probably an Iphone)
Bias: Bias are the parameters which denotes where our Linear line fits the model. Which is also called as the intercept.
Hypothesis: It describes the functionality of the model
Cost function: this is used to check how good our model predicts.It is denoted as j(θ).It also helps us to find the difference between the predicted value and real value.The whole concept of cost function is nothing but of MEAN SQUARE ERROR.
Now Lets get back into Linear Regression!
The main formula for a Linear Regression Model is
Y=c+mx
Here,
Y = hypothesis of the model or functionality of the model
c= Intercept of the model
m=slope of the model(here x is feature of the dataset)
If there is only two features in the dataset it is called as 2D plane.If there is more than two features it is called as a hyperplane.
Remember that having only one feature as a Training data will provide you a Linear line.
Loss Function
The loss is the error in our predicted value of a and b. Our goal is to minimize this error to obtain the most accurate value of a and b.
We will use the cost function(mean square error)to calculate the loss.There are three steps in this function:
- Find the difference between the actual y and predicted y value(y = mx + c), for a given x.
- Square this difference.
- Find the mean of the squares for every value in X.
Mean Squared Error Equation
Here yᵢ is the actual value and ȳᵢ is the predicted value. Lets substitute the value of ȳᵢ:
Substituting the value of ȳᵢ
So we square the error and find the mean. hence the name Mean Squared Error. Now that we have defined the loss function, lets get into the interesting part — minimizing it and finding m and c.
The Gradient Descent Algorithm
Gradient descent is an iterative optimization algorithm to find the minimum of a function. Here that function is our Loss Function.
Understanding Gradient Descent
Firstly we are initializing the value with some (minimal values).The reson for this is unless we don’t give a initializing value our model won’t get up.This method of initializing the model is also called as whole stock model.
Lets see an example for the Learning rate with the above image 1. There are two friends A and B. A is in the starting point and B is in the Convergence point. A has to reach his friend B so that he walks down. The speed of his walking can be defined as learning rate.As we can see the path from A to B is so steep A has to walk slowly. If he walks fast he will fall down.So that’s why we are having a very low learning rate.
Now lets see how Gradient Decent works?
consider a dataset
X Y
1 1
2 3
4 3
3 2
5 5
Initial model
θ0=0.0
θ1=0.0
p=θ0+θ1*x
error = p-y
Iteration1
X=1, Y=1 (as per dataset)
P=0.0+0.0*1
p=0
error =0–1
error=-1
Rule for updated parameter θ0=θ0(t)-learningrate*error
=0.0–0.01*-1
=0.01
Rule for updated parameter θ1=θ1(t)-learningrate*error*X
=0.0–0.01*-1*1
=0.01
Iteration2
X=2, Y=3 (as per dataset)
θ0=0.1
θ1=0.1
P=0.1+0.1*2
p=0.03
error =0.03–3
error=-2.97
Rule for updated parameter θ0=θ0(t)-learningrate*error
=0.01–0.01*-2.97
=0.0397
Rule for updated parameter θ1=θ1(t)-learningrate*error*X
=0.01–0.01*-2.97*2
=0.0694
Iteration3
X=4, Y=3 (as per dataset)
θ0=0.0397
θ1=0.0694
P=0.0397+0.0694*4
p=0.3173
error =0.3173–3
error=-2.6827
Rule for updated parameter θ0=θ0(t)-learningrate*error
=0.0397–0.01*-2.6827
=0.066527
Rule for updated parameter θ1=θ1(t)-learningrate*error*X
=0.0397–0.01*-2.6827*4
=0.176708
So, We must continue our iteration until our model reaches the convergence point.
Gradient descent is one of the simplest and widely used algorithms in machine learning, mainly because it can be applied to any function to optimize it. Learning it lays the foundation to mastering machine learning.
Got questions ? Need help ? Contact me!
Email: joe101richard@gmail.com
instagram:joe___richard
twitter:@JoeRichard101
References: