Linear Regression — Mathematical intuition and Python implementation

Rohan Paris
5 min readMar 24, 2022

Table of contents:
1. What is Linear Regression
2. Math behind Simple and Multiple Linear Regression
3. Performance Metrics
4. Python Implementation
5. References

1. What is Linear Regression

Linear Regression is the supervised Machine Learning model in which the model finds the best fit linear line between the independent and dependent variable to predict the desired values. The two types of linear regression are 1. Simple Linear Regression, 2. Multiple Linear Regression. In Simple Linear Regression there is only one dependent and one independent variable and the model has to find the best fit line between them. Simple Linear Regression is also known as Univariate Linear Regression and Multiple Linear Regression is also known as Multivariate Linear Regression. In Multiple Linear Regression, there is more than one independent variable for the model to find the relationship with the dependent variable.

Image credit: https://www.researchgate.net/figure/Linear-Regression-model-sample-illustration_fig3_333457161

2. Math behind Simple Linear Regression

Suppose ‘x’ denotes our independent variable, ‘y’ denotes our dependent variable and ‘h’ denotes the hypothesis function then,

x → (h) → y; the hypothesis function takes in the input feature ‘x’ and predicts the desired output ‘y’; the hypothesis function is defined as below

In the above equation theta0 is the y-axis intercept and theta1 is the slope of the linear line.

The main idea is to keep the distance between the ‘data points (y)’ and the ‘predicted points (h(x))’ minimum. In order to do so, we need to update theta0 and theta1 to get the best fit line in the training data.

The cost function is also known as ‘Squared Error Function’ and our goal is to minimize the Cost Function ‘J’ over theta0 and theta1.

Gradient Descent

Consider ‘theta0’ = 0 (which means that the lines pass through the origin) and ‘m’=3 (sample size). In order to find the best fit line, we will need to plot several lines by changing the value of ‘theta1’. In the below graph ‘y axis = h(x)’.

The below figure is plotted by substituting the values of h(x) obtained from each of the above lines into the cost function formula. The below curve is known as Gradient Descent Curve.

Gradient descent is an optimization algorithm used to minimize a function by iteratively moving in the direction of the steepest descent.

in the above equation, ‘α’ is the learning rate; j=0 and j=1 need to be updated simultaneously.

Effect of learning rate ‘α’ on Gradient Descent

i. If ‘α’ is too small, then Gradient Descent can be slow

ii. If ‘α’ is too large then Gradient Descent can shoot over the minimum. It may fail to converge or even diverge.

iii. If ‘theta1’ is already at local minima, then the value of ‘theta1’ remains the same as the slope at a minimum is 0 (this is not a valid case in Linear Regression as this algorithm only has global minima ).

iv. Gradient Descent can converge to local minima even if the ‘α’ is of fixed value (as the slope keeps on decreasing).

On removing the derivative, Gradient Descent Algorithm looks as follow

The math behind Multiple Linear Regression

Consider the following example where we have four independent variables and one dependent variable.

The hypothesis function for the above problem will be

And the gradient descent will be

Performance metrics

  1. R²:

ŷ is the predicted value of y and y̅ is the mean. R² will always be positive as we will at least select a line that is a better fit than y̅.

2. Adjusted R²:

R² increases if the features are correlated. But it also increases slightly if the extra added feature is not correlated. So if we were to choose a model based on R² score we might end up with a less efficient one. Adjusted R² is used to solve this issue. As the number of ‘P’ increases, R² is adjusted.

In the above equation, R² is the R² score, N is the number of data points and P is the number of features.

The value of R² will always be greater than the value of Adjusted R².

Python Implementation

#Import the library
from sklearn.linear_model import LinearRegression
#Build a model
regressor = LinearRegression()
#Train the model
regressor.fit(X_train, y_train)
#Get predictions on test data
y_pred = regressor.predict(X_test)
#R² metric
from sklearn.metrics import r2_score
r2_score(y_test, y_pred)

--

--