Logistic Regression (Classification)— Mathematical intuition
Table of contents:
1. Need for Logistic Regression and math behind it
2. Logistic Regression for multiclass classification
3. Python library to implement Logistic Regression
4. References
Classification is a predictive modeling problem that involves assigning a class label to an example. The classification problem with two classes is known as Binary Classification and the problem with more than two classes is known as Multi-class Classification.
1. Need for Logistic Regression and math behind it
For the above values of x and y, we can easily plot the best fit line using Linear Regression. We can classify the data points to the left side of the line as Class1 and the data points to the right side of the line as Class2. This best fit line is also known as Decision Boundary and it is a property of θ and not of the training set.
Please refer to the following link for Linear Regression and its mathematical intuition:- https://parisrohan.medium.com/simple-linear-regression-mathematical-intuition-and-python-implementation-f8ca5c17a4c5
The problem arises when we have an outlier and our best fit line changes according to it.
According to the above plot, some of the data points which were classified as Class2 are now being classified as Class1. Wrongly classifying data might be fatal in situations like when we are trying to identify whether a tumor is malignant or benign. So we need an algorithm to classify the data in a much better way.
Logistic Regression is a Machine Learning classification algorithm that is used to predict the probability of a categorical dependent variable. In logistic regression, the dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.).
The hypothesis function is defined as below for Linear Regression
The above hypothesis function is modified for Logistic Regression
The above function is known as the Sigmoid function or Logistic function. It is graphically represented as follows
From the above function, it can be observed that
i) Suppose y=1, when g(z) ≥ 0.5 then z ≥ 0
ii) Suppose y=0, when g(z) < 0.5 then z < 0
1.1 Cost function for Logistic Regression
By substituting the new hypothesis function into the cost function of Linear Regression, we get a non-convex function of the parameter θ. The problem with this is that there are multiple local minima and there is no guarantee that the Gradient Descent will converge at the global minima.
So we need a convex function of parameter θ which might look as follows.
In order to obtain a convex function, the cost function of Logistic Regression is modified as follows
The cost function can be rewritten as follows,
To get θ we must minimize J(θ) using Gradient Descent
1.2 Interpretation of hypothesis function output
2. Logistic Regression for multiclass classification
Binary classification models like logistic regression do not support multi-class classification natively and require meta-strategies. The One-vs-Rest strategy splits a multi-class classification into one binary classification problem per class. A binary classifier is then trained on each binary classification problem and predictions are made using the model that is the most confident.
Suppose we have a classification problem with the following three classes
We will train a logistic regression classifier hθi(x) for each class ‘i’ to predict the probability that ‘y=i’
To make a prediction on a new data point, pick class ‘i’ that maximizes hθ(x)
3. Python library to implement Logistic Regression
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()