What is a Confusion Matrix ???

3 min readMar 29, 2022

Table of Contents:
1. Confusion Matrix
2. Metrics derived from Confusion Matrix
3. When to reduce FN and FP ?
4. Confusion Matrix implementation using Python
5. References

1. Confusion Matrix

Classification accuracy is often calculated as

classification accuracy = (correct predictions / total predictions) * 100

In the case of an imbalanced dataset, the classification accuracy alone can be misleading.

A confusion matrix is a technique for summarizing the performance of a classification algorithm. Calculating a confusion matrix on our classifier model gives us a fair idea of what our model is doing right and what type of errors it is making.

TN stands for True Negative. It consists of correctly predicted values in which the event did not occur. Example- model predicts Person_A(who is actually innocent) as not guilty.

TP stands for True Positive. It consists of correctly predicted values in which the event did actually occur. Example- model predicts Person_B(who has actually committed a crime) as guilty.

FN stands for False Negative. It consists of incorrectly predicted values in which the event did actually occur. This is also known as Type 2error. Example- model predicts Person_C(who has actually committed a crime) as not guilty.

FP stands for False Positive. It consists of incorrectly predicted values in which the event did not occur. This is also known as Type 1 error. Example- model predicts Person_D(who is actually innocent) as guilty.

2. Metrics derived from Confusion Matrix

Accuracy:
It is the fraction of predictions that the model got right. This means it is the ratio of the number of correct predictions to the number of total predictions.

2. Error Rate:
It is the opposite of Accuracy. It is the fraction of predictions incorrectly predicted by the model.

3. Precision:
It is the measure of Positive predictions made by the model.

4. Recall:
It is the measure of the model correctly predicting True Positives.

3. When to reduce FN and FP ?

This totally depends on the use cases that we are working on. Consider the following scenarios where we will need to reduce either FP or FN or both.

i. In use cases relating to disease prediction, it is beneficial to the patient if the disease is identified in the early stages. Here we need to reduce FN because if a patient who actually has a disease is diagnosed as healthy then he/she will not bother to do extra health tests. They may face serious health issues in the near future which may even prove to be fatal.

ii. In a use case where an individual needs to be found guilty or innocent we need to reduce FP because if an innocent person is charged guilty then he/she may waste several years of their life in jail.

4. Confusion Matrix implementation using Python

from sklearn.metrics import confusion_matrixexpected = [1, 1, 0, 1, 0, 0, 1, 0, 0, 0]predicted = [1, 0, 0, 1, 0, 0, 1, 1, 1, 0]results = confusion_matrix(expected, predicted)print(results)

5. References

1. https://www.youtube.com/watch?v=AyP85ocS-8Y
2. https://machinelearningmastery.com/confusion-matrix-machine-learning/