Confusion Matrix

Abhijeet Singh Rajput

@abhijeetsingh

Created

May 5, 2026

Last Modified

1 day ago

Confusion Matrix

Definition

A Confusion Matrix is a table used to evaluate the performance of a classification model. It compares the model's predicted values against the actual (true) values and summarises the results into four categories.

It is called a "confusion" matrix because it reveals how often a model confuses one class for another.

Structure of the Confusion Matrix

For a binary classification problem (Positive / Negative):

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

The Four Core Terms

True Positive (TP) The model correctly predicted Positive. The patient has cancer and the test says cancer.

True Negative (TN) The model correctly predicted Negative. The patient is healthy and the test says healthy.

False Positive (FP) — Type I Error The model predicted Positive, but the actual class is Negative. A healthy patient is incorrectly flagged as sick. Also called a false alarm.

False Negative (FN) — Type II Error The model predicted Negative, but the actual class is Positive. A sick patient is missed. This is the more dangerous error in medical/safety-critical systems.

Performance Metrics Derived from the Matrix

1. Accuracy

The proportion of all correct predictions out of total predictions.

A cc u r a cy = \frac{T P + T N}{F P + F N + T P + T N}

Limitation: misleading when the dataset is imbalanced (e.g. 95% negative class).

2. Precision (Positive Predictive Value)

Of all positive predictions made, how many were actually positive?

Precision = \frac{T P}{T P + F P}

High precision means few false alarms.

3. Recall (Sensitivity / True Positive Rate)

Of all actual positives, how many did the model correctly identify?

Recall = \frac{T P}{T P + F N}

High recall means few missed positives. Critical in disease detection.

4. F1 Score

The harmonic mean of Precision and Recall. Best metric when classes are imbalanced.

F 1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}

5. Specificity (True Negative Rate)

Of all actual negatives, how many did the model correctly reject?

Specificity = \frac{T N}{T N + F P}

6. False Positive Rate (FPR)

FPR = \frac{F P}{F P + T N} = 1 - Specificity

7. Numerical

Actual Class	Cow	Cow	Deer	Deer	Cow	Deer	Cow	Deer	Cow	Cow
Predicted Class	Deer	Cow	Deer	Deer	Deer	Cow	Cow	Cow	Cow	Deer
	FN	TP	TN	TN	FN	FP	TP	FP	TP	FN

A cc u r a cy = \frac{T P + T N}{T P + T N + F P + F N}

A cc u r a cy = \frac{3 + 2}{10} = \frac{5}{10} = 0.5

Summary Table of Metrics

Metric	Formula	Focus
Accuracy	(TP+TN) / Total	Overall correctness
Precision	TP / (TP+FP)	Quality of positive predictions
Recall	TP / (TP+FN)	Coverage of actual positives
F1 Score	2·P·R / (P+R)	Precision-Recall balance
Specificity	TN / (TN+FP)	Coverage of actual negatives

Precision vs. Recall Trade-off

There is always a trade-off between Precision and Recall:

Increasing the classification threshold → higher Precision, lower Recall
Decreasing the threshold → higher Recall, lower Precision

The F1 Score balances both. In applications like spam filtering, you may prioritise Precision (avoid blocking good emails). In cancer screening, you prioritise Recall (don't miss sick patients).

Multi-class Confusion Matrix

For problems with more than 2 classes (e.g. classifying digits 0–9), the matrix expands to an n×n grid. Each row represents actual class, each column represents predicted class. The diagonal shows correct predictions; off-diagonal cells show misclassifications.

Applications

The confusion matrix is used across classification tasks in spam detection (email spam vs. not spam), medical diagnosis (disease vs. healthy), fraud detection (fraud vs. genuine), image recognition, and sentiment analysis.

Key Takeaway

The confusion matrix is the foundation of classifier evaluation. Accuracy alone is often deceiving — the matrix reveals where a model is making mistakes, which types of errors dominate, and which metric to optimise for based on the real-world cost of each error type.