Confusion Matrix
Confusion Matrix
Definition
A Confusion Matrix is a table used to evaluate the performance of a classification model. It compares the model's predicted values against the actual (true) values and summarises the results into four categories.
It is called a "confusion" matrix because it reveals how often a model confuses one class for another.
Structure of the Confusion Matrix
For a binary classification problem (Positive / Negative):
Predicted Positive | Predicted Negative | |
|---|---|---|
Actual Positive | True Positive (TP) | False Negative (FN) |
Actual Negative | False Positive (FP) | True Negative (TN) |
The Four Core Terms
True Positive (TP) The model correctly predicted Positive. The patient has cancer and the test says cancer.
True Negative (TN) The model correctly predicted Negative. The patient is healthy and the test says healthy.
False Positive (FP) — Type I Error The model predicted Positive, but the actual class is Negative. A healthy patient is incorrectly flagged as sick. Also called a false alarm.
False Negative (FN) — Type II Error The model predicted Negative, but the actual class is Positive. A sick patient is missed. This is the more dangerous error in medical/safety-critical systems.
Performance Metrics Derived from the Matrix
1. Accuracy
The proportion of all correct predictions out of total predictions.
Limitation: misleading when the dataset is imbalanced (e.g. 95% negative class).
2. Precision (Positive Predictive Value)
Of all positive predictions made, how many were actually positive?
High precision means few false alarms.
3. Recall (Sensitivity / True Positive Rate)
Of all actual positives, how many did the model correctly identify?
High recall means few missed positives. Critical in disease detection.
4. F1 Score
The harmonic mean of Precision and Recall. Best metric when classes are imbalanced.
5. Specificity (True Negative Rate)
Of all actual negatives, how many did the model correctly reject?
6. False Positive Rate (FPR)
7. Numerical
Actual Class | Cow | Cow | Deer | Deer | Cow | Deer | Cow | Deer | Cow | Cow |
Predicted Class | Deer | Cow | Deer | Deer | Deer | Cow | Cow | Cow | Cow | Deer |
FN | TP | TN | TN | FN | FP | TP | FP | TP | FN |
Summary Table of Metrics
Metric | Formula | Focus |
|---|---|---|
Accuracy | (TP+TN) / Total | Overall correctness |
Precision | TP / (TP+FP) | Quality of positive predictions |
Recall | TP / (TP+FN) | Coverage of actual positives |
F1 Score | 2·P·R / (P+R) | Precision-Recall balance |
Specificity | TN / (TN+FP) | Coverage of actual negatives |
Precision vs. Recall Trade-off
There is always a trade-off between Precision and Recall:
Increasing the classification threshold → higher Precision, lower Recall
Decreasing the threshold → higher Recall, lower Precision
The F1 Score balances both. In applications like spam filtering, you may prioritise Precision (avoid blocking good emails). In cancer screening, you prioritise Recall (don't miss sick patients).
Multi-class Confusion Matrix
For problems with more than 2 classes (e.g. classifying digits 0–9), the matrix expands to an n×n grid. Each row represents actual class, each column represents predicted class. The diagonal shows correct predictions; off-diagonal cells show misclassifications.
Applications
The confusion matrix is used across classification tasks in spam detection (email spam vs. not spam), medical diagnosis (disease vs. healthy), fraud detection (fraud vs. genuine), image recognition, and sentiment analysis.
Key Takeaway
The confusion matrix is the foundation of classifier evaluation. Accuracy alone is often deceiving — the matrix reveals where a model is making mistakes, which types of errors dominate, and which metric to optimise for based on the real-world cost of each error type.
