ID3 Algorithm

Apr 28, 2025

Updated 1 month ago

3 min read

Understanding the ID3 Algorithm for Decision Tree Learning

The ID3 algorithm (Iterative Dichotomiser 3) is one of the most foundational approaches to building decision trees in machine learning classification. Introduced by Ross Quinlan, the ID3 algorithm works by recursively selecting the most informative feature at each step — using Entropy to measure uncertainty and Information Gain to rank attributes — until every branch reaches a pure class label. If you're studying machine learning classification or decision tree learning, ID3 is the best starting point because its logic is clean, visual, and fully traceable by hand.

In this note, we'll walk through a complete worked example — constructing a decision tree step by step from a 10-row dataset — so you can see exactly how entropy and information gain drive every split.

Related: Decision Tree → · Decision Tree Algorithm → · Bayesian Classifier →

Dataset

S.no	Age	Competition	Type	Profit (Class)
1	Old	Yes	Soft	Down
2	Old	No	Soft	Down
3	Old	No	Hard	Down
4	Mid	Yes	Soft	Down
5	Mid	Yes	Hard	Down
6	Mid	No	Hard	Up
7	Mid	No	Soft	Up
8	New	Yes	Soft	Up
9	New	No	Hard	Up
10	New	No	Soft	Up

Step 1: Initial Entropy of the Dataset

E n t r o p y (S^{'}) = - p_{U p} lo g_{2} (p_{U p}) - p_{D o w n} lo g_{2} (p_{D o w n})

Where:

Total = 10
Down = 5 (rows 1–5)
Up = 5 (rows 6–10)
$p_{U p} = \frac{5}{10}$
$p_{D o w n} = \frac{5}{10}$

So:

E n t r o p y (S^{'}) = - \frac{5}{10} lo g_{2} (\frac{5}{10}) - \frac{5}{10} lo g_{2} (\frac{5}{10})

= - \frac{5}{10} lo g_{2} (\frac{1}{2}) - \frac{5}{10} lo g_{2} (\frac{1}{2})

= \frac{5}{10} \cdot 1 + \frac{5}{10} \cdot 1

= 0.5 + 0.5

$lo g_{a} b^{c} = c lo g_{a} b, lo g_{a} a = 1$

✅ Entropy(S') = 1

Step 2: Information Gain for Each Attribute

Attribute 1: Age (Old, Mid, New)

Entropy(Old)

Old: = $S$ [Down, Down, Down]

E = - \frac{3}{3} lo g_{2} \frac{3}{3} = 0

Entropy(Mid)

Mid: $S$ = [Down, Down, Up, Up]

E = - \frac{2}{4} lo g_{2} \frac{2}{4} - \frac{2}{4} lo g_{2} \frac{2}{4} = 1

Entropy(New)

New: $S$ = [Up, Up, Up]

E = - \frac{3}{3} lo g_{2} \frac{3}{3} = 0

Gain(S', Age)

= 1 - (\frac{3}{10} \cdot 0 + \frac{4}{10} \cdot 1 + \frac{3}{10} \cdot 0)

= 1 - 0.4 = 0.6

✅ Gain(Age) = 0.6

Attribute 2: Competition (Yes, No)

Entropy(Yes)

Yes: $S$ = [Down, Down, Down, Up]

E = - \frac{3}{4} lo g_{2} \frac{3}{4} - \frac{1}{4} lo g_{2} \frac{1}{4} \approx 0.811

Entropy(No)

No: $S$ = [Down, Down, Up, Up, Up, Up]

E = - \frac{2}{6} lo g_{2} \frac{2}{6} - \frac{4}{6} lo g_{2} \frac{4}{6} \approx 0.918

Gain(S', Competition)

= 1 - (\frac{4}{10} \cdot 0.811 + \frac{6}{10} \cdot 0.918)

\approx 1 - (0.3244 + 0.5508) = 0.125

✅ Gain(Competition) = 0.125

Attribute 3: Type (Soft, Hard)

Entropy(Soft)

Soft: $S$ = [Down, Down, Down, Up, Up, Up]

E = - \frac{3}{6} lo g_{2} \frac{3}{6} - \frac{3}{6} lo g_{2} \frac{3}{6} = 1

Entropy(Hard)

Hard: $S$ = [Down, Down, Up, Up]

E = - \frac{2}{4} lo g_{2} \frac{2}{4} - \frac{2}{4} lo g_{2} \frac{2}{4} = 1

Gain(S', Type)

= 1 - (\frac{6}{10} \cdot 1 + \frac{4}{10} \cdot 1) = 1 - 1 = 0

✅ Gain(Type) = 0

Step 3: Selecting the Root Node

Attribute	Information Gain
Age	0.6 ✅
Competition	0.125
Type	0

max (0.6, 0.125, 0) = 0.6

Age gives the highest information gain (0.6), so it becomes the root of the decision tree.

ID3 algorithm decision tree initial structure with Age as root node splitting into Old, Mid, and New branches

Step 4: Expanding Each Branch

Age = Old

$S^{1}$ — subset where Age = Old:

Age	Competition	Type	Profit
Old	Yes	Soft	Down
Old	No	Soft	Down
Old	No	Hard	Down

All 3 examples → Down ✅ Pure group!

E n t r o p y (S^{1}) = 0

Leaf Node = "Down"

Age = New

$S^{3}$ — subset where Age = New:

Age	Competition	Type	Profit
New	Yes	Soft	Up
New	No	Hard	Up
New	No	Soft	Up

All 3 examples → Up ✅ Pure group!

E n t r o p y (S^{3}) = 0

Leaf Node = "Up"

Age = Mid — Further Splitting Required

$S^{2}$ — subset where Age = Mid:

Age	Competition	Type	Profit
Mid	Yes	Soft	Down
Mid	Yes	Hard	Down
Mid	No	Hard	Up
Mid	No	Soft	Up

2 Down, 2 Up → impure (Entropy = 1). Need to split further.

Competition on $S^{2}$

Entropy(Yes): Yes → [Down, Down] → $E = 0$

Entropy(No): No → [Up, Up] → $E = 0$

G ain (S^{2}, C o m p e t i t i o n) = 1 - (\frac{2}{4} \cdot 0 + \frac{2}{4} \cdot 0) = 1

Type on $S^{2}$

Entropy(Soft): Soft → [Down, Up] → $E = 1$

Entropy(Hard): Hard → [Down, Up] → $E = 1$

G ain (S^{2}, T y p e) = 1 - (\frac{2}{4} \cdot 1 + \frac{2}{4} \cdot 1) = 0

Competition wins with Gain = 1 — it splits the Mid group perfectly.

Competition = Yes → [Down, Down] → Leaf: Down
Competition = No → [Up, Up] → Leaf: Up

Final Decision Tree

Summary

The ID3 algorithm selects attributes greedily — always the one with the highest information gain. In this example:

Age split the dataset most cleanly (Gain = 0.6), making it the root
The Old and New branches resolved immediately to pure leaf nodes
The Mid branch required a second split on Competition (Gain = 1), which resolved it completely
Type contributed zero gain and was never used

This greedy, entropy-driven approach is what makes ID3 easy to trace and understand — and also why it can overfit on noisy real-world data, which extensions like C4.5 and CART address.

Further reading on NoteHub: Random Forest & ID3 → · Ensemble Learning → · K-Means Clustering →