ID3 Algorithm

Apr 28, 2025
Updated 1 day ago
3 min read

Understanding the ID3 Algorithm for Decision Tree Learning

The ID3 algorithm (Iterative Dichotomiser 3) is one of the most foundational approaches to building decision trees in machine learning classification. Introduced by Ross Quinlan, the ID3 algorithm works by recursively selecting the most informative feature at each step — using Entropy to measure uncertainty and Information Gain to rank attributes — until every branch reaches a pure class label. If you're studying machine learning classification or decision tree learning, ID3 is the best starting point because its logic is clean, visual, and fully traceable by hand.

In this note, we'll walk through a complete worked example — constructing a decision tree step by step from a 10-row dataset — so you can see exactly how entropy and information gain drive every split.

Related: Decision Tree → · Decision Tree Algorithm → · Bayesian Classifier →


Dataset

S.no

Age

Competition

Type

Profit (Class)

1

Old

Yes

Soft

Down

2

Old

No

Soft

Down

3

Old

No

Hard

Down

4

Mid

Yes

Soft

Down

5

Mid

Yes

Hard

Down

6

Mid

No

Hard

Up

7

Mid

No

Soft

Up

8

New

Yes

Soft

Up

9

New

No

Hard

Up

10

New

No

Soft

Up


Step 1: Initial Entropy of the Dataset

Where:

  • Total = 10

  • Down = 5 (rows 1–5)

  • Up = 5 (rows 6–10)

So:

Entropy(S') = 1


Step 2: Information Gain for Each Attribute

Attribute 1: Age (Old, Mid, New)

Entropy(Old)

Old: = [Down, Down, Down]

Entropy(Mid)

Mid: = [Down, Down, Up, Up]

Entropy(New)

New: = [Up, Up, Up]

Gain(S', Age)

Gain(Age) = 0.6


Attribute 2: Competition (Yes, No)

Entropy(Yes)

Yes: = [Down, Down, Down, Up]

Entropy(No)

No: = [Down, Down, Up, Up, Up, Up]

Gain(S', Competition)

Gain(Competition) = 0.125


Attribute 3: Type (Soft, Hard)

Entropy(Soft)

Soft: = [Down, Down, Down, Up, Up, Up]

Entropy(Hard)

Hard: = [Down, Down, Up, Up]

Gain(S', Type)

Gain(Type) = 0


Step 3: Selecting the Root Node

Attribute

Information Gain

Age

0.6

Competition

0.125

Type

0

Age gives the highest information gain (0.6), so it becomes the root of the decision tree.

ID3 algorithm decision tree initial structure with Age as root node splitting into Old, Mid, and New branches

Step 4: Expanding Each Branch

Age = Old

— subset where Age = Old:

Age

Competition

Type

Profit

Old

Yes

Soft

Down

Old

No

Soft

Down

Old

No

Hard

Down

All 3 examples → Down ✅ Pure group!

Leaf Node = "Down"


Age = New

— subset where Age = New:

Age

Competition

Type

Profit

New

Yes

Soft

Up

New

No

Hard

Up

New

No

Soft

Up

All 3 examples → Up ✅ Pure group!

Leaf Node = "Up"


Age = Mid — Further Splitting Required

— subset where Age = Mid:

Age

Competition

Type

Profit

Mid

Yes

Soft

Down

Mid

Yes

Hard

Down

Mid

No

Hard

Up

Mid

No

Soft

Up

2 Down, 2 Up → impure (Entropy = 1). Need to split further.

Competition on

Entropy(Yes): Yes → [Down, Down] →

Entropy(No): No → [Up, Up] →

Type on

Entropy(Soft): Soft → [Down, Up] →

Entropy(Hard): Hard → [Down, Up] →

Competition wins with Gain = 1 — it splits the Mid group perfectly.

  • Competition = Yes → [Down, Down] → Leaf: Down

  • Competition = No → [Up, Up] → Leaf: Up


Final Decision Tree

ID3 algorithm final decision tree with Age as root, Competition as child node, and leaf nodes showing Up and Down profit classification

Summary

The ID3 algorithm selects attributes greedily — always the one with the highest information gain. In this example:

  • Age split the dataset most cleanly (Gain = 0.6), making it the root

  • The Old and New branches resolved immediately to pure leaf nodes

  • The Mid branch required a second split on Competition (Gain = 1), which resolved it completely

  • Type contributed zero gain and was never used

This greedy, entropy-driven approach is what makes ID3 easy to trace and understand — and also why it can overfit on noisy real-world data, which extensions like C4.5 and CART address.

Further reading on NoteHub: Random Forest & ID3 → · Ensemble Learning → · K-Means Clustering →