Random Forest (ID3 algorithm)

Aug 7, 2025

Updated 1 month ago

5 min read

Random Forest Algorithm (ID3): A Complete Step-by-Step Guide

Random Forest is an ensemble machine-learning algorithm that builds many decision trees on random subsets of training data and combines their predictions. For classification tasks it takes a majority vote; for regression it averages the outputs. By training each tree on a different bootstrap sample, it dramatically reduces overfitting while maintaining high accuracy.

The Core Idea: Why Multiple Trees?

A single decision tree is prone to overfitting — it memorises the training data and performs poorly on unseen data. Random Forest fixes this by:

Bootstrapping — sampling the training data with replacement to create different subsets for each tree.
Aggregating — combining predictions from all trees so individual errors cancel out.

The Dataset

Day	Outlook	Temp	Humidity	Wind	Can Play
D1	Sunny	Hot	High	Weak	No
D2	Sunny	Hot	High	Strong	No
D3	Overcast	Mild	High	Weak	Yes
D4	Rain	Cool	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes
D13	Overcast	Hot	Normal	Weak	Yes
D14	Rain	Mild	High	Strong	No

Unseen Data Point (to classify)

Outlook	Temp	Humidity	Wind
Overcast	Mild	Normal	Weak

We will build 3 trees on 3 different bootstrap samples and take a majority vote.

Entropy and Information Gain — Quick Recap

Entropy measures the impurity of a set:

E n t r o p y (S) = - i = 1 \sum c p_{i} lo g_{2} p_{i}

Information Gain measures how much an attribute reduces impurity:

G ain (S, A) = E n t r o p y (S) - v \in V a l u es (A) \sum \frac{∣ S _{v} ∣}{∣ S ∣} \cdot E n t r o p y (S_{v})

The attribute with the highest gain becomes the splitting node.

Model 1 — Bootstrap Sample (D1–D10)

Day	Outlook	Temp	Humidity	Wind	Can Play
D1	Sunny	Hot	High	Weak	No
D2	Sunny	Hot	High	Strong	No
D3	Overcast	Mild	High	Weak	Yes
D4	Rain	Cool	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	No ← resampled
D6	Rain	Cool	Normal	Strong	Yes ← resampled
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes

6 Yes, 4 No (10 records)

Step 1 — Root Entropy

E n t r o p y (S) = - \frac{6}{10} lo g_{2} \frac{6}{10} - \frac{4}{10} lo g_{2} \frac{4}{10} = 0.442 + 0.529 = 0.971

Step 2 — Information Gain for Each Attribute

Outlook

Value	Records	Entropy
Sunny	D1, D2, D8, D9 → [No, No, No, Yes]	$- \frac{3}{4} lo g_{2} \frac{3}{4} - \frac{1}{4} lo g_{2} \frac{1}{4} = 0.811$
Overcast	D3, D7 → [Yes, Yes]	0 (pure)
Rain	D4, D5, D6, D10 → [Yes, No, Yes, Yes]	$- \frac{3}{4} lo g_{2} \frac{3}{4} - \frac{1}{4} lo g_{2} \frac{1}{4} = 0.811$

Value

Records

Entropy

Sunny

D1, D2, D8, D9 → [No, No, No, Yes]

$- \frac{3}{4} lo g_{2} \frac{3}{4} - \frac{1}{4} lo g_{2} \frac{1}{4} = 0.811$

Overcast

D3, D7 → [Yes, Yes]

0 (pure)

Rain

D4, D5, D6, D10 → [Yes, No, Yes, Yes]

$- \frac{3}{4} lo g_{2} \frac{3}{4} - \frac{1}{4} lo g_{2} \frac{1}{4} = 0.811$

G ain (S, O u tl oo k) = 0.971 - (\frac{4}{10} \times 0.811 + \frac{2}{10} \times 0 + \frac{4}{10} \times 0.811) = 0.971 - 0.649 = 0.322

Temp

Value	Records	Entropy
Hot	D1, D2 → [No, No]	0 (pure)
Mild	D3, D8, D10 → [Yes, No, Yes]	$- \frac{2}{3} lo g_{2} \frac{2}{3} - \frac{1}{3} lo g_{2} \frac{1}{3} = 0.918$
Cool	D4, D5, D6, D7, D9 → [Yes, No, Yes, Yes, Yes]	$- \frac{4}{5} lo g_{2} \frac{4}{5} - \frac{1}{5} lo g_{2} \frac{1}{5} = 0.722$

G ain (S, T e m p) = 0.971 - (\frac{2}{10} \times 0 + \frac{3}{10} \times 0.918 + \frac{5}{10} \times 0.722) = 0.971 - 0.636 = 0.335

Humidity

Value	Records	Entropy
High	D1, D2, D3, D4, D8 → [No, No, Yes, Yes, No]	$- \frac{2}{5} lo g_{2} \frac{2}{5} - \frac{3}{5} lo g_{2} \frac{3}{5} = 0.971$
Normal	D5, D6, D7, D9, D10 → [No, Yes, Yes, Yes, Yes]	$- \frac{4}{5} lo g_{2} \frac{4}{5} - \frac{1}{5} lo g_{2} \frac{1}{5} = 0.722$

G ain (S, H u mi d i t y) = 0.971 - (\frac{5}{10} \times 0.971 + \frac{5}{10} \times 0.722) = 0.971 - 0.847 = 0.124

Wind

Value	Records	Entropy
Weak	D1, D3, D4, D5, D8, D9, D10 → [No, Yes, Yes, No, No, Yes, Yes]	$- \frac{4}{7} lo g_{2} \frac{4}{7} - \frac{3}{7} lo g_{2} \frac{3}{7} = 0.985$
Strong	D2, D6, D7 → [No, Yes, Yes]	$- \frac{2}{3} lo g_{2} \frac{2}{3} - \frac{1}{3} lo g_{2} \frac{1}{3} = 0.918$

G ain (S, W in d) = 0.971 - (\frac{7}{10} \times 0.985 + \frac{3}{10} \times 0.918) = 0.971 - 0.965 = 0.006

Summary of Gains:

Attribute	Gain
Outlook	0.322
Temp	0.335 ← Max
Humidity	0.124
Wind	0.006

✅ Temp has the highest gain → Root Node = Temp

Model 1 – Step 1: Initial ID3 decision tree setup for Random Forest Model 1 showing root node selection. Temp is selected as the root attribute with the highest information gain (0.335), branching into Hot, Mild, and Cool.

Expanding the Temp = Hot Branch

Records: D1 [No], D2 [No] → Pure leaf: No

Model 1 – Step 2: Expansion of the Temp = Hot branch in Model 1. The Hot branch becomes a pure leaf node classified as “No,” while Mild and Cool remain unexpanded.

Expanding the Temp = Mild Branch (S1)

Records: D3 [Overcast, Yes], D8 [Sunny, No], D10 [Rain, Yes]

E n t r o p y (S 1) = - \frac{2}{3} lo g_{2} \frac{2}{3} - \frac{1}{3} lo g_{2} \frac{1}{3} = 0.918

Outlook on S1

Value	Records	Entropy
Overcast	[Yes]	0 (pure)
Sunny	[No]	0 (pure)
Rain	[Yes]	0 (pure)

G ain (S 1, O u tl oo k) = 0.918 - (\frac{1}{3} \times 0 + \frac{1}{3} \times 0 + \frac{1}{3} \times 0) = 0.918

Humidity on S1

Value	Records	Entropy
High	D3, D8 → [Yes, No]	1.0
Normal	D10 → [Yes]	0

G ain (S 1, H u mi d i t y) = 0.918 - (\frac{2}{3} \times 1.0 + \frac{1}{3} \times 0) = 0.918 - 0.667 = 0.251

Wind on S1

All records have Wind = Weak → only one value, no split possible.

G ain (S 1, W in d) = 0

✅ Outlook has the highest gain (0.918) → Split Temp=Mild on Outlook

Overcast → Yes
Sunny → No
Rain → Yes

Model 1 – Step 3: Expansion of the Temp = Mild branch in Model 1. Outlook is selected as the best splitting attribute (gain = 0.918), creating Overcast → Yes, Sunny → No, and Rain → Yes branches.

Expanding the Temp = Cool Branch (S2)

Records: D4 [Rain, High, Weak, Yes], D5 [Rain, Normal, Weak, No], D6 [Rain, Normal, Strong, Yes], D7 [Overcast, Normal, Strong, Yes], D9 [Sunny, Normal, Weak, Yes]

E n t r o p y (S 2) = - \frac{4}{5} lo g_{2} \frac{4}{5} - \frac{1}{5} lo g_{2} \frac{1}{5} = 0.722

Outlook on S2

Value	Records	Entropy
Overcast	D7 → [Yes]	0 (pure)
Sunny	D9 → [Yes]	0 (pure)
Rain	D4, D5, D6 → [Yes, No, Yes]	$- \frac{2}{3} lo g_{2} \frac{2}{3} - \frac{1}{3} lo g_{2} \frac{1}{3} = 0.918$

G ain (S 2, O u tl oo k) = 0.722 - (\frac{1}{5} \times 0 + \frac{1}{5} \times 0 + \frac{3}{5} \times 0.918) = 0.722 - 0.551 = 0.171

Humidity on S2

Value	Records	Entropy
High	D4 → [Yes]	0 (pure)
Normal	D5, D6, D7, D9 → [No, Yes, Yes, Yes]	$- \frac{3}{4} lo g_{2} \frac{3}{4} - \frac{1}{4} lo g_{2} \frac{1}{4} = 0.811$

G ain (S 2, H u mi d i t y) = 0.722 - (\frac{1}{5} \times 0 + \frac{4}{5} \times 0.811) = 0.722 - 0.649 = 0.073

Wind on S2

Value	Records	Entropy
Weak	D4, D5, D9 → [Yes, No, Yes]	0.918
Strong	D6, D7 → [Yes, Yes]	0 (pure)

G ain (S 2, W in d) = 0.722 - (\frac{3}{5} \times 0.918 + \frac{2}{5} \times 0) = 0.722 - 0.551 = 0.171

Summary for S2:

Attribute	Gain
Outlook	0.171 ← tied max
Humidity	0.073
Wind	0.171 ← tied max

Both Outlook and Wind tie at 0.171. We pick Outlook (alphabetically or by convention).

✅ Split Temp=Cool on Outlook:

Overcast → Yes (pure)
Sunny → Yes (pure)
Rain → [D4: Yes, D5: No, D6: Yes] — needs further split

Further split: Temp=Cool, Outlook=Rain

Records: D4 [High, Weak, Yes], D5 [Normal, Weak, No], D6 [Normal, Strong, Yes]

Entropy = 0.918

Wind:

Weak	D4, D5 → [Yes, No]	E = 1.0
Strong	D6 → [Yes]	E = 0

G ain = 0.918 - (\frac{2}{3} \times 1.0 + \frac{1}{3} \times 0) = 0.918 - 0.667 = 0.251

Humidity:

High	D4 → [Yes]	E = 0
Normal	D5, D6 → [No, Yes]	E = 1.0

G ain = 0.918 - (\frac{1}{3} \times 0 + \frac{2}{3} \times 1.0) = 0.918 - 0.667 = 0.251

Both tie. Pick Wind:

Strong → Yes
Weak → [D4: Yes, D5: No] → still impure → pick majority → Yes

Model 1 Classification

Unseen point: Outlook=Overcast, Temp=Mild, Humidity=Normal, Wind=Weak

Root: Temp = Mild → go to Mild branch
Mild → Outlook = Overcast → Yes

Model 1 Prediction: ✅ Yes

Model 1 – Step 4: Final expansion of the Temp = Cool branch in Model 1. Outlook is selected as the next split, and the Rain subset further splits on Wind into Strong → Yes and Weak → Yes (majority leaf).

Model 2 — Bootstrap Sample (D3–D12)

Day	Outlook	Temp	Humidity	Wind	Can Play
D3	Overcast	Mild	High	Weak	Yes
D4	Rain	Cool	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes

7 Yes, 3 No (10 records)

E n t r o p y (S) = - \frac{7}{10} lo g_{2} \frac{7}{10} - \frac{3}{10} lo g_{2} \frac{3}{10} = 0.360 + 0.521 = 0.881

Information Gain for Each Attribute

Outlook

Value	Records	Entropy
Sunny	D8, D9, D11 → [No, Yes, Yes]	0.918
Overcast	D3, D7, D12 → [Yes, Yes, Yes]	0 (pure)
Rain	D4, D5, D6, D10 → [Yes, Yes, No, Yes]	0.811

G ain (S, O u tl oo k) = 0.881 - (\frac{3}{10} \times 0.918 + \frac{3}{10} \times 0 + \frac{4}{10} \times 0.811) = 0.881 - 0.600 = 0.281

Temp

Value	Records	Entropy
Mild	D3, D8, D10, D11, D12 → [Yes, No, Yes, Yes, Yes]	$- \frac{4}{5} lo g_{2} \frac{4}{5} - \frac{1}{5} lo g_{2} \frac{1}{5} = 0.722$
Cool	D4, D5, D6, D7, D9 → [Yes, Yes, No, Yes, Yes]	$- \frac{4}{5} lo g_{2} \frac{4}{5} - \frac{1}{5} lo g_{2} \frac{1}{5} = 0.722$

G ain (S, T e m p) = 0.881 - (\frac{5}{10} \times 0.722 + \frac{5}{10} \times 0.722) = 0.881 - 0.722 = 0.159

Humidity

Value	Records	Entropy
High	D3, D4, D8, D12 → [Yes, Yes, No, Yes]	0.811
Normal	D5, D6, D7, D9, D10, D11 → [Yes, No, Yes, Yes, Yes, Yes]	$- \frac{5}{6} lo g_{2} \frac{5}{6} - \frac{1}{6} lo g_{2} \frac{1}{6} = 0.650$

G ain (S, H u mi d i t y) = 0.881 - (\frac{4}{10} \times 0.811 + \frac{6}{10} \times 0.650) = 0.881 - 0.714 = 0.167

Wind

Value	Records	Entropy
Weak	D3, D4, D5, D8, D9, D10 → [Yes, Yes, Yes, No, Yes, Yes]	$- \frac{5}{6} lo g_{2} \frac{5}{6} - \frac{1}{6} lo g_{2} \frac{1}{6} = 0.650$
Strong	D6, D7, D11, D12 → [No, Yes, Yes, Yes]	0.811

G ain (S, W in d) = 0.881 - (\frac{6}{10} \times 0.650 + \frac{4}{10} \times 0.811) = 0.881 - 0.714 = 0.167

Summary of Gains (Model 2):

Attribute	Gain
Outlook	0.281 ← Max
Temp	0.159
Humidity	0.167
Wind	0.167

✅ Outlook has the highest gain → Root Node = Outlook

Model 2 – Step 1: Initial ID3 tree construction for Random Forest Model 2 showing root node selection. Outlook is chosen as the root attribute with the highest information gain (0.281), branching into Overcast, Sunny, and Rain.

Outlook = Overcast → Pure Yes

All 3 records are Yes → Leaf: Yes

Model 2 – Step 2: Expansion of the Outlook = Overcast branch in Model 2. Since all records are positive, the branch becomes a pure “Yes” leaf node.

Outlook = Sunny

Records: D8 [Mild, High, Weak, No], D9 [Cool, Normal, Weak, Yes], D11 [Mild, Normal, Strong, Yes]

Entropy = 0.918

Humidity:

High	D8 → [No]	E = 0
Normal	D9, D11 → [Yes, Yes]	E = 0

G ain (H u mi d i t y) = 0.918 - 0 = 0.918

Split on Humidity:

High → No
Normal → Yes

Model 2 – Step 3: Expansion of the Outlook = Sunny branch in Model 2. Humidity is selected as the best splitting attribute (gain = 0.918), producing High → No and Normal → Yes leaf nodes.

Outlook = Rain

Records: D4 [Cool, High, Weak, Yes], D5 [Cool, Normal, Weak, Yes], D6 [Cool, Normal, Strong, No], D10 [Mild, Normal, Weak, Yes]

Entropy = $- \frac{3}{4} lo g_{2} \frac{3}{4} - \frac{1}{4} lo g_{2} \frac{1}{4} = 0.811$

Wind:

Weak	D4, D5, D10 → [Yes, Yes, Yes]	E = 0
Strong	D6 → [No]	E = 0

G ain (W in d) = 0.811 - 0 = 0.811

Split on Wind:

Weak → Yes
Strong → No

Model 2 Classification

Unseen point: Outlook=Overcast, Temp=Mild, Humidity=Normal, Wind=Weak

Root: Outlook = Overcast → Yes

Model 2 Prediction: ✅ Yes

Model 2 – Step 4: Final expansion of the Outlook = Rain branch in Model 2. Wind is selected as the splitting attribute (gain = 0.811), producing Weak → Yes and Strong → No classifications.

Model 3 — Bootstrap Sample (D5–D14)

Day	Outlook	Temp	Humidity	Wind	Can Play
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes
D13	Overcast	Hot	Normal	Weak	Yes
D14	Rain	Mild	High	Strong	No

7 Yes, 3 No (10 records)

E n t r o p y (S) = - \frac{7}{10} lo g_{2} \frac{7}{10} - \frac{3}{10} lo g_{2} \frac{3}{10} = 0.881

Information Gain for Each Attribute

Outlook

Value	Records	Entropy
Sunny	D8, D9, D11 → [No, Yes, Yes]	0.918
Overcast	D7, D12, D13 → [Yes, Yes, Yes]	0 (pure)
Rain	D5, D6, D10, D14 → [Yes, No, Yes, No]	1.0

G ain (S, O u tl oo k) = 0.881 - (\frac{3}{10} \times 0.918 + \frac{3}{10} \times 0 + \frac{4}{10} \times 1.0) = 0.881 - 0.675 = 0.206

Temp

Value	Records	Entropy
Hot	D13 → [Yes]	0
Mild	D8, D10, D11, D12, D14 → [No, Yes, Yes, Yes, No]	$- \frac{3}{5} lo g_{2} \frac{3}{5} - \frac{2}{5} lo g_{2} \frac{2}{5} = 0.971$
Cool	D5, D6, D7, D9 → [Yes, No, Yes, Yes]	0.811

G ain (S, T e m p) = 0.881 - (\frac{1}{10} \times 0 + \frac{5}{10} \times 0.971 + \frac{4}{10} \times 0.811) = 0.881 - 0.810 = 0.071

Humidity

Value	Records	Entropy
High	D8, D12, D14 → [No, Yes, No]	0.918
Normal	D5, D6, D7, D9, D10, D11, D13 → [Yes, No, Yes, Yes, Yes, Yes, Yes]

G ain (S, H u mi d i t y) = 0.881 - (\frac{3}{10} \times 0.918 + \frac{7}{10} \times 0.592) = 0.881 - 0.690 = 0.191

Wind

Value	Records	Entropy
Weak	D5, D8, D9, D10, D13 → [Yes, No, Yes, Yes, Yes]	0.722
Strong	D6, D7, D11, D12, D14 → [No, Yes, Yes, Yes, No]	0.971

G ain (S, W in d) = 0.881 - (\frac{5}{10} \times 0.722 + \frac{5}{10} \times 0.971) = 0.881 - 0.847 = 0.034

Summary of Gains (Model 3):

Attribute	Gain
Outlook	0.206 ← Max
Temp	0.071
Humidity	0.191
Wind	0.034

✅ Outlook has the highest gain → Root Node = Outlook

Model 3 – Step 1: Initial ID3 tree construction for Random Forest Model 3 showing root node selection. Outlook is selected as the root node with the highest information gain (0.206).

Outlook = Overcast

Records: D7, D12, D13 → all Yes → Leaf: Yes

Outlook = Sunny

Records: D8 [Mild, High, Weak, No], D9 [Cool, Normal, Weak, Yes], D11 [Mild, Normal, Strong, Yes]

Entropy = 0.918

Humidity splits perfectly (same as Model 2):

High → No
Normal → Yes

Model 3 – Step 3: Expansion of the Outlook = Sunny branch in Model 3. Humidity is selected as the best splitting attribute, producing High → No and Normal → Yes outcomes.

Outlook = Rain

Records: D5 [Cool, Normal, Weak, Yes], D6 [Cool, Normal, Strong, No], D10 [Mild, Normal, Weak, Yes], D14 [Mild, High, Strong, No]

Entropy = 1.0 (2 Yes, 2 No)

Wind:

Weak	D5, D10 → [Yes, Yes]	E = 0
Strong	D6, D14 → [No, No]	E = 0

G ain (W in d) = 1.0 - 0 = 1.0

Split on Wind:

Weak → Yes
Strong → No

Model 3 Classification

Unseen point: Outlook=Overcast, Temp=Mild, Humidity=Normal, Wind=Weak

Root: Outlook = Overcast → Yes

Model 3 Prediction: ✅ Yes

Model 3 – Step 4: Final expansion of the Outlook = Rain branch in Model 3. Wind is selected as the optimal split with perfect information gain (1.000), producing Weak → Yes and Strong → No leaf nodes.

Final Prediction — Majority Vote

Model	Prediction
Model 1 (Temp root)	Yes
Model 2 (Outlook root)	Yes
Model 3 (Outlook root)	Yes

Yes: 3 votes No: 0 votes

✅ Final Answer: The unseen data point (Overcast, Mild, Normal, Weak) is classified as Yes — the person can play.

Summary — Key Takeaways

Concept	Explanation
Bootstrap sampling	Each tree trains on a random subset (with replacement) of the data
ID3 algorithm	Uses entropy and information gain to pick the best splitting attribute
Root node selection	The attribute with the highest information gain becomes the split
Pure leaf	When all records in a subset belong to one class, stop splitting
Majority vote	Final prediction = class chosen by most trees
Ensemble benefit	Individual tree errors cancel out, giving better overall accuracy

Random Forest is powerful precisely because no two trees are identical — each sees a different slice of the data and grows differently. Their disagreements cancel out, leaving only the signal.