Decision Tree Algorithm

Jun 13, 2025

Updated 1 month ago

2 min read

Place the best feature (attribute) of the dataset at the root of the tree.
Split the training set into subsets, where each subset contains data with the same value for a feature.
Repeat steps 1 and 2 recursively for each subset until:
- All branches lead to leaf nodes.
- Leaf nodes contain class labels (decisions).

Used when:

Symbol	Description
$S$	Set of examples (training data)
$C$	Set of class labels {+, -}
$F$	Set of features (attributes)
$A$	$A$ feature in $F$
$V (A)$	Set of possible values of feature $A$
$v$	A single value from $V (A)$
$S_{v}$	Subset of $S$ where $A = v$

Create a root node for the tree.
If all examples in 𝑆 are positive, → return leaf node labeled "+"
If all examples in 𝑆 are negative, → return leaf node labeled "−"
If there are no more features, return a leaf node with the most common class in 𝑆.
Else:
- Choose feature A with the highest Information Gain.
- Assign A to the root node.
- For each value v in V(A):
  - Create a branch below the root labeled A = v
  - If 𝑆ᵥ is empty, add a leaf node with the most common label in 𝑆.
  - Else, recursively apply the algorithm on 𝑆ᵥ and features F \ {A}

Day	OUTLOOK	TEMP	HUMIDITY	WIND	PLAY TENNIS
D1	Sunny	Hot	High	Weak	No
D2	Sunny	Hot	High	Strong	No
D3	Overcast	Hot	High	Weak	Yes
D4	Rain	Mild	High	Weak	Yes
D5	Rain	Cool	Normal	Weak	Yes
D6	Rain	Cool	Normal	Strong	No
D7	Overcast	Cool	Normal	Strong	Yes
D8	Sunny	Mild	High	Weak	No
D9	Sunny	Cool	Normal	Weak	Yes
D10	Rain	Mild	Normal	Weak	Yes
D11	Sunny	Mild	Normal	Strong	Yes
D12	Overcast	Mild	High	Strong	Yes
D13	Overcast	Hot	Normal	Weak	Yes
D14	Rain	Mild	High	Strong	No

E n t r o p y (S) = - (\frac{9}{14}) lo g_{2} (\frac{9}{14}) - (\frac{5}{14}) lo g_{2} (\frac{5}{14})

= 0.940

✅ Entropy( $S^{'}$ ) = 0.940

$E n t r o p h y (S u nn y)$

Sunny: $S$ = [No, No, No, Yes, Yes]

E = - (\frac{3}{5} lo g_{2} \frac{3}{5}) - (\frac{2}{5} lo g_{2} \frac{2}{5})

E = - (0.6 lo g_{2} 0.6) - (0.4 lo g_{2} 0.4)

E = - (0.6 \times - 0.737) - (0.4 \times - 1.322)

E = 0.971

$E n t r o p h y (O v er c a s t)$

Sunny: $S$ = [Yes, Yes, Yes, Yes]

✅ Pure Node

E = 0

$E n t r o p h y (R ain)$

Sunny: $S$ = [Yes, Yes, Yes, No, No]

E = - (\frac{3}{5} lo g_{2} \frac{3}{5}) - (\frac{2}{5} lo g_{2} \frac{2}{5})

E = - (0.6 lo g_{2} 0.6) - (0.4 lo g_{2} 0.4)

E = - (0.6 \times - 0.737) - (0.4 \times - 1.322)

E = 0.971

$G ain (S^{'}, O U T L O O K)$

= E n t r o p y (S^{'}) - \frac{∣ S _{s u nn y} ∣}{∣ S ∣} \cdot E n t r o p y (S_{s u nn y}) - \frac{∣ S _{o v er c a s t} ∣}{∣ S ∣} \cdot E n t r o p y (S_{o v er c a s t}) - \frac{∣ S _{r ain} ∣}{∣ S ∣} \cdot E n t r o p y (S_{r ain})

= 0.94 - (\frac{5}{14} \cdot 0.971 + \frac{4}{14} \cdot 0 + \frac{5}{14} \cdot 0.971)

= 0.2464

✅ Gain(Outlook) = 0.2464