Created
Jun 13, 2025
Last Modified
1 month ago

Decision Tree Algorithm

Decision Tree Algorithm

Steps to Construct a Decision Tree

  1. Place the best feature (attribute) of the dataset at the root of the tree.

  2. Split the training set into subsets, where each subset contains data with the same value for a feature.

  3. Repeat steps 1 and 2 recursively for each subset until:

    • All branches lead to leaf nodes.

    • Leaf nodes contain class labels (decisions).


  • ID3 (Iterative Dichotomiser 3) – Developed by Ross Quinlan

  • C4.5 – Successor to ID3, also developed by Ross Quinlan

  • CART (Classification and Regression Trees)

  • OneR (One Rule Algorithm) – Developed by Robert Holte


📌 The ID3 Algorithm (based on Information Gain)

Used when:

  • There are two class labels, "+" (positive) and "−" (negative)


🔤 Notation

Symbol

Description

Set of examples (training data)

Set of class labels {+, -}

Set of features (attributes)

feature in

Set of possible values of feature

A single value from

Subset of where


🔁 ID3 Algorithm – Steps

  1. Create a root node for the tree.

  2. If all examples in 𝑆 are positive, → return leaf node labeled "+"

  3. If all examples in 𝑆 are negative, → return leaf node labeled "−"

  4. If there are no more features, return a leaf node with the most common class in 𝑆.

  5. Else:

    • Choose feature A with the highest Information Gain.

    • Assign A to the root node.

    • For each value v in V(A):

      • Create a branch below the root labeled A = v

      • If 𝑆ᵥ is empty, add a leaf node with the most common label in 𝑆.

      • Else, recursively apply the algorithm on 𝑆ᵥ and features F \ {A}


🧪 Training Dataset

Day

OUTLOOK

TEMP

HUMIDITY

WIND

PLAY TENNIS

D1

Sunny

Hot

High

Weak

No

D2

Sunny

Hot

High

Strong

No

D3

Overcast

Hot

High

Weak

Yes

D4

Rain

Mild

High

Weak

Yes

D5

Rain

Cool

Normal

Weak

Yes

D6

Rain

Cool

Normal

Strong

No

D7

Overcast

Cool

Normal

Strong

Yes

D8

Sunny

Mild

High

Weak

No

D9

Sunny

Cool

Normal

Weak

Yes

D10

Rain

Mild

Normal

Weak

Yes

D11

Sunny

Mild

Normal

Strong

Yes

D12

Overcast

Mild

High

Strong

Yes

D13

Overcast

Hot

Normal

Weak

Yes

D14

Rain

Mild

High

Strong

No


🔧 Step 1: Calculate Initial Entropy of the Dataset

  • Total = 14

  • Yes = 9, No = 5

Entropy() = 0.940


🔍 Step 2: Calculate Information Gain for Each Attribute


Attribute 1: OUTLOOK (Sunny, Overcast, Rain)

Sunny: = [No, No, No, Yes, Yes]

Sunny: = [Yes, Yes, Yes, Yes]

Pure Node

Sunny: = [Yes, Yes, Yes, No, No]

Gain(Outlook) = 0.2464