Decision Tree

Apr 24, 2025

Updated 1 month ago

5 min read

Decision Tree

A decision tree is a tree-shaped diagram used to determine a course of action. Each branch of the tree represents a possible decision, occurrence, or reaction.

Example

Consider a situation where someone is searching for a job. At the beginning of the process, they decide to consider only those jobs offering a monthly salary of at least ₹50,000. Additionally, they dislike spending excessive time commuting and are only comfortable if the travel time to work is less than an hour. They also expect the company to provide free coffee every morning.

The decision-making process regarding whether to accept or reject a job offer can be schematically represented using a decision tree.

This is a figure representing a decision tree:

Decision tree example showing classification based on salary, commute time, and incentives leading to accept or decline decision

Structure of a Decision Tree

A decision tree is a graph-theoretical tree, where:

Leaf nodes (represented as ellipses) indicate final decisions or outcomes.
Internal nodes (all nodes except the root) represent intermediate decisions based on various conditions.

Types of Decision Trees

Classification Trees: These are tree models where the target variable can take a discrete set of values. In a classification tree:
- Leaves represent class labels.
- Branches represent conjunctions of features leading to specific class labels.
Regression Trees: These trees handle target variables that take continuous values (i.e., real numbers). Examples include:
- Predicting the price of a house.
- Estimating the duration of a phone call.

Classification Tree,

See, we'll explain this with the help of an example.

A dataset is given to us, with features and class labels.

Advantages of Decision Tree

It is simple to understand, interpret, and visualize.
Little effort required for data preparation
Can handle both categorical and numerical data
Non-linear parameters do not affect its performance

Disadvantages of decision trees

Overfitting occurs when the A algorithm captures noise in the data.
High variance can cause the model to become unstable due to even slight variations in the data.
highly complicated decision tree tends to have low buyers, which makes it typical for the model to work with new data

Entropy

entropy is a measure of impurity in a data set, sets with high entropy are very diverse and provide little information about other items that may also belongs in the set, as there is no apparent commonality entropy measured in bits, if there are only two possible classes, entropy value range from 0 to 1.
For $n$ classes, entropy ranges from 0 to $lo g_{2} n$ . In each case, the minimum value indicates that the sample is completely homogenous, which the maximum value indicates that the data is as diverse as possible

Definition

consider a segment $S^{'}$ of dataset having $C$ number of class labels, let $p_{i}$ be the proportion of the examples m $S^{'}$ having the $i^{t h}$ class label. then the Entropy of $s^{'}$ is defined as:

Entropy (S^{'}) = - i = 1 \sum c p_{i} lo g_{2} (p_{i})

🔍 Where:

$S$ = the dataset (or subset of examples),
$C$ = number of classes (e.g., "Up", "Down"),
$P_{i}$ = proportion of examples in class $i$ ,
$lo g_{2}$ = logarithm base 2.

Decision tree visualization showing probability curve or decision boundary used to evaluate splits in classification

in this expression for Entropy, the value

$0 \times lo g_{2} (0) is taken as 0$

💡 Example:

If you have:

4 "Up"
6 "Down"

Then:

p_{Up} = \frac{4}{10}, p_{Down} = \frac{6}{10}

E n t r o p y = - (\frac{4}{10} lo g_{2} \frac{4}{10} + \frac{6}{10} lo g_{2} \frac{6}{10})

That’s how you get the uncertainty of a dataset 🔥

Special Case

Let the data segment $S^{'}$ has only two class labels says " $y es$ " and " $n o$ " if $p$ is proportion of examples having class label " $y es$ ", then proportion of examples have label " $n o$ ", will be $(1 - p)$ . in this case Entropy of $S^{'}$ is given by

E n t r o p y (S^{'}) = - pl o g_{2} (p) - (1 - p) l o g_{2} (1 - p)

💡 Example:

Let $i$ be some class Label, then we denote $p_{i}$ the proportion of examples with class label i

S.no	Name	Give Birth	Aquatic Animal	Aerial Animal	Has Legs	Class Label
1	Human	Yes	No	No	Yes	Mammal
2	Python	No	No	No	No	Reptile
3	Salmon	No	Yes	No	No	Fish
4	Frog	No	Semi	No	Yes	Amphibian
5	Bat	Yes	No	Yes	Yes	Bird
6	Pigeon	No	No	Yes	Yes	Bird
7	Cat	Yes	No	No	Yes	Mammal
8	Shark	Yes	Yes	No	No	Fish

Let $S^{'}$ be the data in given table, the class label are

Class Label	Count	$p_{i}$
Mammal	2	2/8
Reptile	1	1/8
Fish	2	2/8
Amphibian	1	1/8
Bird	2	2/8

Entropy (S^{'}) = - for all classes i \sum p_{i} lo g_{2} (p_{i})

- (\frac{2}{8} lo g_{2} \frac{2}{8}) - (\frac{1}{8} lo g_{2} \frac{1}{8}) - (\frac{2}{8} lo g_{2} \frac{2}{8}) - (\frac{1}{8} lo g_{2} \frac{1}{8}) - (\frac{2}{8} lo g_{2} \frac{2}{8})

= 0.5 + 0.375 + 0.5 + 0.375 + 0.5

= 2.25

Information Gain:

Let $S^{'}$ be a set of Examples, A be a feature (or an attribute), $S_{v}$ be the subset of $S^{'}$ with $A = v$ , and value of $(A)$ be the set of all possible values of A. then the information gain of an attribute A relative to the set $S^{'}$ , denoted by $Gain (S, A)$

📘 Gain Formula:

Gain (S, A) = Entropy (S^{'}) - v \in Values (A) \sum \frac{∣ S _{v} ∣}{∣ S ∣} \cdot Entropy (S_{v})

🔍 Where:

$S$ = full dataset
$A$ = attribute (like Age, Type, etc.)
$Values (A)$ = all possible values of attribute A
$S_{v}$ = subset of S where attribute A has value v
$∣ S_{v} ∣$ = size of that subset
$∣ S ∣$ = size of total dataset
$Entropy (S_{v})$ = entropy of that subset

Computation of $Gain (S, Gives births)$

🔷 Attribute : gives birth(Yes, No)

$E n t r o p y (Yes)$

Yes : $S$ = [Mammal, Mammal, Fish, Bird]

E = - (\frac{2}{4} lo g_{2} \frac{2}{4}) - (\frac{1}{4} lo g_{2} \frac{1}{4}) - (\frac{1}{4} lo g_{2} \frac{1}{4})

E = 0.5 + 0.5 + 0.5

E = 1.5

$E n t r o p y (No)$

No : $S$ = [Reptile, Fish, Amphibian, Bird]

E = - (\frac{1}{4} lo g_{2} \frac{1}{4}) - (\frac{1}{4} lo g_{2} \frac{1}{4}) - (\frac{1}{4} lo g_{2} \frac{1}{4}) - (\frac{1}{4} lo g_{2} \frac{1}{4})

E = 0.5 + 0.5 + 0.5 + 0.5

E = 2

$G ain (S, gives Birth)$

= Entropy (S) - \frac{∣ S _{y es} ∣}{∣ S ∣} \cdot Entropy (S_{y es}) - \frac{∣ S _{n o} ∣}{∣ S ∣} \cdot Entropy (S_{n o}) \cdot Entropy (S_{n o})

= 2.25 - (\frac{4}{8} \cdot 1.5 + \frac{4}{8} \cdot 2)

2.25 - 1.75 = 0.5

Gini Indices

The gini slit index of a dataset is another feature selection measure in the construction of classification tree. This measure is used in the cart algorithm.

Consider a data Set $S^{'}$ having Tau $(τ)$ class labels $C_{1}, C_{2}, C_{3}, ... C_{τ}$ let $p_{i}$ be the proportion of examples having the class label $C_{i}$ , The Gini index of the data set $S^{'}$ , denoted by Gini $(S^{'})$ is defined by:

📘 Gain Formula:

G ini (S^{'}) = 1 - i = 1 \sum τ p_{i}^{2}

construct the decision tree using $I D_{3}$ algorithm for given data set.

S.no	Age	Competition	Type	Profit (Class)
1	Old	Yes	Soft	Down
2	Old	No	Soft	Down
3	Old	No	Hard	Down
4	Mid	Yes	Soft	Down
5	Mid	Yes	Hard	Down
6	Mid	No	Hard	Up
7	Mid	No	Soft	Up
8	New	Yes	Soft	Up
9	New	No	Hard	Up
10	New	No	Soft	Up

🔧 Step 1: Initial Entropy of Dataset

E n t r o p y (S^{'}) = - p_{U p} l o g_{2} (p_{U p}) - p_{D o w n} l o g_{2} (p_{D o w n})

Where:

Total = 10
Down = 5 (1 to 5)
Up = 5 (6 to 10)

$p_{u p} = \frac{count of Up}{total} = \frac{5}{10}$
$p_{D o w n} = \frac{count of Down}{total} = \frac{5}{10}$

So:

E n t r o p y (S^{'}) = - \frac{5}{10} lo g_{2} (\frac{5}{10}) - \frac{5}{10} lo g_{2} (\frac{5}{10})

= - \frac{5}{10} l o g_{2} (\frac{1}{2}) - \frac{5}{10} l o g_{2} (\frac{1}{2})

= - \frac{5}{10} l o g_{2} (2)^{- 1} - \frac{5}{10} l o g_{2} (2)^{- 1}

= \frac{5}{10} l o g_{2} (2) + \frac{5}{10} l o g_{2} (2)

= \frac{5}{10} \cdot 1 + \frac{5}{10} \cdot 1

= 0.5 + 0.5

$lo g_{a} b^{c} = c lo g_{a} b$
$l o g_{a} a = 1$

✅ Entropy( $S^{'}$ ) = 1

Step 2: Calculate Information Gain for all attributes

Attribute 1: Age (Old, Mid, New)

$E n t r o p y (Old)$

Old : $S$ = [Down, Down, Down]

E = - \frac{3}{3} lo g_{2} \frac{3}{3}

E = 0

$E n t r o p y (Mid)$

Mid : $S$ = [Down, Down, Up, Up]

= - \frac{2}{4} l o g_{2} \frac{2}{4} - \frac{2}{4} l o g_{2} \frac{2}{4}

$E = - 0.5 l o g_{2} 0.5 - 0.5 l o g_{2} 0.5$ $$ $E = 1$ $

$E n t r o p y (New)$

New : $S$ = [Up, Up, Up]

E = - \frac{3}{3} l o g_{2} \frac{3}{3}

$E = 0$ $

$G ain (S^{'}, A g e)$

= Entropy (S^{'}) - \frac{∣ S _{o l d} ∣}{∣ S ∣} \cdot Entropy (S_{o l d}) - \frac{∣ S _{mi d} ∣}{∣ S ∣} \cdot Entropy (S_{mi d}) - \frac{∣ S _{n e w} ∣}{∣ S ∣} \cdot Entropy (S_{n e w})

= 1 - (\frac{3}{10} \cdot 0 + \frac{4}{10} \cdot 1 + \frac{3}{10} \cdot 0)

1 - 0.4 = 0.6

✅ Gain(Age) = 0.6

Attribute 2: Competition (Yes, No)

$E n t r o p y (S_{Yes}^{'})$

Yes : $S$ = [Down, Down, Down, Up]

= - \frac{3}{4} l o g_{2} \frac{3}{4} - \frac{1}{4} l o g_{2} \frac{1}{4}

E = - 0.75 \cdot lo g_{2} 0.75 - 0.25 \cdot lo g_{2} 0.25

E \approx 0.811

$E n t r o p y (S_{No}^{'})$

No : $S$ = [Down, Down, Up, Up, Up, Up]

= - \frac{2}{6} l o g_{2} \frac{2}{6} - \frac{4}{6} l o g_{2} \frac{4}{6}

E = - 0.333. lo g_{2} 0.333 - 0.666. lo g_{2} 0.666

E \approx 0.918

$G ain (S^{'}, Competition)$

= 1 - (\frac{4}{10} \cdot 0.811 + \frac{6}{10} \cdot 0.918)

\approx 1 - (0.3244 + 0.5508) = 0.1248

✅ Gain(Competition) = 0.125

Attribute 3: Type (Soft, Hard)

$E n t r o p y (Soft)$

Soft : $S$ = [Down, Down, Down, Up, Up, Up]

E = - \frac{3}{6} l o g_{2} \frac{3}{6} - \frac{3}{6} l o g_{2} \frac{3}{6}

E = - \frac{1}{2} lo g_{2} \frac{1}{2} - \frac{1}{2} lo g_{2} \frac{1}{2}

E = - 0.5 lo g_{2} 2^{- 1} - 0.5 lo g_{2} 2^{- 1}

E = 0.5 \cdot 1 + 0.5 \cdot 1

E = 1

$E n t r o p y (Hard)$

Hard : $S$ = [Down, Down, Up, Up]

E = - \frac{2}{4} l o g_{2} \frac{2}{4} - \frac{2}{4} l o g_{2} \frac{2}{4}

E = - \frac{1}{2} lo g_{2} \frac{1}{2} - \frac{1}{2} lo g_{2} \frac{1}{2}

E = - 0.5 lo g_{2} 2^{- 1} - 0.5 lo g_{2} 2^{- 1}

E = 0.5 \cdot 1 + 0.5 \cdot 1

E = 1

$G ain (S^{'}, A g e)$

= 1 - (\frac{6}{10} \cdot 1 + \frac{4}{10} \cdot 1)

= 1 - (0.6 + 0.4) = 0

✅ Gain(Type) = 0

Find the maximum Gain

✅ Gain(Age) = 0.6

✅ Gain(Competition) = 0.125

✅ Gain(Type) = 0

max (0.6, 0.125, 0) = 0.6

Age gives the highest gain (0.6).

Thus, "Age" will be placed at the root of Decision Tree.

The Decision Tree Formation

Decision tree diagram with root node “Age” splitting into branches (Old, Mid, New) leading to child nodes

$Age = Old$

$S^{1}$ , a subset of $S^{'}$ for which Age = Old

Age	Competition	Type	Profit (Class)
Old	Yes	Soft	Down
Old	No	Soft	Down
Old	No	Hard	Down

Observation:

All 3 examples → Profit = Down
✅ Pure group!

thus

E n t r o p y (S^{1}) = 0

Leaf Node = "Down"

$Age = Mid$

$S^{2}$ a subset of $S^{'}$ for which Age = Mid

Age	Competition	Type	Profit (Class)
Mid	Yes	Soft	Down
Mid	Yes	Hard	Down
Mid	No	Hard	Up
Mid	No	Soft	Up

Observation:

2 examples → Down
2 examples → Up

E n t r o p y (S^{2}) = - \frac{2}{4} l o g_{2} \frac{2}{4} - \frac{2}{4} l o g_{2} \frac{2}{4}

E n t r o p y (S^{2}) = 1

✅ Mid group is impure (entropy = 1).
Further splitting needed.

🔷Attribute 1: Competition(Yes, No)

$E n t r o p y (Yes)$

Yes : $S$ = [Down, Down]

E n t r o p y (Yes) = 0

$E n t r o p y (No)$

No: $S$ = [Up, Up]

E n t r o p y (No) = 0

$G ain (S^{2}, C o m p e t i t i o n)$

= 1 - (\frac{2}{4} \cdot 0 + \frac{2}{4} \cdot 0)

1 - 0 = 1

✅ Gain(Competition) = 1

🔷Attribute 2: Type(Soft, Hard)

$E n t r o p y (Soft)$

Soft: $S$ = [Down, Up]

- \frac{1}{2} l o g_{2} \frac{1}{2} - \frac{1}{2} l o g_{2} \frac{1}{2} = 1

$E n t r o p y (Soft) = 0$

$E n t r o p y (Hard)$

Hard : $S$ = [Down, Up]

- \frac{1}{2} l o g_{2} \frac{1}{2} - \frac{1}{2} l o g_{2} \frac{1}{2} = 1

$E n t r o p y (Hard) = 1$

$G ain (S^{2}, T y p e)$

= 1 - (\frac{2}{4} \cdot 1 + \frac{2}{4} \cdot 1)

1 - 1 = 0

✅ Gain(Type) = 1

Since the Gain is maximum for the attribute, so putting the competition at Node 2

$Age = New$

$S^{3}$ a subset of $S^{'}$ for which Age = New

Age	Competition	Type	Profit (Class)
New	Yes	Soft	Up
New	No	Hard	Up
New	No	Soft	Up

All 3 examples → Profit = Up
✅ Pure group!

E n t r o p y (N e w) = 0

Leaf Node = "Up"

Decision tree with root “Age” and secondary split “Competition,” leading to final outcomes Down or Up.

$Competition = Yes$

All the corresponding class labels are down in $S^{2}$
✅ Pure group!

E n t r o p y (N e w) = 0

Leaf Node = "Down"

$Competition = No$

All the corresponding class labels are up in $S^{2}$
✅ Pure group!

E n t r o p y (N e w) = 0

Leaf Node = "Up"

Decision Tree

Decision Tree

Structure of a Decision Tree

Types of Decision Trees

Classification Tree,

Advantages of Decision Tree

Disadvantages of decision trees

Entropy

Definition

🔍 Where:

💡 Example:

Special Case

💡 Example:

Information Gain:

📘 Gain Formula:

🔍 Where:

Computation of $Gain (S, Gives births)$

Gini Indices

📘 Gain Formula:

construct the decision tree using $I D_{3}$ algorithm for given data set.

🔧 Step 1: Initial Entropy of Dataset

Step 2: Calculate Information Gain for all attributes

Find the maximum Gain

The Decision Tree Formation

$Age = Old$

$Age = Mid$

$Age = New$

$Competition = Yes$

$Competition = No$

Continue Reading

External Authority Links

Decision Tree

Decision Tree

Structure of a Decision Tree

Types of Decision Trees

Classification Tree,

Advantages of Decision Tree

Disadvantages of decision trees

Entropy

Definition

🔍 Where:

💡 Example:

Special Case

💡 Example:

Information Gain:

📘 Gain Formula:

🔍 Where:

Computation of Gain(S,Gives births)

Gini Indices

📘 Gain Formula:

construct the decision tree using ID3​ algorithm for given data set.

🔧 Step 1: Initial Entropy of Dataset

Step 2: Calculate Information Gain for all attributes

Find the maximum Gain

The Decision Tree Formation

Age=Old

Age=Mid

Age=New

Competition=Yes

Competition=No

Continue Reading

External Authority Links

Computation of $Gain (S, Gives births)$

construct the decision tree using $I D_{3}$ algorithm for given data set.

$Age = Old$

$Age = Mid$

$Age = New$

$Competition = Yes$

$Competition = No$