Apriori Algorithm

May 11, 2025

Updated 1 month ago

2 min read

Apriori Algorithm in Machine Learning

The Apriori Algorithm is used for association rule learning on transactional databases. It identifies frequent itemsets and uses them to generate association rules that show how strongly items are related.

It uses Breadth-First Search (BFS) and a Hash Tree to count itemset efficiently.

Proposed by: R. Agrawal & Srikant (1994)
Applications:
Market Basket Analysis
Healthcare (e.g., Drug Interaction Prediction)

What is a Frequent Itemset?

A frequent itemset is a group of items whose support is greater than a minimum support threshold.

💡 If {A, B} is frequent, then both A and B must be frequent individually.

Example:

Transactions:
- A = {1,2,3,4,5}
- B = {2,3,7}
Frequent itemsets = {2, 3} (appear in both)

Important Terms:

Support = Frequency of occurrence

Support (A \Rightarrow B) = \frac{Transactions containing both A and B}{Total number of transactions}

Confidence = Strength of implication
$Confidence (A \Rightarrow B) = \frac{Support ( A \cup B )}{Support ( A )}$

Lift = Strength of association

Apriori Algorithm Steps

Determine the support of itemsets in the transactional database, and select the minimum support and confidence.
Take all supports in the transaction with a higher support value than the minimum or selected support value.
Find all the rules of these subsets that have a higher confidence value than the threshold or minimum confidence.
Sort the rules in decreasing order of lift.

Apriori Example

We will understand the Apriori algorithm using an example and mathematical calculation:

Example:

Suppose we have the following dataset that has various transactions, and from this

dataset, we need to find the frequent itemsets and generate the association rules using the

Apriori algorithm

TID	ITEMSETS
T1	A,B
T2	B,D
T3	B,C
T4	A,B,D
T5	A,C
T6	B,C
T7	A,C
T8	A,B,C,E
T9	A,B,C

Minimum Support = 2
Minimum Confidence = 50%

Solution

✅ Step 1: C1 and L1 (Single Items)

itemset	Support_Count
A	6
B	7
C	5
D	2
E	1 ❌ (Removed)

✅ Step 2: C2 and L2 (Pairs)

Itemset	Support
{A, B}	4
{A, C}	4
{A, D}	1 ❌
{B, C}	4
{B, D}	2
{C, D}	0 ❌

✔️ Frequent Pairs (L2):
→ {A, B}, {A, C}, {B, C}, {B, D}

✅ Step 3: C3 and L3 (Triplets)

Itemset	Support
{A, B, C}	2 ✔️
{B, C, D}	1 ❌
{A, C, D}	0 ❌
{A, B, D}	0 ❌

✔️ Only Frequent Triplet (L3): {A, B, C}

✅ Step 4: Generate Association Rules

From {A, B, C} with Support = 2:

Confidence (A \Rightarrow B) = \frac{Support ( A \cup B )}{Support ( A )}

Rule	Support	Confidence
A, B → C	2	2/4 = 50% ✔️
A, C → B	2	2/4 = 50% ✔️
B, C → A	2	2/4 = 50% ✔️
A → B, C	2	2/6 = 33.33% ❌
B → A, C	2	2/7 = 28.57% ❌
C → A, B	2	2/5 = 40% ❌

✔️ Strong Rules (≥ 50% Confidence):
→ A, B → C, A, C → B, B, C → A

Advantages of Apriori

Simple and easy to understand
Effective join and prune steps
Good for interpretable rules in large datasets

Disadvantages of Apriori

Slow performance for large datasets
Multiple database scans reduce efficiency.
Time & space complexity: O(2^D) → exponential in large itemsets (D = item count)