Apriori Algorithm
Apriori Algorithm in Machine Learning
The Apriori Algorithm is used for association rule learning on transactional databases. It identifies frequent itemsets and uses them to generate association rules that show how strongly items are related.
It uses Breadth-First Search (BFS) and a Hash Tree to count itemset efficiently.
Proposed by: R. Agrawal & Srikant (1994)
Applications:
Market Basket Analysis
Healthcare (e.g., Drug Interaction Prediction)
🔸 What is a Frequent Itemset?
A frequent itemset is a group of items whose support is greater than a minimum support threshold.
💡 If
{A, B}is frequent, then bothAandBmust be frequent individually.
Example:
Transactions:
A =
{1,2,3,4,5}B =
{2,3,7}
Frequent itemsets =
{2, 3}(appear in both)
🔹 Important Terms:
Support = Frequency of occurrence
Confidence = Strength of implication
Lift = Strength of association
🔸 Apriori Algorithm Steps
Determine the support of itemsets in the transactional database, and select the minimum support and confidence.
Take all supports in the transaction with a higher support value than the minimum or selected support value.
Find all the rules of these subsets that have a higher confidence value than the threshold or minimum confidence.
Sort the rules in decreasing order of lift.
🔹 Apriori Example
We will understand the Apriori algorithm using an example and mathematical calculation:
Example:
Suppose we have the following dataset that has various transactions, and from this
dataset, we need to find the frequent itemsets and generate the association rules using the
Apriori algorithm
TID | ITEMSETS |
|---|---|
T1 | A,B |
T2 | B,D |
T3 | B,C |
T4 | A,B,D |
T5 | A,C |
T6 | B,C |
T7 | A,C |
T8 | A,B,C,E |
T9 | A,B,C |
Minimum Support = 2
Minimum Confidence = 50%
🧠 Solution
✅ Step 1: C1 and L1 (Single Items)
itemset | Support_Count |
|---|---|
A | 6 |
B | 7 |
C | 5 |
D | 2 |
E | 1 ❌ (Removed) |
✅ Step 2: C2 and L2 (Pairs)
Itemset | Support |
|---|---|
{A, B} | 4 |
{A, C} | 4 |
{A, D} | 1 ❌ |
{B, C} | 4 |
{B, D} | 2 |
{C, D} | 0 ❌ |
✔️ Frequent Pairs (L2):
→ {A, B}, {A, C}, {B, C}, {B, D}
✅ Step 3: C3 and L3 (Triplets)
Itemset | Support |
|---|---|
{A, B, C} | 2 ✔️ |
{B, C, D} | 1 ❌ |
{A, C, D} | 0 ❌ |
{A, B, D} | 0 ❌ |
✔️ Only Frequent Triplet (L3): {A, B, C}
✅ Step 4: Generate Association Rules
From {A, B, C} with Support = 2:
Rule | Support | Confidence |
|---|---|---|
A, B → C | 2 | 2/4 = 50% ✔️ |
A, C → B | 2 | 2/4 = 50% ✔️ |
B, C → A | 2 | 2/4 = 50% ✔️ |
A → B, C | 2 | 2/6 = 33.33% ❌ |
B → A, C | 2 | 2/7 = 28.57% ❌ |
C → A, B | 2 | 2/5 = 40% ❌ |
✔️ Strong Rules (≥ 50% Confidence):
→ A, B → C, A, C → B, B, C → A
✅ Advantages of Apriori
Simple and easy to understand
Effective join and prune steps
Good for interpretable rules in large datasets
❌ Disadvantages of Apriori
Slow performance for large datasets
Multiple database scans reduce efficiency.
Time & space complexity:
O(2^D)→ exponential in large itemsets (D = item count)
