Created
May 11, 2025
Last Modified
3 months ago

Apriori Algorithm

Apriori Algorithm in Machine Learning

The Apriori Algorithm is used for association rule learning on transactional databases. It identifies frequent itemsets and uses them to generate association rules that show how strongly items are related.

It uses Breadth-First Search (BFS) and a Hash Tree to count itemset efficiently.

Proposed by: R. Agrawal & Srikant (1994)
Applications:

  • Market Basket Analysis

  • Healthcare (e.g., Drug Interaction Prediction)


🔸 What is a Frequent Itemset?

A frequent itemset is a group of items whose support is greater than a minimum support threshold.

💡 If {A, B} is frequent, then both A and B must be frequent individually.

Example:

  • Transactions:

    • A = {1,2,3,4,5}

    • B = {2,3,7}

  • Frequent itemsets = {2, 3} (appear in both)


🔹 Important Terms:

Support = Frequency of occurrence

  • Confidence = Strength of implication

  • Lift = Strength of association


🔸 Apriori Algorithm Steps

  1. Determine the support of itemsets in the transactional database, and select the minimum support and confidence.

  2. Take all supports in the transaction with a higher support value than the minimum or selected support value.

  3. Find all the rules of these subsets that have a higher confidence value than the threshold or minimum confidence.

  4. Sort the rules in decreasing order of lift.


🔹 Apriori Example

We will understand the Apriori algorithm using an example and mathematical calculation:

Example:

Suppose we have the following dataset that has various transactions, and from this

dataset, we need to find the frequent itemsets and generate the association rules using the

Apriori algorithm

TID

ITEMSETS

T1

A,B 

T2

B,D

T3

B,C

T4

A,B,D

T5

A,C

T6

B,C

T7

A,C

T8

A,B,C,E

T9

A,B,C

Minimum Support = 2
Minimum Confidence = 50%

🧠 Solution

✅ Step 1: C1 and L1 (Single Items)

itemset

Support_Count

A

6

B

7

C

5

D

2

E

1 ❌ (Removed)


✅ Step 2: C2 and L2 (Pairs)

Itemset

Support

{A, B}

4

{A, C}

4

{A, D}

1 ❌

{B, C}

4

{B, D}

2

{C, D}

0 ❌

✔️ Frequent Pairs (L2):
{A, B}, {A, C}, {B, C}, {B, D}


✅ Step 3: C3 and L3 (Triplets)

Itemset

Support

{A, B, C}

2 ✔️

{B, C, D}

1 ❌

{A, C, D}

0 ❌

{A, B, D}

0 ❌

✔️ Only Frequent Triplet (L3): {A, B, C}


✅ Step 4: Generate Association Rules

From {A, B, C} with Support = 2:

Rule

Support

Confidence

A, B → C

2

2/4 = 50% ✔️

A, C → B

2

2/4 = 50% ✔️

B, C → A

2

2/4 = 50% ✔️

A → B, C

2

2/6 = 33.33% ❌

B → A, C

2

2/7 = 28.57% ❌

C → A, B

2

2/5 = 40% ❌

✔️ Strong Rules (≥ 50% Confidence):
A, B → C, A, C → B, B, C → A


✅ Advantages of Apriori

  • Simple and easy to understand

  • Effective join and prune steps

  • Good for interpretable rules in large datasets


❌ Disadvantages of Apriori

  • Slow performance for large datasets

  • Multiple database scans reduce efficiency.

  • Time & space complexity: O(2^D) → exponential in large itemsets (D = item count)