Apriori Algorithm

Created

Apr 27, 2026

Last Modified

6 hours ago

In data science and machine learning, discovering patterns in large datasets is essential. One of the most popular algorithms for this task is the Apriori Algorithm, widely used for association rule mining and market basket analysis.

What is Apriori Algorithm?

The Apriori Algorithm is a classical data mining technique used to find frequent itemsets and generate association rules from transactional databases.

It works on databases containing transactions
It determines how strongly or weakly items are related
It was proposed by R. Agrawal and Srikant (1994)

Main Purpose:

Identify items that are frequently bought together
Discover hidden patterns in data

Real-Life Applications

E-commerce → Product recommendation (e.g., “Customers who bought X also bought Y”)
Retail → Market basket analysis
Healthcare → Drug interaction analysis
Banking → Customer behavior analysis

What is Frequent Itemset?

A Frequent Itemset is a group of items that appear together frequently in transactions.

From your notes :

An itemset is frequent if its support ≥ minimum support threshold
If {A, B} is frequent, then A and B must also be frequent individually

Example:

Transactions:

A = {1,2,3,4,5}
B = {2,3,7}
C = {1,2,3,5}

Items 2 and 3 are frequent because they appear multiple times.

Key Terms in Apriori

Support

Measures how often an itemset appears.

S u pp or t (A) = \frac{T r an s a c t i o n sco n t ainin g A}{T o t a l T r an s a c t i o n s}

Confidence

Measures the reliability of a rule.

C o n f i d e n ce (A \to B) = \frac{S u pp or t ( A \cup B )}{S u pp or t ( A )}

Steps of Apriori Algorithm

1.Set Minimum Support & Confidence

Define threshold values

2. Generate Frequent Itemsets

Find items with support ≥ minimum support

3. Candidate Generation

Create combinations (pairs, triplets, etc.)

4.Pruning

Remove itemsets with low support

5.Rule Generation

Generate association rules

6.Sort Rules

Rank rules based on confidence or importance

Working of Apriori Algorithm

Problem Setup

We are given a transactional dataset and need to:

Find frequent itemsets
Generate association rules

Dataset Example

TID	Itemsets
T1	A, B
T2	B, D
T3	B, C
T4	A, B, D
T5	A, C
T6	B, C
T7	A, C
T8	A, B, C, E
T9	A, B, C

Given:

Minimum Support = 2
Minimum Confidence = 50%

Step 1: Generate C1 (Candidate 1-itemsets)

Count frequency of each item.

Item	Support Count
A	6
B	7
C	5
D	2
E	1

👉 This is called C1 (Candidate Set)

Step 2: Generate L1 (Frequent 1-itemsets)

Remove items with support < minimum support (2)

Item	Support Count
A	6
B	7
C	5
D	2

👉 E is removed because support = 1

Step 3: Generate C2 (Candidate 2-itemsets)

Form pairs from L1:

Itemset	Support Count
{A, B}	4
{A, C}	4
{A, D}	1
{B, C}	4
{B, D}	2
{C, D}	0

Step 4: Generate L2 (Frequent 2-itemsets)

Remove itemsets with support < 2:

Itemset	Support Count
{A, B}	4
{A, C}	4
{B, C}	4
{B, D}	2

Step 5: Generate C3 (Candidate 3-itemsets)

Form triplets from L2:

Itemset	Support Count
{A, B, C}	2
{B, C, D}	1
{A, C, D}	0
{A, B, D}	0

Step 6: Generate L3 (Frequent 3-itemsets)

Only keep itemsets with support ≥ 2:

Itemset	Support Count
{A, B, C}	2

Step 7: Stop Condition

No more larger frequent itemsets can be generated.
Algorithm stops here.

Step 8: Generate Association Rules

From frequent itemset {A, B, C}

We generate rules and calculate confidence:

Rule 1: (A ∧ B) → C

Confidence = \frac{Support ( A , B , C )}{Support ( A , B )} = \frac{2}{4} = 50%

Rule 2: (B ∧ C) → A

Confidence = \frac{2}{4} = 50%

Rule 3: (A ∧ C) → B

Confidence = \frac{2}{4} = 50%

Rule 4: C → (A ∧ B)

Confidence = \frac{2}{5} = 40% (R e j ec t e d)

Rule 5: A → (B ∧ C)

Confidence = \frac{2}{6} = 33 . 33% (R e j ec t e d)

Final Strong Rules

Only rules with confidence ≥ 50% are selected:

(A ∧ B) → C
(B ∧ C) → A
(A ∧ C) → B

Advantages of Apriori

Easy to understand and implement
Works well for small datasets
Reduces search space using pruning
Useful in recommendation systems

Limitations of Apriori

Computationally expensive for large datasets
Generates many candidate sets
Multiple database scans required
Not efficient for dense data