Created
Apr 27, 2026
Last Modified
6 hours ago

Apriori Algorithm

In data science and machine learning, discovering patterns in large datasets is essential. One of the most popular algorithms for this task is the Apriori Algorithm, widely used for association rule mining and market basket analysis.

What is Apriori Algorithm?

The Apriori Algorithm is a classical data mining technique used to find frequent itemsets and generate association rules from transactional databases.

  • It works on databases containing transactions

  • It determines how strongly or weakly items are related

  • It was proposed by R. Agrawal and Srikant (1994)

    https://images.openai.com/static-rsc-4/Ne6dl_BVMaJK8xrCvPf_ga2CjoGtqMIkeIjWmxYp57kaa0clerE9D6wiWr0QHeFaZoBKDYSW4AL4yEsz-M9D0YVheRoEgODyJO9evTRQcTz6keSAU7WKHO2XljSVf4VvWpOh45JQ15i9H5ygKaCjjpvEsqCUTvQRifdQ5FYdaTkHJynzUNR4-XdTqNOYbFZw?purpose=fullsize

Main Purpose:

  • Identify items that are frequently bought together

  • Discover hidden patterns in data

Real-Life Applications

  • E-commerce → Product recommendation (e.g., “Customers who bought X also bought Y”)

  • Retail → Market basket analysis

  • Healthcare → Drug interaction analysis

  • Banking → Customer behavior analysis

What is Frequent Itemset?

A Frequent Itemset is a group of items that appear together frequently in transactions.

From your notes :

  • An itemset is frequent if its support ≥ minimum support threshold

  • If {A, B} is frequent, then A and B must also be frequent individually

Example:

Transactions:

  • A = {1,2,3,4,5}

  • B = {2,3,7}

  • C = {1,2,3,5}

Items 2 and 3 are frequent because they appear multiple times.

Key Terms in Apriori

Support

Measures how often an itemset appears.

Confidence

Measures the reliability of a rule.

Steps of Apriori Algorithm

1.Set Minimum Support & Confidence

  • Define threshold values

2. Generate Frequent Itemsets

  • Find items with support ≥ minimum support

3. Candidate Generation

  • Create combinations (pairs, triplets, etc.)

4.Pruning

  • Remove itemsets with low support

5.Rule Generation

  • Generate association rules

6.Sort Rules

  • Rank rules based on confidence or importance

Working of Apriori Algorithm

Problem Setup

We are given a transactional dataset and need to:

  • Find frequent itemsets

  • Generate association rules

Dataset Example

TID

Itemsets

T1

A, B

T2

B, D

T3

B, C

T4

A, B, D

T5

A, C

T6

B, C

T7

A, C

T8

A, B, C, E

T9

A, B, C

Given:

  • Minimum Support = 2

  • Minimum Confidence = 50%

Step 1: Generate C1 (Candidate 1-itemsets)

Count frequency of each item.

Item

Support Count

A

6

B

7

C

5

D

2

E

1

👉 This is called C1 (Candidate Set)

Step 2: Generate L1 (Frequent 1-itemsets)

Remove items with support < minimum support (2)

Item

Support Count

A

6

B

7

C

5

D

2

👉 E is removed because support = 1

Step 3: Generate C2 (Candidate 2-itemsets)

Form pairs from L1:

Itemset

Support Count

{A, B}

4

{A, C}

4

{A, D}

1

{B, C}

4

{B, D}

2

{C, D}

0

Step 4: Generate L2 (Frequent 2-itemsets)

Remove itemsets with support < 2:

Itemset

Support Count

{A, B}

4

{A, C}

4

{B, C}

4

{B, D}

2

Step 5: Generate C3 (Candidate 3-itemsets)

Form triplets from L2:

Itemset

Support Count

{A, B, C}

2

{B, C, D}

1

{A, C, D}

0

{A, B, D}

0

Step 6: Generate L3 (Frequent 3-itemsets)

Only keep itemsets with support ≥ 2:

Itemset

Support Count

{A, B, C}

2

Step 7: Stop Condition

No more larger frequent itemsets can be generated.
Algorithm stops here.

Step 8: Generate Association Rules

From frequent itemset {A, B, C}

We generate rules and calculate confidence:

Rule 1: (A ∧ B) → C

Rule 2: (B ∧ C) → A

Rule 3: (A ∧ C) → B

Rule 4: C → (A ∧ B)

Rule 5: A → (B ∧ C)

Final Strong Rules

Only rules with confidence ≥ 50% are selected:

  • (A ∧ B) → C

  • (B ∧ C) → A

  • (A ∧ C) → B

Advantages of Apriori

  • Easy to understand and implement

  • Works well for small datasets

  • Reduces search space using pruning

  • Useful in recommendation systems

Limitations of Apriori

  • Computationally expensive for large datasets

  • Generates many candidate sets

  • Multiple database scans required

  • Not efficient for dense data