Gaussian Mixture Model

Nov 19, 2025

Updated 3 weeks ago

3 min read

Gaussian Mixture Model (GMM)

A Gaussian Mixture Model is a probabilistic clustering method that assumes data points are generated from a mixture of multiple Gaussian distributions whose parameters are unknown.

Unlike K-Means, which does hard clustering (each point belongs to exactly one cluster), GMM performs soft clustering, meaning every point has a probability of belonging to each cluster.

Gaussian mixture model showing multiple clusters represented by normal distributions with different means and variances

Working of GMM (By Dimri Sir)

Assume we have K Gaussian clusters.
Each cluster corresponds to a Gaussian distribution with its own mean and covariance.

For a data point $x_{n}$ , the probability that it belongs to cluster $k$ is:

P (z_{n} = k ∣ x_{n}) = \frac{π _{k} \cdot N ( x _{n} ∣ μ _{k} , Σ _{k} )}{\sum _{j = 1}^{K} π _{j} \cdot N ( x _{n} ∣ μ _{j} , Σ _{j} )}

Where:

$z_{n} = k$ → latent variable indicating point $x_{n}$ belongs to cluster $k$
$π_{k}$ → mixing coefficient for the $k^{t h}$ Gaussian
$N (x_{n} ∣ μ_{k}, Σ_{k})$ → Gaussian distribution with mean $μ_{k}$ and covariance $Σ_{k}$

The overall likelihood of observing data point $x_{n}$ is:

p (x_{n}) = j = 1 \sum k π_{j} N (x_{n} ∣ μ_{j}, Σ_{k})

Expectation–Maximization (EM) Algorithm

To fit a GMM to data, we use the EM algorithm, an iterative method that optimizes the parameters $(μ, Σ, π_{k})$

1. E-Step (Expectation)

Calculate the responsibility of each cluster for every data point:

How likely is point $x_{n}$ to belong to cluster $k$
Based on current estimates of mean, covariance, and mixing coefficients

2. M-Step (Maximization)

Update the parameters:

Update means $μ_{k}$
Update covariances $Σ_{k}$
Update mixing coefficients $π_{k}$

These updated parameters maximize the likelihood of observing the data.

The process repeats until convergence.

Working of GMM (My Own)

1. Initialization

Choose:

Number of clusters $(k)$
For each cluster:
- Mean $(μ)$
- Covariance $(Σ)$
- Weight $(π)$

Values are initially random or taken from K-Means.

2. E-Step (Expectation Step)

Compute responsibility for each data point:

👉 Probability that a data point belongs to each cluster.

Formula (conceptually):

If the point is close to cluster mean → high probability
If far → low probability

This assigns soft memberships.

3. M-Step (Maximization Step)

Update the parameters based on responsibilities:

New means = weighted average of points
New covariances = weighted spread
New weights = how much responsibility each cluster has

So clusters get reshaped according to data.

4. Repeat until convergence

Keep repeating:
E-Step → M-Step → E-Step → M-Step
until the parameters stop changing.

Output

GMM gives:

The final clusters
For each point → probability of belonging to each cluster
Cluster shapes (elliptical, not circular like K-Means)

EM Algorithm

11 Oct 2025

1. Initialization

Randomly initialize the parameters for each of the $K$ Gaussians:

Means $μ_{k}$
Covariances $Σ_{k}$
Mixing coefficients $π_{k}$

2. E-Step (Expectation)

Calculate the responsibility.

r_{ik} = \frac{π _{k} \cdot N ( x _{i} ∣ μ _{k} , Σ _{k} )}{\sum _{j = 1}^{K} π _{j} \cdot N ( x _{i} ∣ μ _{j} , Σ _{j} )}

3. M-Step (Maximization)

Update the parameters.

π_{k} = i = 1 \sum N r_{ik}

μ_{k}^{new} = \frac{\sum _{i = 1}^{N} r _{ik} \cdot x _{i}}{\sum _{i = 1}^{N} r _{ik}}

k \sum new = i = 1 \sum N r_{ik} (x_{i} - μ_{k}^{new}) \cdot (x_{i} - μ_{k})^{T}

4. Convergence

Repeat E-step and M-step until:

Parameters stop changing significantly, or
Likelihood converges.

Applications of GMM

1. Clustering

Find hidden groups in data.
Used in:

Marketing
Medicine
Genetics
Customer segmentation

2. Anomaly Detection

Identify rare or unusual patterns, e.g.,

Fraud detection
Medical error detection
Network intrusion detection

3. Image Segmentation

Divide images into meaningful regions.
Used in:

Medical imaging
Remote sensing
Military applications

4. Density Estimation

Model complex probability distributions for:

Generative modeling
Sampling
Feature understanding

Advantages of GMM

Flexible cluster shapes
Can model ellipsoidal / overlapping clusters (unlike K-Means).
Soft assignment
Assigns probabilities instead of hard labels.
Handles missing data
More robust to incomplete observations.
Interpretable parameters
Each Gaussian has:
- Mean
- Covariance
- Mixing coefficient

All easy to analyze and understand.

Gaussian Mixture Model

Gaussian Mixture Model (GMM)

Working of GMM (By Dimri Sir)

Expectation–Maximization (EM) Algorithm

1. E-Step (Expectation)

2. M-Step (Maximization)

Working of GMM (My Own)

1. Initialization

2. E-Step (Expectation Step)

3. M-Step (Maximization Step)

4. Repeat until convergence

Output

EM Algorithm

1. Initialization

2. E-Step (Expectation)

3. M-Step (Maximization)

4. Convergence

Applications of GMM

1. Clustering

2. Anomaly Detection

3. Image Segmentation

4. Density Estimation

Advantages of GMM

On NoteHub

Around the Web

Gaussian Mixture Model

Gaussian Mixture Model (GMM)

Working of GMM (By Dimri Sir)

Expectation–Maximization (EM) Algorithm

1. E-Step (Expectation)

2. M-Step (Maximization)

Working of GMM (My Own)

1. Initialization

2. E-Step (Expectation Step)

3. M-Step (Maximization Step)

4. Repeat until convergence

Output

EM Algorithm

1. Initialization

2. E-Step (Expectation)

3. M-Step (Maximization)

4. Convergence

Applications of GMM

1. Clustering

2. Anomaly Detection

3. Image Segmentation

4. Density Estimation

Advantages of GMM

Related Notes

On NoteHub

Around the Web