Dimensionality Reduction

May 31, 2025

Updated 1 month ago

3 min read

Dimensionality Reduction

In statistics and machine learning, dimensionality reduction is the process of reducing the number of variables under consideration by obtaining a smaller set of principal variables. Dimensionality reduction can be implemented in two primary ways:

1. Feature Selection:

In feature selection, we aim to identify k features out of a total of n that provide the most meaningful information. The remaining (n-k) dimensions are discarded, as they contribute less to the model's performance.

2. Feature Extraction:

In feature extraction, we generate a new set of k features that are combinations of the original n features. This approach transforms the data into a lower-dimensional space while retaining important information.

Some well-known feature extraction techniques include:

Principal Component Analysis (PCA) – an unsupervised linear projection method.
Linear Discriminant Analysis (LDA) – a supervised linear projection method.

Usefulness of Dimensionality Reduction:

For most learning algorithms, the computational complexity depends on the number of input dimensions (d) and the size of the dataset (n). Reducing dimensionality helps in lowering both memory and computational costs.
When an input feature is determined to be unnecessary, we reduce the cost of extracting and processing it.
Simpler models tend to be more robust, especially when working with small datasets.
When the data can be described with fewer features, we gain better insight into the underlying processes and patterns, enabling knowledge extraction.
If data can be represented in fewer dimensions without significant loss of information, it can be plotted and visually analyzed to detect patterns, structures, and outliers.

Principal component analysis

The PCA (principal component analysis) algorithm:- This procedure depends on mathematical concepts.

Step 1: Data Representation

We consider a data set having n features denoted by $x_{1}, x_{2}, ..., x_{n}$ . Let these be N examples. let the value of $i - t h$ feature $x_{i 1}, x_{i 2}, ..., x_{i N}$

Features	Example 1	Example 2	...	Example N
$x_{1}$	$x_{11}$	$x_{12}$	...	$x_{1 N}$
				Features Example 1 Example
$x_{n}$	$x_{n 1}$	$x_{n 2}$	...	$x_{n N}$

✅ Step 2: Compute the Mean of Each Feature

\overline{x_{i}} = \frac{1}{N} j = 1 \sum N x_{ij}

\overline{x_{i}} = \frac{1}{N} (x_{i 1} + x_{i 2} + ... + x_{i N})

✅ Step 3: Compute the Covariance Matrix,

consider the variables $x_{i}$ and $x_{j}$ (i & j need not be different). The covariance of the ordered pair $(x_{i}, x_{j})$ is defined as

cov (x_{i}, x_{j}) = \frac{1}{N - 1} k = 1 \sum N (x_{ik} - \overline{x_{i}}) \cdot (x_{j k} - \overline{x_{j}})

We compute the following $n \times n$ matrix $S$ called the covariance matrix of the data, the element of the i-th row, j-th column is the covariance $co v (x_{i}, x_{j})$

$Cov (x_{1}, x_{1})$	$Cov (x_{1}, x_{2})$	$Cov (x_{1}, x_{3})$
$Cov (x_{2}, x_{1})$	$Cov (x_{2}, x_{2})$	$Cov (x_{2}, x_{3})$
...	...	...
...	...	...
$Cov (x_{n}, x_{1})$	$Cov (x_{n}, x_{1})$	$Cov (x_{n}, x_{n})$

✅ Step 4: Compute Eigenvalues and Eigenvectors

To obtain the principal components, we solve for the eigenvalues and eigenvectors of the covariance matrix:

Set the $E q^{n} 2$

d e t (S - λ I) = 0

This is a polynomial equation of degree n in $λ$ . It has no real root, and those roots are the eigenvalues of $S^{'}$ ; we find the n roots $λ_{1}, λ_{2}, ..., λ_{n}$ of $e q^{n} 2$

If $λ = λ^{'}$ is an eigenvalue, then the corresponding eigenvector is a vector

d e t (S - λ ‘ I) U = 0

This is a System of n homogeneous linear equations $u_{1}, u_{2}, ..., u_{n}$ , and it always has a nontrivial solution.

we next find a set of n orthogonal eigen vectors $v_{1}, v_{2}, ..., v n$ such that $v_{i}$ is the eigen vector corresponding to $λ_{i}^{2}$

We now normalize the eigen vectors. given any vector x, we normalize it by dividing X by its length.

plaintext

n ormalized X = \frac{X}{∣∣ X ∣∣}

∣∣ X ∣∣ = X_{1}^{2} + X_{2}^{2} + ... + X_{n}^{2}

Given the Eigen vector U, then corresponding normalized eigen vector is computed as

\frac{1}{∣∣ U ∣∣} \cdot U

x = \frac{x _{1} i + x _{2} j + x _{3}}{x _{1} ^{2} + x _{2} ^{2} + ... + x _{n} ^{2}}

We compute the n-1 normalized eigen vectors

$e_{1}, e_{2}, ..., e_{n}$ by

e_{i} = \frac{1}{∣∣ V _{i} ∣∣} \cdot U_{i}

Step 5: Derive new Data

Order the eigen values from highest to lowest. The unit eigen vector corresponding to the largest eigen value is the first principal component, the unit eigen vector corresponding to next highest eigen value is the second principal component. and so on.

Let the eigen values in descending order
be $λ_{1} \geq λ_{2} ... \geq λ_{n}$ and let the corresponding unit eigen vectors are $e_{1}, e_{2}, ..., e_{n}$
Choose a positive integer $p$ so that $1 \leq p \leq n$
choose the eigen vectors corresponding to the eigen values $λ_{1}, λ_{2}, ..., λ_{p}$ and form the following $p \times n$ matrix
/image here [todo]
we form the following $n \times N$ matrix
/image here [todo]
next compute the matrix

Step 6:

The matrix $X_{n e w}$ is the new dataset, each row of the matrix represents the values of the features, since there are p rows only so new dataset has p features only.

this is how PCA helps us in dimensionality reduction of the dataset.

Understanding Linear Regression
Logistic Regression Explained
Gradient Descent in Machine Learning
XGBoost Algorithm Explained

Dimensionality Reduction

Dimensionality Reduction

1. Feature Selection:

2. Feature Extraction:

Usefulness of Dimensionality Reduction:

Principal component analysis

Step 1: Data Representation

✅ Step 2: Compute the Mean of Each Feature

✅ Step 3: Compute the Covariance Matrix,

✅ Step 4: Compute Eigenvalues and Eigenvectors

Step 5: Derive new Data

Step 6:

Related Machine Learning Articles