Created
May 31, 2025
Last Modified
3 months ago

Dimensionality Reduction

Dimensionality Reduction

In statistics and machine learning, dimensionality reduction is the process of reducing the number of variables under consideration by obtaining a smaller set of principal variables. Dimensionality reduction can be implemented in two primary ways:


1. Feature Selection:

In feature selection, we aim to identify k features out of a total of n that provide the most meaningful information. The remaining (n-k) dimensions are discarded, as they contribute less to the model's performance.


2. Feature Extraction:

In feature extraction, we generate a new set of k features that are combinations of the original n features. This approach transforms the data into a lower-dimensional space while retaining important information.

Some well-known feature extraction techniques include:

  • Principal Component Analysis (PCA) – an unsupervised linear projection method.

  • Linear Discriminant Analysis (LDA) – a supervised linear projection method.


Usefulness of Dimensionality Reduction:

  • For most learning algorithms, the computational complexity depends on the number of input dimensions (d) and the size of the dataset (n). Reducing dimensionality helps in lowering both memory and computational costs.

  • When an input feature is determined to be unnecessary, we reduce the cost of extracting and processing it.

  • Simpler models tend to be more robust, especially when working with small datasets.

  • When the data can be described with fewer features, we gain better insight into the underlying processes and patterns, enabling knowledge extraction.

  • If data can be represented in fewer dimensions without significant loss of information, it can be plotted and visually analyzed to detect patterns, structures, and outliers.


Principal component analysis

The PCA (principal component analysis) algorithm:- This procedure depends on mathematical concepts.


Step 1: Data Representation

We consider a data set having n features denoted by . Let these be N examples. let the value of feature

Features

Example 1

Example 2

...

Example N

...

Features Example 1 Example

...


✅ Step 2: Compute the Mean of Each Feature


✅ Step 3: Compute the Covariance Matrix,

consider the variables and (i & j need not be different). The covariance of the ordered pair is defined as

We compute the following matrix called the covariance matrix of the data, the element of the i-th row, j-th column is the covariance

...

...

...

...

...

...


✅ Step 4: Compute Eigenvalues and Eigenvectors

To obtain the principal components, we solve for the eigenvalues and eigenvectors of the covariance matrix:

  1. Set the

This is a polynomial equation of degree n in . It has no real root, and those roots are the eigenvalues of ; we find the n roots of

  1. If is an eigenvalue, then the corresponding eigenvector is a vector

This is a System of n homogeneous linear equations , and it always has a nontrivial solution.

we next find a set of n orthogonal eigen vectors such that is the eigen vector corresponding to

  1. We now normalize the eigen vectors. given any vector x, we normalize it by dividing X by its length.

plaintext
X = X1
       X2
       .
       .
       .
       Xn

Given the Eigen vector U, then corresponding normalized eigen vector is computed as

We compute the n-1 normalized eigen vectors

by


Step 5: Derive new Data

Order the eigen values from highest to lowest. The unit eigen vector corresponding to the largest eigen value is the first principal component, the unit eigen vector corresponding to next highest eigen value is the second principal component. and so on.

  1. Let the eigen values in descending order
    be and let the corresponding unit eigen vectors are

  2. Choose a positive integer so that

  3. choose the eigen vectors corresponding to the eigen values and form the following matrix
    /image here [todo]

  4. we form the following matrix
    /image here [todo]

  5. next compute the matrix


Step 6:

The matrix is the new dataset, each row of the matrix represents the values of the features, since there are p rows only so new dataset has p features only.

this is how PCA helps us in dimensionality reduction of the dataset.