Convolutional neural network (CNN)

Sep 5, 2025

Updated 1 month ago

3 min read

Abhijeet Singh Rajput

@abhijeetsingh

Convolutional neural network (CNN)

Introduction

CNNs are a specialized type of neural network mainly designed for image processing and classification.
They work by recognizing edges, textures, shapes, and patterns in images.
Digital images are represented as matrices of pixel values (0–255).
Images can be:Grayscale: Single channel (intensity values).RGB (Red, Green, Blue): Three channels, each represented by a separate matrix.

Neurons

A neuron is the most basic unit in a neural network.
It applies a linear function (weighted sum + bias), followed by a non-linear activation function.

Mathematical Representation of a Neuron:

Where:

x = input
w = weight
b = bias
z = weighted sum
$σ (z)$ = activation function

Common Activation Functions

ReLU (Rectified Linear Unit) – keeps positive values, converts negative values to 0.
Leaky ReLU – allows a small gradient for negative values (avoids dead neurons).
Sigmoid – outputs between 0 and 1, often used for probabilities.
Tanh (Hyperbolic Tangent) – outputs between -1 and 1.

CNN Architecture

A Convolutional Neural Network (CNN) uses a hierarchical layered architecture to automatically learn and extract meaningful features from images — no manual feature engineering needed. Raw pixel data is progressively transformed into higher-level representations through specialized layers: early layers detect simple patterns like edges and textures, while deeper layers recognize complex structures like shapes and objects. This makes CNNs one of the most powerful and widely used architectures in image classification and computer vision.

Convolutional Layer

The core layer in CNNs, responsible for extracting features.
Uses kernels/filters:A small matrix (e.g., 3×3 or 5×5).Slides over the input image, performing element-wise multiplication.Produces feature maps that highlight edges, textures, and shapes.
Multiple feature maps can be generated for the same input image by using different kernel values.
Stride: Number of pixels the kernel moves at each step.Large stride → reduces output size but may miss details.Small stride → captures more detail but increases computation (risk of overfitting).
Padding: Adding extra rows/columns around the image.Helps preserve the spatial size after convolution.Formula:Output Size=StrideInput Size+2×Padding−Kernel Size+1

Pooling Layer

Reduces the spatial size of the feature maps (down-sampling).
Makes computation faster, reduces memory usage, and prevents overfitting.
Types of pooling:Max Pooling – takes the maximum value from the region.Average Pooling – takes the average value from the region.
Pooling reduces size without losing important features.
If important features are lost, pooling should be avoided.

Flattening Layer

Converts the 2D feature maps into a 1D column vector.
This vector becomes input for the fully connected layers.

Fully Connected (FC) Layer

Each neuron is connected to every neuron in the previous layer.
Responsible for combining features into final classification or regression results.

Output Layer

Applies activation functions (like Sigmoid or Softmax) to generate probabilities.
For classification, it outputs the probability score for each class.

Applications of CNNs

Image classification (e.g., cats vs. dogs).
Object detection and segmentation.
Autonomous vehicles.
Security camera systems.

Summary of CNN Architecture

Input Layer – Image data (height × width × depth).
Convolutional Layer – Extracts features using kernels/filters.
Activation Layer – Applies non-linearity (ReLU, Tanh, etc.).
Pooling Layer – Reduces dimensions (Max/Average pooling).
Flattening Layer – Converts 2D maps into a 1D vector.
Fully Connected Layer – Learns complex patterns for classification.
Output Layer – Produces class probabilities.

pages on NoteHub that relate directly to your CNN content

Artificial Neural Network — background before CNNs
Perceptron — the simplest neuron model
Loss Function — how CNNs learn from error
Rectified Linear Unit in CNN — directly extends the activation section
Gradient Descent — how weights are updated during training
Recurrent Neural Network (RNN) — natural "what's next" read after CNNs
LSTM — another architecture to explore after
Dimensionality Reduction — ties into pooling concepts

External References

CS231n: CNNs for Visual Recognition — Stanford's definitive CNN course notes
PyTorch CNN Tutorial — hands-on implementation
TensorFlow Keras Conv2D docs — official API reference
Papers With Code — Image Classification — state-of-the-art CNN benchmarks