Convolutional neural network (CNN)
Convolutional Neural Networks (CNNs)
Introduction
CNNs are a specialized type of neural network mainly designed for image processing and classification.
They work by recognizing edges, textures, shapes, and patterns in images.
Digital images are represented as matrices of pixel values (0–255).
Images can be:
Grayscale: Single channel (intensity values).
RGB (Red, Green, Blue): Three channels, each represented by a separate matrix.
Neurons
A neuron is the most basic unit in a neural network.
It applies a linear function (weighted sum + bias), followed by a non-linear activation function.
Mathematical Representation of a Neuron:
Where:
= input
= weight
= bias
= weighted sum
= activation function
Common Activation Functions
ReLU (Rectified Linear Unit) – keeps positive values, converts negative values to 0.
Leaky ReLU – allows a small gradient for negative values (avoids dead neurons).
Sigmoid – outputs between 0 and 1, often used for probabilities.
Tanh (Hyperbolic Tangent) – outputs between -1 and 1.
Convolutional Layer
The core layer in CNNs, responsible for extracting features.
Uses kernels/filters:
A small matrix (e.g., 3×3 or 5×5).
Slides over the input image, performing element-wise multiplication.
Produces feature maps that highlight edges, textures, and shapes.
Multiple feature maps can be generated for the same input image by using different kernel values.
Stride: Number of pixels the kernel moves at each step.
Large stride → reduces output size but may miss details.
Small stride → captures more detail but increases computation (risk of overfitting).
Padding: Adding extra rows/columns around the image.
Helps preserve the spatial size after convolution.
Formula:
Pooling Layer
Reduces the spatial size of the feature maps (down-sampling).
Makes computation faster, reduces memory usage, and prevents overfitting.
Types of pooling:
Max Pooling – takes the maximum value from the region.
Average Pooling – takes the average value from the region.
Pooling reduces size without losing important features.
If important features are lost, pooling should be avoided.
Flattening Layer
Converts the 2D feature maps into a 1D column vector.
This vector becomes input for the fully connected layers.
Fully Connected (FC) Layer
Each neuron is connected to every neuron in the previous layer.
Responsible for combining features into final classification or regression results.
Output Layer
Applies activation functions (like Sigmoid or Softmax) to generate probabilities.
For classification, it outputs the probability score for each class.
Applications of CNNs
Image classification (e.g., cats vs. dogs).
Object detection and segmentation.
Autonomous vehicles.
Security camera systems.
Summary of CNN Architecture
Input Layer – Image data (height × width × depth).
Convolutional Layer – Extracts features using kernels/filters.
Activation Layer – Applies non-linearity (ReLU, Tanh, etc.).
Pooling Layer – Reduces dimensions (Max/Average pooling).
Flattening Layer – Converts 2D maps into a 1D vector.
Fully Connected Layer – Learns complex patterns for classification.
Output Layer – Produces class probabilities.
