Created
Sep 5, 2025
Last Modified
3 months ago

Convolutional neural network (CNN)

Convolutional Neural Networks (CNNs)

Introduction

  • CNNs are a specialized type of neural network mainly designed for image processing and classification.

  • They work by recognizing edges, textures, shapes, and patterns in images.

  • Digital images are represented as matrices of pixel values (0–255).

  • Images can be:

    • Grayscale: Single channel (intensity values).

    • RGB (Red, Green, Blue): Three channels, each represented by a separate matrix.


Neurons

  • A neuron is the most basic unit in a neural network.

  • It applies a linear function (weighted sum + bias), followed by a non-linear activation function.

Mathematical Representation of a Neuron:

Where:

  • = input

  • = weight

  • = bias

  • = weighted sum

  • = activation function


Common Activation Functions

  • ReLU (Rectified Linear Unit) – keeps positive values, converts negative values to 0.

  • Leaky ReLU – allows a small gradient for negative values (avoids dead neurons).

  • Sigmoid – outputs between 0 and 1, often used for probabilities.

  • Tanh (Hyperbolic Tangent) – outputs between -1 and 1.


Convolutional Layer

  • The core layer in CNNs, responsible for extracting features.

  • Uses kernels/filters:

    • A small matrix (e.g., 3×3 or 5×5).

    • Slides over the input image, performing element-wise multiplication.

    • Produces feature maps that highlight edges, textures, and shapes.

  • Multiple feature maps can be generated for the same input image by using different kernel values.

  • Stride: Number of pixels the kernel moves at each step.

    • Large stride → reduces output size but may miss details.

    • Small stride → captures more detail but increases computation (risk of overfitting).

  • Padding: Adding extra rows/columns around the image.

    • Helps preserve the spatial size after convolution.

    • Formula:


Pooling Layer

  • Reduces the spatial size of the feature maps (down-sampling).

  • Makes computation faster, reduces memory usage, and prevents overfitting.

  • Types of pooling:

    • Max Pooling – takes the maximum value from the region.

    • Average Pooling – takes the average value from the region.

  • Pooling reduces size without losing important features.

  • If important features are lost, pooling should be avoided.


Flattening Layer

  • Converts the 2D feature maps into a 1D column vector.

  • This vector becomes input for the fully connected layers.


Fully Connected (FC) Layer

  • Each neuron is connected to every neuron in the previous layer.

  • Responsible for combining features into final classification or regression results.


Output Layer

  • Applies activation functions (like Sigmoid or Softmax) to generate probabilities.

  • For classification, it outputs the probability score for each class.


Applications of CNNs

  • Image classification (e.g., cats vs. dogs).

  • Object detection and segmentation.

  • Autonomous vehicles.

  • Security camera systems.


Summary of CNN Architecture

  1. Input Layer – Image data (height × width × depth).

  2. Convolutional Layer – Extracts features using kernels/filters.

  3. Activation Layer – Applies non-linearity (ReLU, Tanh, etc.).

  4. Pooling Layer – Reduces dimensions (Max/Average pooling).

  5. Flattening Layer – Converts 2D maps into a 1D vector.

  6. Fully Connected Layer – Learns complex patterns for classification.

  7. Output Layer – Produces class probabilities.